Named after the mountain of the Bernese Alps, Eiger is the production partition on Alps, the HPE Cray EX Supercomputer, which is accessible via SSH as eiger.cscs.ch from the frontend ela.cscs.ch

Known issues

You might get occasionally a "Bus error" in the logs of your simulations: unfortunately it is a bug affecting Eiger that has been reported to HPE/Cray support and will be fixed in the next update of Eiger starting March 18th 

Latest news

You can read the Latest news on Eiger, that include the announcements about planned interventions on the system. The list of past and on-going interventions is available in the page about Past interventions

Architecture

Compute Nodes (CN) feature two sockets with one AMD EPYCTM 7742 64-Core processor per socket, that interface with the high-speed HPE Slingshot interconnect: please check the system specifics of the HPE Cray EX Supercomputer. More detailed information on the supercomputer cabinets and the HPE Slingshot interconnect is available in the HPE Cray EX Liquid-Cooled Cabinet for Large-Scale Systems brochure.

Optimal CPU affinity settings require an understanding of the architectural features of the AMD EPYCTM 7742 processor: please refer to the technical description of the processor for further details.

Please find additional details on the Compute node configuration and the CPU configuration following the links to the dedicated pages. 

File systems

The user space on the scratch file system can be reached using the environment variable $SCRATCH that points to the user's personal folder /capstor/scratch/cscs/$USER .
The scratch file system is connected to the compute nodes of the system. File systems access type (read, write, None) from compute (CN) and user access nodes (UAN) is summarized in the table below:


scratch/users/project/store
CNr+wr+wrr
UANr+wr+wr+wr+w

Please read carefully the general information on file systems at CSCS, especially with respect to the soft quota and the cleaning policy enforced on scratch

Cray Programming Environment (CPE)

User Access Nodes (UAN) are the login nodes of the system and feature the Cray Programming Environment (CPE). The CPE provides tools for applications development and code performance analysis, including compilers, analyzers, optimized libraries, and debuggers. The CPE implementation on the system is controlled using the environment modules system Lmod, written in the Lua programming language, which provides a hierarchical mechanism to access compilers, tools and applications. Please find out more in the page Lmod with the CPE.

No modules are loaded by default at login: users need to load the module cray first, then they will be able to load the modules available in the default Cray programming environment. Therefore users are invited to add the command module load cray to their scripts and workflows.  

Cray meta-modules

CPE modulefiles are organized into meta-modules, each one supporting a different compiler suite, that can be loaded as usual with the command module load. The CPE on the system comes with the definition of the meta-modules PrgEnv-aocc , PrgEnv-cray, PrgEnv-gnu and PrgEnv-intel.
Depending on the chosen CPE meta-module, the environment variable PE_ENV  will be defined (with value AOCC , CRAY, GNU or INTEL respectively), loading the module craype  with the compiler wrappers (cc  for C code, CC  for C++ code and ftn  for Fortran code), the compiler (aocccce , gcc or intel according to the selected environment), the MPI library (cray-mpich) and the scientific library (cray-libsci).
The wrappers call the correct compiler with appropriate options to build and link applications with relevant libraries, as required by the loaded modules (only dynamic linking is supported) and therefore should replace direct calls to compiler drivers in Makefiles and build scripts.

Supported applications and libraries

Supported applications and libraries are built using the EasyBuild toolchains cpeAMD, cpeCray, cpeGNU and cpeIntel, which load compilers and libraries of the modules PrgEnv-aocc, PrgEnv-cray, PrgEnv-gnu and PrgEnv-intel respectively: please have a look at Building software with EasyBuild for more details. After loading a toolchain module, you will be able to list with module avail and load with module load additional applications and libraries built with the currently loaded toolchain. The available CPE toolchains can be listed using the following command:

CPE toolchains
module avail cpe

You can list all the modules on the system by typing the command module spider. Add an argument to the module spider command to get the instructions to load a specific module. When multiple versions are available, you have to provide the module full name as argument to the module spider  command, in order to see how to load it, as in the example below:  

Search for a module
$ module spider gromacs

--------------------------------------------------------------------------------------------------------------------------------
  GROMACS: GROMACS/2020.5
--------------------------------------------------------------------------------------------------------------------------------
    Description:
      GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems
      with hundreds to millions of particles.       

    Versions:
        GROMACS/2020.5
        GROMACS/2021.3
        GROMACS/2021.5

-----------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "GROMACS" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider GROMACS/2021.5

As explained in the section Module hierarchy, the search is case insensitive: however, you need to type the correct case when loading the module.

Please note that the toolchains follow the naming convention of the Cray Programming Environment, that is released on a monthly basis: as a consequence, the version of the toolchain modules has the format YY.MM (two digits for the year, two digits for the month). If you don't specify the version in your module  command, the default will be selected by Lmod; when no default is set, Lmod will load the latest one:

Load a module with its toolchain
$ ml cpeGNU GROMACS

Due to MODULEPATH changes, the following have been reloaded:
  1) cray-mpich/8.1.12

$ ml

Currently Loaded Modules:
  1) craype-x86-rome                             6) cray-dsmml/0.2.2       11) perftools-base/21.12.0
  2) libfabric/1.11.0.4.79                       7) cray-libsci/21.08.1.2  12) cpe/21.12
  3) craype-network-ofi                          8) cray-mpich/8.1.12      13) cpeGNU/21.12
  4) xpmem/2.2.40-7.0.1.0_2.4__g1d7a24d.shasta   9) craype/2.7.13          14) craype-hugepages8M
  5) PrgEnv-gnu/8.3.0                           10) gcc/11.2.0             15) GROMACS/2021.5

The compilers available on the system support different implementations of the OpenMP API. Please check the _OPENMP  macro for each compiler version, which helps mapping it with the supported OpenMP API. For instance, the clang-cpp  pre-processor of the aocc and cce available loading the default modules will return 201811 (Nov 2018) using the command below: 

$ echo | clang-cpp -fopenmp -dM | grep _OPENMP

#define _OPENMP 201811

You can retrieve the complete list of modules available on the system with the command module spider, as described above. Software commonly used on the system is listed below:

Available Applications on Eiger
Toolchain: Application list 


cpeAMD: GSL   

cpeCray: GSL, ParaView-OSMesa-python3   

cpeIntel: Amber, GSL, QuantumESPRESSO, VASP   

cpeGNU: Boost, Boost-python3, CDO, CP2K, GROMACS, GSL, Julia, JuliaExtensions, jupyterlab, LAMMPS, matplotlib, NAMD, NCO

Running Jobs

Parallel programs compiled with the Cray MPI library cray-mpich must be run on the compute nodes using the Slurm srun  command: running applications on the login nodes is not allowed, as they are a shared resource.
Slurm batch scripts should be submitted with the Slurm sbatch command from the user's $SCRATCH  folder: for performance reasons users should NOT run jobs from other filesystems.

A template Slurm job submission script to run a simulation with 16 MPI tasks per node and 8 OpenMP threads per task on 6 nodes is provided below: please replace <executable>  with the name of the real executable file.

Template Slurm batch script
#!/bin/bash -l
  
#SBATCH --job-name=template
#SBATCH --time=01:00:00
#SBATCH --nodes=6
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=8
#SBATCH --account=<project>
#SBATCH --constraint=mc

export FI_CXI_RX_MATCH_MODE=hybrid
export MPICH_OFI_STARTUP_CONNECT=1
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} 
export OMP_PLACES=cores
export OMP_PROC_BIND=close
export OMP_STACKSIZE=8M
srun --cpu-bind=verbose,cores <executable>

The srun  option --cpu-bind binds MPI tasks to CPUs: in the template above, the keyword cores  will bind tasks to cores (note that if the number of tasks differs from the number of allocated cores, this can result in sub-optimal binding). We have also enabled the verbose mode with the keyword verbose, that will report in the Slurm output the selected CPU binding for all commands executed in the script. You can also set the SLURM_CPU_BIND  environment variable value to verbose to select verbose output. Please use alternatively the srun  option --hint=nomultithread to avoid extra threads with in-core multi-threading, a configuration that can benefit communication intensive applications: in this case, please remove --cpu-bind, see man srun for details. Please note as well:

  • the default OMP_STACKSIZE is small for the GNU compiler, therefore you may get a segmentation fault with multithreaded simulations: in this case, try to increase it as in the template above. The actual value of OMP_STACKSIZE at runtime will be limited by the free memory on the node, therefore you might get an error like libgomp: Thread creation failed: Resource temporarily unavailable if you request more memory than currently available
  • some applications might fail at runtime reporting an error related to FI_CXI_RX_MATCH_MODE. In this case, please add export FI_CXI_RX_MATCH_MODE=hybrid  as in the template above or export FI_CXI_RX_MATCH_MODE=software in your Slurm batch script. Other environment variables might be fine tuned, for instance FI_CXI_RDZV_THRESHOLD, FI_CXI_REQ_BUF_SIZE, FI_CXI_REQ_BUF_MIN_POSTED and FI_CXI_REQ_BUF_MAX_CACHED      

Please be reminded to include the active project that you would like this job to be charged for allocation. This can be done with the Slurm option  #SBATCH --account=<project>  in the submission script or as a flag with the srun command, i.e.  --account=<project>  or  -A <project> , where the string  <project>  is the ID of the active project. You also need to specify the  Slurm constraint #SBATCH -- constraint=mc in the batch script or as a srun option ( -- constraint=mc or -C mc).

The list of queues and partitions is available typing sinfo or scontrol show partition. Note that not all projects are enabled on every partition, please check the AllowGroups entry of the command scontrol show partition.

The command sinfo -l   provides a summary of the Slurm batch queues that is easy to visualize. Please check the other options of the command with  sinfo --help .

You can choose the queue where to run your job by issuing the Slurm directive --partition in your batch script as follows: #SBATCH --partition=<partition_name> . The list of Slurm queues available on the system is presented in the table below:

NameWall timeMax nodes per jobMax jobs per userBrief description
debug30 minutes101Quick turnaround for tests
normal24 hours

Standard production jobs
prepost30 minutes1
High priority pre- and post-processing
low24 hours

Low priority queue

Allocating large memory nodes

Slurm supports the --mem option to specify the real memory required per node. For applications requiring more than 256 GB of memory per node, users should add the Slurm directive #SBATCH --mem=497G in their jobscript. See the sbatch man page for more details.

Container solutions

Available ToolsFunctionalities (highlights)
Buildah
  • Build OCI compliant (Docker) container images directly on Eiger
  • Execute commands inside a running container 
  • Additional documentation
Sarus
Singularity 

Please click on the name of the tool to access the dedicated pages available in the CSCS Knowledge Base.

Debugging and Performance analysis tools 

The HPE/Cray EX Programming Environment provides a set of debugging and performance analysis tools to analyse the behaviour and performance of programs running on the system:

cpe

atp

cray-

ccdb

cray-

stat

gdb

4hpc

valgrind

4hpc


papi

perftools

22.053.14.114.12.114.11.104.14.02.12.8
6.0.0.1422.05

In addition to the tools provided by HPE/Cray, we also support other debuggers and performance analysis tools:

arm-forge 

(ddt)


ScalascaScore-P
21.1.1-linux-x86_64
2.67.0

As explained above in the section Module hierarchy, there are two ways to find the list of installed tools:

  1. Use the command module spider as shown in the examples below

    Installed tools can be listed by using the module spider command with the module name:

    module spider Score-P
    $ module spider Score-P
    
    ----------------------------------------------------------------------------------------------
      Score-P:
    ----------------------------------------------------------------------------------------------
        Description:
          The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite
          for profiling, event tracing, and online analysis of HPC applications.
         Versions:
            Score-P/7.0
    ----------------------------------------------------------------------------------------------
      For detailed information about a specific "Score-P" package (including how to load the modules) use the module's full name.
      Note that names that have a trailing (E) are extensions provided by other modules.
      For example:
         $ module spider Score-P/7.0
    ----------------------------------------------------------------------------------------------

    A more detailed description of a module can be printed by the command module spider  using as argument the full module name with a version number:

    module spider Score-P/7.0
    $ module spider Score-P/7.0
    
    ----------------------------------------------------------------------------------------------
      Score-P: Score-P/7.0
    ----------------------------------------------------------------------------------------------
        Description:
          The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite
          for profiling, event tracing, and online analysis of HPC applications.
        You will need to load all module(s) on any one of the lines below before the "Score-P/7.0" module is available to load.
          cpeGNU/21.12
        Help:
          Description
          ===========
          The Score-P measurement infrastructure is a highly scalable
          and easy-to-use tool suite for profiling, event tracing, and online analysis of
          HPC applications.
    
          More information
          ================
           - Homepage: http://www.score-p.org
  2. Use the command module avail only after a specific toolchain module ( cpeAMD , cpeCray , cpeGNU , cpeIntel ) has been loaded

    module load cpeGNU ; module avail Score-P
    $ module load cpeGNU/22.05   # will list only the tools compatible with the default cpeGNU
    
    $ module avail Score-P
    ---------------- /apps/eiger/UES/jenkins/1.4.0/modules/all/Toolchain/cpeGNU/21.12 ----------------
       Score-P/7.0

Interactive Computing with JupyterLab

JupyterLab is available on the system and can be ued as described in the dedicated page on JupyterLab in the CSCS Knowledge Base. For further information on creating Jupyter kernels from virtual environments or if you run into any issues using any of our tools, please contact us as described below.

Contact us

Please have a look at the article How to submit a support request on the CSCS Knowledge Base: after logging in with your credentials, you can select a request type that best matches your question, that will be directed to the team in charge and will help us react faster.
Please make sure to include all relevant information to help us address your request: kindly note that the request summary, description and project are mandatory fields for all request types, while the system is mandatory for some request types only.