Page History
...
Sarus is available from /usr/bin
on Eiger, while you should load daint-gpu
or daint-mc
before loading the sarus
modulefile on Piz Daint as shown in the example below:
Code Block | ||||
---|---|---|---|---|
| ||||
module load daint-gpu # or daint-mc
module load sarus |
The previous set of commands will load the GPU (or multicore) enabled software stack on Piz Daint and then load the environment of the default version of the program. You can either type this command every time you intend to use the program within a new session, or you can automatically load it by including it in your shell configuration file. The following module commands will print the environment variables set by loading the program and a help message:
Code Block | ||||
---|---|---|---|---|
| ||||
module show sarus
module help sarus |
...
title | Note for Eiger users |
---|
...
: the version of Sarus on Eiger has been configured to work with the default CPE. It is recommended that you use a container image with MPICH 3.3.x. Please note that due to a limitation of cray-mpich
when using PID namespace, if you use the native MPI hook (--mpi
flag), you will only be able to use one rank per node. To overcome this limitation, please set MPICH_NOLOCAL=
1 when submitting a job with multiple ranks per node, e.g.:
Code Block | ||
---|---|---|
| ||
MPICH_NOLOCAL=1 srun -N1 --ntasks-per-node=2 sarus run --mpi ethcscs/osu-mb:5.3.2-mpich3.1.4 ./osu_latency |
...
For more instructions, kindly refer to the section How to run on Piz Daint below.
How to
...
run
Anchor | ||
---|---|---|
|
...
|
...
Here we provide essential instructions and system-specific information about Sarus on Piz Daint. For the full details about Sarus commands, options and features, please refer to the official User Guide on Read the Docs.
...
Code Block | ||||
---|---|---|---|---|
| ||||
srun -C gpumc sarus pull debian:latest |
We strongly recommend to run the sarus pull
command on the compute nodes through Slurm, so that Sarus can take advantage of their large RAM filesystem, which will greatly reduce the pull process time and will allow to pull larger images. Should you run into problems because the pulled image doesn't fit in the default filesystem, you can specify an alternative temporary directory with the --temp-dir
option. When pulling images using an interactive terminal, we advise to use the --pty
option of the srun
command to improve the quality and clarity of the terminal output.
...
Info |
---|
IMPORTANT: Please be aware that the local Sarus repository is individual for each user, and on Piz Daint is located personal inside the |
Sarus tries to closely follow Docker's command-line interface. To remove images you no longer need and recover disk space, use the sarus rmi
command:
...
Code Block | ||||
---|---|---|---|---|
| ||||
srun -C gpumc sarus run python:3-slim python --version Python 3.7.4 srun -C gpu sarus run debian cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 10 (buster)" NAME="Debian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)" VERSION_CODENAME=buster ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" |
Accessing host directories from the container
On Piz Daint host Host filesystems are not automatically mounted inside containers.
...
Code Block | ||||
---|---|---|---|---|
| ||||
# Mount $SCRATCH srun -C gpumc sarus run --mount=type=bind,source=$SCRATCH,destination=$SCRATCH debian ls -l $SCRATCH # Mount $HOME srun -C gpumc sarus run --mount=type=bind,source=$HOME,destination=$HOME debian ls -l $HOME |
Info |
---|
Warning: Please be aware that mounting |
...
The following OCI hooks are enabled on Piz Daint:
Native MPI support (MPICH-based)
Containers with native MPI performance can be launched by passing the
--mpi
option to thesarus run
command, e.g.:Code Block language bash theme RDark srun -N16 -n16 -C gpumc sarus run --mpi <repo name>/<image name>:<image tag> <mpi_application>
In order to access the high-speed Cray Aries interconnect, the container application must be dynamically linked to an MPI implementation that is ABI-compatible with the compute node's MPI on Piz Daint. We recommend one of the following MPI implementations:
- MPICH v3.1.4 (Feburary 2015)
- MVAPICH2 2.2 (September 2016)
- Intel MPI Library 2017 Update 1
- .1.4 (February 2015)
- MVAPICH2 2.2 (September 2016)
- Intel MPI Library 2017 Update 1
...
- the software stack in the container should not be altered in any way
- non-performance-critical testing
- impossibility to satisfy ABI compatibility for native hardware acceleration
On Piz Daint, the The default process manager interface used by Slurm is the Cray PMI. It is possible to select the PMI-2 interface with the --mpi=pmi2
option to srun
. PMI-2 is adopted by MPICH and MPICH-derived implementations, while OpenMPI has to be configured explicitly at build-time to support it.
The following example shows how to run the OSU point-to-point latency test from the Sarus cookbook on Piz Daint without cookbook without native interconnect support:
Code Block | ||||
---|---|---|---|---|
| ||||
srun -C gpumc -N2 -t2 --mpi=pmi2 sarus run ethcscs/mpich:ub1804_cuda92_mpi314_osu ./osu_latency ###MPI-3.0 # OSU MPI Latency Test v5.6.1 # Size Latency (us) 0 6.66 1 6.81 2 6.88 4 6.88 8 6.85 16 6.79 32 6.88 64 6.86 128 6.85 256 6.84 512 6.66 1024 9.14 2048 10.03 4096 10.49 8192 11.21 16384 12.85 32768 16.11 65536 26.95 131072 51.95 262144 77.97 524288 128.89 1048576 229.30 2097152 432.25 4194304 839.49 |
...
Code Block | ||||
---|---|---|---|---|
| ||||
srun -C gpumc -p debug -N2 -t2 sarus run --mpi ethcscs/mpich:ub1804_cuda92_mpi314_osu ./osu_latency ###MPI-3.1 # OSU MPI Latency Test v5.6.1 # Size Latency (us) 0 1.15 1 1.12 2 1.10 4 1.09 8 1.10 16 1.10 32 1.10 64 1.10 128 1.11 256 1.12 512 1.15 1024 1.39 2048 1.67 4096 2.27 8192 4.16 16384 5.03 32768 6.65 65536 9.98 131072 16.64 262144 29.94 524288 56.40 1048576 109.25 2097152 216.19 4194304 430.82 |
...
Known issues
- When running on a Piz Daint compute node, using the
--workdir
option to set the initial container directory to a location bind mounted from a subdirectory of the user's home directory results in an error. For example, the following command will encounter the aforementioned issue:srun -C gpu -A csstaff c mc sarus run --mount=type=bind,source=$HOME/subdir,destination=/cwd --workdir /cwd alpine pwd
...