Page History
It is possible to use the Singularity container platform on Piz Daint. In order to make singularity
available to your environment, you should use the following command:
module load singularity/3.6.4
Singularity can run GPU-enabled and MPI containers. CSCS offers an additional modulefile customized for Piz Daint, in order to fully exploit the system features. The module defines the necessary environment variables so that required host system directories are mounted inside the container by singularity. Furthermore, the LD_LIBRARY_PATH is set so that the necessary dynamic libraries are available at runtime. Using the CSCS provided module, the MPI installed in the container image is replaced by the one of the host (Piz Daint) which takes advantage of the high-speed Cray Aries interconnect. The aforementioned module can be loaded as follows:
module load daint-gpu # or daint-mc
module load singularity/3.6.4-daint
The following requirements have to be met by the container images:
- For GPU-enabled containers, the version of CUDA inside the container has to be supported by the Nvidia driver of the host
- For MPI-enabled containers, the application inside the container must be dynamically linked to an MPI version that is ABI-compatible with the host MPI
Pulling container images from a registry
Singularity allows you to pull images available in container registries such as DockerHub and Singularity Hub. For example, we can pull the latest Ubuntu image from DockerHub:
srun -C gpu --account=<project> singularity pull docker://ubuntu:latest
Please replace the string with the ID of the active project that will be charged for the allocation. The output of the build is:
INFO: Creating SIF file...
INFO: Build complete: ubuntu_latest.sif
Then we can check the version of Ubuntu within the container by running:
srun -C gpu --account=<project> singularity exec ubuntu_latest.sif cat /etc/os-release
which prints:
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
IMPORTANT: Running containers is only allowed from the /scratch filesystem.
Building container images on your local computer
In order to build container images from singularity definition files, root privileges are required. Therefore, it is not possible to use singularity to build your images on Piz Daint. The suggested method in this case is to build the image using your local computer and then transfer the resulting image to Piz Daint.
Running a GPU-enabled container
In this example we are using the following singularity definition file, cuda_device_query.def to build a container image with singularity:
Bootstrap: docker
From: nvidia/cuda:10.2-devel
%post
apt-get update
apt-get install -y git
git clone https://github.com/NVIDIA/cuda-samples.git /usr/local/cuda_samples
cd /usr/local/cuda_samples
git fetch origin --tags
git checkout v10.2
cd Samples/deviceQuery && make
%runscript
/usr/local/cuda_samples/Samples/deviceQuery/deviceQuery
Based on the cuda_device_query.def definition file given above, we can build the image cuda_device_query.sif using singularity on our local computer:
sudo singularity build cuda_device_query.sif cuda_device_query.def
The final lines of the above command output should look like this:
INFO: Adding runscript
INFO: Creating SIF file...
INFO: Build complete: cuda_device_query.sif
The above command will produce the cuda_device_query.sif image which can be transferred to Piz Daint (e.g. using scp). Then on Piz Daint, the following commands are used to run the created image:
# load the corresponding modules if not already loaded
module load daint-gpu
module load singularity # or singularity/3.6.4-daint
# run singularity using cuda_device_query.sif
srun -C gpu --account=<project> singularity run --nv cuda_device_query.sif
The output of the above commands is the following:
/usr/local/cuda_samples/Samples/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla P100-PCIE-16GB"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 6.0
Total amount of global memory: 16281 MBytes (17071734784 bytes)
(56) Multiprocessors, ( 64) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1329 MHz (1.33 GHz)
Memory Clock rate: 715 Mhz
Memory Bus Width: 4096-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Exclusive Process (many threads in one process is able to use ::cudaSetDevice() with this device) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
IMPORTANT: In order to run a CUDA-enabled container, the
--nv
option has to be passed tosingularity run
. According to this option, singularity is going to setup the container environment to use the NVIDIA GPU and the basic CUDA libraries.
Running an MPI enabled container:
The following singularity definition file mpi_osu.def can be used to build a container with the osu benchmarks using mpi:
bootstrap: docker
from: debian:jessie
%post
# Install software
apt-get update
apt-get install -y file g++ gcc gfortran make gdb strace realpath wget ca-certificates --no-install-recommends
# Install mpich
wget -q http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz
tar xf mpich-3.1.4.tar.gz
cd mpich-3.1.4
./configure --disable-fortran --enable-fast=all,O3 --prefix=/usr
make -j$(nproc)
make install
ldconfig
# Build osu benchmarks
wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.3.2.tar.gz
tar xf osu-micro-benchmarks-5.3.2.tar.gz
cd osu-micro-benchmarks-5.3.2
./configure --prefix=/usr/local CC=$(which mpicc) CFLAGS=-O3
make
make install
cd ..
rm -rf osu-micro-benchmarks-5.3.2
rm osu-micro-benchmarks-5.3.2.tar.gz
%runscript
/usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
sudo singularity build mpi_osu.sif mpi_osu.def
The final lines of the above command output should look like this:
INFO: Adding runscript
INFO: Creating SIF file...
INFO: Build complete: mpi_osu.sif
The above command will produce the mpi_osu.sif image which can be transferred to Piz Daint. Then on Piz Daint, the following commands are used to run the created image:
# load the corresponding modules if not already loaded
module load daint-gpu # or daint-mc
module load singularity/3.6.4-daint
# run mpi_osu.sif with singularity using 2 compute nodes
srun -C gpu -N2 --account=<project> singularity run mpi_osu.sif
The output of the above command should resemble the following:
# OSU MPI Bandwidth Test v5.3.2
# Size Bandwidth (MB/s)
1 1.55
2 3.05
4 6.26
8 12.60
16 24.30
32 49.68
64 103.10
128 190.24
256 400.60
512 778.16
1024 1048.91
2048 1708.01
4096 2459.37
8192 5669.78
16384 8452.91
32768 8916.47
65536 7097.49
131072 9194.56
262144 9614.99
524288 9151.01
1048576 9783.56
2097152 9545.63
4194304 9917.88
Running an MPI enabled container without binding the MPI of the host:
It is possible to run an MPI container without replacing the container's MPI with the host one. In order to do so, we have to instruct Slurm to use the PMI-2 process management interface. Furthermore, the container's MPI has to be configured with PMI-2 enabled. Therefore, the container image used in the previous example can be run as follows:
# load the corresponding modules if not already loaded
module load daint-gpu # or daint-mc
module load singularity/3.6.4
# run using 2 compute nodes using the PMI-2 interface
srun --mpi=pmi2 -C gpu -N2 --account=<project> singularity run mpi_osu.sif
The output of the above command should resemble the following:
# OSU MPI Bandwidth Test v5.3.2
# Size Bandwidth (MB/s)
1 0.44
2 0.87
4 1.75
8 3.49
16 7.02
32 14.12
64 27.69
128 55.51
256 110.65
512 161.96
1024 181.88
2048 355.33
4096 678.33
8192 1328.71
16384 2440.92
32768 3277.84
65536 4343.05
131072 4139.05
262144 4596.31
524288 4888.90
1048576 5094.95
2097152 5149.54
4194304 5180.42
IMPORTANT: To use the container's MPI, the
singularity/3.6.4
module should be loaded instead of thesingularity/3.6.4-daint
one.
Additional information
Please consult the official Singularity documentation for additional information.
Debugging
Debugging support is provided by the DDT Debugger.