It is possible to use the Singularity container platform on Eiger and Piz Daint. In order to make singularity
available to your environment, you should use the following command:
# load singularity on Eiger ml cray singularity # load singularity on Piz Daint module load singularity
Singularity can run GPU-enabled and MPI containers. CSCS offers the additional modulefile singularity/3.6.4-daint
customized for Piz Daint, in order to fully exploit the system features. The module defines the necessary environment variables so that required host system directories are mounted inside the container by singularity. Furthermore, the LD_LIBRARY_PATH is set so that the necessary dynamic libraries are available at runtime. Using the CSCS provided module, the MPI installed in the container image is replaced by the one of the host (Piz Daint) which takes advantage of the high-speed Cray Aries interconnect. The aforementioned module can be loaded as follows:
module load daint-gpu # or daint-mc module load singularity/3.6.4-daint
The following requirements have to be met by the container images:
- For GPU-enabled containers, the version of CUDA inside the container has to be supported by the Nvidia driver of the host
- For MPI-enabled containers, the application inside the container must be dynamically linked to an MPI version that is ABI-compatible with the host MPI
Please note that running containers is only allowed from the /scratch filesystem
Pulling container images from a registry
Singularity allows you to pull images available in container registries such as DockerHub and Singularity Hub. For example, we can pull the latest Ubuntu image from DockerHub:
srun -A <project> -C gpu singularity pull docker://ubuntu:latest
Please replace the string <project>
with the ID of the active project that will be charged for the allocation. The output of the build should look like the following:
INFO: Creating SIF file... INFO: Build complete: ubuntu_latest.sif
Then we can check the version of Ubuntu within the container by running:
srun -A <project> -C gpu --account=<project> singularity exec ubuntu_latest.sif cat /etc/os-release
which prints:
NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
Building container images on your local computer
In order to build container images from singularity definition files, root privileges are required. Therefore, it is not possible to use singularity to build your images on CSCS systems. The suggested method in this case is to build the image using your local computer and then transfer the resulting image to CSCS systems using the Data Transfer service.
Running a GPU-enabled container
In this example we are using the following singularity definition file, cuda_device_query.def to build a container image with singularity:
Bootstrap: docker From: nvidia/cuda:10.2-devel %post apt-get update apt-get install -y git git clone https://github.com/NVIDIA/cuda-samples.git /usr/local/cuda_samples cd /usr/local/cuda_samples git fetch origin --tags git checkout v10.2 cd Samples/deviceQuery && make %runscript /usr/local/cuda_samples/Samples/deviceQuery/deviceQuery
Based on the cuda_device_query.def definition file given above, we can build the image cuda_device_query.sif using singularity on your local workstation with root access via sudo
:
sudo singularity build cuda_device_query.sif cuda_device_query.def
The final lines of the above command output should look like this:
INFO: Adding runscript INFO: Creating SIF file... INFO: Build complete: cuda_device_query.sif
The command will create the image file cuda_device_query.sif which can be transferred using the Data Transfer service to the CSCS system. For instance, the following commands are used to run the image on Piz Daint gpu nodes after loading the singularity
module as explained above:
# load the sungularity modules if not already loaded module load daint-gpu module load singularity/3.6.4-daint # run singularity using cuda_device_query.sif srun -A<project> -C gpu singularity run --nv cuda_device_query.sif
The output of the command above is the following:
/usr/local/cuda_samples/Samples/deviceQuery/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Tesla P100-PCIE-16GB" CUDA Driver Version / Runtime Version 10.2 / 10.2 CUDA Capability Major/Minor version number: 6.0 Total amount of global memory: 16281 MBytes (17071734784 bytes) (56) Multiprocessors, ( 64) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1329 MHz (1.33 GHz) Memory Clock rate: 715 Mhz Memory Bus Width: 4096-bit L2 Cache Size: 4194304 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0 Compute Mode: < Exclusive Process (many threads in one process is able to use ::cudaSetDevice() with this device) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1 Result = PASS
In order to run a CUDA-enabled container, the --nv option has to be passed to singularity run. According to this option, singularity is going to setup the container environment to use the NVIDIA GPU and the basic CUDA
Running an MPI enabled container:
The following singularity definition file mpi_osu.def can be used to build a container with the osu benchmarks using mpi:
bootstrap: docker from: debian:jessie %post # Install software apt-get update apt-get install -y file g++ gcc gfortran make gdb strace realpath wget ca-certificates --no-install-recommends # Install mpich wget -q http://www.mpich.org/static/downloads/3.1.4/mpich-3.1.4.tar.gz tar xf mpich-3.1.4.tar.gz cd mpich-3.1.4 ./configure --disable-fortran --enable-fast=all,O3 --prefix=/usr make -j$(nproc) make install ldconfig # Build osu benchmarks wget -q http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.3.2.tar.gz tar xf osu-micro-benchmarks-5.3.2.tar.gz cd osu-micro-benchmarks-5.3.2 ./configure --prefix=/usr/local CC=$(which mpicc) CFLAGS=-O3 make make install cd .. rm -rf osu-micro-benchmarks-5.3.2 rm osu-micro-benchmarks-5.3.2.tar.gz %runscript /usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
Then build the image mpi_osu.sif
using singularity on your local workstation with root access via sudo
as already shown above:
sudo singularity build mpi_osu.sif mpi_osu.def
The final lines of the output of the command above should look like this:
INFO: Adding runscript INFO: Creating SIF file... INFO: Build complete: mpi_osu.sif
The command will create the image mpi_osu.sif which can be transferred to the CSCS system. For instance on Piz Daint, the following commands are used to run the created image:
# load the corresponding modules if not already loaded module load daint-gpu # or daint-mc module load singularity/3.6.4-daint # run mpi_osu.sif with singularity using 2 compute nodes srun -C gpu -N2 --account=<project> singularity run mpi_osu.sif
The output of the above command should resemble the following:
# OSU MPI Bandwidth Test v5.3.2 # Size Bandwidth (MB/s) 1 1.55 2 3.05 4 6.26 8 12.60 16 24.30 32 49.68 64 103.10 128 190.24 256 400.60 512 778.16 1024 1048.91 2048 1708.01 4096 2459.37 8192 5669.78 16384 8452.91 32768 8916.47 65536 7097.49 131072 9194.56 262144 9614.99 524288 9151.01 1048576 9783.56 2097152 9545.63 4194304 9917.88
Running an MPI enabled container without binding the MPI of the host
It is possible to run an MPI container without replacing the container's MPI with the host one. In order to do so, we have to instruct Slurm to use the PMI-2 process management interface. Furthermore, the container's MPI has to be configured with PMI-2 enabled. Therefore, the container image used in the previous example can be run as follows on Piz Daint:
# load the corresponding modules if not already loaded module load daint-gpu # or daint-mc module load singularity/3.6.4-daint # run using 2 compute nodes using the PMI-2 interface srun --mpi=pmi2 -A<project> -C gpu -N2 singularity run mpi_osu.sif
The output of the command above should resemble the following:
# OSU MPI Bandwidth Test v5.3.2 # Size Bandwidth (MB/s) 1 0.44 2 0.87 4 1.75 8 3.49 16 7.02 32 14.12 64 27.69 128 55.51 256 110.65 512 161.96 1024 181.88 2048 355.33 4096 678.33 8192 1328.71 16384 2440.92 32768 3277.84 65536 4343.05 131072 4139.05 262144 4596.31 524288 4888.90 1048576 5094.95 2097152 5149.54 4194304 5180.42
In order to use the MPI library of the container, the module singularity/3.6.4
should be loaded instead of singularity/3.6.4-daint
Additional information
Please consult the official Singularity documentation for additional information.
Debugging
Debugging support is provided by the DDT debugger.