To use GPU-aware MPI on Alps systems one must use Cray MPICH available through user environments. Most user environments provided by CSCS for GPU systems link to the libraries required to use GPU-aware MPI.

In order to tell Cray MPICH to actually enable GPU-aware MPI support at runtime, you must ensure that the MPICH_GPU_SUPPORT_ENABLED=1 environment variable is set, e.g. by exporting it in a SLURM batch script.

If you attempt to communicate GPU buffers through MPI without setting MPICH_GPU_SUPPORT_ENABLED=1, it will lead to segmentation faults, usually without any indication that it is the communication that fails.

To check if your application is linked against the required library, running ldd on your executable should print something similar to:

$ ldd myexecutable | grep gtl
        libmpi_gtl_cuda.so => /user-environment/linux-sles15-neoverse_v2/gcc-13.2.0/cray-gtl-8.1.30-fptqzc5u6t4nals5mivl75nws2fb5vcq/lib/libmpi_gtl_cuda.so (0x0000ffff82aa0000)

 The path may be different, but the libmpi_gtl_cuda.so library should be printed when using CUDA. In ROCm environments the libmpi_gtl_hsa.so should be linked.

  • No labels