Relevant for GH200 nodes

Overview

To share GPU card(s) between multiple MPI ranks on the same node, you currently need to manually start an Nvidia multi-process daemon call the CUDA Multi-Process Service (MPS). On previous machines this was enabled by setting the environment variable CRAY_CUDA_MPS, but this is no longer supported.

Wrapper Script

Use the following simple wrapper script to start the CUDA Multi-Process Service (MPS):

MPS wrapper script

#!/bin/bash
# Example mps-wrapper.sh usage:
# > srun --cpu-bind=socket [srun args] mps-wrapper.sh [cmd] [cmd args]

# only this path is supported by MPS
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log-$(id -un)
# Launch MPS from a single rank per node
if [[ $SLURM_LOCALID -eq 0 ]]; then
    CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d
fi

# set cuda device
numa_nodes=$(hwloc-calc --physical --intersect NUMAnode $(taskset -p $$ | awk '{print "0x"$6}'))
export CUDA_VISIBLE_DEVICES=$numa_nodes

# Wait for MPS to start
sleep 1
# Run the command
numactl --membind=$numa_nodes "$@"
result=$?
# Quit MPS control daemon before exiting
if [[ $SLURM_LOCALID -eq 0 ]]; then
    echo quit | nvidia-cuda-mps-control
fi
exit $result

Example Usage

An example of using the wrapper script above is provided in the following sample slurm script:

Batch submission script

#!/bin/bash -l
#SBATCH --job-name=<job_name>
#SBATCH --time=01:30:00 #HH:MM:SS
#SBATCH --nodes=2
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=32 #32 MPI ranks per node
#SBATCH --cpus-per-task=8 #8 OMP threads per rank
#SBATCH --account=<account> 
#SBATCH --hint=nomultithread 
#SBATCH --hint=exclusive

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MPICH_MALLOC_FALLBACK=1

ulimit -s unlimited

srun --cpu-bind=socket ./mps-wrapper.sh <code> <args>

Further Documentation

Full documentation on CUDA MPS is available here.

Content

Space Tools

Overview

Wrapper Script

Example Usage

Further Documentation

Content

Space Tools

Breadcrumbs

Oversubscription of GPU cards

Overview

Wrapper Script

Example Usage

Further Documentation