NAMD is a parallel molecular dynamics code based on Charm++ designed for high-performance simulation of large biomolecular systems.
NAMD Single Node
Currently, NAMD is provided without MPI support and therefore it can run only on a single node, and take advantage of the new GPU-resident mode.
Licensing Terms and Conditions
ALPS (GH200)
You can obtain NAMD's Spack stack as follows
# List available images uenv image find namd # Pull the image of interest uenv image pull namd/3.0b6:latest # Start uenv uenv start namd/3.0b6:latest
NAMD and its main dependencies are conveniently provided as modules. Therefore, once you started the user environment, you can simply run
uenv modules use module load namd
in order to have the NAMD executable available.
Single-node, single- or multi-GPU
The single-node build works on a single node and benefits from the new GPU-resident mode (see NAMD 3.0b6 GPU-Resident benchmarking results for more details).
srun -N 1 -n 1 namd3 +p 8 +setcpuaffinity +devices 0 stmv_gpures_nve.namd srun -N 1 -n 1 namd3 +p 15 +pmepes 7 +setcpuaffinity +devices 0,1 stmv_gpures_nve.namd srun -N 1 -n 1 namd3 +p 29 +pmepes 5 +setcpuaffinity +devices 0,1,2,3 stmv_gpures_nve.namd
Scaling of the tobacco mosaic virus (STMV) benchmark with GPU-resident mode on our system is the following:
GPUs | ns/day | Speed-Up | Parallel efficiency |
---|---|---|---|
1 | 31.1463 | - | - |
2 | 53.6525 | 1.72 | 86% |
4 | 92.6927 | 2.98 | 74% |
The official NAMD 3.0b6 GPU-Resident benchmarking results provide results for A100 GPUs as well as the older GPU-offload mode. The following graphs compares results on A100 (official benchmarks) and GH200 (our results) for both GPU-resident and GPU-offload modes.
Piz Daint
Setup
You can see a list of the available versions of the program installed on the machine after loading the gpu or multicore module, by typing:
module load daint-gpu module avail NAMD
for the GPU version or
module load daint-mc module avail NAMD
for the multicore one. The previous set of commands will show the GPU or multicore enabled modules of the applications. The following module command will then load the environment of the default version of the program:
module load NAMD
You can either type this command every time you intend to use the program within a new session, or you can automatically load it by including it in your shell configuration file.
The following module commands will print the environment variables set by loading the program and a help message:
module show NAMD module help NAMD
How to Run
The CUDA-enabled version of NAMD is installed on Daint. When using this version you should set outputEnergies to 100 or higher in the simulation config file, as outputting energies from the GPU is slower compared to the CPU, and you should add +idlepoll
to the command line in order to poll the GPU for results rather than sleeping while idle. Note that some features are unavailable in the CUDA build, including alchemical free energy perturbation and the Lowe-Andersen thermostat.
The GPU code in NAMD is relatively new (introduced first in NAMD 2.7), and forces evaluated on the GPU differ slightly from a CPU-only calculation, so you should test your simulations well before launching production runs.
Note that multiple NAMD processes (or threads) can share the same GPU, and thus it is possible to run with multiple processes per node (see below).
The following job script asks for 16 nodes, using 1 MPI task per node and 24 threads per MPI task with hyperthreading turned on. If you use more than one MPI task per node you will need to set CRAY_CUDA_MPS=1
to enable the tasks to access the GPU device on each node at the same time.
#!/bin/bash -l # # NAMD on Piz Daint # # 32 nodes, 1 MPI task per node, 24 OpenMP threads per task with hyperthreading (--ntasks-per-core=2) # #SBATCH --job-name="namd" #SBATCH --time=00:30:00 #SBATCH --nodes=16 #SBATCH --ntasks-per-core=2 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=24 #SBATCH --constraint=gpu #======================================== # load modules and run simulation module load daint-gpu module load NAMD export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun namd2 +idlepoll +ppn $[SLURM_CPUS_PER_TASK-1] input.namd > namd.out
Scaling
We provide a NAMD scaling example simulating the dynamics of the tobacco mosaic virus (STMV).
We run the scaling jobs with the constraint gpu
on the Cray XC50, using 1 MPI task per node and 24 threads per task. The performance matrics is the average of the values reported in days/ns
in the output file of each simulation.
Running on 16 nodes with this small example the parallel efficiency is around 50%, namely the limit for this scaling indicator, while on 32 nodes the efficiency is ~ 30%.
The scaling data are reported in the table below:
Nodes | Days/ns | Speed-up |
---|---|---|
2 | 0.286 | 1.00 |
4 | 0.185 | 1.55 |
8 | 0.115 | 2.49 |
16 | 0.071 | 4.03 |
32 | 0.061 | 4.69 |
Strong scaling results are plotted against ideal scaling as follows: