LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state. It can model atomic, polymeric, biological, metallic, granular, and coarse-grained systems using a variety of force fields and boundary conditions. The current version of LAMMPS is written in C++.
Licensing Terms and Conditions
LAMMPS is a freely-available open-source code, distributed under the terms of the GNU Public License.
ALPS (GH200)
Setup
On Alps, LAMMPS is precompiled and available in a user environment (uenv). LAMMPS has been built with kokkos.
To find which LAMMPS uenv is provided, you can use the following command:
uenv image find lammps └── uenv image find lammps uenv/version:tag uarch date id size lammps/2024:v1 gh200 2024-06-03 3483b476b75a1801 3.6GB
To get and start the uenv for this specific version of LAMMPS, you can use
uenv image pull lammps/2024:v1 uenv start lammps/2024:v1
You can load the view from the uenv which contains the lmp executable. The executeable in both these views support GPUs:
#lammps+KOKKOS package uenv view kokkos #lammps+GPU package, kokkos disabled uenv view gpu
A development view is also provided, which contains all libraries and command-line tools necessary to build LAMMPs from source, without including the LAMMPS executable:
uenv view develop
How to run lammps+kokkos
To start a job, 2 bash scripts are required:
- A standard slurm submission script.
#!/bin/bash -l #SBATCH --job-name=<job_name> #SBATCH --time=01:00:00 # HH:MM:SS #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 # Number of MPI ranks per node, 1 MPI rank per GPU #SBATCH --gres=gpu:4 #4 GPUs per node #SBATCH --account=<account> #SBATCH --uenv=lammps/2024:v1 export MPICH_GPU_SUPPORT_ENABLED=1 ulimit -s unlimited uenv view kokkos srun ./wrapper.sh lmp -in lj_kokkos.in -k on g 1 -sf kk -pk kokkos gpu/aware on
- A wrapper to control CUDA_VISIBLE_DEVICES to allow multi-node jobs.
#!/bin/bash export LOCAL_RANK=$SLURM_LOCALID export GLOBAL_RANK=$SLURM_PROCID export GPUS=(0 1 2 3) export NUMA_NODE=$(echo "$LOCAL_RANK % 4" | bc) export CUDA_VISIBLE_DEVICES=${GPUS[$NUMA_NODE]} export MPICH_GPU_SUPPORT_ENABLED=1 numactl --cpunodebind=$NUMA_NODE --membind=$NUMA_NODE "$@"
- As well as a LAMMPS input file.
# 3d Lennard-Jones melt variable x index 200 variable y index 200 variable z index 200 variable t index 1000 variable xx equal 1*$x variable yy equal 1*$y variable zz equal 1*$z variable interval equal $t/2 units lj atom_style atomic/kk lattice fcc 0.8442 region box block 0 ${xx} 0 ${yy} 0 ${zz} create_box 1 box create_atoms 1 box mass 1 1.0 velocity all create 1.44 87287 loop geom pair_style lj/cut/kk 2.5 pair_coeff 1 1 1.0 1.0 2.5 neighbor 0.3 bin neigh_modify delay 0 every 20 check no fix 1 all nve thermo ${interval} thermo_style custom step time temp press pe ke etotal density run_style verlet/kk run $t
with the above scripts you can run a calculation on 2 nodes, using 8 GPUs with the command sbatch launch.sh
. You may need to make the wrapper script executable with chmod +x mps-wrapper.sh
. Also ensure to replace <account>
with your CSCS account name.
How to run lammps+gpu
To start a job, 2 bash scripts are required:
- A standard slurm submission script.
#!/bin/bash -l #SBATCH --job-name=<job_name> #SBATCH --time=01:00:00 # HH:MM:SS #SBATCH --nodes=2 #SBATCH --ntasks-per-node=32 # Number of MPI ranks per node, >=1 MPI rank per GPU #SBATCH --gres=gpu:4 #4 GPUs per node #SBATCH --account=<account> #SBATCH --uenv=lammps/2024:v1 export MPICH_GPU_SUPPORT_ENABLED=1 ulimit -s unlimited uenv view gpu srun ./mps-wrapper.sh lmp -sf gpu -pk gpu 4 -in lj.in
- A wrapper to control the CUDA MPS daemon, if you wanted to allow oversubscription of MPI ranks-per-GPU. Unlike the KOKKOS package, this can yield some benefit with the GPU package.
#!/bin/bash # Example mps-wrapper.sh usage: # > srun [srun args] mps-wrapper.sh [cmd] [cmd args] export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log # Launch MPS from a single rank per node if [ $SLURM_LOCALID -eq 0 ]; then CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d fi # Wait for MPS to start sleep 5 # Run the command "$@" # Quit MPS control daemon before exiting if [ $SLURM_LOCALID -eq 0 ]; then echo quit | nvidia-cuda-mps-control fi
- As well as a lammps input file.
# 3d Lennard-Jones melt variable x index 200 variable y index 200 variable z index 200 variable t index 1000 variable xx equal 1*$x variable yy equal 1*$y variable zz equal 1*$z variable interval equal $t/2 units lj atom_style atomic lattice fcc 0.8442 region box block 0 ${xx} 0 ${yy} 0 ${zz} create_box 1 box create_atoms 1 box mass 1 1.0 velocity all create 1.44 87287 loop geom pair_style lj/cut 2.5 pair_coeff 1 1 1.0 1.0 2.5 neighbor 0.3 bin neigh_modify delay 0 every 20 check no fix 1 all nve thermo ${interval} thermo_style custom step time temp press pe ke etotal density run_style verlet run $t
Using lammps uenv as upstream Spack instances
if you'd like to extend the existing uenv with additional packages (or your own), you can use the provide lammps uenv to provide all dependencies needed to build your customization. See https://eth-cscs.github.io/alps-uenv/uenv-compilation-spack/ for more information.
First, set up an environment:
uenv start lammps/2024:v1 git clone -b v0.22.0 https://github.com/spack/spack.git source spack/share/spack/setup-env.sh export SPACK_SYSTEM_CONFIG_PATH=/user-environment/config/
Then create the path and file env/spack.yaml. We'll disable the KOKKOS package (and enable the GPU package via +cuda spec), and add the CG-SPICA package (via the +cg-spica spec) as an example. You can get the full list of options here: https://packages.spack.io/package.html?name=lammps
spack: specs: - lammps@20240417 ~kokkos +cuda cuda_arch=90 +python +extra-dump +cuda_mps +cg-spica packages: all: prefer: - +cuda cuda_arch=90 mpi: require: cray-mpich +cuda view: true concretizer: unify: true
Then concretize and build (note, you will of course be using a different path):
spack -e /capstor/scratch/cscs/browning/SD-61924/env/ concretize -f spack -e /capstor/scratch/cscs/browning/SD-61924/env/ install
During concretization, you'll notice a hash being printed alongside the lammps package name. Take note of this hash. If you now try to load lammps:
# naively try to load lammps # it shows two versions installed (the one in the uenv, and the one we just built) spack load lammps ==> Error: lammps matches multiple packages. Matching packages: rd2koe3 lammps@20240207.1%gcc@12.3.0 arch=linux-sles15-neoverse_v2 zoo2p63 lammps@20240207.1%gcc@12.3.0 arch=linux-sles15-neoverse_v2 Use a more specific spec (e.g., prepend '/' to the hash). # use the hash thats listed in the output of the build # and load using the hash spack load /zoo2p63 # check the lmp executable: which lmp /capstor/scratch/cscs/browning/SD-61924/spack/opt/spack/linux-sles15-neoverse_v2/gcc-12.3.0/lammps-20240417-zoo2p63rzyuleogzn4a2h6yj7u3vhyy2/bin/lmp
You should now see that the CG-SPICA package in the list of installed packages:
> lmp -h ... Installed packages: CG-SPICA GPU KSPACE MANYBODY MOLECULE PYTHON RIGID
Scaling
Scaling tests were performed using a simple Lennard-Jones potential on 32M particles. Each GPU is assigned to a single MPI-rank, with GPU-direct enabled.
Single Node Performance
Multiple Node Performance
Piz Daint
Setup
You can see a list of the available versions of the program installed on the machine after loading the gpu or multicore modulefile. In the examples below we use the daint-gpu
modulefile:
module load daint-gpu module avail LAMMPS
The following module command will load the environment of the default version of the program:
module load LAMMPS
You can either type this command every time you intend to use the program within a new session, or you can automatically load it by including it in your shell configuration file.
The following module commands will print the environment variables set by loading the program and a help message:
module show LAMMPS module help LAMMPS
How to Run
The following job script asks for 64 nodes, using 8 MPI tasks per node and 1 OpenMP thread per MPI task. If you use multiple MPI tasks per node, you need to set CRAY_CUDA_MPS=1
to enable the tasks to access the GPU device on each node at the same time.
#!/bin/bash -l # # LAMMPS on Piz Daint: 64 nodes, 8 MPI tasks per node, 1 OpenMP thread per task, no hyperthreading # #SBATCH --job-name=lammps #SBATCH --time=01:00:00 #SBATCH --nodes=64 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=1 #SBATCH --hint=nomultithread #SBATCH --constraint=gpu #SBATCH --account=<project> #======================================== # load modules and run simulation module load daint-gpu module load LAMMPS export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK export CRAY_CUDA_MPS=1 srun lmp_mpi -sf gpu -in input.in
Please replace the string <project>
with the ID of the active project that will be charged for the allocation.
Please note that if you want to use the hybrid MPI+OpenMP version of LAMMPS you need to load the daint-mc
module and use the lmp_omp
executable
Scaling
We provide a LAMMPS scaling example simulating the NPT dynamics of a Lennard-Jones gas at 300 K and 1 bar pressure. The input file can be downloaded here.
For any given number of nodes, the best performance was achieved with 8 or 12 MPI tasks, and one thread per task. In the following chart, we show the highest performance achieved for the given number of nodes. The ns/day of each job is retrieved from the performance section of the LAMMPS output file; the relative speed-up is computed taking the runtime with two nodes as a reference value. We reach the ~50% limit in parallel efficiency running on 128 nodes in this example.
The scaling data are reported in the table below:
Nodes | ns/day | Speed-up |
---|---|---|
2 | 1.64 | 1.00 |
4 | 2.97 | 1.80 |
8 | 5.48 | 3.34 |
16 | 9.17 | 5.59 |
32 | 14.3 | 8.69 |
64 | 19.9 | 11.7 |
128 | 28.6 | 17.4 |
Strong scaling results are plotted against ideal scaling as follows: