LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state. It can model atomic, polymeric, biological, metallic, granular, and coarse-grained systems using a variety of force fields and boundary conditions. The current version of LAMMPS is written in C++.

Licensing Terms and Conditions

LAMMPS is a freely-available open-source code, distributed under the terms of the GNU Public License.

ALPS (GH200)

Setup

On Alps, LAMMPS is precompiled and available in a user environment (uenv). LAMMPS has been built with kokkos.

To find which LAMMPS uenv is provided, you can use the following command:

uenv image find lammps
└── uenv image find lammps
uenv/version:tag uarch date id size
lammps/2024:v1 gh200 2024-06-03 3483b476b75a1801 3.6GB

To get and start the uenv for this specific version of LAMMPS, you can use

uenv image pull lammps/2024:v1
uenv start lammps/2024:v1

You can load the view from the uenv which contains the lmp executable. The executeable in both these views support GPUs:

#lammps+KOKKOS package
uenv view kokkos
#lammps+GPU package, kokkos disabled
uenv view gpu

A development view is also provided, which contains all libraries and command-line tools necessary to build LAMMPs from source, without including the LAMMPS executable:

uenv view develop

How to run lammps+kokkos

To start a job, 2 bash scripts are required:

  • A standard slurm submission script.

launch.sh
#!/bin/bash -l
#SBATCH --job-name=<job_name>
#SBATCH --time=01:00:00           # HH:MM:SS
#SBATCH --nodes=2                                                                        
#SBATCH --ntasks-per-node=4      # Number of MPI ranks per node, 1 MPI rank per GPU
#SBATCH --gres=gpu:4 #4 GPUs per node
#SBATCH --account=<account>
#SBATCH --uenv=lammps/2024:v1

export MPICH_GPU_SUPPORT_ENABLED=1
 
ulimit -s unlimited

uenv view kokkos
 
srun ./wrapper.sh lmp -in lj_kokkos.in -k on g 1 -sf kk -pk kokkos gpu/aware on
  • A wrapper to control CUDA_VISIBLE_DEVICES to allow multi-node jobs.

wrapper.sh
#!/bin/bash

export LOCAL_RANK=$SLURM_LOCALID
export GLOBAL_RANK=$SLURM_PROCID
export GPUS=(0 1 2 3)
export NUMA_NODE=$(echo "$LOCAL_RANK % 4" | bc)
export CUDA_VISIBLE_DEVICES=${GPUS[$NUMA_NODE]}

export MPICH_GPU_SUPPORT_ENABLED=1
 
numactl --cpunodebind=$NUMA_NODE --membind=$NUMA_NODE "$@"
  • As well as a LAMMPS input file.

lj_kokkos.in
# 3d Lennard-Jones melt
variable        x index 200
variable        y index 200
variable        z index 200
variable        t index 1000

variable        xx equal 1*$x
variable        yy equal 1*$y
variable        zz equal 1*$z

variable        interval equal $t/2

units           lj
atom_style      atomic/kk

lattice         fcc 0.8442
region          box block 0 ${xx} 0 ${yy} 0 ${zz}
create_box      1 box
create_atoms    1 box
mass            1 1.0

velocity        all create 1.44 87287 loop geom

pair_style      lj/cut/kk 2.5
pair_coeff      1 1 1.0 1.0 2.5

neighbor        0.3 bin
neigh_modify    delay 0 every 20 check no

fix             1 all nve

thermo          ${interval}
thermo_style custom step time  temp press pe ke etotal density
run_style       verlet/kk
run             $t

with the above scripts you can run a calculation on 2 nodes, using 8 GPUs with the command sbatch launch.sh. You may need to make the wrapper script executable with chmod +x mps-wrapper.sh. Also ensure to replace <account> with your CSCS account name.

How to run lammps+gpu

To start a job, 2 bash scripts are required:

  • A standard slurm submission script.

launch.sh
#!/bin/bash -l
#SBATCH --job-name=<job_name>
#SBATCH --time=01:00:00           # HH:MM:SS
#SBATCH --nodes=2                                                                        
#SBATCH --ntasks-per-node=32      # Number of MPI ranks per node, >=1 MPI rank per GPU
#SBATCH --gres=gpu:4 #4 GPUs per node
#SBATCH --account=<account>                                                                   
#SBATCH --uenv=lammps/2024:v1

export MPICH_GPU_SUPPORT_ENABLED=1
 
ulimit -s unlimited
 
uenv view gpu

srun ./mps-wrapper.sh lmp -sf gpu -pk gpu 4 -in lj.in
  • A wrapper to control the CUDA MPS daemon, if you wanted to allow oversubscription of MPI ranks-per-GPU. Unlike the KOKKOS package, this can yield some benefit with the GPU package.

mps-wrapper.sh
#!/bin/bash
# Example mps-wrapper.sh usage:
# > srun [srun args] mps-wrapper.sh [cmd] [cmd args]
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
# Launch MPS from a single rank per node
if [ $SLURM_LOCALID -eq 0 ]; then
    CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d
fi
# Wait for MPS to start
sleep 5
# Run the command
"$@"
# Quit MPS control daemon before exiting
if [ $SLURM_LOCALID -eq 0 ]; then
    echo quit | nvidia-cuda-mps-control
fi
  • As well as a lammps input file.

lj.in
# 3d Lennard-Jones melt
variable        x index 200
variable        y index 200
variable        z index 200
variable        t index 1000

variable        xx equal 1*$x
variable        yy equal 1*$y
variable        zz equal 1*$z

variable        interval equal $t/2

units           lj
atom_style      atomic

lattice         fcc 0.8442
region          box block 0 ${xx} 0 ${yy} 0 ${zz}
create_box      1 box
create_atoms    1 box
mass            1 1.0

velocity        all create 1.44 87287 loop geom

pair_style      lj/cut 2.5
pair_coeff      1 1 1.0 1.0 2.5

neighbor        0.3 bin
neigh_modify    delay 0 every 20 check no

fix             1 all nve

thermo          ${interval}
thermo_style custom step time  temp press pe ke etotal density
run_style       verlet
run             $t


Using lammps uenv as upstream Spack instances

if you'd like to extend the existing uenv with additional packages (or your own), you can use the provide lammps uenv to provide all dependencies needed to build your customization. See https://eth-cscs.github.io/alps-uenv/uenv-compilation-spack/ for more information.

First, set up an environment:

uenv start lammps/2024:v1

git clone -b v0.22.0 https://github.com/spack/spack.git
source spack/share/spack/setup-env.sh
export SPACK_SYSTEM_CONFIG_PATH=/user-environment/config/

Then create the path and file env/spack.yaml. We'll disable the KOKKOS package (and enable the GPU package via +cuda spec), and add the CG-SPICA package (via the +cg-spica spec) as an example. You can get the full list of options here: https://packages.spack.io/package.html?name=lammps

spack:
  specs:
  - lammps@20240417 ~kokkos +cuda cuda_arch=90 +python +extra-dump +cuda_mps +cg-spica
  packages:
    all:
      prefer:
        - +cuda cuda_arch=90
    mpi:
      require: cray-mpich +cuda
  view: true
  concretizer:
    unify: true

Then concretize and build (note, you will of course be using a different path):

spack -e /capstor/scratch/cscs/browning/SD-61924/env/ concretize -f
spack -e /capstor/scratch/cscs/browning/SD-61924/env/ install

During concretization, you'll notice a hash being printed alongside the lammps package name. Take note of this hash. If you now try to load lammps:

# naively try to load lammps
# it shows two versions installed (the one in the uenv, and the one we just built)
spack load lammps
==> Error: lammps matches multiple packages.
  Matching packages:
    rd2koe3 lammps@20240207.1%gcc@12.3.0 arch=linux-sles15-neoverse_v2
    zoo2p63 lammps@20240207.1%gcc@12.3.0 arch=linux-sles15-neoverse_v2
  Use a more specific spec (e.g., prepend '/' to the hash).
# use the hash thats listed in the output of the build
# and load using the hash
spack load /zoo2p63
# check the lmp executable:
which lmp
/capstor/scratch/cscs/browning/SD-61924/spack/opt/spack/linux-sles15-neoverse_v2/gcc-12.3.0/lammps-20240417-zoo2p63rzyuleogzn4a2h6yj7u3vhyy2/bin/lmp

You should now see that the CG-SPICA package in the list of installed packages:

> lmp -h
...
Installed packages:

CG-SPICA GPU KSPACE MANYBODY MOLECULE PYTHON RIGID

 

Scaling

Scaling tests were performed using a simple Lennard-Jones potential on 32M particles. Each GPU is assigned to a single MPI-rank, with GPU-direct enabled.

Single Node Performance

lammps_single_node.png

Multiple Node Performance

lammps_multi_node.png

Piz Daint

Setup

You can see a list of the available versions of the program installed on the machine after loading the gpu or multicore modulefile. In the examples below we use the daint-gpu modulefile:

module load daint-gpu
module avail LAMMPS

The following module command will load the environment of the default version of the program:

module load LAMMPS

You can either type this command every time you intend to use the program within a new session, or you can automatically load it by including it in your shell configuration file.

The following module commands will print the environment variables set by loading the program and a help message:

module show LAMMPS
module help LAMMPS

How to Run

The following job script asks for 64 nodes, using 8 MPI tasks per node and 1 OpenMP thread per MPI task. If you use multiple MPI tasks per node, you need to set CRAY_CUDA_MPS=1 to enable the tasks to access the GPU device on each node at the same time.

#!/bin/bash -l
#
# LAMMPS on Piz Daint: 64 nodes, 8 MPI tasks per node, 1 OpenMP thread per task, no hyperthreading
#
#SBATCH --job-name=lammps
#SBATCH --time=01:00:00
#SBATCH --nodes=64
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread
#SBATCH --constraint=gpu
#SBATCH --account=<project>
#========================================
# load modules and run simulation
module load daint-gpu
module load LAMMPS
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export CRAY_CUDA_MPS=1
srun lmp_mpi -sf gpu -in input.in

Please replace the string <project> with the ID of the active project that will be charged for the allocation.

Please note that if you want to use the hybrid MPI+OpenMP version of LAMMPS you need to load the daint-mc module and use the lmp_omp executable

Scaling

We provide a LAMMPS scaling example simulating the NPT dynamics of a Lennard-Jones gas at 300 K and 1 bar pressure. The input file can be downloaded here.

For any given number of nodes, the best performance was achieved with 8 or 12 MPI tasks, and one thread per task. In the following chart, we show the highest performance achieved for the given number of nodes. The ns/day of each job is retrieved from the performance section of the LAMMPS output file; the relative speed-up is computed taking the runtime with two nodes as a reference value. We reach the ~50% limit in parallel efficiency running on 128 nodes in this example.

The scaling data are reported in the table below:

Nodesns/daySpeed-up
21.641.00
42.971.80
85.483.34
169.175.59
3214.38.69
6419.911.7
12828.617.4

Strong scaling results are plotted against ideal scaling as follows:

Further Documentation

LAMMPS Homepage

LAMMPS Online Manual