You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.

VASP computes an approximate solution to the many-body Schrödinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order Møller-Plesset) are available in VASP.

In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method. To determine the electronic groundstate, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.

Licensing Terms and Conditions

Users are kindly asked to obtain their own license. CSCS cannot provide free access to the code and needs to inform VASP Software GmbH with an updated list of users. Therefore, access to precompiled VASP.6 executables and library files will be available only to users who have already purchased a VASP.6 license and upon request will become members of the CSCS unix group vasp6.  Please refer to the VASP web site for more information.


Alps (GH200)

How to run

A precompiled user environment containing VASP with MPI, OpenMP, OpenACC and Wannier90 support is available. Due to license restrictions, the VASP images are not directly accessible in the same way as other applications. A controlled access method has not yet been finalized. Please contact user support for access to VASP images.

To load the VASP user environment:

uenv start <path_to_vasp_image>
uenv view vasp

The vasp_std , vasp_ncl  and vasp_gam  executables are now available for use.

Any SLURM script for more than one task must include export MPICH_GPU_SUPPORT_ENABLED=1, since VASP relies on GPU aware MPI and the job will fail otherwise.


VASP uses GPU aware MPI features and optionally Nvidia NCCL for communication. NCCL provides improved communication, but it is disabled when using more than one task per GPU.

In some cases, using multiple tasks per GPU can be beneficial on the GH200 architecture, as it increases GPU utilization. This requires the use of CUDA MPS, which currently has to be launched manually before the application. The following is an example for running two tasks per GPU on a full node with four GPUs:


sbatch.sh
#!/bin/bash -l
#SBATCH --job-name=<job_name>
#SBATCH --time=01:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1                                                                        
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=16
#SBATCH --account=<account>                                                                             
#SBATCH --hint=exclusive
  
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export MPICH_GPU_SUPPORT_ENABLED=1
  
srun --cpu-bind=socket ./mps-wrapper.sh vasp_std


mps-wrapper.sh
#!/bin/bash
# Example mps-wrapper.sh usage:
# > srun [srun args] mps-wrapper.sh [cmd] [cmd args]
  
set -u
  
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
export CUDA_VISIBLE_DEVICES=$(( SLURM_LOCALID % 4 ))
# Launch MPS from a single rank per node
if [ $SLURM_LOCALID -eq 0 ]; then
    CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d
fi
# Wait for MPS to start sleep 5
sleep 5
# Run the command
exec "$@"
  
# Quit MPS control daemon before exiting
if [ $SLURM_LOCALID -eq 0 ]; then
    echo quit | nvidia-cuda-mps-control
fi

Scaling

On GH200, VASP typically doesn't scale well to large number of nodes. However, scaling behavior varies greatly between types of jobs. 

The following shows scaling of three different cases, where the runtime of a single task is used as reference.


Using more than one task per GPU can provide a benefit on one or two GPUs, but there are (currently) performance issues on higher number of GPUs, and in some cases VASP can hang during the execution. Therefore, in most cases it is recommended to limit a job to one task per GPU when using a full node with four GPUs or more.

Bulding VASP from Source

To build VASP from source, the develop view must first be loaded:

uenv start <path_to_vasp_image>
uenv view develop


All required dependencies can now be found in /user-environment/env/develop . Note that libraries might not be found when executing vasp, if the makefile does not include additional rpath linking options or LD_LIBRARY_PATH  has not been extended.

An example for a makefile.include on GH200:

makefile.include
# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENMP \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
FC          = mpif90 -acc -gpu=cc90,cuda12.2 -mp
FCL         = mpif90 -acc -gpu=cc90,cuda12.2 -mp -c++libs

FREE        = -Mfree

FFLAGS      = -Mbackslash -Mlarge_arrays

OFLAG       = -fast

DEBUG       = -Mfree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o minimax_dependence.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = nvfortran
CC_LIB      = nvc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = nvc++ --no_warnings

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
#
# NOTE: Using "-tp neoverse-v2" causes some tests to fail. On GH200 architecture, "-tp host"
# is recommended.
VASP_TARGET_CPU ?= -tp host
FFLAGS     += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT      =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')

# If the above fails, then NVROOT needs to be set manually
#NVHPC      ?= /opt/nvidia/hpc_sdk
#NVVERSION   = 21.11
#NVROOT      = $(NVHPC)/Linux_x86_64/$(NVVERSION)

## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
#OFLAG_IN   = -fast -Mwarperf
#SOURCE_IN  := nonlr.o

# Software emulation of quadruple precsion (mandatory)
QD         ?= $(NVROOT)/compilers/extras/qd
LLIBS      += -L$(QD)/lib -lqdmod -lqd -Wl,-rpath,$(QD)/lib
INCS       += -I$(QD)/include/qd

# BLAS (mandatory)
BLAS        = -lnvpl_blas_lp64_gomp -lnvpl_blas_core

# LAPACK (mandatory)
LAPACK      = -lnvpl_lapack_lp64_gomp -lnvpl_lapack_core

# scaLAPACK (mandatory)
SCALAPACK   = -lscalapack

LLIBS      += $(SCALAPACK) $(LAPACK) $(BLAS) -Wl,-rpath,/user-environment/env/develop/lib -Wl,-rpath,/user-environment/env/develop/lib64 -Wl,--disable-new-dtags

# FFTW (mandatory)
FFTW_ROOT  ?= /user-environment/env/develop
LLIBS      += -L$(FFTW_ROOT)/lib -lfftw3 -lfftw3_omp
INCS       += -I$(FFTW_ROOT)/include

# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT  ?= /user-environment/env/develop
LLIBS      += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS       += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
CPP_OPTIONS    += -DVASP2WANNIER90
WANNIER90_ROOT ?= /user-environment/env/develop
LLIBS          += -L$(WANNIER90_ROOT)/lib -lwannier

# For the fftlib library (recommended)
#CPP_OPTIONS+= -Dsysv
#FCL        += fftlib.o
#CXX_FFTLIB  = nvc++ -mp --no_warnings -std=c++11 -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(FFTW_ROOT)/include
#LIBS       += fftlib
#LLIBS      += -ldl


Piz Daint

Setup

You can see a list of the available versions of the program installed on the machine after loading the gpu or multicore modulefile. In the examples below we use the daint-gpu modulefile:

module load daint-gpu
module avail VASP

The following module command will load the environment of the default version of the program:

module load VASP

You can either type this command every time you intend to use the program within a new session, or you can automatically load it by including it in your shell configuration file.

The following module commands will print the environment variables set by loading the program and a help message:

module show VASP 
module help VASP  

How to Run on Piz Daint

The following job script asks for 16 nodes, using 1 MPI task per node, since The OpenACC version of VASP is currently limited to the use of 1 MPI-rank per GPU.


#!/bin/bash -l 
#
# VASP on Piz Daint: 16 nodes, 1 MPI task per node, 1 OpenMP thread per task 
# 
#SBATCH --job-name=vasp 
#SBATCH --time=00:30:00 
#SBATCH --nodes=16 
#SBATCH --ntasks-per-node=1 
#SBATCH --cpus-per-task=12 
#SBATCH --constraint=gpu 
#SBATCH --account=<project> 
#======================================== 
# load modules and run simulation 
module load daint-gpu 
module load VASP 
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
ulimit -s unlimited 
srun vasp_std 

Please replace the string <project> with the ID of the active project that will be charged for the allocation.

Simulations running on Piz Daint GPU nodes using the VASP OpenACC release built with PrgEnv-nvidia might get a warning by libibverbs with reference to libibgni (/usr/lib64/libibgni.so.1: undefined symbol: verbs_uninit_context) and rdma-core/22.3-7.0.2.1_2.44__g42f5f32b.ari. The warning will not affect running simulations and it has already been reported to HPE support, since it is due to the Cray Programming Environment.

Scaling on Piz Daint

We provide a VASP scaling example, simulating the geometry optimization of CeO2 ions. Input files are provided by Peter Larsson's VASP test suite:

INCAR
KPOINTS
POSCAR
POTCAR

We run the scaling jobs with the constraint gpu on the Cray XC50, using 1 MPI task per node. The wall time of each job is retrieved from the total CPU time reported in the VASP output file (grep "Total CPU" OUTCAR); the relative speed-up is computed taking the longest runtime as a reference value. We already reach on 4 nodes the ∼50% limit in parallel efficiency running this small example.

The scaling data are reported in the table below:

NodesWall time (s)Speed-up
11411.00
21101.28
4781.81
8622.27
16663.53

Strong scaling results are plotted against ideal scaling as follows:

Further Documentation

VASP Homepage

VASP User Guide

  • No labels