The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.
VASP computes an approximate solution to the many-body Schrödinger equation, either within density functional theory (DFT), solving the Kohn-Sham equations, or within the Hartree-Fock (HF) approximation, solving the Roothaan equations. Hybrid functionals that mix the Hartree-Fock approach with density functional theory are implemented as well. Furthermore, Green's functions methods (GW quasiparticles, and ACFDT-RPA) and many-body perturbation theory (2nd-order Møller-Plesset) are available in VASP.
In VASP, central quantities, like the one-electron orbitals, the electronic charge density, and the local potential are expressed in plane wave basis sets. The interactions between the electrons and ions are described using norm-conserving or ultrasoft pseudopotentials, or the projector-augmented-wave method. To determine the electronic groundstate, VASP makes use of efficient iterative matrix diagonalisation techniques, like the residual minimisation method with direct inversion of the iterative subspace (RMM-DIIS) or blocked Davidson algorithms. These are coupled to highly efficient Broyden and Pulay density mixing schemes to speed up the self-consistency cycle.
Licensing Terms and Conditions
vasp6
. Please refer to the VASP web site for more information.Alps (GH200)
How to run
A precompiled user environment containing VASP with MPI, OpenMP, OpenACC and Wannier90 support is available. Due to license restrictions, the VASP images are not directly accessible in the same way as other applications. To access VASP uenv images, please follow the following guide: Accessing Restricted Software
To load the VASP user environment:
uenv start vasp/v6.5.0:v1 --view=vasp
The vasp_std
, vasp_ncl
and vasp_gam
executables are now available for use.
Any SLURM script for more than one task must include export MPICH_GPU_SUPPORT_ENABLED=1
, since VASP relies on GPU aware MPI and the job will fail otherwise.
It's recommended to use the SLURM option --gpus-per-task=1
, since VASP may fail to properly assign ranks to GPUs when running on more than one node. This is not required when using the CUDA MPS wrapper for oversubscription of GPUs.
VASP uses GPU aware MPI features and optionally Nvidia NCCL for communication. NCCL provides improved communication, but it is disabled when using more than one task per GPU.
In some cases, using multiple tasks per GPU can be beneficial on the GH200 architecture, as it increases GPU utilization. This requires the use of CUDA MPS, which currently has to be launched manually before the application.
The CUDA MPS wrapper can be found at: Oversubscription of GPU cards
Scaling
On GH200, VASP typically doesn't scale well to large number of nodes. However, scaling behavior varies greatly between types of jobs.
The following shows scaling of three different cases, where the runtime of a single task is used as reference.
Using more than one task per GPU can provide a benefit on one or two GPUs, but there are (currently) performance issues on higher number of GPUs, and in some cases VASP can hang during the execution. Therefore, in most cases it is recommended to limit a job to one task per GPU when using a full node with four GPUs or more.
Bulding VASP from Source
To build VASP from source, the develop
view must first be loaded:
uenv start <path_to_vasp_image> uenv view develop
All required dependencies can now be found in /user-environment/env/develop
. Note that libraries might not be found when executing vasp, if the makefile does not include additional rpath linking options or LD_LIBRARY_PATH
has not been extended.
The following makefile.include
is an example for VASP 6.4.3 on GH200 with the develop
view loaded:
# Default precompiler options CPP_OPTIONS = -DHOST=\"LinuxNV\" \ -DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \ -DscaLAPACK \ -DCACHE_SIZE=4000 \ -Davoidalloc \ -Dvasp6 \ -Duse_bse_te \ -Dtbdyn \ -Dqd_emulate \ -Dfock_dblbuf \ -D_OPENMP \ -D_OPENACC \ -DUSENCCL -DUSENCCLP2P CPP = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX) > $*$(SUFFIX) CUDA_VERSION = $(shell nvcc -V | grep -E -o -m 1 "[0-9][0-9]\.[0-9]," | rev | cut -c 2- | rev) CC = mpicc -acc -gpu=cc90,cuda${CUDA_VERSION} -mp FC = mpif90 -acc -gpu=cc90,cuda${CUDA_VERSION} -mp FCL = mpif90 -acc -gpu=cc90,cuda${CUDA_VERSION} -mp -c++libs FREE = -Mfree FFLAGS = -Mbackslash -Mlarge_arrays OFLAG = -fast DEBUG = -Mfree -O0 -traceback OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o LLIBS = -cudalib=cublas,cusolver,cufft,nccl -cuda # Redefine the standard list of O1 and O2 objects SOURCE_O1 := pade_fit.o minimax_dependence.o SOURCE_O2 := pead.o # For what used to be vasp.5.lib CPP_LIB = $(CPP) FC_LIB = $(FC) CC_LIB = $(CC) CFLAGS_LIB = -O -w FFLAGS_LIB = -O1 -Mfixed FREE_LIB = $(FREE) OBJECTS_LIB = linpack_double.o # For the parser library CXX_PARS = nvc++ --no_warnings ## ## Customize as of this point! Of course you may change the preceding ## part of this file as well if you like, but it should rarely be ## necessary ... ## # When compiling on the target machine itself , change this to the # relevant target when cross-compiling for another architecture # # NOTE: Using "-tp neoverse-v2" causes some tests to fail. On GH200 architecture, "-tp host" # is recommended. VASP_TARGET_CPU ?= -tp host FFLAGS += $(VASP_TARGET_CPU) # Specify your NV HPC-SDK installation (mandatory) #... first try to set it automatically NVROOT =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }') # If the above fails, then NVROOT needs to be set manually #NVHPC ?= /opt/nvidia/hpc_sdk #NVVERSION = 21.11 #NVROOT = $(NVHPC)/Linux_x86_64/$(NVVERSION) ## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2 #OFLAG_IN = -fast -Mwarperf #SOURCE_IN := nonlr.o # Software emulation of quadruple precsion (mandatory) QD ?= $(NVROOT)/compilers/extras/qd LLIBS += -L$(QD)/lib -lqdmod -lqd -Wl,-rpath,$(QD)/lib INCS += -I$(QD)/include/qd # BLAS (mandatory) BLAS = -lnvpl_blas_lp64_gomp -lnvpl_blas_core # LAPACK (mandatory) LAPACK = -lnvpl_lapack_lp64_gomp -lnvpl_lapack_core # scaLAPACK (mandatory) SCALAPACK = -lscalapack LLIBS += $(SCALAPACK) $(LAPACK) $(BLAS) -Wl,-rpath,/user-environment/env/develop/lib -Wl,-rpath,/user-environment/env/develop/lib64 -Wl,--disable-new-dtags # FFTW (mandatory) FFTW_ROOT ?= /user-environment/env/develop LLIBS += -L$(FFTW_ROOT)/lib -lfftw3 -lfftw3_omp INCS += -I$(FFTW_ROOT)/include # Use cusolvermp (optional) # supported as of NVHPC-SDK 24.1 (and needs CUDA-11.8) #CPP_OPTIONS+= -DCUSOLVERMP -DCUBLASMP #LLIBS += -cudalib=cusolvermp,cublasmp -lnvhpcwrapcal # HDF5-support (optional but strongly recommended) CPP_OPTIONS+= -DVASP_HDF5 HDF5_ROOT ?= /user-environment/env/develop LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran INCS += -I$(HDF5_ROOT)/include # For the VASP-2-Wannier90 interface (optional) CPP_OPTIONS += -DVASP2WANNIER90 WANNIER90_ROOT ?= /user-environment/env/develop LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier # For the fftlib library (recommended) #CPP_OPTIONS+= -Dsysv #FCL += fftlib.o #CXX_FFTLIB = nvc++ -mp --no_warnings -std=c++11 -DFFTLIB_THREADSAFE #INCS_FFTLIB = -I./include -I$(FFTW_ROOT)/include #LIBS += fftlib #LLIBS += -ldl