Page 1 of 1

VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Posted: Fri May 05, 2023 4:55 pm
by seananderson
Hi all,

I am having difficulty compiling the OpenACC GPU port of VASP 6.4.0 with the Intel MKL and OpenMP parallelism. I have been following the official instructions.

I compiled succesfully with OpenACC+MPI and can use 4 MPI processes with 4 GPUs. The problem arises with the OpenMP threading that should be enabled in order to use more CPU resources on the local node.

The documentation mentions that 'libiomp5.so' should be linked, and references the relevant Makefile.include; however, this is not what happens when I try it. The '-mp' option links the internal NVOMP library and not the Intel OpenMP 'libiomp5.so'. The correct 'libmkl_intel_thread.so' library is linked in but no multi-threading happens, which is consistent behavior.

If I forcefully link 'libiomp5.so' explicitly, the binary segfaults immediately when executed. I have seen some reports that the NVOMP and Intel OpenMP libraries are incompatible, so this also seems to be consistent behavior.

I am sure that there is some easy way to get the the OpenMP threading working between the NVHPC toolchain and the Intel MKL, but I am stumped. Any advice on this is greatly appreciated! Below is my Makefile.include for reference, but it is nothing special.

Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENMP \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
FC          = mpif90 -acc -gpu=cc70,cc80,cuda11.7,cuda12.0 -mp
FCL         = mpif90 -acc -gpu=cc70,cc80,cuda11.7,cuda12.0 -mp -c++libs

FREE        = -Mfree

FFLAGS      = -Mbackslash -Mlarge_arrays

OFLAG       = -fast

DEBUG       = -Mfree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o minimax_dependence.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = nvfortran
CC_LIB      = nvc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = nvc++ --no_warnings


# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS     += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT      =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')

## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
#OFLAG_IN   = -fast -Mwarperf
#SOURCE_IN  := nonlr.o

# Software emulation of quadruple precsion (mandatory)
QD         ?= $(NVROOT)/compilers/extras/qd
LLIBS      += -L$(QD)/lib -lqdmod -lqd
INCS       += -I$(QD)/include/qd

# Intel MKL for FFTW, BLAS, LAPACK, and scaLAPACK
MKLROOT    ?= /opt/intel/oneapi/mkl/2021.2.0
LLIBS_MKL   = -Mmkl -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64
INCS       += -I$(MKLROOT)/include/fftw

LLIBS      += $(LLIBS_MKL)

Re: VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Posted: Mon May 08, 2023 8:04 am
by merzuk.kaltak
Dear Sean Anderson,

could you please share with us the used compiler and mkl version?
I also recommend to update to vasp 6.4.1.

Re: VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Posted: Tue Jul 18, 2023 2:56 pm
by seananderson
Dear Merzuk,

Sorry for the very delayed response, I was on paternity leave.

I am using the GCC 10.2.0 compiler, Intel oneAPI 2021.2 MKL, and the NVHPC 23.3 toolkit. I encountered the exact same issues with VASP 6.4.1, although the CPU-only build works perfectly.

Many thanks!

Re: VASP 6.4.0 with OpenACC+MPI+OpenMP+MKL

Posted: Thu Jul 20, 2023 11:59 am
by seananderson
It seems that my problem was not understanding the MPI options required to parallelize across GPUs and OpenMP threads. I was able to get the desired behavior by looking at the options in 'testsuite/ompi+omp.conf', so I think everything is working as expected.

However, I am curious about the use of 'libiomp' vs. others, as indicated in this particular part of the official documentation. In my builds, 'libiomp' is never linked automatically, and the binaries segfault if I force it to be linked. Will the threaded portions of the MKL run correctly without 'libiomp' (in particular, 'libmkl_intel_thread')? My concern is that the VASP code itself will be running with the OpenMP threads, but not the MKL portions.

Another question: has Intel MPI been tested with this GPU+NVHPC configuration? I wonder if it has the same level of CUDA-awareness as Open MPI.

Thanks!