Page 1 of 1

Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Wed Sep 20, 2023 8:36 am
by paulfons
I am trying to compile the GPU version of Vasp 6.4.2 using the same makefile.include I used for 6.4.0, however, I have updated my Nvidia installation and am running into some problems I thought I would consult about.

The compiler is version 12.2 while the hpc_sdk is now version 23.7.

Code: Select all

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
When I attempt to compile, I am finding that flags to the compiler as specified in the makefile.include are being flagged as errors. After numerous success compilations, the errors occur as pasted below. In particular, it would seem that the compiler doesn't recognized the -Mfree option and suggests using -free. On the otherhand the option "-acc" is not recognized at all. Do you have any suggestions as how to go about fixing this?

Code: Select all

mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c nccl2for.f90
mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c base.f90
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c simd.f90
make[2]: *** [makefile:167: c2f_interface.o] Error 1
make[2]: *** Waiting for unfinished jobs....
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
make[2]: *** [makefile:167: nccl2for.o] Error 1
mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c base.f90
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
make[2]: *** [makefile:167: simd.o] Error 1
make[2]: *** [makefile:167: c2f_interface.o] Error 1
make[2]: *** Waiting for unfinished jobs....
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
make[2]: *** [makefile:167: nccl2for.o] Error 1
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
make[2]: *** [makefile:167: base.o] Error 1
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
make[2]: *** [makefile:167: simd.o] Error 1
cp: cannot stat ‘vasp’: No such file or directory
make[1]: *** [makefile:129: all] Error 1
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
make: *** [makefile:17: ncl] Error 2
gfortran: error: unrecognized debug output level ‘pu=cc60,cc70,cc80,cuda12.2’
gfortran: error: unrecognized command-line option ‘-acc’
gfortran: error: unrecognized command-line option ‘-Mfree’; did you mean ‘-free’?
gfortran: error: unrecognized command-line option ‘-Mbackslash’; did you mean ‘-fbackslash’?
gfortran: error: unrecognized command-line option ‘-Mlarge_arrays’
gfortran: error: unrecognized command-line option ‘-tp’; did you mean ‘-p’?
gfortran: error: unrecognized command-line option ‘-fast’; did you mean ‘-Ofast’?
make[2]: *** [makefile:167: base.o] Error 1
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
cp: cannot stat ‘vasp’: No such file or directory
make[1]: *** [makefile:129: all] Error 1
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make: *** [makefile:17: gam] Error 2

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Wed Sep 20, 2023 11:47 am
by paulfons
I thought I should add my makefile.include for reference. Note the part below the included text is the default in the included arch/makefile.include.nvhpc_acc


Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
#FC          = mpif90 -fopenacc -gpu=cc60,cc70,cc80,cuda12.2
#FC          = mpif90 -fopenacc -gpu=cc60,cc70,cc80,cuda12.2
FCL         = mpif90 -fopenacc  -c++libs
FCL         = mpif90 -fopenacc  -c++libs

FREE        = -free

FFLAGS      = -fbackslash 

OFLAG       = -Ofast

DEBUG       = -ffree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = nvfortran
CC_LIB      = nvc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = nvc++ --no_warnings

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Fri Sep 22, 2023 12:14 pm
by merzuk.kaltak
Dear Paul,

so far we have compiled vasp successfully with hpc_sdk 23.5.
However, following line in your error logs seems suspicious:

Code: Select all

gfortran: error: unrecognized command-line option ...
Could it be that mpif90 is the mpi wrapper to gfortran in your environment?
A easy way to figure that out is to run

Code: Select all

mpif90 --version

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Fri Sep 22, 2023 1:57 pm
by paulfons
Looking down at his feet and shuffling them. Indeed mpif90 was pointing to the Intel mkl binary. I purged the intel load modules and started compiling again, but ran into what is likely a real problem with compatibility of the nvfortran and the sdk. Might you have any advice on what to do? The hpc_sdk is the latest version as installed by yum while I just downloaded and installed the latest sdk from Nvidia (23.7).

Code: Select all

mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.3 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c base.f90
nvfortran-Error-A CUDA toolkit matching the current driver version (12.2) or a supported older version (11.0 or 12.3) was not installed with this HPC SDK.
make[2]: *** [makefile:167: c2f_interface.o] Error 1

The SDK is version 23.7 while the software compiler seems to be at 12.2.

Code: Select all

>ls -l /opt/nvidia/hpc_sdk/Linux_x86_64/
total 0
drwxr-xr-x  2 root root 102 Dec 14  2022 2022
lrwxrwxrwx  1 root root   4 Aug  8 22:27 2023 -> 23.7
drwxr-xr-x 10 root root 129 Dec 14  2022 22.11
drwxrwxr-x 10 root root 129 Mar  9  2022 22.2
drwxrwxr-x 10 root root 129 Mar 29  2022 22.3
drwxrwxr-x 10 root root 129 Jul  8  2022 22.5
drwxrwxr-x 10 root root 129 Aug  4  2022 22.7
drwxr-xr-x 10 root root 129 Mar  2  2023 23.1
drwxr-xr-x 10 root root 129 Jun 21 16:55 23.3
drwxr-xr-x 10 root root 129 Jun 21 17:02 23.5
drwxr-xr-x 10 root root 129 Aug  8 22:26 23.7
lrwxrwxrwx  1 root root   4 Sep 19 15:08 latest -> 23.7

Code: Select all

>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
:>/data/Software/Vasp/vasp.6.4.2>nvfortran --version

nvfortran 23.7-0 64-bit target on x86-64 Linux -tp icelake-server 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Mon Sep 25, 2023 7:35 am
by merzuk.kaltak
It seems you try to link to cuda-12.3 by using the compiler option:

Code: Select all

-gpu=cc60,cc70,cc80,cuda12.3
Does lowering the linked cuda version to, say 12.2 (or even lower) help?
Usually, you can change the cuda version as follows

Code: Select all

-gpu=cc60,cc70,cc80,cuda12.2
Sometimes it also helps to raise the version of the nvidia driver.

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Mon Sep 25, 2023 8:06 am
by paulfons
Thank you for the suggestion of trying 12.2. It seems to have gotten me a little further. Now the compiler crashes with a message I have never seen before "error while writing intermediate language (4) file: File too large". The error relates to line 99 of the header file new_allocator.h which seems pretty innocuous

Code: Select all

allocate(size_type __n, const void* = 0)
      {
        if (__n > this->max_size())
          std::__throw_bad_alloc();

        return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
      }
but gives this error.

Code: Select all

"/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/ext/new_allocator.h", line 99: catastrophic error: error while writing intermediate language (4) file: File too large
        allocate(size_type __n, const void* = 0)

The compilation output is listed below. Any ideas of what to try next or what I might be doing incorrectly? Thanks for your time.

Code: Select all

vasp.6.4.2>make 
if [ ! -d build/std ] ; then mkdir -p build/std  ; fi
cp src/makefile src/.objects src/makedeps.awk makefile.include build/std 
make -C build/std VERSION=std check
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
exit 0
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make -C build/std VERSION=std cleandependencies -j1
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
rm -f .depend
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make -C build/std VERSION=std all
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
rsync -ru ../../src/lib .
cp makefile.include lib
make -C lib -j1
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std/lib'
make libdmy.a
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std/lib'
make[3]: 'libdmy.a' is up to date.
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/lib'
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/lib'
rsync -ru ../../src/parser .
cp makefile.include parser
make -C parser -j1
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std/parser'
make libparser.a
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std/parser'
nvc++ --no_warnings -D YY_parse_DEBUG=1 -c sites.cpp -o sites.o
"/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/ext/new_allocator.h", line 99: catastrophic error: error while writing intermediate language (4) file: File too large
        allocate(size_type __n, const void* = 0)
                                              ^

1 catastrophic error detected in the compilation of "sites.cpp".
Compilation terminated.
make[3]: *** [makefile:31: sites.o] Error 2
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/parser'
make[2]: *** [makefile:12: all] Error 2
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/parser'
make[1]: *** [makefile:146: parser] Error 2
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Mon Sep 25, 2023 8:22 am
by merzuk.kaltak
This seems to be a problem with the gcc version installed.
Note, nvidia's hpc_sdk requires gcc, c++ (and I think also a gfortran).
Following output looks like hpc_sdk uses c++-4.8.5:

Code: Select all

"/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/ext/new_allocator.h", line 99: catastrophic error: error while writing intermediate language (4) file: File too large
        allocate(size_type __n, const void* = 0)
My first guess is that the version is outdated. On our system we have only version 8 of the gcc compiler suite.

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Mon Sep 25, 2023 8:47 am
by paulfons
Thank you for your observation. I switched to a more recent compiler using the software collection (scl) of centos

Code: Select all

>source /opt/rh/devtoolset-11/enable
paulfons@kaon:/data/Software/Vasp/vasp.6.4.2>gcc --version
gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

paulfons@kaon:/data/Software/Vasp/vasp.6.4.2>gfortran --version
GNU Fortran (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and the same problem occurs. Do you have any further suggestions?

Code: Select all

kaon:/data/Software/Vasp/vasp.6.4.2>make veryclean
rm -rf build/std
rm -rf build/gam
rm -rf build/ncl
paulfons@kaon:/data/Software/Vasp/vasp.6.4.2>make -j 12
if [ ! -d build/std ] ; then mkdir -p build/std  ; fi
if [ ! -d build/gam ] ; then mkdir -p build/gam  ; fi
if [ ! -d build/ncl ] ; then mkdir -p build/ncl  ; fi
cp src/makefile src/.objects src/makedeps.awk makefile.include build/gam 
cp src/makefile src/.objects src/makedeps.awk makefile.include build/std 
cp src/makefile src/.objects src/makedeps.awk makefile.include build/ncl 
make -C build/gam VERSION=gam check
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
exit 0
make -C build/std VERSION=std check
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make -C build/gam VERSION=gam cleandependencies -j1
make[1]: warning: -j1 forced in submake: resetting jobserver mode.
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
exit 0
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
rm -f .depend
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make -C build/std VERSION=std cleandependencies -j1
make[1]: warning: -j1 forced in submake: resetting jobserver mode.
make -C build/ncl VERSION=ncl check
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
rm -f .depend
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
exit 0
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
make -C build/gam VERSION=gam all
make -C build/ncl VERSION=ncl cleandependencies -j1
make[1]: warning: -j1 forced in submake: resetting jobserver mode.
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
rsync -ru ../../src/lib .
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
rm -f .depend
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make -C build/std VERSION=std all
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
rsync -ru ../../src/lib .
rsync -ru ../../src/parser .
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
make -C build/ncl VERSION=ncl all
rsync -ru ../../src/parser .
make[1]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
rsync -ru ../../src/lib .
rsync -u ../../src/*.F ../../src/*.inc .
rsync -ru ../../src/parser .
rsync -u ../../src/*.F ../../src/*.inc .
rsync -u ../../src/*.F ../../src/*.inc .
cp makefile.include parser
cp makefile.include parser
cp makefile.include parser
make -C parser -j1
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make[2]: warning: -j1 forced in submake: resetting jobserver mode.
make libparser.a
make -C parser -j1
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make[2]: warning: -j1 forced in submake: resetting jobserver mode.
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam/parser'
nvc++ --no_warnings -D YY_parse_DEBUG=1 -c sites.cpp -o sites.o
make libparser.a
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std/parser'
nvc++ --no_warnings -D YY_parse_DEBUG=1 -c sites.cpp -o sites.o
make -C parser -j1
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
make[2]: warning: -j1 forced in submake: resetting jobserver mode.
make libparser.a
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl/parser'
nvc++ --no_warnings -D YY_parse_DEBUG=1 -c sites.cpp -o sites.o
cp makefile.include lib
cp makefile.include lib
cp makefile.include lib
make -C lib -j1
make -C lib -j1
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make[2]: warning: -j1 forced in submake: resetting jobserver mode.
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make[2]: warning: -j1 forced in submake: resetting jobserver mode.
make libdmy.a
make libdmy.a
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std/lib'
nvfortran -Mpreprocess -Mfree -Mextend -E preclib.F > preclib.f90
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam/lib'
nvfortran -Mpreprocess -Mfree -Mextend -E preclib.F > preclib.f90
make -C lib -j1
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
make[2]: warning: -j1 forced in submake: resetting jobserver mode.
make libdmy.a
make[3]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl/lib'
nvfortran -Mpreprocess -Mfree -Mextend -E preclib.F > preclib.f90
"/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/ext/new_allocator.h", line 99: catastrophic error: error while writing intermediate language (4) file: File too large
        allocate(size_type __n, const void* = 0)
                                              ^

"/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/ext/new_allocator.h", line 99: catastrophic error: error while writing intermediate language (4) file: File too large
        allocate(size_type __n, const void* = 0)
                                              ^

nvfortran -O1 -Mfixed -Mfree -c -o preclib.o preclib.f90
nvfortran -O1 -Mfixed -Mfree -c -o preclib.o preclib.f90
"/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../include/c++/4.8.5/ext/new_allocator.h", line 99: catastrophic error: error while writing intermediate language (4) file: File too large
        allocate(size_type __n, const void* = 0)
                                              ^

1 catastrophic error detected in the compilation of "sites.cpp".
Compilation terminated.
make[3]: *** [makefile:31: sites.o] Error 2
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam/parser'
make[2]: *** [makefile:12: all] Error 2
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam/parser'
make[1]: *** [makefile:146: parser] Error 2
make[1]: *** Waiting for unfinished jobs....
1 catastrophic error detected in the compilation of "sites.cpp".
Compilation terminated.
nvfortran -O1 -Mfixed -Mfree -c -o preclib.o preclib.f90
make[3]: *** [makefile:31: sites.o] Error 2
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/parser'
make[2]: *** [makefile:12: all] Error 2
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/parser'
make[1]: *** [makefile:146: parser] Error 2
make[1]: *** Waiting for unfinished jobs....
1 catastrophic error detected in the compilation of "sites.cpp".
Compilation terminated.
make[3]: *** [makefile:31: sites.o] Error 2
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl/parser'
make[2]: *** [makefile:12: all] Error 2
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl/parser'
make[1]: *** [makefile:146: parser] Error 2
make[1]: *** Waiting for unfinished jobs....
nvc -w -O -c -o timing_.o timing_.c
nvc -w -O -c -o timing_.o timing_.c
nvc -w -O -c -o timing_.o timing_.c
nvc -w -O -c -o derrf_.o derrf_.c
nvc -w -O -c -o derrf_.o derrf_.c
nvc -w -O -c -o derrf_.o derrf_.c
nvc -w -O -c -o dclock_.o dclock_.c
nvc -w -O -c -o dclock_.o dclock_.c
nvc -w -O -c -o dclock_.o dclock_.c
nvfortran -Mpreprocess -Mfree -Mextend -E diolib.F > diolib.f90
nvfortran -Mpreprocess -Mfree -Mextend -E diolib.F > diolib.f90
nvfortran -Mpreprocess -Mfree -Mextend -E diolib.F > diolib.f90
nvfortran -O1 -Mfixed -Mfree -c -o diolib.o diolib.f90
nvfortran -O1 -Mfixed -Mfree -c -o diolib.o diolib.f90
nvfortran -O1 -Mfixed -Mfree -c -o diolib.o diolib.f90
nvfortran -Mpreprocess -Mfree -Mextend -E dlexlib.F > dlexlib.f90
nvfortran -Mpreprocess -Mfree -Mextend -E dlexlib.F > dlexlib.f90
nvfortran -Mpreprocess -Mfree -Mextend -E dlexlib.F > dlexlib.f90
nvfortran -O1 -Mfixed -Mfree -c -o dlexlib.o dlexlib.f90
nvfortran -O1 -Mfixed -Mfree -c -o dlexlib.o dlexlib.f90
nvfortran -O1 -Mfixed -Mfree -c -o dlexlib.o dlexlib.f90
nvfortran -Mpreprocess -Mfree -Mextend -E drdatab.F > drdatab.f90
nvfortran -Mpreprocess -Mfree -Mextend -E drdatab.F > drdatab.f90
nvfortran -Mpreprocess -Mfree -Mextend -E drdatab.F > drdatab.f90
nvfortran -O1 -Mfixed -Mfree -c -o drdatab.o drdatab.f90
nvfortran -O1 -Mfixed -Mfree -c -o drdatab.o drdatab.f90
nvfortran -O1 -Mfixed -Mfree -c -o drdatab.o drdatab.f90
make build_info
make build_info
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make build_info
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
nvfortran -O1 -Mfixed  -c linpack_double.f
nvfortran -O1 -Mfixed  -c linpack_double.f
nvfortran -O1 -Mfixed  -c linpack_double.f
printf "    character(len=*), parameter :: cpp_options = '&\n&-DHOST=\"LinuxNV\" &\n&-DMPI &\n&-DMPI_BLOCK=8000 &\n&-Duse_collective &\n&-DscaLAPACK &\n&-DCACHE_SIZE=4000 &\n&-Davoidalloc &\n&-Dvasp6 &\n&-Duse_bse_te &\n&-Dtbdyn &\n&-Dqd_emulate &\n&-Dfock_dblbuf &\n&-D_OPENACC &\n&-DUSENCCL &\n&-DUSENCCLP2P'\n" >  build_info.inc
printf "    character(len=*), parameter :: cpp_options = '&\n&-DHOST=\"LinuxNV\" &\n&-DMPI &\n&-DMPI_BLOCK=8000 &\n&-Duse_collective &\n&-DscaLAPACK &\n&-DCACHE_SIZE=4000 &\n&-Davoidalloc &\n&-Dvasp6 &\n&-Duse_bse_te &\n&-Dtbdyn &\n&-Dqd_emulate &\n&-Dfock_dblbuf &\n&-D_OPENACC &\n&-DUSENCCL &\n&-DUSENCCLP2P'\n" >  build_info.inc
printf "    character(len=*), parameter :: link_line   = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs &\n&-Llib &\n&-ldmy &\n&-Lparser &\n&-lparser &\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"   >> build_info.inc
printf "    character(len=*), parameter :: link_line   = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs &\n&-Llib &\n&-ldmy &\n&-Lparser &\n&-lparser &\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"   >> build_info.inc
printf "    character(len=*), parameter :: fc     = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2'\n"     >>  build_info.inc
printf "    character(len=*), parameter :: fc     = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2'\n"     >>  build_info.inc
printf "    character(len=*), parameter :: fcl    = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs'\n"    >> build_info.inc
printf "    character(len=*), parameter :: fcl    = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs'\n"    >> build_info.inc
printf "    character(len=*), parameter :: cpp_options = '&\n&-DHOST=\"LinuxNV\" &\n&-DMPI &\n&-DMPI_BLOCK=8000 &\n&-Duse_collective &\n&-DscaLAPACK &\n&-DCACHE_SIZE=4000 &\n&-Davoidalloc &\n&-Dvasp6 &\n&-Duse_bse_te &\n&-Dtbdyn &\n&-Dqd_emulate &\n&-Dfock_dblbuf &\n&-D_OPENACC &\n&-DUSENCCL &\n&-DUSENCCLP2P'\n" >  build_info.inc
printf "    character(len=*), parameter :: fflags = '&\n&-Mbackslash &\n&-Mlarge_arrays &\n&-tp &\n&host'\n" >> build_info.inc
printf "    character(len=*), parameter :: fflags = '&\n&-Mbackslash &\n&-Mlarge_arrays &\n&-tp &\n&host'\n" >> build_info.inc
printf "    character(len=*), parameter :: link_line   = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs &\n&-Llib &\n&-ldmy &\n&-Lparser &\n&-lparser &\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"   >> build_info.inc
printf "    character(len=*), parameter :: llibs  = '&\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"  >>  build_info.inc
printf "    character(len=*), parameter :: llibs  = '&\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"  >>  build_info.inc
rm -f libdmy.a
printf "    character(len=*), parameter :: fc     = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2'\n"     >>  build_info.inc
printf "    character(len=*), parameter :: incs   = '&\n&-I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd &\n&-I/opt/fftw3/include'\n"   >> build_info.inc
printf "    character(len=*), parameter :: incs   = '&\n&-I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd &\n&-I/opt/fftw3/include'\n"   >> build_info.inc
printf "    character(len=*), parameter :: fcl    = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs'\n"    >> build_info.inc
rm -f libdmy.a
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
printf "    character(len=*), parameter :: fflags = '&\n&-Mbackslash &\n&-Mlarge_arrays &\n&-tp &\n&host'\n" >> build_info.inc
printf "    character(len=*), parameter :: llibs  = '&\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"  >>  build_info.inc
ar vq libdmy.a preclib.o timing_.o derrf_.o dclock_.o diolib.o dlexlib.o drdatab.o linpack_double.o
ar: creating libdmy.a
rm -f libdmy.a
a - preclib.o
a - timing_.o
printf "    character(len=*), parameter :: incs   = '&\n&-I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd &\n&-I/opt/fftw3/include'\n"   >> build_info.inc
a - derrf_.o
a - dclock_.o
a - diolib.o
a - dlexlib.o
a - drdatab.o
a - linpack_double.o
ar vq libdmy.a preclib.o timing_.o derrf_.o dclock_.o diolib.o dlexlib.o drdatab.o linpack_double.o
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
ar: creating libdmy.a
a - preclib.o
a - timing_.o
a - derrf_.o
a - dclock_.o
a - diolib.o
a - dlexlib.o
a - drdatab.o
a - linpack_double.o
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam/lib'
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam/lib'
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make: *** [makefile:17: gam] Error 2
make: *** Waiting for unfinished jobs....
ar vq libdmy.a preclib.o timing_.o derrf_.o dclock_.o diolib.o dlexlib.o drdatab.o linpack_double.o
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/lib'
ar: creating libdmy.a
a - preclib.o
a - timing_.o
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std/lib'
a - derrf_.o
a - dclock_.o
a - diolib.o
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
a - dlexlib.o
a - drdatab.o
a - linpack_double.o
make: *** [makefile:17: std] Error 2
make[3]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl/lib'
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl/lib'
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
make: *** [makefile:17: ncl] Error 2
paulfons@kaon:/data/Software/Vasp/vasp.6.4.2>nvfortran --version

nvfortran 23.7-0 64-bit target on x86-64 Linux -tp icelake-server 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Mon Sep 25, 2023 9:22 am
by merzuk.kaltak
It still seems that nvfortran is linked to gcc-4.8.5. My guess is to re-install the sdk with the newer gcc version as default.

Most probably not a problem, but I noticed you try to compile vasp in parallel with -j12 without setting DEPS>0. This will not work, because the dependency list is not generated. You would have to run instead:

Code: Select all

make -j12 DEPS=1

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Wed Sep 27, 2023 7:09 am
by paulfons
Thank you for your comment about dependencies. As I mentioned before I am using CentOS7.9. I have installed the latest Nvidia hpc_sdk version (that also includes compatibility with 12.2, 11.8, 11.0). I have also used mamba to install the latest version of gcc (including gfortran).

Code: Select all

GNU Fortran (conda-forge gcc 13.2.0-2) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
and

Code: Select all

(gcc) paulfons@kaon:/data/Software/Vasp/vasp.6.4.2>gcc --version
gcc (conda-forge gcc 13.2.0-2) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The compilation proceeds until gfortran is invoked where it appears that the flags specified in the makefile are rejected by gfortran. I will past the last part of the output below. Is it necessary to update the flags in the makefile? I recall you mentioned compiling successfully against Nvidia 12.2.

Code: Select all

mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c nccl2for.f90
/opt/intel/oneapi/mpi/2021.8.0/bin/mpif90: line 752: [: : integer expression expected
mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.7/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c nccl2for.f90
/opt/intel/oneapi/mpi/2021.8.0/bin/mpif90: line 752: [: : integer expression expected
/opt/intel/oneapi/mpi/2021.8.0/bin/mpif90: line 752: [: : integer expression expected
gfortran: error: unrecognized debug output level 'pu=cc60,cc70,cc80,cuda12.2'
gfortran: error: unrecognized debug output level 'pu=cc60,cc70,cc80,cuda12.2'
gfortran: error: unrecognized debug output level 'pu=cc60,cc70,cc80,cuda12.2'
gfortran: error: unrecognized command-line option '-acc'
gfortran: error: unrecognized command-line option '-Mfree'; did you mean '-free'?
gfortran: error: unrecognized command-line option '-Mbackslash'; did you mean '-fbackslash'?
gfortran: error: unrecognized command-line option '-Mlarge_arrays'
gfortran: error: unrecognized command-line option '-tp'; did you mean '-p'?
gfortran: error: unrecognized command-line option '-fast'; did you mean '-Ofast'?
gfortran: error: unrecognized command-line option '-acc'
gfortran: error: unrecognized command-line option '-Mfree'; did you mean '-free'?
gfortran: error: unrecognized command-line option '-Mbackslash'; did you mean '-fbackslash'?
gfortran: error: unrecognized command-line option '-Mlarge_arrays'
gfortran: error: unrecognized command-line option '-tp'; did you mean '-p'?
gfortran: error: unrecognized command-line option '-fast'; did you mean '-Ofast'?
gfortran: error: unrecognized command-line option '-acc'
gfortran: error: unrecognized command-line option '-Mfree'; did you mean '-free'?
gfortran: error: unrecognized command-line option '-Mbackslash'; did you mean '-fbackslash'?
gfortran: error: unrecognized command-line option '-Mlarge_arrays'
gfortran: error: unrecognized command-line option '-tp'; did you mean '-p'?
gfortran: error: unrecognized command-line option '-fast'; did you mean '-Ofast'?
make[2]: *** [makefile:167: c2f_interface.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [makefile:167: c2f_interface.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/opt/intel/oneapi/mpi/2021.8.0/bin/mpif90: line 752: [: : integer expression expected
make[2]: *** [makefile:167: c2f_interface.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/opt/intel/oneapi/mpi/2021.8.0/bin/mpif90: line 752: [: : integer expression expected
/opt/intel/oneapi/mpi/2021.8.0/bin/mpif90: line 752: [: : integer expression expected
gfortran: error: unrecognized debug output level 'pu=cc60,cc70,cc80,cuda12.2'
gfortran: error: unrecognized debug output level 'pu=cc60,cc70,cc80,cuda12.2'
gfortran: error: unrecognized debug output level 'pu=cc60,cc70,cc80,cuda12.2'
gfortran: error: unrecognized command-line option '-acc'
gfortran: error: unrecognized command-line option '-Mfree'; did you mean '-free'?
gfortran: error: unrecognized command-line option '-Mbackslash'; did you mean '-fbackslash'?
gfortran: error: unrecognized command-line option '-Mlarge_arrays'
gfortran: error: unrecognized command-line option '-tp'; did you mean '-p'?
gfortran: error: unrecognized command-line option '-fast'; did you mean '-Ofast'?
make[2]: *** [makefile:167: nccl2for.o] Error 1
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
gfortran: error: unrecognized command-line option '-acc'
gfortran: error: unrecognized command-line option '-Mfree'; did you mean '-free'?
gfortran: error: unrecognized command-line option '-Mbackslash'; did you mean '-fbackslash'?
gfortran: error: unrecognized command-line option '-Mlarge_arrays'
gfortran: error: unrecognized command-line option '-tp'; did you mean '-p'?
gfortran: error: unrecognized command-line option '-fast'; did you mean '-Ofast'?
make[2]: *** [makefile:167: nccl2for.o] Error 1
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
gfortran: error: unrecognized command-line option '-acc'
gfortran: error: unrecognized command-line option '-Mfree'; did you mean '-free'?
gfortran: error: unrecognized command-line option '-Mbackslash'; did you mean '-fbackslash'?
gfortran: error: unrecognized command-line option '-Mlarge_arrays'
gfortran: error: unrecognized command-line option '-tp'; did you mean '-p'?
gfortran: error: unrecognized command-line option '-fast'; did you mean '-Ofast'?
make[2]: *** [makefile:167: nccl2for.o] Error 1
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
cp: cannot stat ‘vasp’: No such file or directory
make[1]: *** [makefile:129: all] Error 1
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/gam'
make: *** [makefile:17: gam] Error 2
make: *** Waiting for unfinished jobs....
cp: cannot stat ‘vasp’: No such file or directory
make[1]: *** [makefile:129: all] Error 1
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/std'
make: *** [makefile:17: std] Error 2
cp: cannot stat ‘vasp’: No such file or directory
make[1]: *** [makefile:129: all] Error 1
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2/build/ncl'
make: *** [makefile:17: ncl] Error 2

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Thu Sep 28, 2023 11:28 am
by juergen.furthmueller
Dear Paul,

You are still uisng the wrong environment (wrong mpif90 version calling the wrong compiler)! All the compiler options are intended for the nvfortran compiler coming with the Nvidia HPC SDK and NOT for gfortran. The Nvidia compilers (related to the PGI compilers) are the only ones which can compile the GPU version of VASP (gfortran still cannot do this). Further, only the CUDA-aware (modified!) OpenMPI coming with the Nvidia HPC SDK is working together with the GPU but NONE of the standard MPIs coming with any Linux distribution and also not the Intel MPI coming with Intel MKL/OneAPI can be used for this purpose! For that reason some modulefile(s) are installed with the Nvidia HPC SDK and what you need is, taking the definitions of the VASP makefile.include, file "$NVHPC/modulefiles/nvhpc/$NVVERSION". You have to load this module file (as the last one!) in order to set up the necessary environment (binary search paths, library paths, etc.) to ensure to use the correct mpif90 version, the correct OpenMPI modified by Nvidia, etc. Otherwise it will never work! Maybe check out what the command "which mpi90" returns. It must be, in VASP makefile.include language, "$NVROOT/comm_libs/mpi/bin/mpif90"! The only non-Nvidia software you may use later is the Intel MKL (but the *_intel_* and *_openmpi_* versions only!), optionally also the Intel MKL FFT wrappers (to be compiled with the Intel or maybe also PGI compiler or the Nvidia compiler) as a (faster) replacement for the original FFTW3. And don't forget: Any whatever additional (e,g. wannier90) libraries you link in need to be at least compatible with the Intel (or PGI) compilers (not GNU) or should be compiled with the Intel or better Nvidia (or PGI) compilers then. With the correct environment (and compiler compatibilities) all should run smoothly then ...

Best regards and good success,
Jürgen Futhmüller

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Mon Oct 16, 2023 7:43 am
by paulfons
Dear Jürgen,

Thank you advice and a apologize for being slow in getting back to you. Last week I switched the OS on my cluster from CentOS 7.9 to Rocky Linux 9.2 due to the impending end of support. I also installed the latest Nvidia hpc_sdk and Intel oneapi compilers. The problem I was having earlier with the gnu compiler lurking behind my mpif90 was the intel initialization script. This is fixed. I am now using the Nvidia compiler.

Code: Select all

mpif90 --version

nvfortran 23.9-0 64-bit target on x86-64 Linux -tp icelake-server 
NVIDIA Compilers and Tools
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
Now make proceeds for quite a long time until it encounters the error below. Note I specified cuda12.2 on the FC line. The compilation stops with the error "NVFORTRAN-S-0017-Unable to open include file: fftw3.f (wannier_interpol.F: 743)" . There is no such file in the source tree. Do you have any idea what I might try to do to fix this?

Code: Select all

make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2_gpu/build/std'
printf "    character(len=*), parameter :: cpp_options = '&\n&-DHOST=\"LinuxNV\" &\n&-DMPI &\n&-DMPI_BLOCK=8000 &\n&-Duse_collective &\n&-DscaLAPACK &\n&-DCACHE_SIZE=4000 &\n&-Davoidalloc &\n&-Dvasp6 &\n&-Duse_bse_te &\n&-Dtbdyn &\n&-Dqd_emulate &\n&-Dfock_dblbuf &\n&-D_OPENACC &\n&-DUSENCCL &\n&-DUSENCCLP2P'\n" >  build_info.inc
printf "    character(len=*), parameter :: link_line   = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs &\n&-Llib &\n&-ldmy &\n&-Lparser &\n&-lparser &\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"   >> build_info.inc
printf "    character(len=*), parameter :: fc     = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2'\n"     >>  build_info.inc
printf "    character(len=*), parameter :: fcl    = '&\n&mpif90 &\n&-acc &\n&-gpu=cc60,cc70,cc80,cuda12.2 &\n&-c++libs'\n"    >> build_info.inc
printf "    character(len=*), parameter :: fflags = '&\n&-Mbackslash &\n&-Mlarge_arrays &\n&-tp &\n&host'\n" >> build_info.inc
printf "    character(len=*), parameter :: llibs  = '&\n&-cudalib=cublas,cusolver,cufft,nccl &\n&-cuda &\n&-L/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/extras/qd/lib &\n&-lqdmod &\n&-lqd &\n&-Mscalapack &\n&-llapack &\n&-lblas &\n&-L/opt/fftw3/lib &\n&-lfftw3'\n"  >>  build_info.inc
printf "    character(len=*), parameter :: incs   = '&\n&-I/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/extras/qd/include/qd &\n&-I/opt/fftw3/include'\n"   >> build_info.inc
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2_gpu/build/std'
rm -f vasp ; make vasp ; cp vasp ../../bin/vasp_std
make[2]: Entering directory '/data/Software/Vasp/vasp.6.4.2_gpu/build/std'
mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -Mfree -Mbackslash -Mlarge_arrays -tp host -fast -I/opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/extras/qd/include/qd -I/opt/fftw3/include  -c wannier_interpol.f90
NVFORTRAN-S-0017-Unable to open include file: fftw3.f (wannier_interpol.F: 743)
  0 inform,   0 warnings,   1 severes, 0 fatal for fourier_interpol_ktor
make[2]: *** [makefile:167: wannier_interpol.o] Error 2
make[2]: Leaving directory '/data/Software/Vasp/vasp.6.4.2_gpu/build/std'
cp: cannot stat 'vasp': No such file or directory
make[1]: *** [makefile:129: all] Error 1
make[1]: Leaving directory '/data/Software/Vasp/vasp.6.4.2_gpu/build/std'
make: *** [makefile:17: std] Error 2

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Mon Oct 16, 2023 2:42 pm
by juergen.furthmueller
Dear Paul,

File "fftw3.f" is part of the FFTW3 installation. Please check whether FFTW3 is installed at all on your system and whether it is really installed under "/opt/fftw3" or somewhere else, i.e., whether you needed to adapt the search path (the given "-I/opt/fftw3/include") to the correct one. If this directory exists it should contain the needed file. Otherwise the FFTW3 installation is incomplete. If compiled from the sources a "make install" should do this but then check the given installation path (and usually you need of course root permissions for the installation)! If you have the sources, it is by the way installed from subdirectory "api" of the FFTW3 source tree ... . Hope you can find/fix it.

Best regards,
Jürgen Furthmüller

Re: Problems with 6.4.2 GPU version and hpc_sdk 23.7/nvcc 12.2

Posted: Wed Oct 18, 2023 5:03 am
by paulfons
Dear Jürgen,

I reinstalled fftw3 and everything linked OK. Thanks for your help.

Best wishes,
Paul