Parallel Vasp successfully compiled (AMD x86_64, 4 core, OpenMPI, Blas, Intel-Fortran-Comp.)

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
the_big_guy
Newbie
Newbie
Posts: 1
Joined: Sun Apr 25, 2010 9:09 pm

Parallel Vasp successfully compiled (AMD x86_64, 4 core, OpenMPI, Blas, Intel-Fortran-Comp.)

#1 Post by the_big_guy » Sun Apr 25, 2010 10:25 pm

I found Meister Krause's post on his successful build helpful, so I thought I would post info on my successful build as well. Hopefully it will save someone from spending the many, many hours it took me to compile.

system:
- 4 x AMD Opteron(tm) Processor 850
- Arch Linux (X86-64)
- Intel Fortran Compiler 11.1
- Gnu C/C++ Compiler 4.5.0
- OpenMPI 1.4.1
- Vasp 5.2

steps:
- install the Intel Fortran Compiler (ifort)
- install OpenMPI
- build the fftw 3.x Fortran wrapper library
- install the Blas and Lapack libraries
- build the VASP libraries
- build VASP

ifort
The Intel Fortran Compiler works well but requires lib32-gcc even though I am compiling the 64 bit version. Without it the installation fails without giving any error messages, so it took me a while to figure out the problem.

OpenMPI
OpenMPI must be compiled with ifort as the fortran compiler because gfortran, and g95 both seem unable to compile VASP as of now. The process will be as easy as a simple substitution of 'ifort' for 'gfortran' or 'g95' in the make file. to keep my system organized I used the PKGBUILD available in the Arch Linux User Repository (AUR) and changed gfortran to ifort.

fftw 3.x
After many attempts at compiling VASP I stumbled onto this article on Intel's website. I followed all of their advice except for one slight deviation. I do not have the Intel C compiler so I used the Gnu Compiler.

Code: Select all

make libem64t compiler=gnu
The resulting binary worked well for me. I should note that the assumed location of the file on their website was "/opt/intel/mkl/10.2.0.013/interfaces/fftw3xf" but the default location is actually "/opt/intel/Compiler/11.1/072/mkl/interfaces/fftw3xf" for the current compiler, you will have to find yours.

Blas and Lapack
Not much for me to do here, I just used the Blas and Lapack libraries from the regular Arch Linux repositories.

VASP Libraries
I edited the FC line of the makefile.linux_efc_itanium file slightly to use open mpi in conjunction with ifort

Code: Select all

.SUFFIXES: .inc .f .F

CPP     = gcc -E -P -C $*.F >$*.f
FC=mpiifort

CFLAGS = -O
FFLAGS = -O1 -FI
FREE   =  -FR

DOBJ =  preclib.o timing_.o derrf_.o dclock_.o  diolib.o dlexlib.o drdatab.o


#-----------------------------------------------------------------------
# general rules
#-----------------------------------------------------------------------

libdmy.a: $(DOBJ) lapack_double.o linpack_double.o lapack_atlas.o
	-rm libdmy.a
	ar vq libdmy.a $(DOBJ)

# files which do not require autodouble 
lapack_min.o: lapack_min.f
	$(FC) $(FFLAGS) $(NOFREE) -c lapack_min.f
lapack_double.o: lapack_double.f
	$(FC) $(FFLAGS) $(NOFREE) -c lapack_double.f
lapack_single.o: lapack_single.f
	$(FC) $(FFLAGS) $(NOFREE) -c lapack_single.f
lapack_atlas.o: lapack_atlas.f
	$(FC) $(FFLAGS) $(NOFREE) -c lapack_atlas.f
linpack_double.o: linpack_double.f
	$(FC) $(FFLAGS) $(NOFREE) -c linpack_double.f
linpack_single.o: linpack_single.f
	$(FC) $(FFLAGS) $(NOFREE) -c linpack_single.f

.c.o:
	$(CC) $(CFLAGS) -c $*.c
.F.o:
	$(CPP) 
	$(FC) $(FFLAGS) $(FREE) $(INCS) -c $*.f
.F.f:
	$(CPP) 
.f.o:
	$(FC) $(FFLAGS) $(FREE) $(INCS) -c $*.f
VASP
I once again edited the makefile.linux_efc_itanium file to suit my needs. The changes I made were
- set the fortran compiler to mpiifort
- change the fortran flags line to match that on Intel's website as mentioned above
- add -heap-arrays to FFLAGS to avoid segfaults (as per this forum post)
- change the BLAS, LAPACK and FFT3D lines to search the proper location for the libraries they need
- uncomment a few lines in the mpi section

Code: Select all

.SUFFIXES: .inc .f .f90 .F
SUFFIX=.f90

FFLAGS =  -I/opt/intel/Compiler/11.1/072/mkl/include/fftw -FR -lowercase -assume byterecl -ftz -heap-arrays

#-----------------------------------------------------------------------
# optimization
# -O3 seems best
#-----------------------------------------------------------------------

OFLAG=-O3

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH = 
OBJ_NOOPT = 
DEBUG  = -FR -O0
INLINE = $(OFLAG)


#-----------------------------------------------------------------------
# the following lines specify the position of BLAS  and LAPACK
#-----------------------------------------------------------------------

BLAS= -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t/ -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

# use the mkl Intel lapack
LAPACK= -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t/ -lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

#-----------------------------------------------------------------------

LIB  = -L../vasp.5.lib -ldmy \
     ../vasp.5.lib/linpack_double.o $(LAPACK) \
     $(BLAS)

# options for linking (for compiler version 6.X) nothing is required
LINK    =  

#=======================================================================
# MPI section
#
# the system we used is an SGI test system, and it is best
# to compile using ifort and adding the option -lmpi during
# linking
#=======================================================================

FC=mpiifort
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf               charge density   reduced in Z direction
# wNGZhalf              gamma point only reduced in Z direction
# scaLAPACK             use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

CPP    = $(CPP_) -DMPI  -DHOST=\"LinuxIFCmkl\" -DIFC \
     -Dkind8 -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
     -DMPI_BLOCK=8000 \
     -DRPROMU_DGEMV  -DRACCMU_DGEMV

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

LIB     = -L../vasp.5.lib -ldmy  \
      ../vasp.5.lib/linpack_double.o $(LAPACK) \
      $(SCA) $(BLAS) \
      -lmpi

FFT3D   = fftmpi.o fftmpi_map.o fftw3d.o   fft3dlib.o  /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libfftw3xf_gnu.a

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC=   symmetry.o symlib.o   lattlib.o  random.o   

SOURCE=  base.o     mpi.o      smart_allocate.o      xml.o  \
         constant.o jacobi.o   main_mpi.o  scala.o   \
         asa.o      lattice.o  poscar.o   ini.o       xclib.o     xclib_grad.o \
         radial.o   pseudo.o   mgrid.o    gridq.o     ebs.o  \
         mkpoints.o wave.o     wave_mpi.o  wave_high.o  \
         $(BASIC)   nonl.o     nonlr.o    nonl_high.o dfast.o    choleski2.o \
         mix.o      hamil.o    xcgrad.o   xcspin.o    potex1.o   potex2.o  \
         metagga.o constrmag.o cl_shift.o relativistic.o LDApU.o \
         paw_base.o egrad.o    pawsym.o   pawfock.o  pawlhf.o    paw.o   \
         mkpoints_full.o       charge.o   dipol.o    pot.o  \
         dos.o      elf.o      tet.o      tetweight.o hamil_rot.o \
         steep.o    chain.o    dyna.o     sphpro.o    us.o  core_rel.o \
         aedens.o   wavpre.o   wavpre_noio.o broyden.o \
         dynbr.o    rmm-diis.o reader.o   writer.o   tutor.o xml_writer.o \
         brent.o    stufak.o   fileio.o   opergrid.o stepver.o  \
         chgloc.o   fast_aug.o fock.o     mkpoints_change.o sym_grad.o \
         mymath.o   internals.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
         hamil_high.o nmr.o    force.o \
         pead.o     subrot.o   subrot_scf.o pwlhf.o  gw_model.o optreal.o   davidson.o \
         electron.o rot.o  electron_all.o shm.o    pardens.o  paircorrection.o \
         optics.o   constr_cell_relax.o   stm.o    finite_diff.o elpol.o    \
         hamil_lr.o rmm-diis_lr.o  subrot_cluster.o subrot_lr.o \
         lr_helper.o hamil_lrf.o   elinear_response.o ilinear_response.o \
         linear_optics.o linear_response.o   \
         setlocalpp.o  wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
         ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
         ump2.o bse.o acfdt.o chi.o sydmat.o 

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o 
	rm -f vasp
	$(FCL) -o vasp main.o  $(SOURCE)   $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
	$(FCL) -o makeparam  $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
	$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
	$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB) 
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
	$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
	$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:	
	-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
	$(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
	$(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
	$(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
	$(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F 
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
	$(CPP)
	$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
	$(CPP)
	$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
	$(CPP)
	$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
	$(CPP)
	$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
	$(CPP)
$(SUFFIX).o:
	$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
#   in one compiler version, stays in the list forever)
# performance penalities are small however


fft3dlib.o : fft3dlib.F
	$(CPP)
	$(FC) -FR -lowercase -O3 -ip -ftz -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

radial.o : radial.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

rot.o : rot.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

symlib.o : symlib.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

acfdt.o : acfdt.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

chi.o : chi.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)
poscar.o : poscar.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

chi_base.o : chi_base.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

symmetry.o : symmetry.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

pead.o : pead.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

dynbr.o : dynbr.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

electron_all.o : electron_all.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

asa.o : asa.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

broyden.o : broyden.F
	$(CPP)
	$(FC) -FR -lowercase -O2 -ftz -c $*$(SUFFIX)

us.o : us.F
	$(CPP)
	$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

LDApU.o : LDApU.F
	$(CPP)
	$(FC) -FR -lowercase -O2 -ftz -c $*$(SUFFIX)
Then to test the parallel build against the serial build I ran each of the Hands On Example files found here. I found that I could run every example with no errors and that the parallel build runs faster than the serial build for every one of the included examples.

Good Luck!
Last edited by the_big_guy on Sun Apr 25, 2010 10:25 pm, edited 1 time in total.

support_vasp
Global Moderator
Global Moderator
Posts: 1817
Joined: Mon Nov 18, 2019 11:00 am

Re: Parallel Vasp successfully compiled (AMD x86_64, 4 core, OpenMPI, Blas, Intel-Fortran-Comp.)

#2 Post by support_vasp » Wed Sep 04, 2024 12:23 pm

Hi,

We're sorry that we didn’t answer your question. This does not live up to the quality of support that we aim to provide. The team has since expanded. If we can still help with your problem, please ask again in a new post, linking to this one, and we will answer as quickly as possible.

Best wishes,

VASP


Locked