vasp 5.2 parallel version running problem

Message

neo · #1 Post by **neo** » Fri Jan 14, 2011 7:07 pm

I have compiled vasp5.2 on ununtu 10.04 using intel 11.1 ifort compiler and mkl library. The serial version worked fine. I used mpich2 distributed by argon national labs for parallelizaion. The parallel version vasp got compiled but when i tried to run it i got following error. I tried to fix it by adding mpi libraries but couldn't do that? If somebody have any idea please reply ?

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libmpi.so.0 00002AD65FA90A66 Unknown Unknown Unknown
vasp 00000000009F711B Unknown Unknown Unknown
vasp 0000000000443316 Unknown Unknown Unknown
vasp 00000000004589A1 Unknown Unknown Unknown
vasp 00000000004065B4 Unknown Unknown Unknown
vasp 00000000004064EC Unknown Unknown Unknown
libc.so.6 00002AD6601B8C4D Unknown Unknown Unknown
vasp 00000000004063E9 Unknown Unknown Unknown

makefile is as below

.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler for ITANIUM 2 systems
#
# The makefile was tested only under Linux on Intel platforms
# the following compiler versions have been tested
# ifort 9.1.051
# ifort 10.0.023
#
# it might be required to change some library pathes, since
# LINUX installation vary a lot
# Hence check ***ALL**** options in this makefile very carefully
#-----------------------------------------------------------------------
#
# BLAS must be installed and properly functioning on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) most desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 2a) Intels own optimised BLAS (PIII, P4, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent when you use Intel CPU's
#
# 2b) probably also possible
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# http://www.tacc.utexas.edu/resources/software/
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC=ifort
# fortran linker
FCL=$(FC)

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# PGF90 work around some for some PGF90 / IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DHOST=\"LinuxIFCmkl\" \
-Dkind8 -DCACHE_SIZE=16000 -DPGF90 -Davoidalloc -DNGXhalf \
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
# byterecl is strictly required for ifc, since otherwise
# the WAVECAR file becomes huge
# ftz works around a known Itanium bug
#-----------------------------------------------------------------------

FFLAGS = -FR -lowercase -assume byterecl -ftz

#-----------------------------------------------------------------------
# optimization
# -O3 seems best
#-----------------------------------------------------------------------

OFLAG=-O3
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
#-----------------------------------------------------------------------

# use the mkl Intel libraries for Itanium (www.intel.com)
# set -DRPROMU_DGEMV -DRACCMU_DGEMV -DNBLK_default=64
# in the CPP lines
BLAS=-L/opt/intel/Compiler/11.0/083/mkl/lib/em64t -lmkl -lguide -lpthread

# not faster for VASP on Itanium Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# parallel goto version requires sometimes -libverbs
#BLAS= /opt/libs/libgoto/libgoto.so

# LAPACK, simplest use vasp.5.lib/lapack_double
#LAPACK= ../vasp.5.lib/lapack_double.o

# use the mkl Intel lapack
LAPACK= -lmkl_lapack -lmkl

#-----------------------------------------------------------------------

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking (for compiler version 6.X) nothing is required
LINK =

#-----------------------------------------------------------------------
# fft libraries:
# VASP.4.6 can use fftw.30 (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

FFT3D = fft3dfurth.o fft3dlib.o

# alternatively: fftw.3.1.X is slighly faster and should be used if available
#FFT3D = fftw3d.o fft3dlib.o /opt/libs/fftw-3.1.2/lib/libfftw3.a

#=======================================================================
# MPI section
#
# the system we used is an SGI test system, and it is best
# to compile using ifort and adding the option -lmpi during
# linking
#=======================================================================

FC=mpif90
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFCmkl\" -DIFC \
-Dkind8 -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf -DwNGZhalf \
-DMPI_BLOCK=8000 \
## -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------

#BLACS=$(HOME)/archives/SCALAPACK/BLACS/
#SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA= $(SCA_)/libscalapack.a \
# $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(SCA) $(BLAS) \
-lmpi

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

# fftw.3.1.2 is slighly faster and should be used if available
#FFT3D = fftmpi.o fftmpi_map.o fftw3d.o fft3dlib.o /opt/libs/fftw-3.1.2/lib/libfftw3.a

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o xclib.o xclib_grad.o \
radial.o pseudo.o mgrid.o gridq.o ebs.o \
mkpoints.o wave.o wave_mpi.o wave_high.o \
$(BASIC) nonl.o nonlr.o nonl_high.o dfast.o choleski2.o \
mix.o hamil.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o cl_shift.o relativistic.o LDApU.o \
paw_base.o egrad.o pawsym.o pawfock.o pawlhf.o paw.o \
mkpoints_full.o charge.o dipol.o pot.o \
dos.o elf.o tet.o tetweight.o hamil_rot.o \
steep.o chain.o dyna.o sphpro.o us.o core_rel.o \
aedens.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
chgloc.o fast_aug.o fock.o mkpoints_change.o sym_grad.o \
mymath.o internals.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
hamil_high.o nmr.o force.o \
pead.o subrot.o subrot_scf.o pwlhf.o gw_model.o optreal.o davidson.o \
electron.o rot.o electron_all.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o elpol.o \
hamil_lr.o rmm-diis_lr.o subrot_cluster.o subrot_lr.o \
lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o \
linear_optics.o linear_response.o \
setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
ump2.o bse.o acfdt.o chi.o sydmat.o

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp main.o $(SOURCE) $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
# in one compiler version, stays in the list forever)
# performance penalities are small however

fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -lowercase -O3 -ip -ftz -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

radial.o : radial.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

rot.o : rot.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

symlib.o : symlib.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

acfdt.o : acfdt.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

chi.o : chi.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)
poscar.o : poscar.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

chi_base.o : chi_base.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

symmetry.o : symmetry.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

pead.o : pead.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

dynbr.o : dynbr.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

electron_all.o : electron_all.F
$(CPP)
346,1-8 94% $(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

asa.o : asa.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

broyden.o : broyden.F
$(CPP)
$(FC) -FR -lowercase -O2 -ftz -c $*$(SUFFIX)

us.o : us.F
$(CPP)
$(FC) -FR -lowercase -O1 -ftz -c $*$(SUFFIX)

LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -lowercase -O2 -ftz -c $*$(SUFFIX)

#2 Post by **admin** » Mon Jan 31, 2011 12:17 pm

your makefile looks ok, but please check whether all libraries that are requested ("ldd vasp" gives a list of all of them) are included in your $LD_LIBRARY_PATH during runtime, and whether the libraries are installed in a compatible way with your compiler.
please also make sure that
- you keep the serial and parallel executables in separate directories.
- your stack size is large enough (ulimit -s has to return unlimited)