My Community

Posted: **Mon May 19, 2008 11:26 am**

Dear All,

I compiled the program with mpif90: cluster Intel Xeon E5472 / Linux x86-64 / mvapich -rev2368 (Infiniband) / Intel Compiler 10.1.008.

Executable file vasp is creating. However, I received bad output result (this example solve on other cluster without problems):
N E dE d eps ncg rms rms(c)

DAV: 1 0.252670748443E+01 0.25267E+01 -0.56691E+03 976 0.111E+03
DAV: 2 -0.201936552391E+02 -0.22720E+02 -0.21408E+02 1456 0.126E+02
DAV: 3 -0.203468874159E+02 -0.15323E+00 -0.15323E+00 1024 0.140E+01
DAV: 4 -0.203471728609E+02 -0.28545E-03 -0.28545E-03 1392 0.600E-01
DAV: 5 -0.203471730432E+02 -0.18236E-06 -0.18240E-06 968 0.766E-03
BRMIX: very serious problems
the old and the new charge density differ
old charge density: 8.08838 new 8.03933
0.569E+00

My Makefile for 4.6.31 version:

.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler for ITANIUM 2 systems
#
# The makefile was tested only under Linux on Intel platforms
#
#
# it might be required to change some of library pathes, since
# LINUX installation vary a lot
# Hence check ***ALL**** options in this makefile very carefully
#-----------------------------------------------------------------------
#
# BLAS must be installed and properly functioning on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) most desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 3a) Intels own optimised BLAS (PIII, P4, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent when you use Intel CPU's
#
# 3b) or obtain the atlas based BLAS routines
# http://math-atlas.sourceforge.net/
# you certainly need atlas on the Athlon, since the mkl
# routines are not optimal on the Athlon.
# If you want to use atlas based BLAS, check the lines around LIB=
#
# 3c) a little bit faster than mkl and atls (5 GFlops on Itanium 2, 1.3 GHz)
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
# libgoto_it2-r0.9.so seems to be buggy however !!
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
#FC=efc
#fortran linker
#FCL=$(FC)

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# IFC work around some IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (usually faster)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (faster on P4)
#-----------------------------------------------------------------------

#CPP = $(CPP_) -DHOST=\"LinuxEFC_mkl\" \
# -Dkind8 -DNGXhalf -DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
# -DRPROMU_DGEMV -DRACCMU_DGEMV -DNBLK_default=64 -Duse_cray_ptr

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
# -cm suppress all comment messages
# -w95 suppress messages about use of non-standard Fortran
# -tpp2 optimize for ITANIUM2
# -ftz flush denormal results to 0
# -stack_temps save temporary dynamic arrays on stack instead of heap
# this improves performance, but requires to set stacklimits
# using ulimit or limit
#-----------------------------------------------------------------------

#FFLAGS = -FR -lowercase -cm -w95 -tpp2 -safe_cray_ptr -stack_temps
#FFLAGS = -FR -lowercase -cm -w95 -tpp2 -safe_cray_ptr
FFLAGS = -FR -lowercase -xT -cm -w95

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# but -O3 seems to result in the best overall performance
# -ip inlining in file
# -ivdep_parallel
#-----------------------------------------------------------------------

# best (unroll0 improves performance slightly, without unroll0 compilations
# fails for a couple of files)
#OFLAG=-O3 -unroll0 -ivdep_parallel -fno-alias
OFLAG=-O2

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
#-----------------------------------------------------------------------

# Atlas based libraries
#ATLASHOME= $(HOME)/archives/Linux_IA64Itan_2/lib/

# use specific libraries (default library path points to other libraries)
#BLAS= -L$(ATLASHOME) $(ATLASHOME)/libf77blas.a $(ATLASHOME)/libatlas.a

# use the mkl Intel libraries for Itanium (www.intel.com)
# set -DRPROMU_DGEMV -DRACCMU_DGEMV -DNBLK_default=64
# in the CPP lines
#BLAS=-L/opt/intel/mkl/9.0/lib/em64t/ -lguide
#BLAS=-L/opt/intel/mkl/9.0/lib/em64t/ -lmkl_ipf -lguide
#BLAS=-L/opt/intel/mkl/9.0/lib/em64t/ -lmkl_i2p -lmkl_vml_i2p -lguide

#BLAS=-L/opt/intel/mkl/9.0/lib/64/ -lguide

# Kazushige Goto's BLAS seem to be buggy as of r0.9
# please do not use it
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
#BLAS = /opt/libs/libgoto-it2/libgoto_it2-r0.95.so

# LAPACK, simplest use vasp.4.lib/lapack_double
LAPACK= ../vasp.4.lib/lapack_double.o
BLAS = ../vasp.4.lib/blas.o

# use atlas optimized part of lapack
#LAPACK= ../vasp.4.lib/lapack_atlas.o -llapack -lcblas

# use the mkl Intel lapack
#LAPACK= -L/opt/intel/mkl/9.0/lib/em64t/ -lmkl_lapack64
#LAPACK= -L/opt/intel/mkl/9.0/lib/em64t/ -lmkl_lapack

#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking (for compiler version 6.X) nothing is required
LINK =

#-----------------------------------------------------------------------
# fft libraries:
# VASP.4.6 can use fftw.30 (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

#FFT3D = fft3dfurth.o fft3dlib.o
# fftw is a little bit slower than fft3dfurth on Itanium (cheers Juergen)
#FFT3D = fftw3d.o fft3dlib.o /opt/libs/fftw-3.0.1/lib/libfftw3.a

#=======================================================================
# MPI section, uncomment the following lines
#
# one comment for users of mpich or lam:
# You must *not* compile mpi with g77/f77, because f77/g77
# appends *two* underscores to symbols that contain already an
# underscore (i.e. MPI_SEND becomes mpi_send__). The ifc/efc
# compiler however appends only one underscore.
# Precompiled mpi version usually do not work !!!
#
# lam-7.0.4 has been used and configured using
# ./configure -prefix /opt/libs/lam-7.0.4 -with-fc=efc --with-f77flags=-O \
# --without-romio
#
# please note that you might be able to use a lam or mpich version
# compiled with f77/g77, but then you need to add the following
# options: -Msecond_underscore (compilation) and -g77libs (linking)
#
# !!! Please do not send me any queries on how to install MPI, we will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi: if you use LAM and compiled it with the options
# suggested above, you can use the following line
#-----------------------------------------------------------------------

FC=mpif90
#fortran linker
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=500 -DPROC_GROUP=8 \
-DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------

BLACS=$(HOME)/archives/SCALAPACK/BLACS/
SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

SCA= $(SCA_)/libscalapack.a \
$(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

#LIB = -L../vasp.4.lib -ldmy \
# ../vasp.4.lib/linpack_double.o $(LAPACK) \
# $(SCA) $(BLAS)

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o

# fftw.3.0 is slighly faster and should be used if available
#FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o /opt/libs/fftw-3.0/lib/libfftw3.a

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o setex.o radial.o \
pseudo.o mgrid.o mkpoints.o wave.o wave_mpi.o $(BASIC) \
nonl.o nonlr.o dfast.o choleski2.o \
mix.o charge.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o pot.o cl_shift.o force.o dos.o elf.o \
tet.o hamil.o steep.o \
chain.o dyna.o relativistic.o LDApU.o sphpro.o paw.o us.o \
ebs.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
dipol.o xclib.o chgloc.o subrot.o optreal.o davidson.o \
edtest.o electron.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o \
elpol.o setlocalpp.o aedens.o

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp $(LINK) main.o $(SOURCE) $(FFT3D) $(LIB)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.f *.o *.L ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
#
# efc 7.1 seems to be reasonably buggy

fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)

fftmpi.o : fftmpi.F
$(CPP)
$(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)

symlib.o : symlib.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)

us.o : us.F
$(CPP)
$(FC) $(FFLAGS) -O1 -c $*$(SUFFIX)

wave.o : wave.F
$(CPP)
$(FC) $(FFLAGS) -O0 -c $*$(SUFFIX)

LDApU.o : LDApU.F
$(CPP)
$(FC) $(FFLAGS) -O2 -c $*$(SUFFIX)

Please, help me to solve this problem.
Thank you in advance for your help,

srccmsu

Posted: **Mon May 19, 2008 1:19 pm**

please check if that problem can be solved if hyper-threading is switched off explicitely (OMP_NUM_THREADS=1 in your shell-rc file).
all reasonable parallelization is done explicitely in the VASP-code, further 'automatic' parallelization by the compiler may lead to errors like the one you encountered.

Posted: **Mon May 19, 2008 5:23 pm**

Dear Admin,

Very thanks for your replay. But cluster Intel Xeon E5472 have not hyper-threading. Can you help me understand what is another problem here?

srccmsu

Posted: **Fri May 23, 2008 1:04 pm**

Does this help?

http://cms.mpi.univie.ac.at/vasp-forum/ ... php?2.3858

Good luck,

Thomas.

Posted: **Wed May 28, 2008 5:42 pm**

Dear Thomas,

I'm very grateful to you for your help. The program VASP_4.6.34 runs at serial and parallel modes now.

Yours sincerely,

srccmsu

<span class='smallblacktext'>[ Edited Wed May 28 2008, 07:44PM ]</span>

My Community

Compilation problem on Intel Xeon E5472

Compilation problem on Intel Xeon E5472

Compilation problem on Intel Xeon E5472

Compilation problem on Intel Xeon E5472

Compilation problem on Intel Xeon E5472

Compilation problem on Intel Xeon E5472