Page 1 of 1

VASP 5.2 crashes on NPAR values -ne 4 ?

Posted: Fri Jul 09, 2010 10:20 am
by ariel
Hi, first post here...
I've recently successfully compiled VASP 5.2 on our new linux cluster using the (much appreciated) help found in this forum.
(em64t, MKL , openMPI everything updated to the latest version)

I seem to be facing a strange problem, though. When I try to perform the medium benchmark with NPAR =4, I get decent results, but for a crash on a test of two nodes only (16 CPUs). Whenever I alter NPAR (e.g. NPAR=1,2 or 8) every run crashes or hangs, mostly with the following output (shown here specifically compiled with traceback)

Code: Select all

forrtl: severe (71): integer divide by zero
Image              PC                   Routine            Line        Source
libmpi.so.0        00002B99EF5724A4  Unknown               Unknown  Unknown
libmpi.so.0        00002B99EF57290D  Unknown               Unknown  Unknown
libmpi.so.0        00002B99EF547794  Unknown               Unknown  Unknown
libmpi_f77.so.0    00002B99EF2E1259  Unknown               Unknown  Unknown
vasp_trace         0000000000472209  mpimy_mp_m_divide         224  mpi.f90
vasp_trace         000000000047E72C  main_mpi_mp_init_         167  main_mpi.f90
vasp_trace         0000000000438E2D  MAIN__                    370  main.f90
vasp_trace         000000000043876C  Unknown               Unknown  Unknown
libc.so.6          00002B99F0544586  Unknown               Unknown  Unknown
vasp_trace         0000000000438669  Unknown               Unknown  Unknown
My programming skills are mediocre, and my grasp of MPI is appalling at best, but even so I can't help to think that something is wrong with our openMPI configuration, or that somehow VASP receives NPAR=0 (which would make it a bug report). Other HPC software runs fine.

What would be your take on the matter?

I believe it is customary to attach a MakeFile, so here is one (interesting parts version):

Code: Select all

.SUFFIXES: .inc .f .f90 .F

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90


CPP_ =  ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)


FFLAGS = -I/usr/local/intel/Compiler/11.1/072/mkl/include/fftw -FR -lowercase -assume byterecl -ftz -heap-arrays

OFLAG=-O3 -xSSE4.2

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG  = -FR -O0
INLINE = $(OFLAG)


NEWMKLPATH=/usr/local/intel/Compiler/11.1/072/mkl/lib/em64t

BLAS= -L$(NEWMKLPATH) -lmkl_intel_lp64 -lmkl_sequential  -lmkl_core -lpthread

LAPACK= -L$(NEWMKLPATH) -lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential  -lmkl_core -lpthread

# options for linking, nothing is required (usually)
LINK    =

FC=mpif90 -traceback
FCL=$(FC)


CPP    = $(CPP_) -DMPI  -DHOST=\"LinuxIFCmkl\" -DIFC \
     -Dkind8 -DCACHE_SIZE=12000 -DPGF90 -Davoidalloc  -DNGZhalf \
     -DMPI_BLOCK=50000 -DRPROMU_DGEMV -DRACCMU_DGEMV -DscaLAPACK \
     -Duse_allreduce -Duse_collective

SCA= $(NEWMKLPATH)/libmkl_scalapack_lp64.a $(NEWMKLPATH)/libmkl_blacs_openmpi_lp64.a


LIB     = -L../vasp.5.lib -ldmy  \
      ../vasp.5.lib/linpack_double.o $(LAPACK) \
      $(SCA) $(BLAS)

FFT3D   = fftmpi.o fftmpi_map.o fftw3d.o fft3dlib.o $(HOME)/fftw3xfnew/libfftw3xf_intel.a
Thank you!

VASP 5.2 crashes on NPAR values -ne 4 ?

Posted: Mon Aug 02, 2010 11:51 am
by ariel
As there is no reply, does anyone have any idea on how to get more relevant output (some hidden flag maybe?) out of VASP so I could give more information here?