How to install intel mkl scalapack

Message

zhangyg · #1 Post by **zhangyg** » Thu Mar 20, 2008 3:02 am

Dear All:

I have a cluster made of Xeon quad-core on each node. The nodes are connected by infiniband. the OS is Redhat linux kernel 2.6.9 with x86_64. I had installed successfully ifort 10, mvapich-1.0, mkl 10. VASP already runs smoothly with mkl blas and lapack. Now I want to use intel mkl scalapack. Comiling was smooth, but when I run VASP, it gives me segmentation fault and a lot of unknowns for libc.so.6 and libpthread.so.0. Searching previous messages, the head admin pointed out it may be due to incompatibility. I have built everything using intel 10 (i.e., mvapich). It seems libc and libpthread are from the operating system. So I do not know what I can do with them. Could someone kindly provide a solution / remedy (or simply modification of the makefile) to such a problem or where to look at. In addition, could someone who had used scalapack on infiniband tell me the potential gain in performance and scalability. If it is not great, then I would give up the try as I can already run VASP with blas and lapack.

with great thanks.

yigang zhang

Following are my makefile (not working yet)
.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler for Operton systems

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
#FC=ifort
# fortran linker
#FCL=$(FC)

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# IFC work around some IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
#-----------------------------------------------------------------------

#CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
# -Dkind8 -DNGXhalf -DCACHE_SIZE=12000 -DPGF90 -Davoidalloc \
# -Duse_cray_ptr
## -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
#-----------------------------------------------------------------------

FFLAGS = -FR -lowercase -assume byterecl

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# -axK SSE1 optimization, but also generate code executable on all mach.
# xK improves performance somewhat on XP, and a is required in order
# to run the code on older Athlons as well
# -xW SSE2 optimization
# -axW SSE2 optimization, but also generate code executable on all mach.
# -axP
# -xT for quad-core Xeon
# -tpp6 P3 optimization
# -tpp7 P4 optimization
#-----------------------------------------------------------------------

#OFLAG=-O3 -xW
OFLAG=-O3 -xPT

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =

OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
# on Operton, you really need the libgoto library
#-----------------------------------------------------------------------

#myblas
#BLAS=-L/usr/lib64 -lblas
#-------------------------------------
# Atlas based libraries
#ATLASHOME= /zhang_jobs/ATLAS/Linux_UNKNOWNSSE2_8
#BLAS = -L$(ATLASHOME) -lf77blas -latlas

# use atlas optimized part of lapack
#LAPACK= ../vasp.4.lib/lapack_atlas.o -llapack -lcblas
#-----------------------------------------------
# LAPACK, simplest use vasp.4.lib/lapack_double
#LAPACK= ../vasp.4.lib/lapack_double.o
#-------------------------------------------------------------------
# use intel MKL lib
# need to set -DRPROMU-DGEMV & -DRACCMU-DGEMV
BLAS=
#BLAS = -L/opt/intel/mkl/10.0.1.014/lib/em64t \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_intel_lp64.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_intel_thread.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_core.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libguide.a
# -lmkl_em64t \
# -lguide -lpthread
# -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
# /opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_em64t.a
LAPACK =
#LAPACK = -L/opt/intel/mkl/10.0.1.014/lib/em64t \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_lapack.a
# -lmkl_lapack \
# -lmkl_em64t -lguide -lpthread

#--------------------------------------------------------------
#even faster Kazushige Goto's BLAS
#BLAS = /opt/libs/libgoto/libgoto_p4_512-r0.6.so
#BLAS = /GotoBLAS/libgoto.a -lpthread

#----------------------------------------------------------------
# location of scaLAPCK
# if you do not use scaLAPACK simply umcomment the line SCA
#----------------------------------------------------------------
#BLACS=$(HOME)/archives/SCALAPACK/BLACS/
#SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA= $(SCA_)/libscalapack.a \
$(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a \
$(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a
#SCA=
SCA = -L/opt/intel/mkl/10.0.1.014/lib/em64t \
-lmkl_scalapack_lp64 \
-lmkl_blacs_lp64 \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_lapack.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_intel_lp64.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_intel_thread.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_core.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libguide.a \
-lpthread
# -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
# /opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_em64t.a \
# /opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_scalapack_lp64.a \
# /opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_blacs_lp64.a \
# /opt/intel/mkl/10.0.1.014/lib/em64t/libguide.a

#=======================================================================
# MPI section, uncomment the following lines
#-----------------------------------------------------------------------

FC=/usr/local/mvapich/bin/mpif90
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-Dkind8 -DNGZhalf -DCACHE_SIZE=6000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=1000 \
-DwNGZhalf \
-Duse_cray_ptr \
-DRPROMU_DGEMV -DRACCMU_DGEMV \
-DscaLAPACK

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(SCA) $(BLAS)
#-----------------------------------------------------------------------
# FFT libraries
#-----------------------------------------------------------------------
# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o

# fftw.3.0.1 is much faster on Opteron
#FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o /opt/libs/fftw-3.0.1/lib/libfftw3.a
#FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o \
# -I/usr/local/include \
# /usr/local/lib/libfftw3.a

# MKL wrapper
#FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o \
-I/opt/intel/mkl/10.0.1.014/include \
-I/opt/intel/mkl/10.0.1.014/include/fftw \
# ex1.f \
-L/opt/intel/mkl/10.0.1.014/lib/em64t \
-lfftw3xf_intel \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_intel_lp64.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_intel_thread.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libmkl_core.a \
/opt/intel/mkl/10.0.1.014/lib/em64t/libguide.a \
-lpthread -lm
# ex1

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o setex.o radial.o \
pseudo.o mgrid.o mkpoints.o wave.o wave_mpi.o $(BASIC) \
nonl.o nonlr.o dfast.o choleski2.o \
mix.o charge.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o pot.o cl_shift.o force.o dos.o elf.o \
tet.o hamil.o steep.o \
chain.o dyna.o relativistic.o LDApU.o sphpro.o paw.o us.o \
ebs.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
dipol.o xclib.o chgloc.o subrot.o optreal.o davidson.o \
edtest.o electron.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o \
elpol.o setlocalpp.o aedens.o

INC=

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp $(LINK) main.o $(SOURCE) $(FFT3D) $(LIB)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

clean:
-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
#
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
#
base.o: base.inc base.F
mgrid.o: mgrid.inc mgrid.F
constant.o: constant.inc constant.F
lattice.o: lattice.inc lattice.F
setex.o: setexm.inc setex.F
pseudo.o: pseudo.inc pseudo.F
poscar.o: poscar.inc poscar.F
mkpoints.o: mkpoints.inc mkpoints.F
wave.o: wave.inc wave.F
nonl.o: nonl.inc nonl.F
nonlr.o: nonlr.inc nonlr.F

$(OBJ_HIGH):
$(CPP)
$(FC) $(FFLAGS) $(OFLAG_HIGH) $(INCS) -c $*$(SUFFIX)
$(OBJ_NOOPT):
$(CPP)
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(CPP)
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

.F.o:
$(CPP)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
.F$(SUFFIX):
$(CPP)
$(SUFFIX).o:
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules
#-----------------------------------------------------------------------
# these special rules are cummulative (that is once failed
# in one compiler version, stays in the list forever)
# -tpp5|6|7 P, PII-PIII, PIV
# -xW use SIMD (does not pay of on PII, since fft3d uses double prec)
# all other options do no affect the code performance since -O1 is used
#-----------------------------------------------------------------------

fft3dlib.o : fft3dlib.F
$(CPP)
$(FC) -FR -lowercase -O1 -tpp7 -xW -prefetch- -prev_div -unroll0 -vec_report3 -c $*$(SUFFIX)
fft3dfurth.o : fft3dfurth.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

radial.o : radial.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symlib.o : symlib.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

symmetry.o : symmetry.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

dynbr.o : dynbr.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

broyden.o : broyden.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

us.o : us.F
$(CPP)
$(FC) -FR -lowercase -O1 -c $*$(SUFFIX)

wave.o : wave.F
$(CPP)
$(FC) -FR -lowercase -O0 -c $*$(SUFFIX)

LDApU.o : LDApU.F
$(CPP)
$(FC) -FR -lowercase -O2 -c $*$(SUFFIX)

#2 Post by **admin** » Mon Apr 07, 2008 6:51 am

it depends which libpthread and libc you want to load. If it's the ones in your system installation and your compiler resease is not the default compiler (release) of your system, there will be incompatibilities.