Page 1 of 1

Problem of running VASP 4.6 on dual socket-six core nodes

Posted: Thu Feb 24, 2011 9:51 pm
by cosmos
Dear VASP administrator and users,

I compiled vasp 4.6.31 on our new machine which has dual socket-six core (12 cores) each node, using OPENMPI, intel ifort 12.0 compiler and MKL 10.0.23. The compilation is finished fine. However when I run the code in parallel, the calculations will proceed for a few minutes but then hang in there with no progress. I looked into OUTCAR and found that the program read in the input and then (seems) stops at 3dFFT transformation as in the following I copied from the OUTCAR:
......
......
k-point 1 : 0.00000.00000.0000 plane waves: 3511
k-point 2 : 0.20000.00000.0000 plane waves: 3569
k-point 3 : 0.40000.00000.0000 plane waves: 3598
k-point 4 : 0.40000.20000.0000 plane waves: 3609
k-point 5 : -.40000.20000.0000 plane waves: 3653

maximum and minimum number of plane-waves per node : 3653 3511

maximum number of plane-waves: 3653
maximal index in each direction:
IXMAX= 5 IYMAX= 4 IZMAX= 39
IXMIN= -5 IYMIN= -5 IZMIN=-39

WARNING: wrap around error must be expected set NGX to 22
NGY is ok and might be reduce to 20
NGZ is ok and might be reduce to 158

parallel 3dFFT wavefunction:
minimum data exchange during FFTs selected (reduces bandwidth)

(NO MORE OUTPUT after this in the OUTCAR)

I asked our administrator to do some tests and he found out that the program (with a simple test) "runs fine on up to 16 procs, on 18+ procs it deadlocks. Then it also runs fine at 32 procs. I looked at where it deadlocks, and, some processes are at an MPI_Allreduce statement and others are at MPI_Barrier. The deadlock
is in a routine which divides the domain for the 3D FFT transform." And it seems that the number of processors used should be powers of 2. So that means some processors are just not used when the job is running.

So my question is how to circumvent this problem? Is there anything we can play with in the Makefile or in the INCAR file?

Thanks in advance.

Problem of running VASP 4.6 on dual socket-six core nodes

Posted: Fri Feb 25, 2011 7:34 am
by alex
Do you use MKL fft or some other?
There is also the possibility that you use a threated fft, which is of no use in this application.

Cheers,

alex

Problem of running VASP 4.6 on dual socket-six core nodes

Posted: Fri Feb 25, 2011 9:47 pm
by cosmos
Alex, thank you very much for your reply.

I didn't use MKL FFT. I used the one with VASP. I didn't use threaded FFT either.


[quote="alex"]Do you use MKL fft or some other?
There is also the possibility that you use a threated fft, which is of no use in this application.

Cheers,

alex[/quote]