Page 1 of 1

Fatal error in MPI_Allreduce: Other MPI error, error stack

Posted: Sun Apr 23, 2023 3:33 am
by guanzhi_li
Hello everyone,

I'm hoping to get some help with an error message when running VASP.

I was doing structural relaxation of a magnetic molecule in crystal structure and got an error message when switching from ALGO = Normal to ALGO = All. There was no error when using ALGO = Normal but it was hard to converge. So I decided to use ALGO = All but got an error message.
I was running vasp 5.4.4 on NERSC Cori with the following mudules loaded:

Code: Select all

Currently Loaded Modulefiles:
  1) modules/3.2.11.4                                  9) pmi/5.0.17                                       17) atp/3.14.9
  2) darshan/3.4.0                                    10) dmapp/7.1.1-7.0.3.1_3.44__g93a7e9f.ari           18) perftools-base/21.12.0
  3) craype-network-aries                             11) gni-headers/5.0.12.0-7.0.3.1_3.27__gd0d73fe.ari  19) PrgEnv-intel/6.0.10
  4) intel/19.1.2.254                                 12) xpmem/2.2.27-7.0.3.1_3.28__gada73ac.ari          20) craype-haswell
  5) craype/2.7.10                                    13) job/2.2.4-7.0.3.1_3.35__g36b56f4.ari             21) cray-mpich/7.7.19
  6) cray-libsci/20.09.1                              14) dvs/2.12_2.2.224-7.0.3.1_3.45__gc77db2af         22) craype-hugepages2M
  7) udreg/2.3.2-7.0.3.1_3.45__g5f0d670.ari           15) alps/6.6.67-7.0.3.1_3.43__gb91cd181.ari          23) vasp/5.4.4-hsw
  8) ugni/6.0.14.0-7.0.3.1_6.26__g8101a58.ari         16) rca/2.2.20-7.0.3.1_3.48__g8e3fb5b.ari            24) Base-opts/2.4.142-7.0.3.1_3.23__g8f27585.ari
The running log is:

Code: Select all

 running on  128 total cores
 distrk:  each k-point on  128 cores,    1 groups
 distr:  one band on   16 cores,    8 groups
 using from now: INCAR
 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Mar 12 2022 03:47:05) gamma-only

 POSCAR found type information on POSCAR  Mn O  N  C  H  I
 POSCAR found :  6 types and     286 ions
 scaLAPACK will be used

 -----------------------------------------------------------------------------
|                                                                             |
|  ADVICE TO THIS USER RUNNING 'VASP/VAMP'   (HEAR YOUR MASTER'S VOICE ...):  |
|                                                                             |
|      You have a (more or less) 'large supercell' and for larger cells       |
|      it might be more efficient to use real space projection opertators     |
|      So try LREAL= Auto  in the INCAR   file.                               |
|      Mind:          For very  accurate calculation you might also keep the  |
|      reciprocal projection scheme          (i.e. LREAL=.FALSE.)             |
|                                                                             |
 -----------------------------------------------------------------------------

 LDA part: xc-table for Pade appr. of Perdew
 found WAVECAR, reading the header
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ...
 reading WAVECAR
 the WAVECAR file was read successfully
 charge-density read from file: Mn3
 magnetization density read from file 1
 initial charge from wavefunction
 entering main loop

 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|      ALGO=A and IALGO=5X tend to fail with the tetrahedron method           |
|      (e.g. Bloechls method ISMEAR=-5 is not variational)                    |
|      please switch to IMSEAR=0-n, except for DOS calculations               |
|      For DOS calculations use IALGO=53 after preconverging with ISMEAR>=0   |
|          I HOPE YOU KNOW, WHAT YOU ARE  DOING                               |
|                                                                             |
 -----------------------------------------------------------------------------

       N       E                     dE             d eps       ncg     rms          ort
 gam= 0.000 g(H,U,f)=  0.143E+02 0.000E+00       NaN ort(H,U,f) = 0.000E+00 0.000E+00       NaN
SDA:   1    -0.185963621438E+04   -0.18596E+04    0.00000E+00   736         NaN       NaN
 gam= 0.000 trial= 0.400  step=   NaN mean= 0.400
 gam= 0.000 trial= 2.600  step= 2.600 mean= 0.631
Rank 19 [Sat Apr 22 20:13:33 2023] [c1-0c0s7n3] Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(212).....................: MPI_Recv(buf=0x100082ff720, count=0, MPI_BYTE, src=45, tag=9, comm=0xc4000015, status=0x7ffffffdcf00) failed
MPIDI_CH3U_Receive_data_found(144): Message from rank 45 and tag 9 truncated; 2304 bytes received but buffer size is 0
Rank 66 [Sat Apr 22 20:13:33 2023] [c1-0c0s8n1] Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(212).......................: MPI_Recv(buf=0x100082dc860, count=0, MPI_BYTE, src=33, tag=9, comm=0xc4000019, status=0x7ffffffdcf00) failed
MPIDI_CH3U_Request_unpack_uebuf(595): Message truncated; 2304 bytes received but buffer size is 0
Rank 74 [Sat Apr 22 20:13:33 2023] [c1-0c0s8n1] Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(212).......................: MPI_Recv(buf=0x10008300740, count=0, MPI_BYTE, src=37, tag=9, comm=0xc4000017, status=0x7ffffffdcf00) failed
MPIDI_CH3U_Request_unpack_uebuf(595): Message truncated; 2304 bytes received but buffer size is 0
My INCAR is

Code: Select all

SYSTEM = Mn3
####
ISTART = 1
ICHARG = 1
ALGO = All

####
PREC = Accurate
NCORE = 16
ENCUT = 500
#NBANDS = 768
#NELECT = 1040
#NGX = 192
#NGY = 192
#NGZ = 210

#### electron & strut ####
EDIFF = 1E-7
#NELMIN = 10
NELM = 300
IBRION = 2
#ISIF = 3
EDIFFG = -1E-3
NSW = 50

#### sym ####
ISYM = -1

#### mag ####
ISPIN = 2
MAGMOM =  12*6 274*0

#### CHG & WAV ####
#ICHARG = 11
LMAXMIX = 4
#LWAVE = .F.

#### dos ####
ISMEAR = -2
FERWE  = 575*1 0 1 159*0
FERDO  = 528*1 0 207*0
NBANDS = 736
#ISMEAR = 0
#SIGMA = 0.05
#NEDOS = 2001
#EMIN = -10
#EMAX = 10
LORBIT = 10

#### vdW ####
IVDW = 11

#### LDA+U ####
LDAU = T
LDAUTYPE = 1
LDAUPRINT = 1
LDAUL =   2   -1   -1   -1   -1   -1
LDAUU = 2.8  0.0  0.0  0.0  0.0  0.0
LDAUJ = 0.9  0.0  0.0  0.0  0.0  0.0
#LASPH = T
Can anyone offer any advice or suggestions? I'd really appreciate any help you can provide.

Thanks in advance!

Re: Fatal error in MPI_Allreduce: Other MPI error, error stack

Posted: Tue Apr 25, 2023 10:15 pm
by ferenc_karsai
In the beginning of the calculation there are already some NaNs.
That means there is something wrong with the setup and the code cannot get proper starting values.

You have so many tags activated, +U with van der Waals, etc.

Try to switch on/off things and see what causes the error.
Start by switching off van der Waals.
Also +U.
Try different values for MAGMOM.