Collective abort of all ranks
Posted: Fri Mar 15, 2013 11:56 am
Good morning
For large system (160 atoms) VASP crash at the beginning of the calculation with MPI errors, see below :
Do you have any idea on how can I fix this ? I tried to compile VASP with -Ddebug fpp flag but it does not work, I get the foloowing error :
I also had got this error, but I temporarly fix it by renaming NY in NYY
Thanks
For large system (160 atoms) VASP crash at the beginning of the calculation with MPI errors, see below :
Code: Select all
?running?on???72?total?cores
?distrk:??each?k-point?on???72?cores,????1?groups
?distr:??one?band?on???12?cores,????6?groups
?using?from?now:?INCAR?????
?vasp.5.3.2?13Sep12?(build?Nov?05?2012?17:04:05)?complex????????????????????????
??
?POSCAR?found?type?information?on?POSCAR??Li?Co?O??S?
?POSCAR?found?:??4?types?and?????163?ions
?LDA?part:?xc-table?for?Ceperly-Alder,?standard?interpolation
?POSCAR,?INCAR?and?KPOINTS?ok,?starting?setup
?FFT:?planning?...
?WAVECAR?not?read
?entering?main?loop
???????N???????E?????????????????????dE?????????????d?eps???????ncg?????rms??????????rms(c)
rank?65?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
??exit?status?of?rank?65:?killed?by?signal?9?
rank?20?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
??exit?status?of?rank?20:?killed?by?signal?9?
rank?18?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
??exit?status?of?rank?18:?killed?by?signal?9?
rank?34?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
??exit?status?of?rank?34:?killed?by?signal?9?
rank?33?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
??exit?status?of?rank?33:?killed?by?signal?9?
rank?32?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
??exit?status?of?rank?32:?killed?by?signal?9?
rank?46?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
??exit?status?of?rank?46:?killed?by?signal?9?
rank?42?in?job?1??node252.cm.cluster_35718???caused?collective?abort?of?all?ranks
ect?....
Code: Select all
fpp?-f_com=no?-free?-w0?pawlhf.F?pawlhf.f90??-DMPI??-DHOST=\"LinuxIFC\"?-DIFC?-DCACHE_SIZE=8000?-DPGF90?-Davoidalloc?-DNGZhalf?-DMPI_BLOCK=10000?-Duse_collective?-DRPROMU_DGEMV??-DRACCMU_DGEMV?-Ddebug
mpiifort??-FR?-names?lowercase?-assume?byterecl?-m64?-warn?nousage?-g?-traceback??-I/cm/shared/apps/intel/icsxe/2012.0.032/mkl/include/fftw??-c?pawlhf.f90
pawlhf.F(1169):?error?#6404:?This?name?does?not?have?a?type,?and?must?have?an?explicit?type.???[DFOCKAE]
??????CALL?DUMP_DLLMM(?"ONE-CENTRE-CORRECTION?AE",DFOCKAE,?PP)
--------------------------------------------------^
pawlhf.F(1169):?error?#6634:?The?shape?matching?rules?of?actual?arguments?and?dummy?arguments?have?been?violated.???[DFOCKAE]
??????CALL?DUMP_DLLMM(?"ONE-CENTRE-CORRECTION?AE",DFOCKAE,?PP)
--------------------------------------------------^
compilation?aborted?for?pawlhf.f90?(code?1)
make:?***?[pawlhf.o]?Erreur?1
Code: Select all
fpp -f_com=no -free -w0 xcspin.F xcspin.f90 -DMPI -DHOST=\"LinuxIFC\" -DIFC -DCACHE_SIZE=8000 -DPGF90 -Davoidalloc -DNGZhalf -DMPI_BLOCK=10000 -Duse_collective -DRPROMU_DGEMV -DRACCMU_DGEMV -Ddebug
mpiifort -FR -names lowercase -assume byterecl -m64 -warn nousage -g -traceback -check bounds -I/cm/shared/apps/intel/icsxe/2012.0.032/mkl/include/fftw -c xcspin.f90
xcspin.F(1271): error #6423: This name has already been used as an external function name. [NY]
DO NY=1,GRIDC%NGY
---------^
compilation aborted for xcspin.f90 (code 1)
make: *** [xcspin.o] Erreur 1