Is this memory problem or something else ..?
Posted: Mon May 16, 2016 1:14 pm
Hello all,
I am running a GW calculation on a slab with kpoints 7x6x1 and the POSCAR as:
-------------------------------------------------------------------------------------------------------------------
Li24 O24
1.00000000000000
6.3607230185999999 0.0000000000000000 0.0000000000000000
0.0000000000000000 7.7035222053999997 0.0000000000000000
0.0000000000000000 0.0000000000000000 22.0000000000000000
Li O
24 24
Selective dynamics
Direct
0.2500000000550244 0.7499999999935127 0.0417315059090910 F F F
0.7500000000078586 0.7499999999935127 0.0417315059090910 F F F
0.2500000000550244 0.0000000000000000 0.1251942677727271 F F F
0.7500000000078586 0.0000000000000000 0.1251942677727271 F F F
0.2500000000550244 0.5000000000389448 0.1251942677727271 F F F
0.7500000000078586 0.5000000000389448 0.1251942677727271 F F F
0.4999999999528342 0.7499999999935127 0.1669257699999989 F F F
0.0000000000000000 0.7499999999935127 0.1669257699999989 F F F
0.2500000980048540 0.2500000167121783 0.2052688645769223 T T T
0.7500001022620424 0.2500000194902157 0.2052688661991127 T T T
0.9999999831027253 0.0027676309394735 0.2462410069040288 T T T
0.4999999909262058 0.0027676283595071 0.2462410084879991 T T T
0.9999999896075167 0.4972323759992250 0.2462411043113875 T T T
0.4999999815450380 0.4972323738417330 0.2462411026440279 T T T
0.2500000410409200 0.7499999901229160 0.2874166347982197 T T T
0.7500000420981721 0.7499999885483817 0.2874166357634280 T T T
0.4999999384522340 0.2500000067084471 0.3264903720489727 T T T
0.9999999405782631 0.2500000010045227 0.3264903711083491 T T T
0.2499998593155297 0.9937402371787556 0.3626863213947971 T T T
0.7499998582454452 0.9937402434208664 0.3626863189092759 T T T
0.2499998194365745 0.5062597140084932 0.3626864592192405 T T T
0.7499998218586796 0.5062597186970592 0.3626864627341675 T T T
0.2500005626001496 0.2499995198580081 0.4389166628746324 T T T
0.7500005627464361 0.2499995209723096 0.4389166662839799 T T T
0.4999999999528342 0.8506900071510515 0.0834627581363705 F F F
0.0000000000000000 0.8506900071510515 0.0834627581363705 F F F
0.4999999999528342 0.6493099928359669 0.0834627581363705 F F F
0.0000000000000000 0.6493099928359669 0.0834627581363705 F F F
0.4999999999528342 0.3506900072419157 0.1669257699999989 F F F
0.0000000000000000 0.3506900072419157 0.1669257699999989 F F F
0.4999999999528342 0.1493099927970221 0.1669257699999989 F F F
0.0000000000000000 0.1493099927970221 0.1669257699999989 F F F
0.2499999210757338 0.8541036604356407 0.2044632896614758 T T T
0.7499999183219401 0.8541036767916808 0.2044632890318212 T T T
0.2500000058068181 0.6458963384943885 0.2044632928234833 T T T
0.7500000079276177 0.6458963550621775 0.2044632934887858 T T T
0.2499999271126256 0.3539066321619018 0.2869552670766993 T T T
0.7499999277660763 0.3539066663930086 0.2869552674341023 T T T
0.2499999751363404 0.1460933376666347 0.2869552552217272 T T T
0.7499999765354630 0.1460933715138495 0.2869552555206454 T T T
0.4999999794324523 0.8536104715263875 0.3287724576926365 T T T
0.9999999840773839 0.8536104758010623 0.3287724586312137 T T T
0.4999999300647673 0.6463895117591960 0.3287724665739020 T T T
0.9999999276011309 0.6463895160164554 0.3287724672324472 T T T
0.5000002622113087 0.3547397610432625 0.4080437671185635 T T T
0.0000002603261606 0.3547397665217744 0.4080437670026242 T T T
0.5000003221861320 0.1452601102673654 0.4080437404683650 T T T
0.0000003194025666 0.1452601155010242 0.4080437398718999 T T T
-------------------------------------------------------------------------------------------------------------------
As you can see, I have a slab system with 24(Li2O2) = 216 electrons (108 occupied bands). In my GW part I use the INCAR as:
-------------------------------------------------------------------------------------------------------------------
ALGO = SCGW0
ENCUTGW = 150
NOMEGA = 72
ISMEAR = -5
ISPIN = 2
SIGMA = 0.01
LREAL = .FALSE.
NELM = 2
LORBIT = 11
PRECFOCK = FAST
LWANNIER90=.TRUE.
LPEAD = .TRUE.
MAXMEM = 25500
KPAR = 2
NBANDS = 240
-------------------------------------------------------------------------------------------------------------------
The code always reaches the point where in the stdlog it shows in the end:
-------------------------------------------------------------------------------------------------------------------
energies w=
0.00 0.00 0.45 0.00 0.90 0.00 1.34 0.00 1.79 0.00
2.23 0.00 2.67 0.00 3.10 0.00 3.54 0.00 3.96 0.00
4.38 0.00 4.80 0.00 5.21 0.00 5.62 0.00 6.03 0.00
6.42 0.00 6.82 0.00 7.21 0.00 7.60 0.00 7.98 0.00
8.36 0.00 8.74 0.00 9.11 0.00 9.49 0.00 9.86 0.00
10.23 0.00 10.60 0.00 10.97 0.00 11.35 0.00 11.72 0.00
12.10 0.00 12.48 0.00 12.87 0.00 13.26 0.00 13.65 0.00
14.06 0.00 14.47 0.00 14.88 0.00 15.31 0.00 15.75 0.00
16.20 0.00 16.66 0.00 17.14 0.00 17.64 0.00 18.15 0.00
18.69 0.00 19.25 0.00 19.83 0.00 20.44 0.00 21.09 0.00
21.77 0.00 22.50 0.00 23.27 0.00 24.09 0.00 24.97 0.00
25.92 0.00 26.95 0.00 28.06 0.00 29.28 0.00 30.62 0.00
32.09 0.00 33.73 0.00 35.55 0.00 37.61 0.00 39.95 0.00
42.64 0.00 45.74 0.00 49.38 0.00 53.70 0.00 58.91 0.00
65.32 0.00 73.40 0.00
responsefunction array rank= 4560
LDA part: xc-table for Pade appr. of Perdew
allocating 1 responsefunctions rank= 4560
shmem allocating 36 responsefunctions rank= 4560
response function shared by NCSHMEM nodes 1
Doing 1 frequencies on each core in blocks of 36
NQ= 1 0.0000 0.0000 0.0000,
|.........|.........
-------------------------------------------------------------------------------------------------------------------
and after that it crashes giving a segmentation fault error that ALWAYS looks like :
-------------------------------------------------------------------------------------------------------------------
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp_std 00000000015168B5 Unknown Unknown Unknown
libpthread.so.0 00002AF0B878D100 Unknown Unknown Unknown
-------------------------------------------------------------------------------------------------------------------
When I check my memory requirements, it is not more than 13 GB per core where has I have made 25 GB available per core. I was hoping this much memory would be sufficient but apparently its not. There are several posts related to exactly this problem on the vasp forums without any solution. Can someone please suggest where to look for the source for the problem ? could it be the installation? In some other examples cases I noticed that the memory distribution across cores is quite horrendous (for eg. 12-13 GB on 22 out of 24 cores of a given node and 25-27 GB on the other 2 cores of that same node,w hcih probably leads to the crash) - could this be leading to crashes ?
I am running a GW calculation on a slab with kpoints 7x6x1 and the POSCAR as:
-------------------------------------------------------------------------------------------------------------------
Li24 O24
1.00000000000000
6.3607230185999999 0.0000000000000000 0.0000000000000000
0.0000000000000000 7.7035222053999997 0.0000000000000000
0.0000000000000000 0.0000000000000000 22.0000000000000000
Li O
24 24
Selective dynamics
Direct
0.2500000000550244 0.7499999999935127 0.0417315059090910 F F F
0.7500000000078586 0.7499999999935127 0.0417315059090910 F F F
0.2500000000550244 0.0000000000000000 0.1251942677727271 F F F
0.7500000000078586 0.0000000000000000 0.1251942677727271 F F F
0.2500000000550244 0.5000000000389448 0.1251942677727271 F F F
0.7500000000078586 0.5000000000389448 0.1251942677727271 F F F
0.4999999999528342 0.7499999999935127 0.1669257699999989 F F F
0.0000000000000000 0.7499999999935127 0.1669257699999989 F F F
0.2500000980048540 0.2500000167121783 0.2052688645769223 T T T
0.7500001022620424 0.2500000194902157 0.2052688661991127 T T T
0.9999999831027253 0.0027676309394735 0.2462410069040288 T T T
0.4999999909262058 0.0027676283595071 0.2462410084879991 T T T
0.9999999896075167 0.4972323759992250 0.2462411043113875 T T T
0.4999999815450380 0.4972323738417330 0.2462411026440279 T T T
0.2500000410409200 0.7499999901229160 0.2874166347982197 T T T
0.7500000420981721 0.7499999885483817 0.2874166357634280 T T T
0.4999999384522340 0.2500000067084471 0.3264903720489727 T T T
0.9999999405782631 0.2500000010045227 0.3264903711083491 T T T
0.2499998593155297 0.9937402371787556 0.3626863213947971 T T T
0.7499998582454452 0.9937402434208664 0.3626863189092759 T T T
0.2499998194365745 0.5062597140084932 0.3626864592192405 T T T
0.7499998218586796 0.5062597186970592 0.3626864627341675 T T T
0.2500005626001496 0.2499995198580081 0.4389166628746324 T T T
0.7500005627464361 0.2499995209723096 0.4389166662839799 T T T
0.4999999999528342 0.8506900071510515 0.0834627581363705 F F F
0.0000000000000000 0.8506900071510515 0.0834627581363705 F F F
0.4999999999528342 0.6493099928359669 0.0834627581363705 F F F
0.0000000000000000 0.6493099928359669 0.0834627581363705 F F F
0.4999999999528342 0.3506900072419157 0.1669257699999989 F F F
0.0000000000000000 0.3506900072419157 0.1669257699999989 F F F
0.4999999999528342 0.1493099927970221 0.1669257699999989 F F F
0.0000000000000000 0.1493099927970221 0.1669257699999989 F F F
0.2499999210757338 0.8541036604356407 0.2044632896614758 T T T
0.7499999183219401 0.8541036767916808 0.2044632890318212 T T T
0.2500000058068181 0.6458963384943885 0.2044632928234833 T T T
0.7500000079276177 0.6458963550621775 0.2044632934887858 T T T
0.2499999271126256 0.3539066321619018 0.2869552670766993 T T T
0.7499999277660763 0.3539066663930086 0.2869552674341023 T T T
0.2499999751363404 0.1460933376666347 0.2869552552217272 T T T
0.7499999765354630 0.1460933715138495 0.2869552555206454 T T T
0.4999999794324523 0.8536104715263875 0.3287724576926365 T T T
0.9999999840773839 0.8536104758010623 0.3287724586312137 T T T
0.4999999300647673 0.6463895117591960 0.3287724665739020 T T T
0.9999999276011309 0.6463895160164554 0.3287724672324472 T T T
0.5000002622113087 0.3547397610432625 0.4080437671185635 T T T
0.0000002603261606 0.3547397665217744 0.4080437670026242 T T T
0.5000003221861320 0.1452601102673654 0.4080437404683650 T T T
0.0000003194025666 0.1452601155010242 0.4080437398718999 T T T
-------------------------------------------------------------------------------------------------------------------
As you can see, I have a slab system with 24(Li2O2) = 216 electrons (108 occupied bands). In my GW part I use the INCAR as:
-------------------------------------------------------------------------------------------------------------------
ALGO = SCGW0
ENCUTGW = 150
NOMEGA = 72
ISMEAR = -5
ISPIN = 2
SIGMA = 0.01
LREAL = .FALSE.
NELM = 2
LORBIT = 11
PRECFOCK = FAST
LWANNIER90=.TRUE.
LPEAD = .TRUE.
MAXMEM = 25500
KPAR = 2
NBANDS = 240
-------------------------------------------------------------------------------------------------------------------
The code always reaches the point where in the stdlog it shows in the end:
-------------------------------------------------------------------------------------------------------------------
energies w=
0.00 0.00 0.45 0.00 0.90 0.00 1.34 0.00 1.79 0.00
2.23 0.00 2.67 0.00 3.10 0.00 3.54 0.00 3.96 0.00
4.38 0.00 4.80 0.00 5.21 0.00 5.62 0.00 6.03 0.00
6.42 0.00 6.82 0.00 7.21 0.00 7.60 0.00 7.98 0.00
8.36 0.00 8.74 0.00 9.11 0.00 9.49 0.00 9.86 0.00
10.23 0.00 10.60 0.00 10.97 0.00 11.35 0.00 11.72 0.00
12.10 0.00 12.48 0.00 12.87 0.00 13.26 0.00 13.65 0.00
14.06 0.00 14.47 0.00 14.88 0.00 15.31 0.00 15.75 0.00
16.20 0.00 16.66 0.00 17.14 0.00 17.64 0.00 18.15 0.00
18.69 0.00 19.25 0.00 19.83 0.00 20.44 0.00 21.09 0.00
21.77 0.00 22.50 0.00 23.27 0.00 24.09 0.00 24.97 0.00
25.92 0.00 26.95 0.00 28.06 0.00 29.28 0.00 30.62 0.00
32.09 0.00 33.73 0.00 35.55 0.00 37.61 0.00 39.95 0.00
42.64 0.00 45.74 0.00 49.38 0.00 53.70 0.00 58.91 0.00
65.32 0.00 73.40 0.00
responsefunction array rank= 4560
LDA part: xc-table for Pade appr. of Perdew
allocating 1 responsefunctions rank= 4560
shmem allocating 36 responsefunctions rank= 4560
response function shared by NCSHMEM nodes 1
Doing 1 frequencies on each core in blocks of 36
NQ= 1 0.0000 0.0000 0.0000,
|.........|.........
-------------------------------------------------------------------------------------------------------------------
and after that it crashes giving a segmentation fault error that ALWAYS looks like :
-------------------------------------------------------------------------------------------------------------------
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
vasp_std 00000000015168B5 Unknown Unknown Unknown
libpthread.so.0 00002AF0B878D100 Unknown Unknown Unknown
-------------------------------------------------------------------------------------------------------------------
When I check my memory requirements, it is not more than 13 GB per core where has I have made 25 GB available per core. I was hoping this much memory would be sufficient but apparently its not. There are several posts related to exactly this problem on the vasp forums without any solution. Can someone please suggest where to look for the source for the problem ? could it be the installation? In some other examples cases I noticed that the memory distribution across cores is quite horrendous (for eg. 12-13 GB on 22 out of 24 cores of a given node and 25-27 GB on the other 2 cores of that same node,w hcih probably leads to the crash) - could this be leading to crashes ?