Signal code: Address not mapped (1)
Posted: Thu Apr 24, 2008 11:40 pm
Dear, all. I encounterred this error and was able to repeat to get the same error. It looks like my vasp program got seg fault/mem violation but I do not know how to intepret this part of mpi.
Our system is rocks 4.3 x86_64, openmpi-1.2.5, scalapack-1.8.0,
Barcelona, Gigabit interconnections.
# cat 2156.jupiter.mynetwork.com.out | wc -l
614
# cat 2089.jupiter.mynetwork.com.out | wc -l
157
The interesting part is that the same job ran on different nodes and got the same error but at different iterations. For job 2156, it took much longer to see the error and for job 2089 the error happened earlier.
[test@Jupiter ]$ cat Co0001.e2089
[compute-1-1:14557] *** Process received signal ***
[compute-1-1:14557] Signal: Segmentation fault (11)
[compute-1-1:14557] Signal code: Address not mapped (1)
[compute-1-1:14557] Failing at address: (nil)
[compute-1-1:14557] [ 0] /lib64/tls/libpthread.so.0 [0x3db530c4f0]
[compute-1-1:14557] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-1:14557] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-1:14557] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-1:14557] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-1:14557] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-1:14557] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3db441c3fb]
[compute-1-1:14557] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-1:14557] *** End of error message ***
mpiexec noticed that job rank 0 with PID 14557 on node compute-1-1.local exited on signal 11 (Segmentation fault).
[test@Jupiter ]$ cat Co0001.e2156
[compute-1-2:03847] *** Process received signal ***
[compute-1-2:03847] Signal: Segmentation fault (11)
[compute-1-2:03847] Signal code: Address not mapped (1)
[compute-1-2:03847] Failing at address: (nil)
[compute-1-2:03847] [ 0] /lib64/tls/libpthread.so.0 [0x3984e0c4f0]
[compute-1-2:03847] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-2:03847] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-2:03847] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-2:03847] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-2:03847] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-2:03847] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3983f1c3fb]
[compute-1-2:03847] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-2:03847] *** End of error message ***
mpiexec noticed that job rank 0 with PID 3847 on node compute-1-2.local exited on signal 11 (Segmentation fault).
Could somebody tell me what caused this type of error?
Thank you very much for your helps.
Our system is rocks 4.3 x86_64, openmpi-1.2.5, scalapack-1.8.0,
Barcelona, Gigabit interconnections.
# cat 2156.jupiter.mynetwork.com.out | wc -l
614
# cat 2089.jupiter.mynetwork.com.out | wc -l
157
The interesting part is that the same job ran on different nodes and got the same error but at different iterations. For job 2156, it took much longer to see the error and for job 2089 the error happened earlier.
[test@Jupiter ]$ cat Co0001.e2089
[compute-1-1:14557] *** Process received signal ***
[compute-1-1:14557] Signal: Segmentation fault (11)
[compute-1-1:14557] Signal code: Address not mapped (1)
[compute-1-1:14557] Failing at address: (nil)
[compute-1-1:14557] [ 0] /lib64/tls/libpthread.so.0 [0x3db530c4f0]
[compute-1-1:14557] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-1:14557] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-1:14557] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-1:14557] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-1:14557] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-1:14557] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3db441c3fb]
[compute-1-1:14557] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-1:14557] *** End of error message ***
mpiexec noticed that job rank 0 with PID 14557 on node compute-1-1.local exited on signal 11 (Segmentation fault).
[test@Jupiter ]$ cat Co0001.e2156
[compute-1-2:03847] *** Process received signal ***
[compute-1-2:03847] Signal: Segmentation fault (11)
[compute-1-2:03847] Signal code: Address not mapped (1)
[compute-1-2:03847] Failing at address: (nil)
[compute-1-2:03847] [ 0] /lib64/tls/libpthread.so.0 [0x3984e0c4f0]
[compute-1-2:03847] [ 1] /usr/local/bin/vaspopenmpi_scala(__dfast__cnorma+0x1e4) [0x4dd884]
[compute-1-2:03847] [ 2] /usr/local/bin/vaspopenmpi_scala(__rmm_diis__eddrmm+0x6dbd) [0x5b25fd]
[compute-1-2:03847] [ 3] /usr/local/bin/vaspopenmpi_scala(elmin_+0x32fa) [0x608a9a][compute-1-2:03847] [ 4] /usr/local/bin/vaspopenmpi_scala(MAIN__+0x15492) [0x425f4a]
[compute-1-2:03847] [ 5] /usr/local/bin/vaspopenmpi_scala(main+0xe) [0x6ed9ee]
[compute-1-2:03847] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3983f1c3fb]
[compute-1-2:03847] [ 7] /usr/local/bin/vaspopenmpi_scala [0x410a2a]
[compute-1-2:03847] *** End of error message ***
mpiexec noticed that job rank 0 with PID 3847 on node compute-1-2.local exited on signal 11 (Segmentation fault).
Could somebody tell me what caused this type of error?
Thank you very much for your helps.