Error message when runing VASP on more than one node

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
berni_k86

Error message when runing VASP on more than one node

#1 Post by berni_k86 » Tue Feb 19, 2013 3:32 pm

Hi!

On our cluster the nodes are build up with 2 octo-core Intel Xenon-processors and 64gb RAM. For the communication between nodes infiniband is used. We use VASP 5.3.2 and OpenMPI. Both are compiled with intel compilers.

If we try to run VASP on 2 nodes with 32 core we get the an error message (see below, sorry for the bad formatting). This happens after the SCF-cycle is converged but before VASP writes the total energy and wavefunctions. If we use only one node, we don't have any problem. So I think it is a problem with the communication between VASP and OpenMPI.
We tried several different systems and input parameters and it is always the same: no problems on one node but error message on 2 nodes.

Has anyone an idea how to solve this problem?

Thanks in advance!

regards,
Berni

Error message:

forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B7687273830 Unknown Unknown Unknown
libmpi.so.1 00002B7680B7AFC9 Unknown Unknown Unknown
libmpi.so.1 00002B7680AAD7B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B768875AA2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B7688760F46 Unknown Unknown Unknown
libmpi.so.1 00002B7680ABBB6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B7680836F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B7681D40EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B47852459F3 Unknown Unknown Unknown
libmpi.so.1 00002B477EB4CFC9 Unknown Unknown Unknown
libmpi.so.1 00002B477EA7F7B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B478672CA2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B4786732F46 Unknown Unknown Unknown
libmpi.so.1 00002B477EA8DB6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B477E808F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B477FD12EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002AD7E0FCF92E Unknown Unknown Unknown
libmpi.so.1 00002AD7DA8D6FC9 Unknown Unknown Unknown
libmpi.so.1 00002AD7DA8097B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002AD7E24B6A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002AD7E24BCF46 Unknown Unknown Unknown
libmpi.so.1 00002AD7DA817B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002AD7DA592F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002AD7DBA9CEAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B61B186FAEC Unknown Unknown Unknown
libmpi.so.1 00002B61AB176FC9 Unknown Unknown Unknown
libmpi.so.1 00002B61AB0A97B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B61B2D56A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B61B2D5CF46 Unknown Unknown Unknown
libmpi.so.1 00002B61AB0B7B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B61AAE32F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B61AC33CEAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_sm.so 00002BA0C69B4D41 Unknown Unknown Unknown
libmpi.so.1 00002BA0BFE90FC9 Unknown Unknown Unknown
libmpi.so.1 00002BA0BFDC37B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002BA0C7A70A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002BA0C7A76F46 Unknown Unknown Unknown
libmpi.so.1 00002BA0BFDD1B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002BA0BFB4CF5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002BA0C1056EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B02F7701939 Unknown Unknown Unknown
libmpi.so.1 00002B02F1008FC9 Unknown Unknown Unknown
libmpi.so.1 00002B02F0F3B7B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B02F8BE8A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B02F8BEEF46 Unknown Unknown Unknown
libmpi.so.1 00002B02F0F49B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B02F0CC4F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B02F21CEEAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B4FA79DE939 Unknown Unknown Unknown
libmpi.so.1 00002B4FA12E5FC9 Unknown Unknown Unknown
libmpi.so.1 00002B4FA12187B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B4FA8EC5A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B4FA8ECBF46 Unknown Unknown Unknown
libmpi.so.1 00002B4FA1226B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B4FA0FA1F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B4FA24ABEAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_sm.so 00002AB59AC86CE7 Unknown Unknown Unknown
libmpi.so.1 00002AB594162FC9 Unknown Unknown Unknown
libmpi.so.1 00002AB5940957B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002AB59BD42A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002AB59BD48F46 Unknown Unknown Unknown
libmpi.so.1 00002AB5940A3B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002AB593E1EF5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002AB595328EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B7505AC6ABC Unknown Unknown Unknown
libmpi.so.1 00002B74FF3CDFC9 Unknown Unknown Unknown
libmpi.so.1 00002B74FF3007B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B7506FADA2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B7506FB3F46 Unknown Unknown Unknown
libmpi.so.1 00002B74FF30EB6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B74FF089F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B7500593EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002AD9BE2A88FD Unknown Unknown Unknown
libmpi.so.1 00002AD9B7BAFFC9 Unknown Unknown Unknown
libmpi.so.1 00002AD9B7AE27B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002AD9BF78FA2E Unknown Unknown Unknown
mca_coll_tuned.so 00002AD9BF795F46 Unknown Unknown Unknown
libmpi.so.1 00002AD9B7AF0B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002AD9B786BF5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002AD9B8D75EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_sm.so 00002B4F6B455D41 Unknown Unknown Unknown
libmpi.so.1 00002B4F64931FC9 Unknown Unknown Unknown
libmpi.so.1 00002B4F648647B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B4F6C511A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B4F6C517F46 Unknown Unknown Unknown
libmpi.so.1 00002B4F64872B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B4F645EDF5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B4F65AF7EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002AEE37ABF9E0 Unknown Unknown Unknown
libmpi.so.1 00002AEE313C6FC9 Unknown Unknown Unknown
libmpi.so.1 00002AEE312F97B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002AEE38FA6A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002AEE38FACF46 Unknown Unknown Unknown
libmpi.so.1 00002AEE31307B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002AEE31082F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002AEE3258CEAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B84745FC9F3 Unknown Unknown Unknown
libmpi.so.1 00002B846DF03FC9 Unknown Unknown Unknown
libmpi.so.1 00002B846DE367B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B8475AE3A2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B8475AE9F46 Unknown Unknown Unknown
libmpi.so.1 00002B846DE44B6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B846DBBFF5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B846F0C9EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
mca_btl_openib.so 00002B147F79792E Unknown Unknown Unknown
libmpi.so.1 00002B147909EFC9 Unknown Unknown Unknown
libmpi.so.1 00002B1478FD17B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B1480C7EA2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B1480C84F46 Unknown Unknown Unknown
libmpi.so.1 00002B1478FDFB6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B1478D5AF5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B147A264EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread.so.0 00002B4DBBBC1FC3 Unknown Unknown Unknown
libmlx4-rdmav2.so 00002B4DC2E4686B Unknown Unknown Unknown
mca_btl_openib.so 00002B4DC1323A6F Unknown Unknown Unknown
libmpi.so.1 00002B4DBAC2AFC9 Unknown Unknown Unknown
libmpi.so.1 00002B4DBAB5D7B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B4DC280AA2E Unknown Unknown Unknown
mca_coll_tuned.so 00002B4DC2810F46 Unknown Unknown Unknown
libmpi.so.1 00002B4DBAB6BB6C Unknown Unknown Unknown
libmpi_f77.so.1 00002B4DBA8E6F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002B4DBBDF0EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread.so.0 00002AC5500F4FC3 Unknown Unknown Unknown
libmlx4-rdmav2.so 00002AC55737986B Unknown Unknown Unknown
mca_btl_openib.so 00002AC555856A6F Unknown Unknown Unknown
libmpi.so.1 00002AC54F15DFC9 Unknown Unknown Unknown
libmpi.so.1 00002AC54F0907B2 Unknown Unknown Unknown
mca_coll_tuned.so 00002AC556D3DA2E Unknown Unknown Unknown
mca_coll_tuned.so 00002AC556D43F46 Unknown Unknown Unknown
libmpi.so.1 00002AC54F09EB6C Unknown Unknown Unknown
libmpi_f77.so.1 00002AC54EE19F5F Unknown Unknown Unknown
vasp 00000000004A14FF Unknown Unknown Unknown
vasp 000000000110CEB1 Unknown Unknown Unknown
vasp 0000000001103703 Unknown Unknown Unknown
vasp 0000000001104B22 Unknown Unknown Unknown
vasp 0000000000BF3771 Unknown Unknown Unknown
vasp 0000000000CD50C7 Unknown Unknown Unknown
vasp 000000000047B326 Unknown Unknown Unknown
vasp 000000000043A8CC Unknown Unknown Unknown
libc.so.6 00002AC550323EAD Unknown Unknown Unknown
vasp 000000000043A7A9 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpirun noticed that process rank 25 with PID 19248 on node node11 exited on signal 11 (Segmentation fault).
Last edited by berni_k86 on Tue Feb 19, 2013 3:32 pm, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 585
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Error message when runing VASP on more than one node

#2 Post by alex » Wed Feb 20, 2013 10:09 am

Hi Berni,

you have to set $LD_LIBRARY_PATH in your .profile/.cshrc for every node. I'd suggest putting your folders on an NFS-system, which you can reach from every node.

Cheers,

alex
Last edited by alex on Wed Feb 20, 2013 10:09 am, edited 1 time in total.

berni_k86

Error message when runing VASP on more than one node

#3 Post by berni_k86 » Wed Feb 20, 2013 3:00 pm

Hallo Alex,

thanks for your reply!

I figured out, that on our cluster, changes in .profile on the master node are taken over for all nodes.
So I set the $LD_LIBRARY_PATH in the .profile at the master, but the problem still exists. I get the same error message as before when using more than one node.
Do you have any further suggestions?

regards,
Berni
Last edited by berni_k86 on Wed Feb 20, 2013 3:00 pm, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 585
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Error message when runing VASP on more than one node

#4 Post by alex » Wed Feb 20, 2013 4:14 pm

Hm,

if that's the case, please check the MPI installation on the node(s). Maybe it's missing or has a different path compared to the server.

You could also go (via ssh) to the nodes, where VASP was supposed to run and check, if all dynamic libs are found. You do this by

ldd "pathtoexecutable"

If something is not found, it's either missing or the $LD_LIBRARY_PATH is not set properly!

You may also check your job script, if it's using a different login shell than yourself. Here might also be a potential trap.

Hth

alex
Last edited by alex on Wed Feb 20, 2013 4:14 pm, edited 1 time in total.

berni_k86

Error message when runing VASP on more than one node

#5 Post by berni_k86 » Wed Feb 20, 2013 5:19 pm

Hi,

I connected to one of the nodes via ssh and checked the mpi and vasp installation with ldd. As you can see from the following lines, all the dependencies are fulfilled.

ldd mpirun:
linux-vdso.so.1 => (0x00007fffa17ff000)
libopen-rte.so.4 => /usr/local/openmpi-1.6.3-intel/lib/libopen-rte.so.4 (0x00007fa4b0200000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa4aff76000)
libnuma.so.1 => /usr/lib/libnuma.so.1 (0x00007fa4afd6b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa4afb67000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa4af95e000)
libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00007fa4af746000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fa4af543000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa4af32c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa4af110000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa4aed86000)
libimf.so => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libimf.so (0x00007fa4ae8bb000)
libsvml.so => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libsvml.so (0x00007fa4adff6000)
libirng.so => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libirng.so (0x00007fa4addef000)
libintlc.so.5 => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libintlc.so.5 (0x00007fa4adba0000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa4b0539000)

ldd vasp:
linux-vdso.so.1 => (0x00007fff244ce000)
libmkl_intel_lp64.so => /opt/intel/composer_xe_2013.0.079/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f3d89554000)
libmkl_cdft_core.so => /opt/intel/composer_xe_2013.0.079/mkl/lib/intel64/libmkl_cdft_core.so (0x00007f3d89339000)
libmkl_scalapack_lp64.so => /opt/intel/composer_xe_2013.0.079/mkl/lib/intel64/libmkl_scalapack_lp64.so (0x00007f3d88b6d000)
libmkl_sequential.so => /opt/intel/composer_xe_2013.0.079/mkl/lib/intel64/libmkl_sequential.so (0x00007f3d884ce000)
libmkl_core.so => /opt/intel/composer_xe_2013.0.079/mkl/lib/intel64/libmkl_core.so (0x00007f3d872d2000)
libmpi_f90.so.3 => /usr/local/openmpi-1.6.3-intel/lib/libmpi_f90.so.3 (0x00007f3d870cf000)
libmpi_f77.so.1 => /usr/local/openmpi-1.6.3-intel/lib/libmpi_f77.so.1 (0x00007f3d86e98000)
libmpi.so.1 => /usr/local/openmpi-1.6.3-intel/lib/libmpi.so.1 (0x00007f3d86a99000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f3d8688e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3d8660c000)
libnuma.so.1 => /usr/lib/libnuma.so.1 (0x00007f3d86400000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3d861f8000)
libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00007f3d85fe0000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f3d85ddc000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f3d85bc0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f3d85836000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f3d8561f000)
libifport.so.5 => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libifport.so.5 (0x00007f3d853f0000)
libifcore.so.5 => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libifcore.so.5 (0x00007f3d850ba000)
libimf.so => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libimf.so (0x00007f3d84bef000)
libintlc.so.5 => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libintlc.so.5 (0x00007f3d849a1000)
libsvml.so => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libsvml.so (0x00007f3d840dc000)
libifcoremt.so.5 => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libifcoremt.so.5 (0x00007f3d83d77000)
libirng.so => /opt/intel/composer_xe_2013.0.079/compiler/lib/intel64/libirng.so (0x00007f3d83b70000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3d89ca2000)


I also changed the shell in my job script from sh to bash, but it doesn't change anything.
Maybe we need to recompile OpenMPI.

Anyway, thanks for your efforts!

regards,
Berni
Last edited by berni_k86 on Wed Feb 20, 2013 5:19 pm, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 585
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Error message when runing VASP on more than one node

#6 Post by alex » Wed Feb 20, 2013 5:40 pm

Did you install the Infiniband-Libs everywhere? Line 3 of the first error message block suggests otherwise ...
Last edited by alex on Wed Feb 20, 2013 5:40 pm, edited 1 time in total.

berni_k86

Error message when runing VASP on more than one node

#7 Post by berni_k86 » Wed Feb 20, 2013 6:41 pm

If you refer to mca_btl_openib.so and the other libraries starting with mca, they are somewhere in the OpenMPI-directory and that is in my $LD_LIBRARY_PATH.
Last edited by berni_k86 on Wed Feb 20, 2013 6:41 pm, edited 1 time in total.

alex
Hero Member
Hero Member
Posts: 585
Joined: Tue Nov 16, 2004 2:21 pm
License Nr.: 5-67
Location: Germany

Error message when runing VASP on more than one node

#8 Post by alex » Thu Feb 21, 2013 1:53 pm

So ... Why is it not found then? That's the question you have to answer.
Try a job script which prints $LD_LIBRARY_PATH on the chosen nodes.

Cheers,

alex
Last edited by alex on Thu Feb 21, 2013 1:53 pm, edited 1 time in total.

berni_k86

Error message when runing VASP on more than one node

#9 Post by berni_k86 » Fri Feb 22, 2013 12:42 pm

A collegue of mine told me, that the MD code she uses runs on more than one node on our cluster, and that code uses the same OpenMPI installation.
So maybe I have just forgotten one directory in the $LD_LIBRARY_PATH, but on the nodes it seems that all dependencies are fulfilled. Also I have no idea which directory should be missing.
Last edited by berni_k86 on Fri Feb 22, 2013 12:42 pm, edited 1 time in total.

sxy
Newbie
Newbie
Posts: 1
Joined: Mon Dec 09, 2013 3:07 am

Error message when runing VASP on more than one node

#10 Post by sxy » Mon Dec 09, 2013 3:57 am

have you been able to solve this problem?
we also encounter such issue.
Last edited by sxy on Mon Dec 09, 2013 3:57 am, edited 1 time in total.

Post Reply