VASP run error under mpich2
Posted: Thu Jan 06, 2011 8:04 am
Hi, all. Here I try to explain my problem clearly.
I get two nodes, each nodes have 12 cores, and the two nodes have been connected by a Giga Switch.
Now, I test the vasp examples on nodes.
First, I boot the mpd on the single node, and all the examples run well on each of the single nodes.
then I try to run the examples on the both nodes using 24 cores, after booting the mpd on the two nodes, I got the following error.
As I am thinking, whether this error is from the low speed of the switch or the limitation of memory, althrough I have set the stack and memeory to unlimited .
Thanks for your attention.
I get two nodes, each nodes have 12 cores, and the two nodes have been connected by a Giga Switch.
Now, I test the vasp examples on nodes.
First, I boot the mpd on the single node, and all the examples run well on each of the single nodes.
then I try to run the examples on the both nodes using 24 cores, after booting the mpd on the two nodes, I got the following error.
Code: Select all
Fatal error in MPI_Waitall: Other MPI error, error stack:
MPI_Waitall(261)..................: MPI_Waitall(count=46, req_array=0x7fffeeca46a0, status_array=0x7fffeeca4760) failed
MPIDI_CH3I_Progress(150)..........:Â
MPID_nem_mpich2_blocking_recv(948):Â
MPID_nem_tcp_connpoll(1709).......: Communication error
rank 23 in job 1  node0_55860   caused collective abort of all ranks
  exit status of rank 23: killed by signal 9Â
Thanks for your attention.