Page 1 of 1

pararel version of VASP installation issue

Posted: Sun Sep 09, 2012 1:21 pm
by felixvasp
Hi everyone,
I encountered some issue installation of VASP/5.2.12.NOV. The source code was successfully compiled, but when I ran it, it just work for the case of job running on one node (8 cores), in other words , if I ran with 2 nodes (8 core each node), it runs abnormally since after ten minutes it didn't start scf yet but for the same job it was done within ten minutes using 1 node. It is very strange to me since I don't have much exp. on compilation of parallel codes.
The compilation was done with intel/11.1.056 compiler, openmpi-intel, and fftw-3.2.2. Doesn't any know why it not working for multi-nodes? Any suggestions or thoughts will be highly appreciated.

pararel version of VASP installation issue

Posted: Wed Sep 19, 2012 3:53 pm
by peterklaver
Hi felix,

Have you or others successfully run mpi codes other than VASP across multiple nodes? While I can't make out very well what the problem would be from what you describe, it wouldn't surprise me if the problem lies in your mpi installation, rather than specifically in VASP on your cluster.

pararel version of VASP installation issue

Posted: Fri Sep 21, 2012 7:23 pm
by felixvasp
hi, peter,
Thanks for reply.
I guess MPI runs fine since I was compiling VASP on a HPC of our univ and there are some parallel version of programs there. Do you think it is due to high demanding of bandwidth for inter-node communication? So if I changes -DMPI-Block to a bigger number, it seems to speed up a little bit but still slow than running on the same node with multiple cores. I also tried this on a HPC with a so-called infiniband and with the same makefile settings, the parallel version of VASP is much faster. But this doesn't rule out the possibility of incorrectly-installed MPI. So if this MPI problem does exist, do you know any way to test it? Or can you direct me to some source on the web for that? Thanks a lot. Have a nice day.


Felix

[quote="peterklaver"]Hi felix,

Have you or others successfully run mpi codes other than VASP across multiple nodes? While I can't make out very well what the problem would be from what you describe, it wouldn't surprise me if the problem lies in your mpi installation, rather than specifically in VASP on your cluster.[/quote]

pararel version of VASP installation issue

Posted: Sat Sep 22, 2012 9:56 am
by peterklaver
Hi felix,

Running on a single node will very likely be faster in most cases, as communication within one node is near-instant, so there is virtually no communication delay. What you describe seems to fit in with a strong communication bottleneck between nodes (better on infiniband etc. I have no experience varying the -DMIP-Block pre-processor option though, no idea about that one).

The run you mentioned in your initial post must have been fairly small, as it finished in 10 minutes on a single node. With so little cpu time required, communication may weigh relatively heavily on the total time required. You could try a very big system instead (but make sure your calculation can still easily fit into the RAM, if the node starts swapping data to disk your result would be unrealistically slow, use the makeparam utility bundled with VASP to check memory requirements). For a big system of many atoms the communication bottleneck should be less important, so the time required on multiple nodes should improve compared to running on one node.

If that does happen, then your VASP installation may be fine as it is. And you should then probably just use parallel runs across multiple nodes for big jobs, where multiple nodes are really useful, and do small jobs just on one node.