Parallel performance of vasp over Gbit ethernet

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
cpp6f
Newbie
Newbie
Posts: 39
Joined: Sat Nov 12, 2005 2:04 am

Parallel performance of vasp over Gbit ethernet

#1 Post by cpp6f » Wed Aug 12, 2009 4:07 pm

I am running vasp 4.6.34 on a cluster of nodes containing 2 quad core 2.1 GHz opterons connected by Gbit ethernet. Vasp was compiled with the intel compiler and open mpi. So far, I have not observed any significant speedup when running on more than one node. I am wondering if this is a limit of the ethernet network or whether it can be improved by tuning vasp or mpi. What experience do other people have running on this type of cluster? Thanks!
<span class='smallblacktext'>[ Edited ]</span>
Last edited by cpp6f on Wed Aug 12, 2009 4:07 pm, edited 1 time in total.

forsdan
Sr. Member
Sr. Member
Posts: 339
Joined: Mon Apr 24, 2006 9:07 am
License Nr.: 173
Location: Gothenburg, Sweden

Parallel performance of vasp over Gbit ethernet

#2 Post by forsdan » Thu Aug 13, 2009 12:22 pm

I've only performed scaling tests for a few testproblems, so no general conclusions should be drawn from this, but anyhow here my experience. (In all cases in this post I refer the results from using a testsystem consisting of a supercell with 128 Cr atoms in a bcc structure with a interstitial defect. No ScaLAPACK routines were used.)

1. VASP4.6.31 compiled with Intel 9.0, MKL and MPICH2 v1.0.5 running on Xeon 5160 (Woodcrest) 3 GHz cores (two dual core processors on each node) with Gbit ethernet:

Linear scaling is observed up to 6 nodes (24 cores). Then scaling drops significantly with default settings for NPAR/NSIM. Fine-tuning of NPAR/NSIM yield better performance for more nodes but not near the linear scaling.

2. VASP4.6.34 compiled with Intel 9.0, MKL and MPICH2 v1.0.5 running on Xeon E5430 (Harpertown) 2.66 GHz cores (two quad core processors on each node) with Gbit ethernet.

Linear scaling is observed up to 3 nodes (24 cores) and then the performance drops significantly. Fine-tuning with NPAR/NSIM again yields better performance for more nodes but not near the linear scaling regime.

The relative speed between the clusters in case 1 and 2 for the same number of cores is basically 1:1.


3. For comparison: running the testcase on a cluster with the same hardware as in case 1, but with inifiniband interconnect, the scaling is linear up to 20 nodes (80 cores). Since that cluster isn't larger I haven't been able check the performance for more nodes. A colleague has however parallelized VASP up to 256 cores on another infiniband cluster and claimed to have obtained a near linear scaling, but then ScaLAPACK was essential to accomplish this task.

Hope this helps to some extent.

Cheers,
/Dan



<span class='smallblacktext'>[ Edited Sun Aug 16 2009, 11:58AM ]</span>
Last edited by forsdan on Thu Aug 13, 2009 12:22 pm, edited 1 time in total.

martonak
Newbie
Newbie
Posts: 11
Joined: Tue Mar 18, 2008 2:31 pm
License Nr.: 788

Parallel performance of vasp over Gbit ethernet

#3 Post by martonak » Sat Aug 15, 2009 9:48 pm

Could you, please, specify the precision, cutoff and pseudopotential you use for the 128 Cr atoms problem ? Could you perhaps also post the INCAR file ?

Thanks

Roman
Last edited by martonak on Sat Aug 15, 2009 9:48 pm, edited 1 time in total.

forsdan
Sr. Member
Sr. Member
Posts: 339
Joined: Mon Apr 24, 2006 9:07 am
License Nr.: 173
Location: Gothenburg, Sweden

Parallel performance of vasp over Gbit ethernet

#4 Post by forsdan » Sun Aug 16, 2009 9:49 am

INCAR
------

SYSTEM = Cr128X
ENCUT = 550 eV
ENAUG = 700 eV
EDIFF = 1E-5
LREAL = A
PREC = Normal
ROPT = -0.0001 -0.0001
ISMEAR = 1
SIGMA = 0.1
NSW = 2
IBRION = 2
ISIF = 3
POTIM = 0.3
MAGMOM = 64*4 64*-4 2
ISPIN = 2
LWAVE = .FALSE.


KPOINTS
---------
Input
6
Reciprocal
0.125000 0.125000 0.125000 8.000000
0.375000 0.125000 0.125000 16.000000
0.375000 0.375000 0.125000 8.000000
0.125000 0.125000 0.375000 8.000000
0.375000 0.125000 0.375000 16.000000
0.375000 0.375000 0.375000 8.000000


I use the PAW-PBE pseudopotentials, where semi-core 3p states are included for the Cr atoms (12 electrons/atom). For the interstitial defect I use a Boron atom (3 electrons) positioned at an octahedral site.

The setup was just one of my previous runs that generate a decent time consumption and memory requirement, so that's why I choose this example as one of my test cases. There's nothing really else to it.

/Dan


<span class='smallblacktext'>[ Edited Sun Aug 16 2009, 12:02PM ]</span>
Last edited by forsdan on Sun Aug 16, 2009 9:49 am, edited 1 time in total.

Post Reply