Page 1 of 1
Parallel performance of vasp over Gbit ethernet
Posted: Wed Aug 12, 2009 4:07 pm
by cpp6f
I am running vasp 4.6.34 on a cluster of nodes containing 2 quad core 2.1 GHz opterons connected by Gbit ethernet. Vasp was compiled with the intel compiler and open mpi. So far, I have not observed any significant speedup when running on more than one node. I am wondering if this is a limit of the ethernet network or whether it can be improved by tuning vasp or mpi. What experience do other people have running on this type of cluster? Thanks!
<span class='smallblacktext'>[ Edited ]</span>
Parallel performance of vasp over Gbit ethernet
Posted: Thu Aug 13, 2009 12:22 pm
by forsdan
I've only performed scaling tests for a few testproblems, so no general conclusions should be drawn from this, but anyhow here my experience. (In all cases in this post I refer the results from using a testsystem consisting of a supercell with 128 Cr atoms in a bcc structure with a interstitial defect. No ScaLAPACK routines were used.)
1. VASP4.6.31 compiled with Intel 9.0, MKL and MPICH2 v1.0.5 running on Xeon 5160 (Woodcrest) 3 GHz cores (two dual core processors on each node) with Gbit ethernet:
Linear scaling is observed up to 6 nodes (24 cores). Then scaling drops significantly with default settings for NPAR/NSIM. Fine-tuning of NPAR/NSIM yield better performance for more nodes but not near the linear scaling.
2. VASP4.6.34 compiled with Intel 9.0, MKL and MPICH2 v1.0.5 running on Xeon E5430 (Harpertown) 2.66 GHz cores (two quad core processors on each node) with Gbit ethernet.
Linear scaling is observed up to 3 nodes (24 cores) and then the performance drops significantly. Fine-tuning with NPAR/NSIM again yields better performance for more nodes but not near the linear scaling regime.
The relative speed between the clusters in case 1 and 2 for the same number of cores is basically 1:1.
3. For comparison: running the testcase on a cluster with the same hardware as in case 1, but with inifiniband interconnect, the scaling is linear up to 20 nodes (80 cores). Since that cluster isn't larger I haven't been able check the performance for more nodes. A colleague has however parallelized VASP up to 256 cores on another infiniband cluster and claimed to have obtained a near linear scaling, but then ScaLAPACK was essential to accomplish this task.
Hope this helps to some extent.
Cheers,
/Dan
<span class='smallblacktext'>[ Edited Sun Aug 16 2009, 11:58AM ]</span>
Parallel performance of vasp over Gbit ethernet
Posted: Sat Aug 15, 2009 9:48 pm
by martonak
Could you, please, specify the precision, cutoff and pseudopotential you use for the 128 Cr atoms problem ? Could you perhaps also post the INCAR file ?
Thanks
Roman
Parallel performance of vasp over Gbit ethernet
Posted: Sun Aug 16, 2009 9:49 am
by forsdan
INCAR
------
SYSTEM = Cr128X
ENCUT = 550 eV
ENAUG = 700 eV
EDIFF = 1E-5
LREAL = A
PREC = Normal
ROPT = -0.0001 -0.0001
ISMEAR = 1
SIGMA = 0.1
NSW = 2
IBRION = 2
ISIF = 3
POTIM = 0.3
MAGMOM = 64*4 64*-4 2
ISPIN = 2
LWAVE = .FALSE.
KPOINTS
---------
Input
6
Reciprocal
0.125000 0.125000 0.125000 8.000000
0.375000 0.125000 0.125000 16.000000
0.375000 0.375000 0.125000 8.000000
0.125000 0.125000 0.375000 8.000000
0.375000 0.125000 0.375000 16.000000
0.375000 0.375000 0.375000 8.000000
I use the PAW-PBE pseudopotentials, where semi-core 3p states are included for the Cr atoms (12 electrons/atom). For the interstitial defect I use a Boron atom (3 electrons) positioned at an octahedral site.
The setup was just one of my previous runs that generate a decent time consumption and memory requirement, so that's why I choose this example as one of my test cases. There's nothing really else to it.
/Dan
<span class='smallblacktext'>[ Edited Sun Aug 16 2009, 12:02PM ]</span>