Page 1 of 1

VASP performance issue

Posted: Sat Aug 22, 2009 10:31 am
by prithwish
Hi,
We have installed VASP in the HPC cluster of our center with the following specifications :

Kernel : Linux 2.6.18-53.el5
Architecture : x86_64
Each Node : quad-core with dual processors each
(Hyperthreading disabled)
RAM used : 16 GB per node
Swap for each node : 8 Gb
Interconnect : INFINIBAND, 20 Gbps
MPI : Intel MPI, version 3.1, 64 bit

In Makefile we used : MPI_BLOCK size = 8000 and CACHE_SIZE = 4000
O3 level of optimization
and NSIM=4 was specified in INCAR.

We ran a job with 54 atoms. The first job we submitted with 40 processors taking 4 processors each from every node. It took ~ 4 Hours and 33 minutes.
The very same job was submitted with the rest 4 processors each of the same 5 nodes. This time the job was completed in 12 Hours and 20 minutes. Its surprising.

Not only that, we also submitted a job with 128 atoms. Once we submitted using Sun Grid Engine with 40 processors and next time we submitted with out using SGE( i.e. submitted directly using mpirun ). We noticed that the job which was submitted without SGE is about 2.5 faster than the job submitted through SGE.

Is there anything wrong in our installation ? Can you please suggest whether anything we are missing ?

Regards.
Prithwish

VASP performance issue

Posted: Sun Aug 30, 2009 10:44 am
by alex
Prithwish,

you did not specify the vendor and model of your CPUs. This might have quite some influence on the performance under full load (speed of data transfer from memory to cpu).

about the 54 atoms job:
I'm not sure if I fully understand: You have two jobs, one with 40 cores and one with 20, the former taking 4.5h the latter 12.3h to complete?! Have they been started at the same time? What happened to the timings per step after the 4.5h job was finished?

Reliable benchmarking is hard. ;-)

No ideas about the big job, sorry.

cheers

alex

VASP performance issue

Posted: Mon Aug 31, 2009 8:39 am
by prithwish
Hi Alex,
I am sorry for the mistake. The first job was also submitted with 20 processors. Both job was started at the same time.
Prithwish

VASP performance issue

Posted: Mon Aug 31, 2009 9:05 am
by alex
Go back and try to reproduce. Check, if others are using the machine. Look at the LOOP timings. And try to answer all questions (CPU model, cat /proc/cpuinfo helps).

alex

VASP performance issue

Posted: Mon Aug 31, 2009 9:38 am
by prithwish
Hi,
No, others were not using the machine that time.
The details are as follows :

vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5460 @ 3.16GHz
stepping : 6
cpu MHz : 3166.851
cache size : 6144 KB

Regards.
Prithwish

VASP performance issue

Posted: Mon Oct 12, 2009 2:00 pm
by admin
please note that there is a small part of vasp (ca 3%) which cannot be parallelized, therefore the scaling will never go linearly with the number of processors.
please also check how much of the CPU time your jobs spend swapping memory, if the switches are comparably fast,...
last but not least: I hope you do not compare wallclock times, do you?