My Community

Posted: **Sat Aug 22, 2009 10:31 am**

Hi,
We have installed VASP in the HPC cluster of our center with the following specifications :

Kernel : Linux 2.6.18-53.el5
Architecture : x86_64
Each Node : quad-core with dual processors each
(Hyperthreading disabled)
RAM used : 16 GB per node
Swap for each node : 8 Gb
Interconnect : INFINIBAND, 20 Gbps
MPI : Intel MPI, version 3.1, 64 bit

In Makefile we used : MPI_BLOCK size = 8000 and CACHE_SIZE = 4000
O3 level of optimization
and NSIM=4 was specified in INCAR.

We ran a job with 54 atoms. The first job we submitted with 40 processors taking 4 processors each from every node. It took ~ 4 Hours and 33 minutes.
The very same job was submitted with the rest 4 processors each of the same 5 nodes. This time the job was completed in 12 Hours and 20 minutes. Its surprising.

Not only that, we also submitted a job with 128 atoms. Once we submitted using Sun Grid Engine with 40 processors and next time we submitted with out using SGE( i.e. submitted directly using mpirun ). We noticed that the job which was submitted without SGE is about 2.5 faster than the job submitted through SGE.

Is there anything wrong in our installation ? Can you please suggest whether anything we are missing ?

Regards.
Prithwish

Posted: **Sun Aug 30, 2009 10:44 am**

Prithwish,

you did not specify the vendor and model of your CPUs. This might have quite some influence on the performance under full load (speed of data transfer from memory to cpu).

about the 54 atoms job:
I'm not sure if I fully understand: You have two jobs, one with 40 cores and one with 20, the former taking 4.5h the latter 12.3h to complete?! Have they been started at the same time? What happened to the timings per step after the 4.5h job was finished?

Reliable benchmarking is hard. ;-)

No ideas about the big job, sorry.

cheers

alex

Posted: **Mon Aug 31, 2009 8:39 am**

Hi Alex,
I am sorry for the mistake. The first job was also submitted with 20 processors. Both job was started at the same time.
Prithwish

Posted: **Mon Aug 31, 2009 9:05 am**

Go back and try to reproduce. Check, if others are using the machine. Look at the LOOP timings. And try to answer all questions (CPU model, cat /proc/cpuinfo helps).

alex

Posted: **Mon Aug 31, 2009 9:38 am**

Hi,
No, others were not using the machine that time.
The details are as follows :

vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU X5460 @ 3.16GHz
stepping : 6
cpu MHz : 3166.851
cache size : 6144 KB

Regards.
Prithwish

Posted: **Mon Oct 12, 2009 2:00 pm**

please note that there is a small part of vasp (ca 3%) which cannot be parallelized, therefore the scaling will never go linearly with the number of processors.
please also check how much of the CPU time your jobs spend swapping memory, if the switches are comparably fast,...
last but not least: I hope you do not compare wallclock times, do you?

My Community

VASP performance issue

VASP performance issue

VASP performance issue

VASP performance issue

VASP performance issue

VASP performance issue

VASP performance issue