Parallelization and node memory usage

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
labumm
Newbie
Newbie
Posts: 1
Joined: Sun Jul 05, 2020 11:36 pm
License Nr.: 20-0110

Parallelization and node memory usage

#1 Post by labumm » Mon Nov 23, 2020 5:41 pm

I have vasp.5.4.4 running fine on a system at ENCUT 300eV. I'd like to increase that to 400eV, but when I do that the memory usage a little too high to run on the 20 core 32GB nodes (VASP fails setting up the 3D FFT). It will run on the 24 core 64GB nodes, but those are in higher demand.

I have been submitting the job to 4 nodes/total 80 processors with the parameters below.

** I am looking for recommendations to change the parallelization options to reduce the memory usage per node.


LPLANE = .TRUE.
LSCALU = .FALSE.

NCORE = 4
NSIM = 4
KPAR = 4
ISYM = 0

The following OUTCAR excerpt is included to demonstrate that VASP is using 80 core and 4 nodes.

running on 80 total cores
distrk: each k-point on 20 cores, 4 groups
distr: one band on NCORES_PER_BAND= 4 cores, 5 groups

martin.schlipf
Global Moderator
Global Moderator
Posts: 542
Joined: Fri Nov 08, 2019 7:18 am

Re: Parallelization and node memory usage

#2 Post by martin.schlipf » Mon Nov 30, 2020 4:37 pm

There is no one-fits-all solution as this will depend specifically on your system. Things you can explore

Your KPAR should match a multiple of the (irreducible) KPOINTS of your cell. Do you need that many KPOINTS for your system? For large systems often a single KPOINT is sufficient. You may also do the ENCUT test with underconverged KPOINT set, to check if 400 eV is really necessary.

Increasing NCORE at the expense of NPAR can be sometimes more memory efficient. Keep in mind to keep it compatible with your node size (for 20 cores, NCORE = 4, 5, 10, 20 may make sense).

If ISYM = 0 is not important for your calculation you can reduce the number of k-points by removing this line.

If you just barely exceed the available memory, you can also submit calculations not using the whole node, i.e. using only 16 of the 20 available cores. Of course this wastes CPU time as you'll still pay for the whole node.

In general, memory usage is something you need to explore. You will need to run a few setups to see which particular combination works best for you. Keep in mind that you should do this exploration doing as little actual steps necessary (so you might for example do just a single electronic step).

Martin Schlipf
VASP developer


Locked