Problems running VASP in parallel mode with 5 nodes/64 cores per node/ 250 Gb RAM per node
Posted: Mon May 27, 2013 11:20 am
Hi,
I m working fine with VASP 5.3 in a system with "vacuum +monolayer+substrate" of about 1500 atoms with 4 nodes and 64 cores/node and 250 Gb/node as RAM, (ARCH: lx26-amd64, It uses Mellanox Infiniband QDR 40 Gb/s for parallel communication and file system access). In that case, I m using almost 200 Gb/node during execution. For now, all seems to be ok. but I m in the limit of having memory problems because so much memory is used, and I will need to increase the number of atoms.
However, it would be interesting a new node to include in my calculations to reduce the load memory per node. But now, it is imposible to have an stable running with 5 nodes. I ve tried different combinations of parameters KPAR, NCORE or NPAR, NSIM, LPANE but nothing seems to be work. The execution always breaks during the first interation at EDDAV, after POTLOK and SETDIJ.
Am I in the limit of VASP 5.3 for handling RAM? or It is a limitation of my CLUSTER?
If VASP 5.3 can manage any memory independently of the number of nodes, Can anyone help me to configure VASP for running in 5 nodes? or should I use even nodes instead?.
My script for running VASP is:
#!/bin/bash
#
#$ -cwd
#$ -o job.out -j
#$ -pe mp64 256
## Create rank file
./mkrnkfile.sh
mpirun -np 128 --rankfile rank.$JOB_ID --bind-to-core vasp.
and my INCAR is:
ISTART = 0; ICHARG = 2
GGA = PE
PREC = High
AMIN = 0.01
general:
SYSTEM = (110)system vacuum
LWAVE = .FALSE.
LCHARG = .FALSE.
LREAL = Auto
ISMEAR = 1; SIGMA = 0.2
ALGO = Fast
NGX = 194; NGY = 316; NGZ = 382
linux:
LSCALAPACK = .TRUE.
NCORE = 32
KPAR = 1
LSCALU = .FALSE.
LPLANE = .TRUE.
NSIM = 1
LREAL = Auto
no magnetic:
ISPIN = 1
dynamics:
NSW = 0
IBRION = 0
I m only using 3 k-points (irr).
Thank you for your attention.
I m working fine with VASP 5.3 in a system with "vacuum +monolayer+substrate" of about 1500 atoms with 4 nodes and 64 cores/node and 250 Gb/node as RAM, (ARCH: lx26-amd64, It uses Mellanox Infiniband QDR 40 Gb/s for parallel communication and file system access). In that case, I m using almost 200 Gb/node during execution. For now, all seems to be ok. but I m in the limit of having memory problems because so much memory is used, and I will need to increase the number of atoms.
However, it would be interesting a new node to include in my calculations to reduce the load memory per node. But now, it is imposible to have an stable running with 5 nodes. I ve tried different combinations of parameters KPAR, NCORE or NPAR, NSIM, LPANE but nothing seems to be work. The execution always breaks during the first interation at EDDAV, after POTLOK and SETDIJ.
Am I in the limit of VASP 5.3 for handling RAM? or It is a limitation of my CLUSTER?
If VASP 5.3 can manage any memory independently of the number of nodes, Can anyone help me to configure VASP for running in 5 nodes? or should I use even nodes instead?.
My script for running VASP is:
#!/bin/bash
#
#$ -cwd
#$ -o job.out -j
#$ -pe mp64 256
## Create rank file
./mkrnkfile.sh
mpirun -np 128 --rankfile rank.$JOB_ID --bind-to-core vasp.
and my INCAR is:
ISTART = 0; ICHARG = 2
GGA = PE
PREC = High
AMIN = 0.01
general:
SYSTEM = (110)system vacuum
LWAVE = .FALSE.
LCHARG = .FALSE.
LREAL = Auto
ISMEAR = 1; SIGMA = 0.2
ALGO = Fast
NGX = 194; NGY = 316; NGZ = 382
linux:
LSCALAPACK = .TRUE.
NCORE = 32
KPAR = 1
LSCALU = .FALSE.
LPLANE = .TRUE.
NSIM = 1
LREAL = Auto
no magnetic:
ISPIN = 1
dynamics:
NSW = 0
IBRION = 0
I m only using 3 k-points (irr).
Thank you for your attention.