Issue with NPAR
Posted: Wed Mar 15, 2023 9:02 am
Hi,
With the newly compiled VASP executable, whenever I put a NPAR value in INCAR, I get errors like the following. If I don't put any NPAR value, my simulation runs smoothly but at the costs of 1/6-1/10 speed. I asked the HPC guys, they say that the fault is not from their end.
Error with NPAR command
==========================================================================
[cn275:2585 :0:2585] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffc96ba5a80)
Abort(68273934) on node 24 (rank 24 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
PMPI_Recv(173)........: MPI_Recv(buf=0xc29b120, count=0, MPI_BYTE, src=94, tag=9, comm=0xc4000011, status=0x7ffeae862bc0) failed
MPID_Recv(590)........:
MPIDI_recv_unsafe(205):
(unknown)(): Message truncated
==========================================================================
script.sh looks like below:
==========================================================================
#!/bin/csh
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --time=03:25:00
#SBATCH --output=slurm-%A.out
#SBATCH --error=slurm-%A.err
#SBATCH --partition=small
cd $SLURM_SUBMIT_DIR
source /opt/ohpc/admin/lmod/8.1.18/init/csh
module load spack
source /home-ext/apps/spack/share/spack/setup-env.csh
spack load intel-mkl@2020.4.304 /fet6h2j
spack load intel-mpi@2019.10.317 /6icwzn3
spack load fftw@3.3.10%intel@2021.5.0 /tphl5ba
unlimit
#ulimit -s unlimited
mpiexec.hydra -n $SLURM_NTASKS /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std
==========================================================================
And the INCAR looks like this:
==========================================================================
# Parameters related to accuracy of the simulation
PREC= Normal # Precision of the calculation
ALGO= Normal # Selects the block Davidson diagonalization algorithm
LREAL= .FALSE. # Evaluation of projection operators in reciprocal space
ENCUT= 400 # Plane wave cutoff in eV
EDIFF= 1E-5 # Converge DFT energy till variation is less than 1E-5 eV
# Accelerating convergence through electronic smearing
ISMEAR= 1 # Gaussian smearing to accelerate convergence
SIGMA= 0.2 # Width of Gaussian for smearing electronic distribution
# Spin polarization setting
ISPIN= 2 # Spin-polarized calculation, i.e., taking spin into account
#LDIPOL = .TRUE.
#IDIPOL=3
# Output settings
LCHARG= .TRUE. # Do not write CHGCAR
LWAVE= .TRUE. # Do not write WAVECAR
# Parallelization options
NPAR= 12 # Number of bands that are treated in parallel; NPAR ~ sqrt(number of cores)
# Exchange-correlation functional settings
GGA= PE # Chooses PBE XC functional
IVDW= 12 # Adds dispersion to DFT using Grimme's D3 method, with Becke-Johnson (BJ) damping, see: 10.1021/jp501237c
# Cell optimization details
IBRION= 2 # Optimize ion positions
EDIFFG= -0.05 # Stop optimization if forces on all atoms are less than 0.01 eV/A
NSW= 500 # Number of optimization steps to carry out
==========================================================================
With the newly compiled VASP executable, whenever I put a NPAR value in INCAR, I get errors like the following. If I don't put any NPAR value, my simulation runs smoothly but at the costs of 1/6-1/10 speed. I asked the HPC guys, they say that the fault is not from their end.
Error with NPAR command
==========================================================================
[cn275:2585 :0:2585] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffc96ba5a80)
Abort(68273934) on node 24 (rank 24 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
PMPI_Recv(173)........: MPI_Recv(buf=0xc29b120, count=0, MPI_BYTE, src=94, tag=9, comm=0xc4000011, status=0x7ffeae862bc0) failed
MPID_Recv(590)........:
MPIDI_recv_unsafe(205):
(unknown)(): Message truncated
==========================================================================
script.sh looks like below:
==========================================================================
#!/bin/csh
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --time=03:25:00
#SBATCH --output=slurm-%A.out
#SBATCH --error=slurm-%A.err
#SBATCH --partition=small
cd $SLURM_SUBMIT_DIR
source /opt/ohpc/admin/lmod/8.1.18/init/csh
module load spack
source /home-ext/apps/spack/share/spack/setup-env.csh
spack load intel-mkl@2020.4.304 /fet6h2j
spack load intel-mpi@2019.10.317 /6icwzn3
spack load fftw@3.3.10%intel@2021.5.0 /tphl5ba
unlimit
#ulimit -s unlimited
mpiexec.hydra -n $SLURM_NTASKS /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std
==========================================================================
And the INCAR looks like this:
==========================================================================
# Parameters related to accuracy of the simulation
PREC= Normal # Precision of the calculation
ALGO= Normal # Selects the block Davidson diagonalization algorithm
LREAL= .FALSE. # Evaluation of projection operators in reciprocal space
ENCUT= 400 # Plane wave cutoff in eV
EDIFF= 1E-5 # Converge DFT energy till variation is less than 1E-5 eV
# Accelerating convergence through electronic smearing
ISMEAR= 1 # Gaussian smearing to accelerate convergence
SIGMA= 0.2 # Width of Gaussian for smearing electronic distribution
# Spin polarization setting
ISPIN= 2 # Spin-polarized calculation, i.e., taking spin into account
#LDIPOL = .TRUE.
#IDIPOL=3
# Output settings
LCHARG= .TRUE. # Do not write CHGCAR
LWAVE= .TRUE. # Do not write WAVECAR
# Parallelization options
NPAR= 12 # Number of bands that are treated in parallel; NPAR ~ sqrt(number of cores)
# Exchange-correlation functional settings
GGA= PE # Chooses PBE XC functional
IVDW= 12 # Adds dispersion to DFT using Grimme's D3 method, with Becke-Johnson (BJ) damping, see: 10.1021/jp501237c
# Cell optimization details
IBRION= 2 # Optimize ion positions
EDIFFG= -0.05 # Stop optimization if forces on all atoms are less than 0.01 eV/A
NSW= 500 # Number of optimization steps to carry out
==========================================================================