Problem when using VASP parallelized using MPI + OpenMP
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 24
- Joined: Fri Mar 24, 2023 1:19 pm
Problem when using VASP parallelized using MPI + OpenMP
Dear fellows,
I have been running VASP parallelized using MPI for a few months in a local computer cluster. I just installed VASP in an HPC facility with the "makefile.include.intel_omp" archetype to use MPI + OpenMP.
When running VASP as I do in my local cluster, for example, with "#PBS -l nodes=1:ppn=4", "mpirun -np 4 vasp_std", and "NCORE = 4" in the INCAR file I get no errors. When trying to run VASP parallelized using MPI + OpenMP with "#PBS -l nodes=1:ppn=4", "export OMP_NUM_THREADS=2", "mpirun -np 2 vasp_std" and "NCORE = 2", for instance, I get several warnings in the stdout that say "WARNING: Sub-Space-Matrix is not hermitian in DAV" and the final error "Error EDDDAV: Call to ZHEGV failed. Returncode = 15 2 16". I know that this error happens when the matrix is divided into an inappropriate number of cores and ends up being non-Hermitian, halting the diagonalization process. When I had this issue when running VASP parallelized using MPI only, it was simply solved by adjusting the variable "NCORE" on the INCAR file; however, I tried several combinations of "OMP_NUM_THREADS", "-np", and "NCORE" to solve this issue with no success.
The relevant files are attached below. Could someone help me with this?
Best regards,
Renan Lira.
I have been running VASP parallelized using MPI for a few months in a local computer cluster. I just installed VASP in an HPC facility with the "makefile.include.intel_omp" archetype to use MPI + OpenMP.
When running VASP as I do in my local cluster, for example, with "#PBS -l nodes=1:ppn=4", "mpirun -np 4 vasp_std", and "NCORE = 4" in the INCAR file I get no errors. When trying to run VASP parallelized using MPI + OpenMP with "#PBS -l nodes=1:ppn=4", "export OMP_NUM_THREADS=2", "mpirun -np 2 vasp_std" and "NCORE = 2", for instance, I get several warnings in the stdout that say "WARNING: Sub-Space-Matrix is not hermitian in DAV" and the final error "Error EDDDAV: Call to ZHEGV failed. Returncode = 15 2 16". I know that this error happens when the matrix is divided into an inappropriate number of cores and ends up being non-Hermitian, halting the diagonalization process. When I had this issue when running VASP parallelized using MPI only, it was simply solved by adjusting the variable "NCORE" on the INCAR file; however, I tried several combinations of "OMP_NUM_THREADS", "-np", and "NCORE" to solve this issue with no success.
The relevant files are attached below. Could someone help me with this?
Best regards,
Renan Lira.
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 418
- Joined: Mon Sep 13, 2021 11:02 am
Re: Problem when using VASP parallelized using MPI + OpenMP
Hi,
If I understand correctly, the problem occurs only with the new installation on the HPC facility, right? Does it occur systematically, independently of the solid that is considered or choices in INCAR like ALGO, ISMEAR or the functional (I see that you are using the Tkatchenko-Scheffler)?
If I understand correctly, the problem occurs only with the new installation on the HPC facility, right? Does it occur systematically, independently of the solid that is considered or choices in INCAR like ALGO, ISMEAR or the functional (I see that you are using the Tkatchenko-Scheffler)?
-
- Newbie
- Posts: 24
- Joined: Fri Mar 24, 2023 1:19 pm
Re: Problem when using VASP parallelized using MPI + OpenMP
Hello,
Yes, the problem only occurs with the new installation on the HPC facility, when trying to use VASP parallelized with MPI + OpenMP (I don't get errors when using only MPI).
I ran VASP for a different system using different parameters, such as ALGO=Normal, ISMEAR=0 and IVDW=202 (Many-body dispersion energy method), and VASP ran successfully. Which parameter do you think is the culprit? I will run a few more calculations changing one parameter at a time to try and isolate the problem.
Although it ran successfully, I realized that in the header of the stdout, instead of getting
Best regards,
Renan Lira.
Yes, the problem only occurs with the new installation on the HPC facility, when trying to use VASP parallelized with MPI + OpenMP (I don't get errors when using only MPI).
I ran VASP for a different system using different parameters, such as ALGO=Normal, ISMEAR=0 and IVDW=202 (Many-body dispersion energy method), and VASP ran successfully. Which parameter do you think is the culprit? I will run a few more calculations changing one parameter at a time to try and isolate the problem.
Although it ran successfully, I realized that in the header of the stdout, instead of getting
that I get when running MPI only, I getrunning 4 mpi-ranks, with 1 threads/rank, on 1 nodes
distrk: each k-point on 4 cores, 1 groups
distr: one band on 4 cores, 1 groups
I can't get the band distribution to run on 2 cores, 1 group as the k-point distribution. What is wrong here?running 2 mpi-ranks, with 2 threads/rank, on 1 nodes
distrk: each k-point on 2 cores, 1 groups
distr: one band on 1 cores, 2 groups
Best regards,
Renan Lira.
-
- Global Moderator
- Posts: 418
- Joined: Mon Sep 13, 2021 11:02 am
Re: Problem when using VASP parallelized using MPI + OpenMP
I have run your system in MPI+OpenMP mode (I also compiled with Intel 2022.0.1 and used makefile.include.intel_omp) and the calculation ran properly. There is maybe a problem with your installation.
Here forum/viewtopic.php?f=2&t=19026 you reported that you switched to a more recent version of the Intel compiler. Did you execute "make veryclean" before recompiling with the more recent compiler? If not, then please do it and try again to run your system.
Concerning your question "I can't get the band distribution to run on 2 cores, 1 group as the k-point distribution. What is wrong here?", this should be due to what is written at Combining_MPI_and_OpenMP:
"The main difference between the pure MPI and the hybrid MPI/OpenMP version of VASP is that the latter will not distribute a single Bloch orbital over multiple MPI ranks but will distribute the work on a single Bloch orbital over multiple OpenMP threads. "
Here forum/viewtopic.php?f=2&t=19026 you reported that you switched to a more recent version of the Intel compiler. Did you execute "make veryclean" before recompiling with the more recent compiler? If not, then please do it and try again to run your system.
Concerning your question "I can't get the band distribution to run on 2 cores, 1 group as the k-point distribution. What is wrong here?", this should be due to what is written at Combining_MPI_and_OpenMP:
"The main difference between the pure MPI and the hybrid MPI/OpenMP version of VASP is that the latter will not distribute a single Bloch orbital over multiple MPI ranks but will distribute the work on a single Bloch orbital over multiple OpenMP threads. "
-
- Newbie
- Posts: 24
- Joined: Fri Mar 24, 2023 1:19 pm
Re: Problem when using VASP parallelized using MPI + OpenMP
Hello,
Thank you for the explanation regarding band parallelization on MPI + OpenMP mode.
Best regards,
Renan Lira.
After successfully running the different system I mentioned, I went back to my original system with ALGO=Normal, ISMEAR=0 and IVDW=202, but it did not work. Then, I returned these variables to ALGO=Fast, ISMEAR=-5 and IVDW=20 (as the file attached), but changed PREC=Accurate to PREC=Normal and it ran with no problems. What is going on here?I have run your system in MPI+OpenMP mode (I also compiled with Intel 2022.0.1 and used makefile.include.intel_omp) and the calculation ran properly. There is maybe a problem with your installation.
Yes, I executed "make veryclean" before recompiling with the other compiler.Did you execute "make veryclean" before recompiling with the more recent compiler?
Thank you for the explanation regarding band parallelization on MPI + OpenMP mode.
Best regards,
Renan Lira.
-
- Global Moderator
- Posts: 418
- Joined: Mon Sep 13, 2021 11:02 am
Re: Problem when using VASP parallelized using MPI + OpenMP
Difficult to say what is the problem. Can you provide the files of the case that does not work?
-
- Newbie
- Posts: 24
- Joined: Fri Mar 24, 2023 1:19 pm
Re: Problem when using VASP parallelized using MPI + OpenMP
I should mention that to install VASP, I used a makefile.include that was provided by the support team from the HPC facility. I attached it to this message.
The files I provided are for the case that does not work.
The files I provided are for the case that does not work.
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 418
- Joined: Mon Sep 13, 2021 11:02 am
Re: Problem when using VASP parallelized using MPI + OpenMP
I could again run your system without any problem with the same setting as yours:
-vasp.6.4.1
-Compilation with the makefile.include that you provided (I only adapted the paths for HDF5 and WANNIER90)
-MPI+OpenMP mode: "mpirun -np 2 ~/vasp-release.6.4.1/bin/vasp_std" and OMP_NUM_THREADS=2
Which machines/processors are you using?
-vasp.6.4.1
-Compilation with the makefile.include that you provided (I only adapted the paths for HDF5 and WANNIER90)
-MPI+OpenMP mode: "mpirun -np 2 ~/vasp-release.6.4.1/bin/vasp_std" and OMP_NUM_THREADS=2
Which machines/processors are you using?
-
- Newbie
- Posts: 24
- Joined: Fri Mar 24, 2023 1:19 pm
Re: Problem when using VASP parallelized using MPI + OpenMP
Hello,
I am using a Dell EMC R6525 which has 2 AMD EPYC 7662 processors.
I am using a Dell EMC R6525 which has 2 AMD EPYC 7662 processors.
-
- Global Moderator
- Posts: 418
- Joined: Mon Sep 13, 2021 11:02 am
Re: Problem when using VASP parallelized using MPI + OpenMP
There is something that if confusing. In a previous post you wrote:
Something that I somehow forgot, is that in general some options should be used when combining MPI and OpenMP, as indicated at Combining_MPI_and_OpenMP. In particular try the following:
What does it mean? Does it mean that "ALGO=Fast, ISMEAR=-5 and IVDW=20" was originally producing the error, but was then later finally working?After successfully running the different system I mentioned, I went back to my original system with ALGO=Normal, ISMEAR=0 and IVDW=202, but it did not work. Then, I returned these variables to ALGO=Fast, ISMEAR=-5 and IVDW=20 (as the file attached), but changed PREC=Accurate to PREC=Normal and it ran with no problems. What is going on here?
This concerns the HPC facility where the error occurs, right?I am using a Dell EMC R6525 which has 2 AMD EPYC 7662 processors.
Something that I somehow forgot, is that in general some options should be used when combining MPI and OpenMP, as indicated at Combining_MPI_and_OpenMP. In particular try the following:
Code: Select all
mpirun -np 2 --bind-to core --report-bindings --map-by ppr:2:node:PE=2 -x OMP_NUM_THREADS=2 -x OMP_PLACES=cores -x OMP_PROC_BIND=close pat_to_vasp_std
-
- Newbie
- Posts: 24
- Joined: Fri Mar 24, 2023 1:19 pm
Re: Problem when using VASP parallelized using MPI + OpenMP
Thanks for the reply.
Best regards,
Renan Lira.
ALGO=Fast, ISMEAR=-5 and IVDW=20 with PREC=accurate were causing errors. When I changed PREC to NORMAL it worked. However, I need high precision.What does it mean? Does it mean that "ALGO=Fast, ISMEAR=-5 and IVDW=20" was originally producing the error, but was then later finally working?
Yes, the EPYC 7662 processors are the ones on the HPC facility where I get the errors.This concerns the HPC facility where the error occurs, right?
I will try that as soon as possible and post the results.Something that I somehow forgot, is that in general some options should be used when combining MPI and OpenMP, as indicated at Combining_MPI_and_OpenMP. In particular try the following:Code: Select all
mpirun -np 2 --bind-to core --report-bindings --map-by ppr:2:node:PE=2 -x OMP_NUM_THREADS=2 -x OMP_PLACES=cores -x OMP_PROC_BIND=close pat_to_vasp_std
Best regards,
Renan Lira.