4/5 Images running on Climbing Image NEB

Message

myless · #1 Post by **myless** » Thu May 18, 2023 1:37 pm

Hello,

I am trying to run a Climbing Image Nudged Elastic Band Tutorial that I found on github: https://github.com/drinwater/Nudged-Ela ... d-Tutorial.

Here is what I did:
0. made directories 00-06
1. I copied the POSCAR and OUTCAR files from 00 and 06 from the example run to my new folder's 00 and 06 directory
2. copied the INCAR and KPOINT files, created the POTCAR file using cat Pt N O > POTCAR
3. I used vtst's nebmake.pl command to produce the POSCARs for 01-05 (checked them to make sure no overlap)
4. edited the number of cores (I am running on a 40 core, 4 gpu node)
5. I then ran my slurm script (script included in the zip file).

When I run the code, only files 01, 02, 03, and 04 indicate they are running (I.e have files besides POSCAR in them). I made sure my NCORE was an integer multiple of IMAGES (NCORE = 40, IMAGES = 5), so I'm not entirely sure what is going on here.

Another issue, is the vasprun.xml file alone is 12MB for each folder. Would sharing my INCAR be enough? That's the only file I changed compared to the tutorial in the assets/NEB run/run1/ folder. I have attached my slurm file, INCAR, and output of nebef.pl for now, but can add anything else.

Appreciate any help received here

Thanks!

myless · #2 Post by **myless** » Thu May 25, 2023 3:17 pm

Is there anyone who has experienced this or could help me diagnose this?

To reiterate, I have folders 00, 01, 02, 03, 04, 05, and 06.

00 and 06 are my end points, where 01, 02, 03, 04, and 05 are my images. Folders 01-04 show files besides POSCAR in them, where as 05 only has POSCAR.

Thanks again!

#3 Post by **martin.schlipf** » Fri May 26, 2023 6:43 am

Please revisit the parallel setup of the calculation. It appears that you mix a lot of different parallelization options (MPI + OpenMP + OpenACC) and this causes the observed behavior.

Specifically, you run on 4 MPI ranks with 10 OpenMP treads according to your standard output, but in the INCAR file, you set

Code: Select all

IMAGES = 5
NCORE=40
NPAR = 8

First, you should never set both NCORE and NPAR and in fact VASP will overrule that choice. For GPUs, NCORE will be set to 1 and then by default all the remainder will go to band parallelization, i.e., NPAR. So for your specific case you need neither of these flags. Secondly, because you use only 4 MPI ranks you cannot effectively parallelize over 5 images. This leads to the behavior you see. The MPI ranks start calculating the first 4 images and then after all of them are done 1 rank would deal with the single remaining image. Until the 4 first images are done with with there first electronic self consistency, there is no output in the directory of the 5th image.

So for your specific setup there are two reasonable choice: You can either run with 40 MPI ranks, where always 8 ranks would share a single image, or you can change your NEB run to 4 or 8 images so that you can run with 4 MPI ranks and 10 OpenMP threads. If you can deal with the different number of images, the latter case is probably more efficient.

You may also check out the NEB tutorial in the wiki for more information.

myless · #4 Post by **myless** » Mon May 29, 2023 5:08 pm

Hi Dr. Schlipf,

Thank you for your reply and help!

I indeed made a kerfuffle there...

I found that by changing the number of tasks in my slurm file (--ntastks-per-node) to the number of images things began to work. (It seems that even though the job would run the first 4 images, it would never work on the final image in the fifth folder before ending).

My system has only 3 nodes, each one with 4 CPUs per node and 4 V100 GPUs per node. There is no high-speed connection between the nodes, so effectively I can at most use 1 node of 40 cores and 4 gpus per job.
I tried to allocate 4 gpus to the task using the following line in my slurm file:

Code: Select all

#SBATCH --gres=gpu:4
#SBATCH --contraint=v100s

but it seems like the other 3 GPUs don't get utilized:

Code: Select all

[gpu-v100s-01:2477212] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[gpu-v100s-01:2477212] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

After looking through the wiki, I wasn't able to figure out how I would use more than one GPU for the same job, is there another resource you could please point me towards?

Thanks,
Myles

#5 Post by **martin.schlipf** » Tue May 30, 2023 8:28 am

In principle, 4 MPI ranks with 4 GPUs on 1 node should work. I will inquire if it would work if the number of ranks does not match with the number of GPUs or whether the deactivation of NCCL will also lead to not using some of the GPUs.

#6 Post by **martin.schlipf** » Tue May 30, 2023 2:05 pm

I can confirm that this setup should work. Please make sure that the environment variable CUDA_VISIBLE_DEVICES is set when you run VASP. For 4 GPUs and OpenMPI this could be done with

Code: Select all

mpirun -np 4 -x CUDA_VISIBLE_DEVICES=0,1,2,3 /path/to/vasp/executable

myless · #7 Post by **myless** » Tue May 30, 2023 4:38 pm

Good afternoon Dr. Schlipf,

Thank you for your suggestion and help with this!

After changing:

Code: Select all

/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/comm_libs/mpi/bin/mpirun /home/myless/VASP/vasp.6.3.2/bin/vasp_std

to:

Code: Select all

/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/comm_libs/mpi/bin/mpirun -np 3 -x CUDA_VISIBLE_DEVICES=0,1,2 /home/myless/VASP/vasp.6.3.2/bin/vasp_std

Entire slurm file for reference:

Code: Select all

#!/bin/bash
#
#SBATCH --job-name=V_3gp
#SBATCH --output=std-out
#SBATCH --ntasks-per-node=3
#SBATCH --nodes=1
#SBATCH --gres=gpu:3
#SBATCH --constraint=v100s
#SBATCH --time=1-05:00:00
#SBATCH -p regular

cd "$SLURM_SUBMIT_DIR"
/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/comm_libs/mpi/bin/mpirun -np 3 -x CUDA_VISIBLE_DEVICES=0,1,2 /home/myless/VASP/vasp.6.3.2/bin/vasp_std

exit

I still get the following from std-out

Code: Select all

btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
 ----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
   O   O PPPP  EEEEE N N N M M M PPPP   -- VERSION
   O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    3 mpi-ranks, with   10 threads/rank
 each image running on    1 cores
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 OpenACC runtime initialized ...    3 GPUs detected
 vasp.6.3.2 27Jun22 (build Mar  8 2023 11:59:18) complex
 POSCAR found type information on POSCAR V
 01/POSCAR found :  1 types and      53 ions
 scaLAPACK will be used selectively (only on CPU)
 -----------------------------------------------------------------------------
|                                                                             |
|               ----> ADVICE to this user running VASP <----                  |
|                                                                             |
|     You have a (more or less) 'large supercell' and for larger cells it     |
|     might be more efficient to use real-space projection operators.         |
|     Therefore, try LREAL= Auto in the INCAR file.                           |
|     Mind: For very accurate calculation, you might also keep the            |
|     reciprocal projection scheme (i.e. LREAL=.FALSE.).                      |
|                                                                             |
 -----------------------------------------------------------------------------

 LDA part: xc-table for Pade appr. of Perdew
 POSCAR found type information on POSCAR V
 00/POSCAR found :  1 types and      53 ions
 POSCAR found type information on POSCAR V
 04/POSCAR found :  1 types and      53 ions
 Jacobian:     17.34300086515033
 POSCAR found type information on POSCAR V
 00/POSCAR found :  1 types and      53 ions
 POSCAR found type information on POSCAR V
 04/POSCAR found :  1 types and      53 ions
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 WAVECAR not read
[gpu-v100s-01:3001370] 2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[gpu-v100s-01:3001370] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.834099289090E+04    0.83410E+04   -0.29610E+05  8660   0.127E+03
DAV:   2     0.231803422156E+02   -0.83178E+04   -0.81113E+04  8260   0.533E+02
DAV:   3    -0.515463820131E+03   -0.53864E+03   -0.50927E+03 11270   0.135E+02
DAV:   4    -0.545139401809E+03   -0.29676E+02   -0.28658E+02 10256   0.235E+01

I also compared the compute times in the OUTCAR files and got 2643.359 seconds when using the new command line argument for mpi, versus 2675.369 seconds without using it. That seems within the realm of standard error? Unsure as of now.

Thanks,
Myles

#8 Post by **martin.schlipf** » Wed May 31, 2023 6:47 am

Well, it seems like you are using 3 GPUs now. At least that is what the output says. If this is as fast as before that means one of two things: Either you ran on multiple GPUs before or your system cannot be accelerated much because the limiting factor is related to something else. If you want to figure out what is going on, I would recommend to compile VASP with profiling support and then compare the OUTCARs of the two runs.

One more advice: If you want to find the best possible setup it is often advisable to reduce the number of steps (NSW or NELM). Then you don't need to wait for nearly an hour to get feedback. You can check that in your output but I expect that every iteration takes about the same time, so optimizing the performance can be done on a subset of the steps.

My Community

4/5 Images running on Climbing Image NEB

4/5 Images running on Climbing Image NEB

Re: 4/5 Images running on Climbing Image NEB

Re: 4/5 Images running on Climbing Image NEB

Re: 4/5 Images running on Climbing Image NEB

Re: 4/5 Images running on Climbing Image NEB

Re: 4/5 Images running on Climbing Image NEB

Re: 4/5 Images running on Climbing Image NEB

Re: 4/5 Images running on Climbing Image NEB