Queries about input and output files, running specific calculations, etc.
Moderators: Global Moderator, Moderator
-
mike_foster
- Newbie
- Posts: 12
- Joined: Fri Jan 27, 2023 5:26 pm
#1
Post
by mike_foster » Fri Jan 27, 2023 6:48 pm
Hi,
I have been experiencing large variation in run times using ML generated force-fields. Rerunning the same job multiple times can sometimes take 2-3 times longer. I'm using intel, MKL and intel-mpi compilers. I'm guessing/thinking it has something to do with mpi commutation/allocation at run time but can't seem to identify any differences. Any ideas why and/or how to fix the issue? Thanks for any help.
Running short test calculations on 4 nodes each with 48 cpus.
Run time for 3 different runs (Elapsed time (sec) from OUTCAR): 415.773, 613.946, 904.526
INCAR:
ENCUT = 10
NCORE = 12
ISYM = 0
IBRION = 0
NSW = 2000
POTIM = 2.0
NBLOCK = 10
MDALGO = 3
LANGEVIN_GAMMA = 26
LANGEVIN_GAMMA_L = 10
PMASS = 100
TEBEG = 400
TEEND = 400
ISIF = 3
ML_LMLFF = T
ML_ISTART = 2
ML_WTSIF = 2
RANDOM_SEED = 4786233 0 0
-
andreas.singraber
- Global Moderator
- Posts: 236
- Joined: Mon Apr 26, 2021 7:40 am
#2
Post
by andreas.singraber » Mon Jan 30, 2023 1:08 pm
Hello!
Welcome to the VASP forum! That is indeed a confusing result, there should not be a significant variation in the timings. However, it is hard to tell what is going wrong without further information. Could you please provide a complete set of input files according to the
forum posting guidelines. The
ML_FF file is probably to large to send, but please add the
ML_LOGFILE which was created in your last training step (
ML_ISTART=0,1,3). Thank you!
Best,
Andreas Singraber
-
mike_foster
- Newbie
- Posts: 12
- Joined: Fri Jan 27, 2023 5:26 pm
#3
Post
by mike_foster » Tue Jan 31, 2023 1:08 pm
Thanks for the reply. Attached is the ML_LOGFILE file from the last training step and input/output files from a short run. I have experienced this problem with other systems; I'm having this issue in general not just with this particular system (ML_FF).
You do not have the required permissions to view the files attached to this post.
-
andreas.singraber
- Global Moderator
- Posts: 236
- Joined: Mon Apr 26, 2021 7:40 am
#4
Post
by andreas.singraber » Mon Feb 06, 2023 5:27 pm
Hello again,
sorry for the delay! Thank you for providing the in- and output files. I suspect that the provided ML_ISTART=2 run is not scaling well to the large amount of MPI processes. You would like to use 192 MPI ranks to perform the workload that is generated by 257 atoms in the POSCAR file. So most ranks will get only a single atom, some will get two atoms, to work on. However, because ML force fields are way less computationally demanding than ab initio calculations the MPI ranks will have too little to work on and are actually waiting most of the time for the next communication steps. Then the overall total CPU times heavily depend on the communication speed and the synchronicity of the MPI ranks. Hence, you see a lot of variation in the timings.
I would suggest to try a much lower number of MPI processes. Ideally, benchmark how the timings develop starting from a serial run, then 2 cores, 4, 8, and so on.. until you find a good compromise between speed and deployed CPU resources. Please let me know if there is further confusion regarding the parallelization efficiency. Also note that the upcoming VASP release 6.4 will come with a major performance gain for the ML prediction mode.
All the best,
Andreas Singraber
-
mike_foster
- Newbie
- Posts: 12
- Joined: Fri Jan 27, 2023 5:26 pm
#5
Post
by mike_foster » Thu Feb 09, 2023 1:24 am
Glad to hear that there will be performance improvement in 6.4. I understand your point regarding the number-of-atoms / number-of-cpus. I need/should do more testing. I did a few tests in the past and noticed a speed-up with more (maybe too many) cpus but then I noticed timing inconsistences. As I said, I need to do more testing to be sure but if the new version is coming out soon, I just might wait.
-
mike_foster
- Newbie
- Posts: 12
- Joined: Fri Jan 27, 2023 5:26 pm
#6
Post
by mike_foster » Tue Feb 28, 2023 9:54 pm
I'm still experiencing runtime variations. I'm now using VASP 6.4.0. I ran 5 calculations with a different number of cpus/nodes. I have done this with both ML_MODE = REFIT and REFITFULL and get runtime variations in both cases (REFIT is much faster). The table below is with the REFIT mode on a system with 256 atoms running for 5000 steps (ML_OUTBLOCK = 10; ML_OUTPUT_MODE=0).
Time (sec)
nodes cpus 1 2 3 4 5
1 12 2529 1607 1663 809 816
1 24 550 536 1821 541 535
1 48 352 351 607 355 349
2 96 202 244 203 203 205
4 192 227 155 154 153 619
6 288 118 275 118 187 117
-
mike_foster
- Newbie
- Posts: 12
- Joined: Fri Jan 27, 2023 5:26 pm
#7
Post
by mike_foster » Wed Mar 01, 2023 2:17 pm
Attached an image of the data table; it's hard to read above.
You do not have the required permissions to view the files attached to this post.
-
alex
- Hero Member
- Posts: 585
- Joined: Tue Nov 16, 2004 2:21 pm
- License Nr.: 5-67
- Location: Germany
#8
Post
by alex » Thu Mar 02, 2023 7:54 am
Hello Mike,
are you all alone if you are not using all of the machine's cores? These simulations are memory-heavy and - in case you have to battle for memory bandwith - this might cause one of the delays.
Hth,
alex
-
mike_foster
- Newbie
- Posts: 12
- Joined: Fri Jan 27, 2023 5:26 pm
#9
Post
by mike_foster » Thu Mar 02, 2023 12:37 pm
Yes, only my job is on the node. It should not be a memory issue; there is 192 GB of memory on the nodes. I also logged onto a node during one of the jobs and the percent of memory used was low. If no one else is experiencing this problem, maybe it's related to my VASP build / libraries used (intel and mkl 19.1; intel-mpi 2019). Maybe I should try building with openmpi.