Restarting on the fly ML MD
Posted: Mon Apr 03, 2023 11:09 pm
Dear VASP users,
When restarting ML MD (ML_ISTART=1) from a previous ML_AB(N) file, the following two things happen which I want to optimize. Version=6.3.2
(1) ML_MCONF=1500 and ML_MB=1500 are set to a new value (as quoted below) which is similar to the restarted ML_AB file + a small overhead (see wiki quote below). Often this increase is too small and leads to premature crashes. This seems to assume that the first calculation will cover a lot of the configuration space for learning. Often the values of ML_MCONF and ML_AB are below the original 1500. Trying to specify these in the INCAR on restart is overwritten (seen in the ML_LOGFILE) to values based on the ML_AB restart file. Can I fix this?
(2) Restarted runs still do the first 10 steps with ab initio calculations. I think this should not be the default for ML_ISTART=1, and only for fresh calculations (i.e. ML_ISTART=0). Perhaps this can be fixed manually by changing "ML_MHIS" if this is defining the 10 steps that have to be done ab initio? Or is there another way.
Thanks, Victor
When restarting ML MD (ML_ISTART=1) from a previous ML_AB(N) file, the following two things happen which I want to optimize. Version=6.3.2
(1) ML_MCONF=1500 and ML_MB=1500 are set to a new value (as quoted below) which is similar to the restarted ML_AB file + a small overhead (see wiki quote below). Often this increase is too small and leads to premature crashes. This seems to assume that the first calculation will cover a lot of the configuration space for learning. Often the values of ML_MCONF and ML_AB are below the original 1500. Trying to specify these in the INCAR on restart is overwritten (seen in the ML_LOGFILE) to values based on the ML_AB restart file. Can I fix this?
(2) Restarted runs still do the first 10 steps with ab initio calculations. I think this should not be the default for ML_ISTART=1, and only for fresh calculations (i.e. ML_ISTART=0). Perhaps this can be fixed manually by changing "ML_MHIS" if this is defining the 10 steps that have to be done ab initio? Or is there another way.
Thanks, Victor
Caution: number of structures and basis functions
The maximum number of structure datasets ML_MCONF and basis functions (local reference configurations) ML_MB potentially constitutes a memory bottleneck, because the required arrays are allocated statically at the beginning of the calculation. It is advised not to use too large numbers initially. For ML_ISTART=0, the defaults are ML_MCONF=1500 and ML_MB=1500. For ML_ISTART=1 and 3, the values are set to the number of entries read from the ML_AB file plus a small overhead. If at any point during the training either the number of structure datasets or the size of the basis set exceeds its respective maximum number, the calculation stops with an error message. Since the ML_ABN is continuously written during on-the-fly learning, not all is lost though. Simply copy the ML_ABN to ML_AB and CONTCAR to POSCAR, increase ML_MCONF or ML_MB, and continue training (see restart the calculation).