how to manually generate the ML_AB file
Posted: Mon Feb 21, 2022 4:12 am
Dear all, is it possible to generate ML_AB files from already done conventional MD calculations for MLFF to learn?
Code: Select all
LDA part: xc-table for Pade appr. of Perdew
Machine learning selected
Setting communicators for machine learning
Initializing machine learning
Starting to select new local configurations from ML_AB file (ML_FF_ISTART=3):
Insufficient memory to allocate Fortran RTL message buffer, message #41 = hex 00000029.
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 140995 RUNNING AT node41
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
...
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 47 PID 141041 RUNNING AT node41
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
forrtl: severe (41): insufficient virtual memory
Image PC Routine Line Source
vasp_std 0000000001D586FB Unknown Unknown Unknown
vasp_std 00000000005F3971 Unknown Unknown Unknown
vasp_std 00000000005EB217 Unknown Unknown Unknown
vasp_std 00000000005E81FA Unknown Unknown Unknown
vasp_std 00000000006B80D6 Unknown Unknown Unknown
vasp_std 00000000006B15C6 Unknown Unknown Unknown
vasp_std 00000000006BB8FE Unknown Unknown Unknown
vasp_std 00000000010529EA Unknown Unknown Unknown
vasp_std 0000000001CA65CE Unknown Unknown Unknown
vasp_std 000000000040DBA2 Unknown Unknown Unknown
libc-2.17.so 00002AAB57EF9555 __libc_start_main Unknown Unknown
vasp_std 000000000040DAA9 Unknown Unknown Unknown
ferenc_karsai wrote: ↑Mon Feb 21, 2022 8:25 am Yes it is possible. But that feature is still not strongly tested and fully supported.
If you still want to do it, do the following steps:
-) First prepare an ML_AB that looks like this:
wiki/index.php/ML_AB
Please mind that the order of appearance of element types on line 13 ("The atom types in the data file") has to be the same as they later appear in the configurations. That means for example if the first system is Fe_xNi_y and the second is Co_xSi_y this line would have to be written as
**************************************************
The atom types in the data file
--------------------------------------------------
Fe Ni Co Si
Please also mind that the reference atomic energies and atomic masses also depend on this order. In your case you can set the reference atomic energies to 0.
The Basis sets (local reference configurations) for the elements can be set to 1. This part is ignored but dummy values have to be set so that the reader works correctly.
So for example you would write:
**************************************************
The numbers of basis sets per atom type
--------------------------------------------------
1 1 1 1
**************************************************
Basis set for Fe
--------------------------------------------------
1 1
**************************************************
Basis set for Ni
--------------------------------------------------
1 1
**************************************************
Basis set for Co
--------------------------------------------------
1 1
**************************************************
Basis set for Si
--------------------------------------------------
1 1
Also very important: Training structures have to be properly grouped together and given unique names. That means training structures containing the same element types and the same number of atoms per element belong to the same group.
This strict ordering of structures and elements will be lifted in the next update, so that the user doesn't necessarily have to be so strict with naming and ordering. Nevertheless it's good practice to order the data correctly.
-) Second run a calculation using ML_ISTART=3:
wiki/index.php/ML_ISTART
This calculation will loop over all existing training structures, read them in one by one and simulate an on the fly simulation. The entire purpose of this is to select the local reference configurations which are part of the force field. Beware, this step can be quite time consuming.
At this step you get a new ML_AB (ML_ABN) file but also an ML_FFN that can be used.
-) Optionally you may want to refine your force field using the new ML_AB file. For that please have a look at this site:
wiki/index.php/Machine_learning_force_f ... rce_fields
ferenc_karsai wrote: ↑Wed Feb 23, 2022 10:29 am The training of the force field is generally memory consuming.
Especially if one has many training structures with lots of different element types.
Please provide some information about your job. How many training structures do you have, what is the number of types and what is the maximum number of atoms per structure? You can also upload your ML_AB file so that I can check it.
Did you compile using mpi shared memory (-Duse_shmem precompiler option)?
The largest matrix that needs to be stored is the design matrix.
It's dimension is number_of_training_structures*(3*N_atom+7)*local_reference_configurations. At the beginning of the ML_ISTART=3 you don't know the number of local reference configurations, but you have to set a maximum according to ML_MB (I usually set it to be the same as the number of training structures, but it's system dependent; possibly you have to repeat the calculation afterwards). The maximum number of training structures needs also to be set but that can be chosen since you know how many training structures you have in your ML_AB file. The design matrix is then statically allocated at the beginning of the calculation. At the beginning of the calculation the estimated memory is printed out in the ML_LOGFILE before the actual allocations are done. So you can see how much more you need to possibly fit into your available memory. The entry "FMAT for basis" is the required memory for the design matrix.
Please also read this wiki entry about the memory estimation in the ML_LOGFILE:
wiki/index.php/ML_LOGFILE#Memory_consumption_estimation
The design matrix is fully parallelized also in memory. So the more cores you use the less memory it needs per core. This way if you go to more nodes you possibly can fit it into the memory.
Another very important point is shared memory:
The covariance matrix and parts of the descriptors need to be present at every core in it's full size ("CMAT for basis" and "DESC for basis" in the ML_LOGFILE). If mpi shared memory is not used these matrices are allocated on every core. With shared memory these matrices are allocated only once per node. So without shared memory usage one can strongly be limited in memory. So please check if you use this capability.
Please also see this wiki entry on memory usage and shared memory:
wiki/index.php/Machine_learning_force_f ... mory_usage