My Community

Posted: **Fri Jun 14, 2024 5:35 am**

Dear VASP Developers,

I am currently working on training a machine learning force field (MLFF) for the charge density wave material, NbSe2, where the low-symmetry CDW structure is energetically more favorable than the high-symmetry pristine structure. [Please see the attached figure for reference]. However, I've encountered significant challenges during the training process.

Initially, I attempted on-the-fly training, but this approach resulted in poor predictions for the blue DFT energy landscape curve, producing a positive parabola instead of the expected negative one for the high-symmetry pristine structure. I then manually selected around 1000 structures near the potential well as training data and employed the select mode (ML_LMLFF=.TRUE., ML_MODE=select). Unfortunately, as illustrated by the red curve in the attachment, the depth of the potential well is still greatly underestimated.

Although I've consulted the best practices on your website wiki/index.php/Best_practices_for_machi ... rce_fields and even increased the radius cutoffs (ML_RCUT1 = 9, ML_RCUT2 = 7), there has been little improvement in the MLFF results.

Could you please advise on the most effective strategies to enhance my MLFF for NbSe2? Any guidance or suggestions would be immensely appreciated.

Thank you very much for your assistance!
Best regards,
Yubi

Posted: **Fri Jun 14, 2024 8:14 am**

Dear yubi_chen,

Indeed it does look like the performance could be improved. Would you mind sharing the INCAR files for the runs where on-the-fly training was done and where training was done on the randomly sampled structures?

Sudarshan

Posted: **Fri Jun 14, 2024 6:58 pm**

Dear Sudarshan,

Thank you for your prompt response! Attached are the INCAR files for both the on-the-fly training and the manually-sampled-structure select mode training.

For context, the training supercell contains 216 atoms, with lattice parameters a=b=21Å and c=13Å. The k-grid sampling is Gamma-centered 2*2*2. Initially, I considered giving up on-the-fly training because I think there is not enough sampling near the potential well phase space. To address this, I increased sampling around the potential well manually. This time, I've set the upper bound of the manually selected structures to be within 0-5eV relative to the pristine energy, with 90% of the 1000 training structures lying within 0-1eV. Despite these efforts, the performance of the MLFF remains unsatisfactory. I suspect that adjustments in training parameters could enhance the amount of local reference configurations near the potential well. Could you provide some guidance on which parameters might be most effective in improving this aspect?

I look forward to your suggestions and am happy to provide any additional information you might need.

Best regards,
Yubi

Posted: **Mon Jun 24, 2024 9:43 am**

Dear Yubi,

There are a couple of things that you could try here:

To make sure that you are picking up enough structures during training, consider adjusting ML_CTIFOR, ML_ICRITERIA and ML_SCLC_CITFOR. There is a description of these parameters under the "on-the-fly" parameters section here: wiki/index.php/Best_practices_for_machi ... rce_fields

Consider your selection of ML_EPS_LOW, when you refit (wiki/index.php/ML_EPS_LOW)

Sudarshan

My Community

How to train small energy difference accurately in MLFF

How to train small energy difference accurately in MLFF

Re: How to train small energy difference accurately in MLFF

Re: How to train small energy difference accurately in MLFF

Re: How to train small energy difference accurately in MLFF