Page 1 of 1

Segmentation fault when running VASP with ML_LMLFF = .TRUE.

Posted: Sat Apr 16, 2022 9:03 am
by dominika_melicherova1
Dear vasp developers,

I want to run MD simulation with switched on MLFF training but I got segmentation fault error.
I tried to set ulimit -s unlimited but it didn't help.

All files related to the simulation are attached.

Thank you

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

Posted: Tue Apr 19, 2022 7:24 am
by ferenc_karsai
It looks like your scalapack has problems setting up the processor grid. I suspect some problems with scalapack.
Please try compiling without "-DscaLAPACK" and run the code. Most likely you will run out of memory, so run it for something small just for test purposes (let's say 8-16 atoms). If that runs properly please try to switch back scalapack and if then the error comes again we have pinned it down to a faulty scalapack. Also do your tests with a lesser number of cores. Please also test running on one node and on more.

Which toolchains are you using?

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

Posted: Tue Apr 19, 2022 5:25 pm
by dominika_melicherova1
Thank you for your reply,

it seems that scalapack is one part of the problem. I ran a simulation with small cell (8 atoms) using VASP without "-DscaLAPACK" and it was working but it didn't work using VASP with "-DscaLAPACK". Therefore, I ran the simulation with large (144 atoms) supercell without scalapack but there is some other problem. File with error output is attached.

I am using Intel-2021.4.0 and OpenMPI-4.1.2.

Thank you

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

Posted: Thu Apr 21, 2022 7:45 am
by ferenc_karsai
The large calculation most likely went out of memory. In the ML_LOGFILE the memory prediction can be seen per core. In your case it writes:
"Total memory consumption : 16056.6". I guess you don't have 16GB per core available.

Practically the use of scalapack is required to run the code for realistic systems (at least the learning part), because the design matrix needs to be distributed. Without scalapack each core possesses the whole design matrix. With scalapack the distribution of this array is almost perfectly linear with the number of cores. We made the code available to use without scalapack to pin down scalapack errors like in your case.

So this means you need to fix your scalapack installation to be able to run the ML code on your system.

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

Posted: Thu Apr 21, 2022 8:19 am
by dominika_melicherova1
Thank you for your help. I have already solved the problem with scalapack and it works very well now. I had only a trivial mistake in makefile related to linking libraries.

Re: Segmentation fault when running VASP with ML_LMLFF = .TRUE.

Posted: Fri Apr 22, 2022 4:47 am
by ferenc_karsai
Thank you for your reply, I am very glad that it works now.
I am going to close this topic now.