Page 1 of 1

Different results from different computer clusters?

Posted: Sun Nov 02, 2008 4:26 am
by fanghz
Dear Admin,

I've performed MD calculations on two different computer clusters. using exactly the same input files (INCAR,POSCAR,KPOINTS,POTCAR). But the results turn out to be very different in view of the energy.

one is
500 T= 2000. E= -.32795538E+03 F= -.36647510E+03 E0= -.36645879E+03 EK= 0.38520E+02 SP= 0.00E+00 SK= 0.00E+00
and the other is
500 T= 2000. E= -.76886428E+03 F= -.80738400E+03 E0= -.80736331E+03 EK= 0.38520E+02 SP= 0.00E+00 SK= 0.00E+00

I'm simulating the liquid structure at 2000K using an initial random supercell. I'm really puzzled about this problem, since my input files are exactly the same. Could this large energy difference result from the different hardware and software configurations of the two clusters? Or it result from the MD algorithm or the electronic iterating process?

Different results from different computer clusters?

Posted: Sun Nov 02, 2008 11:36 am
by forsdan
Just to check: Have you explicitly defined your velocity distribution in the POSCAR file? Otherwise the input-files will never be the same since the velocities are randomly assigned according to a Maxwell-Boltzmann distribution before the calculation starts. In this case it's meaningless to compare the last step in the MD calculation. You should plot all energies as a function of time to see the evolution and how it fluctuates.

However the energy differences in your case seems to be very high wrt to the energy values themselves. It might be that the first mentioned calculation simply havn't reached equilibirum while the other has. Check the time evolution first and see if the systems behave reasonably.

Best regards,
/Dan Fors


<span class='smallblacktext'>[ Edited Sun Nov 02 2008, 12:37PM ]</span>

Different results from different computer clusters?

Posted: Wed Nov 12, 2008 3:33 pm
by admin
please also do very simple standard tests (like the optimization of a simple bulk unit cell at 0K) on both clusters, where sources of difference like different time evolutions may be excluded. Use the same # of CPUs on both clusteres for the tests, to make sure that differences in the energies are not due to different NBANDS. The converged energies have to be the same to within 0.000001eV, and the equilibrium geometries have to be exactly the same as well.
After all, there remains the possibility that that one of the 2 clusters simply gives wrong results :( (due to over-optimized compilation, errors in the implementations of libraries, damaged CPUs,...)