Re: Parallel Wannier Projections
Posted: Fri Sep 03, 2021 10:32 pm
Dear Henrique,
Thanks for the reply. One thing I just realized I forgot to mention is that my newest calculations are spin-polarized, so these would normally take half as much time to run if they were instead spin-unpolarized. Perhaps that is what you are thinking about, in terms of the run time. Indeed, previous calculations I did years ago for my PhD work on spin-unpolarized systems took only 2-4 days at very high quality settings (2*default ENCUT, dense k-points, etc).
Furthermore, for my current spin-polarized calculations, I require about 3x the number of occupied bands because I need to wannierize the bottom 492 conduction states during the final wannier localization step. I have 492 occupied bands, so I have set NBANDS = 1476 for my conversion process in order to reliably resolve the first 492 conduction bands. Perhaps I can reduce NBANDS to something like 1100 or 1200 instead, but I need to ensure that the 984th band is reasonably "converged" in order to trust the final wannier step. To summarize, I'm really pushing these calculations in ways that, perhaps, haven't been done before, so the run times have turned out to be pretty severe as a result.
For my current calculations that are causing me issues, I'll address each of your points below. See attached files if you want to see exactly what I'm doing for the vasp2wannier conversion step.
1) I am using a fairly small vacuum spacing, at least with respect to the calculations I mentioned above: only 13.3 Angstroms. An issue with my current calculations is that I have a molecule attached to one surface facet, and this molecule is quite long on its own. Therefore, even though I am only using 13.3 Angstroms of vacuum spacing, the total surface-normal lattice vector is about 50 Angstroms in length. I could reduce the vacuum to 10 Angstroms, potentially, but that will only help shave off 1 day of time at most. Still, perhaps this is worth doing, so I'll consider it.
2) I split my calculations up into (1) a wavefunction generation step, (1) a vasp2wannier conversion step where I use IALGO=2 to read in the previous WAVECAR file, and (3) a wannier localization step running wannier90.x as a separate calculation. Therefore, for step 2 which is important for generating the wannier90.* files, I do not use NCORE or KPAR since the former results in a crash and the latter does not help. For the wavefunction generation step 1, I use both NCORE and KPAR, but that is its own calculation and has no effect on the vasp2wannier conversion process. As for the core count --- for the vasp2wannier conversion step, I use anywhere from 4-12 cores on my University supercomputer's high-memory nodes (2 TB of memory across 48 cores) since this step is very memory intensive. I have found that it doesn't really matter how many cores I use, so I typically use the smallest number of cores that gives me adequate memory. The MMN file writing seems to parallelize over the number of cores somewhat, but these files only take a few hours to write from the onset (roughly 4 hours per file on 4 cores, or 8 hours in total for the up and down channels together). However, the AMN file writing is largely unaffected by the number of cores. This is related to your reply above from Apr 28, 2021, where you noticed that using VASP compiled with Intel and MKL did not result in AMN parallelization over cores. This is what I have noticed as well over the years, since I compile VASP with Intel compilers. Furthermore, this is at the heart of my problem --- the AMN file writing is the bottleneck of my vasp2wannier calculations, taking up almost the entirety of the total run time. To be precise, the AMN file writing takes about 116 hours to complete for both the up and down channels, and the total runtime of the entire calculation, start to finish, was about 131 hours. Perhaps if I reduce the vacuum to 10 Angstroms and reduce NBANDS to 1100 or 1200, this run time can fall under 100 hours. In which case, I now have some room to breathe. However, I would still be interested in a KPAR parallelization routine.
3) See the end of my reply for the above point about AMN files and Intel-compiled VASP.
4) I do not use LWANNIER90_RUN=.TRUE. for the reason you mentioned.
To address your last comment and to reiterate a point I made previously: I have 5 unique k-points (a 3x3x1 gamma-centered grid and ISYM=0), so if I could use KPAR=5 for these vasp2wannier calculations, I could potentially see a substantial decrease in total run time. I know KPAR doesn't result in a perfectly 1:1 improvement, but I believe at worst, I could see a 2x speedup in my calculations. And more realistically, I could more likely see a 3-4x speedup in these calculations, which would make this 7-day max wall-time a non-issue.
Many thanks,
Peyton Cline
Thanks for the reply. One thing I just realized I forgot to mention is that my newest calculations are spin-polarized, so these would normally take half as much time to run if they were instead spin-unpolarized. Perhaps that is what you are thinking about, in terms of the run time. Indeed, previous calculations I did years ago for my PhD work on spin-unpolarized systems took only 2-4 days at very high quality settings (2*default ENCUT, dense k-points, etc).
Furthermore, for my current spin-polarized calculations, I require about 3x the number of occupied bands because I need to wannierize the bottom 492 conduction states during the final wannier localization step. I have 492 occupied bands, so I have set NBANDS = 1476 for my conversion process in order to reliably resolve the first 492 conduction bands. Perhaps I can reduce NBANDS to something like 1100 or 1200 instead, but I need to ensure that the 984th band is reasonably "converged" in order to trust the final wannier step. To summarize, I'm really pushing these calculations in ways that, perhaps, haven't been done before, so the run times have turned out to be pretty severe as a result.
For my current calculations that are causing me issues, I'll address each of your points below. See attached files if you want to see exactly what I'm doing for the vasp2wannier conversion step.
1) I am using a fairly small vacuum spacing, at least with respect to the calculations I mentioned above: only 13.3 Angstroms. An issue with my current calculations is that I have a molecule attached to one surface facet, and this molecule is quite long on its own. Therefore, even though I am only using 13.3 Angstroms of vacuum spacing, the total surface-normal lattice vector is about 50 Angstroms in length. I could reduce the vacuum to 10 Angstroms, potentially, but that will only help shave off 1 day of time at most. Still, perhaps this is worth doing, so I'll consider it.
2) I split my calculations up into (1) a wavefunction generation step, (1) a vasp2wannier conversion step where I use IALGO=2 to read in the previous WAVECAR file, and (3) a wannier localization step running wannier90.x as a separate calculation. Therefore, for step 2 which is important for generating the wannier90.* files, I do not use NCORE or KPAR since the former results in a crash and the latter does not help. For the wavefunction generation step 1, I use both NCORE and KPAR, but that is its own calculation and has no effect on the vasp2wannier conversion process. As for the core count --- for the vasp2wannier conversion step, I use anywhere from 4-12 cores on my University supercomputer's high-memory nodes (2 TB of memory across 48 cores) since this step is very memory intensive. I have found that it doesn't really matter how many cores I use, so I typically use the smallest number of cores that gives me adequate memory. The MMN file writing seems to parallelize over the number of cores somewhat, but these files only take a few hours to write from the onset (roughly 4 hours per file on 4 cores, or 8 hours in total for the up and down channels together). However, the AMN file writing is largely unaffected by the number of cores. This is related to your reply above from Apr 28, 2021, where you noticed that using VASP compiled with Intel and MKL did not result in AMN parallelization over cores. This is what I have noticed as well over the years, since I compile VASP with Intel compilers. Furthermore, this is at the heart of my problem --- the AMN file writing is the bottleneck of my vasp2wannier calculations, taking up almost the entirety of the total run time. To be precise, the AMN file writing takes about 116 hours to complete for both the up and down channels, and the total runtime of the entire calculation, start to finish, was about 131 hours. Perhaps if I reduce the vacuum to 10 Angstroms and reduce NBANDS to 1100 or 1200, this run time can fall under 100 hours. In which case, I now have some room to breathe. However, I would still be interested in a KPAR parallelization routine.
3) See the end of my reply for the above point about AMN files and Intel-compiled VASP.
4) I do not use LWANNIER90_RUN=.TRUE. for the reason you mentioned.
To address your last comment and to reiterate a point I made previously: I have 5 unique k-points (a 3x3x1 gamma-centered grid and ISYM=0), so if I could use KPAR=5 for these vasp2wannier calculations, I could potentially see a substantial decrease in total run time. I know KPAR doesn't result in a perfectly 1:1 improvement, but I believe at worst, I could see a 2x speedup in my calculations. And more realistically, I could more likely see a 3-4x speedup in these calculations, which would make this 7-day max wall-time a non-issue.
Many thanks,
Peyton Cline