How to improve parallelised calculation ?
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
Dear vasp users:
I am running vasp 5.2 on suse linux operatiing system with Infiniband network card(20 GB/S).
My question is: if I wanna calculate on 2 nodes or even more nodes(each nodes has 12 processors) , how should I set the NPAR, LPLANE and other parameters about parallelisation??
Thanks a lot in advance:)
I am running vasp 5.2 on suse linux operatiing system with Infiniband network card(20 GB/S).
My question is: if I wanna calculate on 2 nodes or even more nodes(each nodes has 12 processors) , how should I set the NPAR, LPLANE and other parameters about parallelisation??
Thanks a lot in advance:)
Last edited by vasp16888 on Sun Apr 25, 2010 8:41 pm, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
Can somebody give me some tips, thanks a lot:)
Last edited by vasp16888 on Mon Apr 26, 2010 6:44 am, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS
-
- Hero Member
- Posts: 585
- Joined: Tue Nov 16, 2004 2:21 pm
- License Nr.: 5-67
- Location: Germany
How to improve parallelised calculation ?
Nobody will answer faster if you scream around. This part of the forum is voluntarily.
About your question: You have to try on your own. It'll depend on number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu and so on. I hope you got an idea ...
Start with NPAR = 1 and remove LPLANE fron INCAR.
Hth
alex
About your question: You have to try on your own. It'll depend on number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu and so on. I hope you got an idea ...
Start with NPAR = 1 and remove LPLANE fron INCAR.
Hth
alex
Last edited by alex on Mon Apr 26, 2010 1:01 pm, edited 1 time in total.
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
[quote="20px"]scream[/size] around. This part of the forum is voluntarily.
About your question: You have to try on your own. It'll depend on number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu and so on. I hope you got an idea ...
Start with NPAR = 1 and remove LPLANE fron INCAR.
Hth
alex
[/quote]First[/color] I have to say, sorry, I am a new comer of vasp forum, and there are a lot of things to learn here, thank you for your suggestions(all of them).
Second, about your reply (number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu), I already know it and tested some of them before I posted the thread. But thanks anyway.
Third, I read the userguide, it said:
LPLANE = .TRUE.
NPAR = number of nodes.
LSCALU = .FALSE.
NSIM = 4
but the improvement of parallelisation is not obvious.
My supercomputer's network card is Inifniband (20GB/S), and all hardwares are the latest, I just wanna know how to deal with large system efficiently(test many times, but failed).
If your time permiting, any suggestion will be greatly appreciated, sorry to bother you:)
Hui
<span class='smallblacktext'>[ Edited Mon Apr 26 2010, 08:53PM ]</span>
About your question: You have to try on your own. It'll depend on number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu and so on. I hope you got an idea ...
Start with NPAR = 1 and remove LPLANE fron INCAR.
Hth
alex
[/quote]First[/color] I have to say, sorry, I am a new comer of vasp forum, and there are a lot of things to learn here, thank you for your suggestions(all of them).
Second, about your reply (number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu), I already know it and tested some of them before I posted the thread. But thanks anyway.
Third, I read the userguide, it said:
LPLANE = .TRUE.
NPAR = number of nodes.
LSCALU = .FALSE.
NSIM = 4
but the improvement of parallelisation is not obvious.
My supercomputer's network card is Inifniband (20GB/S), and all hardwares are the latest, I just wanna know how to deal with large system efficiently(test many times, but failed).
If your time permiting, any suggestion will be greatly appreciated, sorry to bother you:)
Hui
<span class='smallblacktext'>[ Edited Mon Apr 26 2010, 08:53PM ]</span>
Last edited by vasp16888 on Mon Apr 26, 2010 6:51 pm, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS
-
- Full Member
- Posts: 201
- Joined: Thu Nov 02, 2006 4:35 pm
- License Nr.: 5-532
- Location: Ghent, Belgium
- Contact:
How to improve parallelised calculation ?
The only way to do this is the hard way.
Step one, find a calculation which is welbehaved and runs ~5-20h on a single CPU containing ~100 atoms
do a calculation on multiple nodes/CPU's with LPLANE=.TRUE. and one with LPLANE=.False.
You should see a clear difference (sometime 50%) in time. (note: you need to use exactly the same CPU configuration)
After this you choose the best LPLANE value.
Second step: NSIM and NPAR, these two parameters seem connected and their behavior seems quite system dependent.
Choose a set of values NPAR, and NSIM (choose wisely so that they have some physical meaning wrt your cpu's and nodes)
and loop over all of them doing your test calculation.(i.e. #NPARx#NSIM calculations) and find a trend.
Default values are NSIM=1, NPAR=#CPU's.
The current machine I run on needs NSIM=8-16, NPAR=#nodes/2
(a former machine needed NPAR=#cores, while another one needed NPAR=#nodes)
It takes time but it is worth it.
Cheers
Danny
Step one, find a calculation which is welbehaved and runs ~5-20h on a single CPU containing ~100 atoms
do a calculation on multiple nodes/CPU's with LPLANE=.TRUE. and one with LPLANE=.False.
You should see a clear difference (sometime 50%) in time. (note: you need to use exactly the same CPU configuration)
After this you choose the best LPLANE value.
Second step: NSIM and NPAR, these two parameters seem connected and their behavior seems quite system dependent.
Choose a set of values NPAR, and NSIM (choose wisely so that they have some physical meaning wrt your cpu's and nodes)
and loop over all of them doing your test calculation.(i.e. #NPARx#NSIM calculations) and find a trend.
Default values are NSIM=1, NPAR=#CPU's.
The current machine I run on needs NSIM=8-16, NPAR=#nodes/2
(a former machine needed NPAR=#cores, while another one needed NPAR=#nodes)
It takes time but it is worth it.
Cheers
Danny
Last edited by Danny on Tue Apr 27, 2010 1:46 pm, edited 1 time in total.
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
[quote author=.TRUE. and one with LPLANE=.False.
You should see a clear difference (sometime 50%) in time. (note: you need to use exactly the same CPU configuration)
After this you choose the best LPLANE value.
Second step: NSIM and NPAR, these two parameters seem connected and their behavior seems quite system dependent.
Choose a set of values NPAR, and NSIM (choose wisely so that they have some physical meaning wrt your cpu's and nodes)
and loop over all of them doing your test calculation.(i.e. #NPARx#NSIM calculations) and find a trend.
Default values are NSIM=1, NPAR=#CPU's.
The current machine I run on needs NSIM=8-16, NPAR=#nodes/2
(a former machine needed NPAR=#cores, while another one needed NPAR=#nodes)
It takes time but it is worth it.
Cheers
Danny[/quote]
Hi Danny:
I am little confused about the concept of node, cpu, core, and processor.
In my opinion, for instance: we have 6 nodes which are connected by Infiniband card, and each node has 2 cpus on the motheboard, and each cpu has 6 cores, which means each node has 12 cores. I think processor = cpu.
Please correct me if I am wrong.
Thanks:)
Yours sincerely:
Hui
You should see a clear difference (sometime 50%) in time. (note: you need to use exactly the same CPU configuration)
After this you choose the best LPLANE value.
Second step: NSIM and NPAR, these two parameters seem connected and their behavior seems quite system dependent.
Choose a set of values NPAR, and NSIM (choose wisely so that they have some physical meaning wrt your cpu's and nodes)
and loop over all of them doing your test calculation.(i.e. #NPARx#NSIM calculations) and find a trend.
Default values are NSIM=1, NPAR=#CPU's.
The current machine I run on needs NSIM=8-16, NPAR=#nodes/2
(a former machine needed NPAR=#cores, while another one needed NPAR=#nodes)
It takes time but it is worth it.
Cheers
Danny[/quote]
Hi Danny:
I am little confused about the concept of node, cpu, core, and processor.
In my opinion, for instance: we have 6 nodes which are connected by Infiniband card, and each node has 2 cpus on the motheboard, and each cpu has 6 cores, which means each node has 12 cores. I think processor = cpu.
Please correct me if I am wrong.
Thanks:)
Yours sincerely:
Hui
Last edited by vasp16888 on Wed Apr 28, 2010 3:14 am, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS
-
- Full Member
- Posts: 201
- Joined: Thu Nov 02, 2006 4:35 pm
- License Nr.: 5-532
- Location: Ghent, Belgium
- Contact:
How to improve parallelised calculation ?
Yes you are right, current day machinerie is confusing since node/CPU and core are often used interchangeably.
In the vasp manual the reference to node means actually core. In my case I try to only refer to
1) nodes= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny
In the vasp manual the reference to node means actually core. In my case I try to only refer to
1) nodes= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny
Last edited by Danny on Wed Apr 28, 2010 9:14 am, edited 1 time in total.
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
[quote author= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Thanks, Danny, it's more clear:)
Are the combination you tailed about last time:
NPAR=1, NSIM=1
NPAR=2, NSIM=2
NPAR=4, NSIM=4
NPAR=8, NSIM=6
NPAR=12, NSIM=12
NPAR=24, NSIM=24
if this is the case, I think it is relative easier.
But if it doesn't combine orderly, it's gonna be 36 combinations, this is a really hardwork for our limited computer resources:(
Thanks in advance
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Thanks, Danny, it's more clear:)
Are the combination you tailed about last time:
NPAR=1, NSIM=1
NPAR=2, NSIM=2
NPAR=4, NSIM=4
NPAR=8, NSIM=6
NPAR=12, NSIM=12
NPAR=24, NSIM=24
if this is the case, I think it is relative easier.
But if it doesn't combine orderly, it's gonna be 36 combinations, this is a really hardwork for our limited computer resources:(
Thanks in advance
Last edited by vasp16888 on Thu Apr 29, 2010 1:11 am, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS
-
- Hero Member
- Posts: 585
- Joined: Tue Nov 16, 2004 2:21 pm
- License Nr.: 5-67
- Location: Germany
How to improve parallelised calculation ?
Hi there again,
some hints:
You've got a fast machine with fast network, so start with NPAR = 1 and NSIM = 1 for one number of tasks you are most likely to use most often.
Next: NPAR = # of tasks / 4, NSIM unchanged.
Then: NPAR = # of tasks, NSIM unchanged.
Then take the two fastest and optimize NPAR further. I'd guess, you'll end up at NPAR = 2 or 4.
Then touch NSIM, same game.
Gotcha. Hth
alex
some hints:
You've got a fast machine with fast network, so start with NPAR = 1 and NSIM = 1 for one number of tasks you are most likely to use most often.
Next: NPAR = # of tasks / 4, NSIM unchanged.
Then: NPAR = # of tasks, NSIM unchanged.
Then take the two fastest and optimize NPAR further. I'd guess, you'll end up at NPAR = 2 or 4.
Then touch NSIM, same game.
Gotcha. Hth
alex
Last edited by alex on Thu Apr 29, 2010 7:40 am, edited 1 time in total.
-
- Full Member
- Posts: 201
- Joined: Thu Nov 02, 2006 4:35 pm
- License Nr.: 5-532
- Location: Ghent, Belgium
- Contact:
How to improve parallelised calculation ?
[quote="Danny"]Yes you are right, current day machinerie is confusing since node/CPU and core are often used interchangeably.
In the vasp manual the reference to node means actually core. In my case I try to only refer to
1) nodes= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Nope, I'm affraid it's the 36. Then again If you have a job that takes 10h on 1 core, this job on 24 cores might take 30 to 60 minutes...so you will need <36 hours on 24 cores = <1000 CPU hours ( still reasonable, knowing that you will probably face relaxations that take twice that time for 1 calculation, plus if you can gain 25-30% compared to the normal settings those 1000 hours are recovered quite quickly ;-)
Danny
In the vasp manual the reference to node means actually core. In my case I try to only refer to
1) nodes= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Nope, I'm affraid it's the 36. Then again If you have a job that takes 10h on 1 core, this job on 24 cores might take 30 to 60 minutes...so you will need <36 hours on 24 cores = <1000 CPU hours ( still reasonable, knowing that you will probably face relaxations that take twice that time for 1 calculation, plus if you can gain 25-30% compared to the normal settings those 1000 hours are recovered quite quickly ;-)
Danny
Last edited by Danny on Fri Apr 30, 2010 8:33 am, edited 1 time in total.
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
ok, I am gonna do it:)
Last edited by vasp16888 on Sat May 01, 2010 2:46 am, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
[quote author= 1 and NSIM = 1 for one number of tasks you are most likely to use most often.
Next: NPAR = # of tasks / 4, NSIM unchanged.
Then: NPAR = # of tasks, NSIM unchanged.
Then take the two fastest and optimize NPAR further. I'd guess, you'll end up at NPAR = 2 or 4.
Then touch NSIM, same game.
Gotcha. Hth
alex
[/quote]</span>
Next: NPAR = # of tasks / 4, NSIM unchanged.
Then: NPAR = # of tasks, NSIM unchanged.
Then take the two fastest and optimize NPAR further. I'd guess, you'll end up at NPAR = 2 or 4.
Then touch NSIM, same game.
Gotcha. Hth
alex
[/quote]</span>
Last edited by vasp16888 on Mon May 03, 2010 11:03 pm, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS
-
- Newbie
- Posts: 44
- Joined: Fri Apr 23, 2010 3:09 am
How to improve parallelised calculation ?
[quote="vasp16888"][quote author= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Nope, I'm affraid it's the 36. Then again If you have a job that takes 10h on 1 core, this job on 24 cores might take 30 to 60 minutes...so you will need <36 hours on 24 cores = <1000 CPU hours ( still reasonable, knowing that you will probably face relaxations that take twice that time for 1 calculation, plus if you can gain 25-30% compared to the normal settings those 1000 hours are recovered quite quickly ;-)
Danny[/quote]NPAR, NSIM, and LPLANE [/b]which may improve the efficiency. The result are posted in a new thread:http://cms.mpi.univie.ac.at/vasp-forum/ ... php?4.7257
Please take a look, and there are some questions about the testing result, waiting for your suggestions, thanks in advance.
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Nope, I'm affraid it's the 36. Then again If you have a job that takes 10h on 1 core, this job on 24 cores might take 30 to 60 minutes...so you will need <36 hours on 24 cores = <1000 CPU hours ( still reasonable, knowing that you will probably face relaxations that take twice that time for 1 calculation, plus if you can gain 25-30% compared to the normal settings those 1000 hours are recovered quite quickly ;-)
Danny[/quote]NPAR, NSIM, and LPLANE [/b]which may improve the efficiency. The result are posted in a new thread:http://cms.mpi.univie.ac.at/vasp-forum/ ... php?4.7257
Please take a look, and there are some questions about the testing result, waiting for your suggestions, thanks in advance.
Last edited by vasp16888 on Mon May 03, 2010 11:10 pm, edited 1 time in total.
[align=center]
[/align]AB INITIO STUDY OF MAGNETIC MATERIALS