Page 1 of 1
ScaLAPACK .VS. LAPACK
Posted: Wed Jan 27, 2010 3:44 pm
by zoowe
Hi vaspers!
I am wondering when we should use ScaLAPACK or LAPACK. In other words, what will we gain/lose when using ScaLAPACK or LAPACK?
I got more or less the same CPUs time when using ScaLAPACK or LAPACK (with various number of CPUS).
I used MKL10, intel compiler 10.1, fftw-3.2.1.
D.
ScaLAPACK .VS. LAPACK
Posted: Wed Jan 27, 2010 4:20 pm
by panda
The original goal of the LAPACK project was to make the widely used EISPACK and LINPACK libraries run efficiently on shared-memory vector and parallel processors. On these machines, LINPACK and EISPACK are inefficient because their memory access patterns disregard the multi-layered memory hierarchies of the machines, thereby spending too much time moving data instead of doing useful floating-point operations. LAPACK addresses this problem by reorganizing the algorithms to use block matrix operations, such as matrix multiplication, in the innermost loops. These block operations can be optimized for each architecture to account for the memory hierarchy, and so provide a transportable way to achieve high efficiency on diverse modern machines. We use the term "transportable" instead of "portable" because, for fastest possible performance, LAPACK requires that highly optimized block matrix operations be already implemented on each machine.
Highly efficient machine-specific implementations of the BLAS are available for many modern high-performance computers. For details of known vendor- or ISV-provided BLAS, consult the BLAS FAQ. Alternatively, the user can download ATLAS to automatically generate an optimized BLAS library for the architecture. A Fortran77 reference implementation of the BLAS in available from netlib; however, its use is discouraged as it will not perform as well as a specially tuned implementation.
also see:
http://cms.mpi.univie.ac.at/vasp-forum/ ... php?2.4050
have you tried BLAS and ATLAS as well?
ScaLAPACK .VS. LAPACK
Posted: Wed Jan 27, 2010 4:24 pm
by panda
I think you answered your own ? though, that you get more or less the same performance, it may be that your system is already optimized for both libraries, and if so, that's great! I have not experienced very much difference when using BLAS or ATLAS or LAPACK or mpi versus openmp and etc.... for VASP or otherwise.
ScaLAPACK .VS. LAPACK
Posted: Wed Jan 27, 2010 7:29 pm
by zoowe
Thank Panda, maybe you missed understand my question. I want to compare ScaLAPACK with LAPACK.
[quote="panda"]
also see:
http://cms.mpi.univie.ac.at/vasp-forum/ ... php?2.4050
[/quote]
Yeah, I read this thread weeks ago but there is no information I need [/b]
[quote="panda"]
have you tried BLAS and ATLAS as well?
[/quote]Maybe I don't get your point here. ATLAS contains BLAS and FEW routines of LAPACK. Usually, if we want to use ATLAS, we ONLY use BLAS from ATLAS and LAPACK from other resource (netlib, ../vasp.X.lib, or others)[/b]
ScaLAPACK .VS. LAPACK
Posted: Wed Jan 27, 2010 8:03 pm
by forsdan
For clusters with Infinibath interconnect ScaLAPACK can be crucial in order to be able to run large systems on many cores. For our infiniband clusters, investigations for different number of cores beween 64 to 256 cores show that LAPACK provides a rather poor scaling. However, the use ScaLAPACK can yield a next to linear scaling. So this is what you gain.
On GBit interconnect clusters we don't see any direct difference between LAPACK and ScaLAPACK, so we tend to never use ScaLAPACK on those clusters.
Best regards,
/Dan
<span class='smallblacktext'>[ Edited Wed Jan 27 2010, 09:05PM ]</span>
ScaLAPACK .VS. LAPACK
Posted: Wed Jan 27, 2010 11:00 pm
by zoowe
Thank Dan,
I did benchmark test with VASP/LAPACK. I got the similar result: scaling factor is very poor with higher 48 cores in our cluster.
I haven't done any test with ScaLAPACK using that many cores.
Thank you for your input.
D.
<span class='smallblacktext'>[ Edited Thu Jan 28 2010, 02:47AM ]</span>
ScaLAPACK .VS. LAPACK
Posted: Fri Jan 29, 2010 5:48 pm
by panda
My point was that I have tested ScaLAPACK, LAPACK, ATLAS, and BLAS and don't see any significant performance differences