Scientific Computing Corner: Optimization tips
Simple optimization techniques on DataStar
—Nick Wright
In this article we explore some simple techniques one can use on DataStar in order to improve the performance of your code. To be able to use this guide, one needs a small test problem, with a known result, for the purposes of comparison and accuracy checking.
Compiler Flags
Perhaps the simplest method to achieve better performance from one's code is simply to compile it using a higher level of optimization than the default.
Start with
This gives the compiler a little more room to optimize and may
produce faster code in some cases. However, it may produce
code that does not strictly perform IEEE-compliant arithmetic,
so careful validation of the results of programs compiled
without -qstrict is recommended.
The -qarch argument produces an executable program that contains
machine instructions specific to the processor it was compiled
on, and the -qtune argument specifies the type of processor for
which the program should be tuned to produce the best performance.
Both of these options are recommended to be used all the time
on DataStar, as they ensure the executable that is produced takes
into account the architecture of the machine in an optimal way.
Some codes also benefit from even higher levels of optimization.
Other flags to try are -qhot (high-order analysis of loop structures)
and -qipa (interprocedural analysis). The man page for the compiler
and the NERSC
Web page provide more information.
Use-Optimized Libraries
- Mass library(-lmass)
This library provides tuned versions of mathematical intrinsic functions, such as cos or exp. It can speed up the evaluation of a mathematical function from 1.2 to 5 times, and if a code makes heavy use of such functions, significant performance gains can be achieved. Note that this library is not as accurate as the system library, so testing is recommended.
- ESSL/PESSL library
-lessl (scalar) -lpessl (parallel)
This library contains highly optimized serial and parallel mathematical and scientific algorithms including:
- BLAS Levels 2 and 3
- ScaLAPACK subset
- FFTs
- Linear Algebra Subprograms
- Matrix Operations
- Linear Algebraic Equations
- Eigen System Analysis
- Fourier Transforms, Convolutions, and Correlations
- Sorting and Searching
- Interpolation
- Numerical Quadrature
- Random Number Generations
The versions of the algorithms in this library are often at least 50% faster than one could
achieve by hand-coding them, so whenever possible—especially if a code makes heavy use of
such techniques—their use is recommended.
Beyond these relatively simple techniques there is a vast array of methods that can be used to optimize the performance of a program. Please contact us at consult@sdsc.edu if you have any questions or need assistance with optimizing your code.
Nick Wright is reachable via e-mail at nwright@sdsc.edu
|