SDSC Thread Graphic Issue 4, February 2006

RSS RSS Feed (What is this?)

User Services Director:
Anke Kamrath

Subhashini Sivagnanam

Graphics Designer:
Diana Diehl

Application Designer:
Fariba Fana

Scientific Computing Corner: Optimization tips

Simple optimization techniques on DataStar

—Nick Wright

In this article we explore some simple techniques one can use on DataStar in order to improve the performance of your code. To be able to use this guide, one needs a small test problem, with a known result, for the purposes of comparison and accuracy checking.

Compiler Flags

Perhaps the simplest method to achieve better performance from one's code is simply to compile it using a higher level of optimization than the default.

Start with

  • -O3 -qtune=auto -qarch=auto -qstrict
  • Some speed up may be obtained by removing the -qstrict option

  • -O3 -qtune=auto -qarch=auto

This gives the compiler a little more room to optimize and may produce faster code in some cases. However, it may produce code that does not strictly perform IEEE-compliant arithmetic, so careful validation of the results of programs compiled without -qstrict is recommended.

The -qarch argument produces an executable program that contains machine instructions specific to the processor it was compiled on, and the -qtune argument specifies the type of processor for which the program should be tuned to produce the best performance. Both of these options are recommended to be used all the time on DataStar, as they ensure the executable that is produced takes into account the architecture of the machine in an optimal way.

Some codes also benefit from even higher levels of optimization. Other flags to try are -qhot (high-order analysis of loop structures) and -qipa (interprocedural analysis). The man page for the compiler and the NERSC Web page provide more information.

Use-Optimized Libraries

  • Mass library(-lmass)
    This library provides tuned versions of mathematical intrinsic functions, such as cos or exp. It can speed up the evaluation of a mathematical function from 1.2 to 5 times, and if a code makes heavy use of such functions, significant performance gains can be achieved. Note that this library is not as accurate as the system library, so testing is recommended.
  • ESSL/PESSL library -lessl (scalar) -lpessl (parallel)
    This library contains highly optimized serial and parallel mathematical and scientific algorithms including:
    • BLAS Levels 2 and 3
    • ScaLAPACK subset
    • FFTs
    • Linear Algebra Subprograms
    • Matrix Operations
    • Linear Algebraic Equations
    • Eigen System Analysis
    • Fourier Transforms, Convolutions, and Correlations
    • Sorting and Searching
    • Interpolation
    • Numerical Quadrature
    • Random Number Generations
    The versions of the algorithms in this library are often at least 50% faster than one could achieve by hand-coding them, so whenever possible—especially if a code makes heavy use of such techniques—their use is recommended.

Beyond these relatively simple techniques there is a vast array of methods that can be used to optimize the performance of a program. Please contact us at if you have any questions or need assistance with optimizing your code.

Nick Wright is reachable via e-mail at

Did you know ..?

Always use MP_INFOLEVEL environment variable or the -infolevel option when you invoke POE to help trouble shooting abnormal job termination problems, for example:
cp: cannot stat `/dsgpfs/username/dir1/program': A file or directory in the path name does not exist.
ERROR: 0031-250 task 160: Terminated
Setting either of these to 6 gives you the maximum number of diagnostic messages when you run your program. - Eva Hocks.