Compiling on DataStar
Programming Models
DataStar supports both shared memory (OpenMP or Pthreads), message passing (MPI), and mixed mode (OpenMP + MPI) programming models. Shared memory programming is done within each node and message passing is done between nodes. Mixed mode programming is useful for those applications with a limited degree of task level parallelism.
Memory Usage Modes
The Datastar environment supports compilation in both 32-bit and 64-bit addressing mode. It is recommended that users compile in 64-bit mode unless they have compatibility issues within their code. To compile in 64-bit mode, use the option -q64 or set the environment variable OBJECT_MODE to 64 (setenv OBJECT_MODE 64) (this sets the default mode to 64 bit). Use only thread-safe (_r) compilers when compiling in the 64 bit mode.
One advantage of compiling in 64 bit mode is that some codes may run faster due to better memory performance. Another advantage is that memory use is limited only by the hardware but not the executable, while with 32 bits there is a limit on memory used by the executable. By default this limit is 128 MB for heap and 32 MB for stack. If a program allocates memory in excess of this limit, it will abort with an error message like the following:
exec():
0509-036 Cannot load program a.out because of the following errors:
0509-026 System error: There is not enough memory available now.
The memory limit can be raised by using compiler flags -bmaxstack and -bmaxdata. For example, -bmaxdata:0x80000000 sets the limit on heap to 2 GB (0x means the following number is hexadecimal, so 8 times 167 B = 2 GB). Again, it is not necessary to use these flags with 64 bit mode as the memory is unlimited by default.
In both cases (64 and 32 bits) the actual amount of memory available on the node is a little less than the total memory the node has, due to a space reserved for operating system. In the case of p655 (8-way) nodes the available memory for user applications is about 13.5 GB out of 16 GB per node.
If an application allocates more than that amount, paging to disk starts occuring, which will significantly decrease performance, and in a case of sufficiently large paging may exhaust paging space and thus cause system problems. Therefore the users are urged to monitor the amount of memory their programs use and keep it within the hardware memory limits. If desired, the users may use hpmcount to measure their application's memory highwater mark.
Most programs are easy to port from 32-bit to 64-bit architectures. However the users should keep in mind possible issues. The most common problem stems from using 32-bit library components. When compiling in any given mode, all components need to be compiled in that mode. Other issues have to do with variable length. For example, the size of long is different in these two modes, so interchangable use of int and long may cause problems in porting. Similarly, pointers are now variables 64 bits in length, so interchanging ints with pointers is problem-prone. Users can get warned about these kinds of issues if -qwarn64 flag is used when compiling. Please contact consult@sdsc.edu if you experience porting problems or have questions.
Serial Programs
Compile your programs with xlc_r, xlC_r, xlf90_r and xlf95_r. Compiler names ending with _r indicate thread-safe versions of the real compiler. They are recommended for most programming models and mandatory for 64-bit and multithreaded programs. Unless the -c option is used, these utilities produce a single executable. For more information, read the C Compiler Guide and Fortran compiler man pages.
| Command | Compiles | xlc_r options file.c | C |
| xlC_r options file.C | C++ |
| xlf_r options file.f | Fortran 77 |
| xlf90_r options file.f | Fortran 90 |
| xlf95_r options file.f | Fortran 95 |
There are examples of compiler options on dslogin.sdsc.edu at /usr/local/apps/examples. Suggested compiler options include the following:
| Option | Definition |
| -O3 | Optimization level 3 (default is level 2). |
| -qstrict | Used with -O3 to ensure compiler optimization does not alter program semantics. Use only when necessary as it may reduce optimization. |
| -q64 | Compiles in 64 bit mode. This flag should be used only with thread-safe compilers (xlf_r, xlc_r, etc.). |
| -qarch=pwr4 | Produces an object that contains instructions that run on the POWER4 hardware platforms. |
| -qsave (Fortran) -qnosave |
Sets the default storage class for local variables to static. Sets the default storage class for local variables to automatic. |
| -qsigtrap
-qsigtrap=xl__ieee
-qsigtrap=xl__trce -qsigtrap=xl__trcedump |
Turns on a handler for trapped exceptions. Use this option with -g,
-C, and -qflttrap. By
default, you get a core dump when an error occurs. There are other exception
handlers available.
Produces a traceback and an explanation of the signal and continues execution by supplying the default IEEE result for the failed computation. Produces a traceback and stops the program. Produces a traceback and a core file and stops the program. |
| -qflttrap =enable:zerodivide:invalid:overflow -qsigtrap -g |
Detect floating point exceptions. This may slow execution - use only for debugging). See page 314 of the Fortran Language Manual for discussion of exception trapping and handling. |
| -qtune=pwr4 | Produces an object optimized for the POWER4 hardware platforms. |
| -g | Includes debugger information in the object files. This will work with all levels of optimization. |
| -C or -qcheck | Array bounds checking (this may slow execution - should be used only for debugging). |
These flags are only suggestions, and every program performs differently with different compiler flags. We suggest that you try various combinations of the above to see which flags are optimal (and produce correct results) for your program. For more information, read the C Compiler Guide and Fortran compiler man pages.
As an example, to compile the simple F90 program serial_pi.f to compute pi by integrating the function 4.0 / (1.0 + x2) from x=0 to x=1 (the file timenow.c contains an accurate wall-clock timing routine):
xlc_r -qtune=pwr4 -qarch=pwr4 -O3 -c timenow.c
xlf90_r -qtune=pwr4 -qarch=pwr4 -O3 -o serial_pi serial_pi.f timenow.o
Shared Memory Programs
All Datastar nodes are shared memory SMP nodes (some 8-way and some 32-way). One way to parallelize programs is to use multithreading, with each processor (thread) working on its assigned portion of data in the node's shared memory.
The IBM compilers are capable of automatically parallelizing some code sections, usually loop structures, to create multithreaded code. In such cases, it is important to ensure that the correct program behavior follows. The -qsmp or -qsmp=auto compiler option turns on the automatic parallelization feature. Please keep in mind that often the resulting program runs more slowly since the compiler will attempt to parallelize every loop, even those for which the overhead associated with executing parallel threads is not compensated for by the amount of computational work that is done in parallel. It is possible to get a report listing the transformations done by the compiler to parallelize the code by using -qreport=smplist option.
In cases when the source code is too complex to be automatically parallelized the programmer may want to identify loops with enough work to justify parallelization of them, and provide the compiler with information on which variables should be held as shared and which kept private among threads. This can be done either with IBM directives (Fortran) or pragmas (C/C++) (see the IBM Language User Guides) or using OpenMP directives (for better portability) or Pthreads. The -qsmp=noauto option informs the compiler to turn off automatic parallelization and create a multi-threaded program based on directives in any SMP syntax (SMP$, IBMP or $OMP triggers). The -qsmp=omp option also turns off autoparallelization, and restricts the directive syntax to OMP. For information on OpenMP syntax please visit the OpenMP website.
Note: When using OpenMP with Fortran codes use the -qnoswapomp flag. This allows the runtime environment OpenMP external functions (omp_*) to work properly. Also be sure to specify the -qnosave flag on the compilation line (it's default for xlf90_r). This ensures that local variables are placed on the stack, resulting in the correct functioning of shared memory programs.
Message Passing Programs
For message-passing programs, the Parallel Environment (PE) has several compiler scripts which link the startup and message passing libraries. Compile your programs with mpcc, mpCC, and mpxlf90 (these automatically link with the appropriate MPI libraries). The thread-safe library (_r) is required for 64-bit MPI compilation, but is optional (though recommended) for 32-bit. There are examples of the use of some compiler options on dslogin.sdsc.edu at /usr/local/apps/examples.
| 32-bit | 64-bit | Definition |
| mpcc options file.c | mpcc_r -q64 options file.c | (C) |
| mpCC options file.C | mpCC_r -q64 options file.C | (C++) |
| mpxlf options file.f | mpxlf_r -q64 options file.f | (FORTRAN 77) |
| mpxlf90 options file.f | mpxlf90_r -q64 options file.f | (FORTRAN 90) |
| mpxlf95 options file.f | mpxlf95_r -q64 options file.f | (FORTRAN 95) |
For best performance of message passing programs (MPI), use the serial compiler switches given previously. For more information on these and other compiler switches, please refer to the man page for the respective compiler.
A list of IBM MPI environment variables and their default values is given in the IBM Parallel Environment (PE) for AIX 5L V4.1: MPI Programming Guide
Mixed Mode Programs
This programming approach should be considered by those users with applications that have a limited degree of task-level parallelism or which exhibit inadequate scaling at large numbers of processors. While it may be possible to improve the performance of an application by using this hybrid programming style, user experience generally has been that performance decreases or the gains are small relative to the amount of effort required.
For mixed-mode (MPI and shared memory) programs, the Parallel Environment (PE) has several compiler wrappers which link in the shared memory and message passing libraries. These are mpcc_r, mpCC_r, and mpxlf90_r. The thread-safe library (_r) is required for both 32-bit and 64-bit mixed mode compilation. There are examples of the use of some compiler options on dslogin.sdsc.edu at /usr/local/apps/examples.
For best performance of mixed mode programs, use the serial compiler switches given previously and add: -qsmp or -qsmp=noauto. For more information on these and other compiler switches, please refer to the man page for the respective compiler.
**When using the mpxlf_r compiler, be sure to specify the -qnosave flag on the compilation line. This ensures that local variables are placed on the stack, resulting in the correct functioning of mixed mode programs.



