SDSC Thread Graphic Issue 4, February 2006

RSS RSS Feed (What is this?)

User Services Director:
Anke Kamrath

Subhashini Sivagnanam

Graphics Designer:
Diana Diehl

Application Designer:
Fariba Fana

Help Desk: Programming Tips

Specifying the format of corefiles

—Eva Hocks

When a parallel application terminates abnormally, core dumps are written to the working directory one per task. A user application running on 1 node with 8 tasks produces 8 core dumps.

By default, POE processes that terminate abnormally generate standard AIX corefiles. Traditional AIX corefiles tend to be large and can consume too much available disk space and an unacceptable amount of CPU time and network bandwidth.

SDSC has limited the core file size to 32MB. To make good use of the core file size it is recommneded using the MP_COREFILE_FORMAT environment variable (or its associated command-line flag -corefile_format) to set the format of corefiles to lightweight corefiles that conform to the Parallel Tool Consortium's Standardized Lightweight Corefile Format (LCF). A lightweight corefile contains thread stack traces (listings of function calls that led to the error) but does not have the often unnecessary low-level detail found in a traditional corefile.

export MP_COREFILE_FORMAT= < any name>
poe program -corefile_format < any name>

One lightweight corefile for each process will be saved in a separate subdirectory.

In case of a memory allocation problem disabling the creation of a new subdirectory may be necessary in situations where programs are abnormally terminating due to memory allocation failures, (for example, a malloc() call is the result of the original corefile). In these cases, setting -coredir or MP_COREDIR to none may prevent a situation where POE could hang as a result of a memory allocation problem while it is attempting to create a new subdirectory to hold the corefile.

export MP_COREDIR=none
poe program -coredir none

In this case corefiles will be located at /tmp/core . If multiple tasks run on a node, and each generates a corefile, only the last corefile written will be retrieved. The others are overwritten.

Eva Hocks is reachable via email at

Did you know ..?

Always use MP_INFOLEVEL environment variable or the -infolevel option when you invoke POE to help trouble shooting abnormal job termination problems, for example:
cp: cannot stat `/dsgpfs/username/dir1/program': A file or directory in the path name does not exist.
ERROR: 0031-250 task 160: Terminated
Setting either of these to 6 gives you the maximum number of diagnostic messages when you run your program. - Eva Hocks.