Help Desk: Programming Tips
Specifying the format of corefiles
When a parallel application terminates abnormally, core dumps are written to the working directory one per task. A user application running on 1 node with 8 tasks produces 8 core dumps.
By default, POE processes that terminate abnormally generate standard AIX corefiles. Traditional AIX corefiles tend to be large and can consume too much available disk space and an unacceptable amount of CPU time and network bandwidth.
SDSC has limited the core file size to 32MB. To make good use of the core file size it is recommneded using the MP_COREFILE_FORMAT environment variable (or its associated command-line flag -corefile_format) to set the format of corefiles to lightweight corefiles that conform to the Parallel Tool Consortium's Standardized Lightweight Corefile Format (LCF). A lightweight corefile contains thread stack traces (listings of function calls that led to the error) but does not have the often unnecessary low-level detail found in a traditional corefile.
export MP_COREFILE_FORMAT= < any name>
One lightweight corefile for each process will be saved in a separate subdirectory.
In case of a memory allocation problem disabling the creation of a new subdirectory may be necessary in situations where programs are abnormally terminating due to memory allocation failures, (for example, a malloc() call is the result of the original corefile). In these cases, setting -coredir or MP_COREDIR to none may prevent a situation where POE could hang as a result of a memory allocation problem while it is attempting to create a new subdirectory to hold the corefile.
In this case corefiles will be located at /tmp/core . If multiple tasks run on a node, and each generates a corefile, only the last corefile written will be retrieved. The others are overwritten.
Eva Hocks is reachable via email at email@example.com