SDSC Thread Graphic Issue 4, February 2006





RSS RSS Feed (What is this?)

User Services Director:
Anke Kamrath

Editor:
Subhashini Sivagnanam

Graphics Designer:
Diana Diehl

Application Designer:
Fariba Fana


Help Desk: Programming Tips

Specifying the format of corefiles

—Eva Hocks

When a parallel application terminates abnormally, core dumps are written to the working directory one per task. A user application running on 1 node with 8 tasks produces 8 core dumps.

By default, POE processes that terminate abnormally generate standard AIX corefiles. Traditional AIX corefiles tend to be large and can consume too much available disk space and an unacceptable amount of CPU time and network bandwidth.

SDSC has limited the core file size to 32MB. To make good use of the core file size it is recommneded using the MP_COREFILE_FORMAT environment variable (or its associated command-line flag -corefile_format) to set the format of corefiles to lightweight corefiles that conform to the Parallel Tool Consortium's Standardized Lightweight Corefile Format (LCF). A lightweight corefile contains thread stack traces (listings of function calls that led to the error) but does not have the often unnecessary low-level detail found in a traditional corefile.

export MP_COREFILE_FORMAT= < any name>
or
poe program -corefile_format < any name>

One lightweight corefile for each process will be saved in a separate subdirectory.

In case of a memory allocation problem disabling the creation of a new subdirectory may be necessary in situations where programs are abnormally terminating due to memory allocation failures, (for example, a malloc() call is the result of the original corefile). In these cases, setting -coredir or MP_COREDIR to none may prevent a situation where POE could hang as a result of a memory allocation problem while it is attempting to create a new subdirectory to hold the corefile.

export MP_COREDIR=none
or
poe program -coredir none

In this case corefiles will be located at /tmp/core . If multiple tasks run on a node, and each generates a corefile, only the last corefile written will be retrieved. The others are overwritten.

Eva Hocks is reachable via email at hocks@sdc.edu

Did you know ..?

Login to the DSPOE node to run jobs in the DataStar "express" queue. There are 4 nodes set up to run up to 64 task 24/7 -Eva Hocks