Loadleveler Topics
Overview
The following sections give 1) an example of a more complex submit script that can be used on DataStar, which deals with a mixed MPI/OPENMP or Pthreads code and 2) a more detailed explanation of all the Loadleveler tags. Please see the Running Jobs on DataStar page for basic information and script samples.
Mixed MPI/OpenMP or Pthreads Script
To use 8 cpus per node, you may either use MPI or a mixed-mode style of programming (for example run 1 MPI task per node, each controlling 8 OpenMP or POSIX threads per node). An example of a LoadLeveler script for this latter mode (using 2 nodes total) is shown here.
All environment variables must all be included within one statement in your Loadleveler script. When the statement is too long to fit on one line, users should:
- use ; to delimit between variables
- use \ to break lines
- use # in the beginning of the new line
- not use ; or \ at the end of the last line in the statement
Note the syntax of the example given below.
If a keyword or variable is defined more than once, only the last definition will take effect.
#!/usr/bin/ksh
#@environment = COPY_ALL;\
#AIXTHREAD_COND_DEBUG=OFF;\
#AIXTHREAD_MUTEX_DEBUG=OFF;\
#AIXTHREAD_RWLOCK_DEBUG=OFF;\
#AIXTHREAD_SCOPE=S;\
#MP_ADAPTER_USE=dedicated;\
#MP_CPU_USE=unique;\
#MP_CSS_INTERRUPT=no;\
#MP_EAGER_LIMIT=64K;\
#MP_EUIDEVELOP=min;\
#MP_LABELIO=yes;\
#MP_POLLING_INTERVAL=100000;\
#MP_PULSE=0;\
#MP_SHARED_MEMORY=yes;\
#MP_SINGLE_THREAD=no;\
#RT_GRQ=ON;\
#SPINLOOPTIME=0;\
#XLSMPOPTS="stack=67108864" ;\
#YIELDLOOPTIME=0
#@account_no = your_account
#@class = normal
#@node = 2
#@tasks_per_node = 1
#@wall_clock_limit = 00:10:00
#@node_usage = not_shared
#@network.MPI = sn_all, shared, US
#@job_type = parallel
#@job_name= job.$(jobid)
#@output = LL_out.$(jobid)
#@error = LL_err.$(jobid)
#@notification = always
#@notify_user = your_email
#@initialdir = /gpfs/your_username
#@queue
export MALLOCMULTIHEAP=heaps:8
export OMP_NUM_THREADS=8
poe hybrid_executable
Users can also run irregular task distribution jobs on DataStar. For example, a user who wanted to run 8 tasks on one node, and 7 tasks on a second node, would specify:
#@total_tasks = 15
#@node = 2
instead of:
#@tasks_per_node = 8
#@node = 2
LoadLeveler Tags
A summary of commonly used LoadLeveler tags is given in the subsequent table. Examples of how to use these tags can be found below in the sections p655 (8-way) nodes or p690 (32-way) nodes.
| Variable | Purpose |
| #@account_no | If you have more than one account and would like to charge a particular run to a given account, set the above tag equal to the ID of the account you want charged. |
| #@arguments | A space delimited argument list used to provide arguments to the executable. |
| #@class=[express | high | normal] | Used to request batch queue classes. Express queues are limited to 4 nodes (see Batch Queues). Jobs in the express queue will have exclusive access to the nodes. The job submissions to the express queue must be done from dspoe.sdsc.edu. |
| #@environment=COPY_ALL;env2;env3 | Specifies to copy in and use all of the set environment variables and set (in this example) two additional values (denoted here by env2 and env3). |
| #@network | [ MPI,LAPI,PVM]=[ sn_all|css0| en0| tk1],[ shared | not_shared],[ US | IP] This environment variable specifies network and adapter types to use to run the job. Use the following: #@network.MPI = sn_all, shared, US |
| #@node | This allows a user to acquire the requested number of nodes. |
| #@node_usage=[shared | not_shared] | This variable tells LoadLeveler
whether a node may be shared among tasks
or not. Set this to not_shared when running
on 8-way nodes, and shared when running on
32-way nodes so that other users may utilize
the unused processors on the node. If exclusive node usage on p690 (32-way) nodes is desired, a user may request the full number of processors (with the corresponding SUs charged to their account; see the Accounting Section to calculate charges). |
| #@tasks_per_node | This sets the number of tasks to run on a node. If your program is MPI only, this value should be set to the number of MPI tasks per node. If your program uses both MPI in combination with a multithreading API (such as OpenMP), this value should be set to 1 and the OMP_NUM_THREADS environment variable set to the required number of threads per node. |
| #@wall_clock_limit | Maximum wall clock time to use for the job. Syntax is "hours:minutes:seconds" (e.g. 00:10:00 is equivalent to 10 minutes). The maximim allowed value is 18 hours. If you require more than 18 hours, contact SDSC Consulting so that special arrangements can be made. |
| MALLOCMULTIHEAP | when using threaded programs (OpenMP or Pthreads) set this environment variable to heaps:n where n is the minimum of the number of threads used and the number of processors on the node. |
| MEMORY_AFFINITY | set this environment variable to MCM when running on 32-way nodes |
| MP_CSS_INTERRUPT | set this variable to no for most cases. Setting it to yes can results in performance advantage when using nonblocking MPI communications. See IBM's AIX guide for more information. |
| MP_EAGER_LIMIT | This variable sets the threshold for message size separating two communication protocols for MPI_Send: eager protocol for small sizes and synchronous protocol for large sizes. See page 29 of IBM's "Scientific Applications on RS/6000 SP Environments" guide and "Distributed Memory Programming and MPI" for more information. |
| MP_LABELIO | set to yes in order to turn on labeling of the output according to MPI task rank. |
| MP_SHARED_MEMORY=YES | An @environment option to allow MPI calls within a node to use shared memory, in some cases improving communication times between tasks on the same node. |
| MP_SINGLE_THREAD | set to no if using OpenMP or P-threads, compiler auto-parallelization, MPI-IO or MPI-2 one-sided communication; that is if thread-safe (_r) compilers are used. Otherwise set to yes |
| OMP_NUM_THREADS=4 | An @environment option to set the number of OpenMP threads per node. |
| SPINS=0:YIELDS=0 | An @environment option to cause threads to hold onto their respective processors for the duration of the execution of a program. This avoids overhead associated with gaining and releasing processors as the execution point in a program moves into and out of parallel regions. |
| SPINLOOPTIME=5000 | An @environment option to allow threads to stay active in critical regions, reducing the overhead associated with starting up and killing threads between parallel and serial regions. |
| XLSMPOPTS | this variable can be set for multithreaded programs. The most important one is setting it to stack, which defines the size (in bytes) of stack available for one thread. See IBM's AIX guide for more information. |




