Skip to content

Loadleveler Topics

Overview

The following sections give  1) an example of a more complex submit script that can be used on DataStar, which deals with a mixed MPI/OPENMP or Pthreads code and 2) a more detailed explanation of all the Loadleveler tags. Please see the Running Jobs on DataStar page for basic information and script samples.

Mixed MPI/OpenMP or Pthreads Script

To use 8 cpus per node, you may either use MPI or a mixed-mode style of programming (for example run 1 MPI task per node, each controlling 8 OpenMP or POSIX threads per node). An example of a LoadLeveler script for this latter mode (using 2 nodes total) is shown here.

All environment variables must all be included within one statement in your Loadleveler script. When the statement is too long to fit on one line, users should:

  • use ;  to delimit between variables
  • use \  to break lines
  • use #  in the beginning of the new line
  • not use ; or \ at the end of the last line in the statement

Note the syntax of the example given below.

If a keyword or variable is defined more than once, only the last definition will take effect.

Back to Top

#!/usr/bin/ksh
#@environment = COPY_ALL;\
#AIXTHREAD_COND_DEBUG=OFF;\
#AIXTHREAD_MUTEX_DEBUG=OFF;\
#AIXTHREAD_RWLOCK_DEBUG=OFF;\
#AIXTHREAD_SCOPE=S;\
#MP_ADAPTER_USE=dedicated;\
#MP_CPU_USE=unique;\
#MP_CSS_INTERRUPT=no;\
#MP_EAGER_LIMIT=64K;\
#MP_EUIDEVELOP=min;\
#MP_LABELIO=yes;\
#MP_POLLING_INTERVAL=100000;\
#MP_PULSE=0;\
#MP_SHARED_MEMORY=yes;\
#MP_SINGLE_THREAD=no;\
#RT_GRQ=ON;\
#SPINLOOPTIME=0;\
#XLSMPOPTS="stack=67108864" ;\
#YIELDLOOPTIME=0
#@account_no = your_account
#@class = normal
#@node = 2
#@tasks_per_node = 1
#@wall_clock_limit = 00:10:00
#@node_usage = not_shared
#@network.MPI = sn_all, shared, US
#@job_type = parallel
#@job_name= job.$(jobid)
#@output = LL_out.$(jobid)
#@error = LL_err.$(jobid)
#@notification = always
#@notify_user = your_email
#@initialdir = /gpfs/your_username
#@queue

export MALLOCMULTIHEAP=heaps:8
export OMP_NUM_THREADS=8
poe hybrid_executable

Users can also run irregular task distribution jobs on DataStar. For example, a user who wanted to run 8 tasks on one node, and 7 tasks on a second node, would specify:

#@total_tasks = 15
#@node = 2

instead of:

#@tasks_per_node = 8
#@node = 2

Back to Top

LoadLeveler Tags

A summary of commonly used LoadLeveler tags is given in the subsequent table. Examples of how to use these tags can be found below in the sections p655 (8-way) nodes or p690 (32-way) nodes.

Variable Purpose
#@account_no If you have more than one account and would like to charge a particular run to a given account, set the above tag equal to the ID of the account you want charged.
#@arguments A space delimited argument list used to provide arguments to the executable.
#@class=[express | high | normal] Used to request batch queue classes. Express queues are limited to 4 nodes (see Batch Queues). Jobs in the express queue will have exclusive access to the nodes. The job submissions to the express queue must be done from dspoe.sdsc.edu.
#@environment=COPY_ALL;env2;env3 Specifies to copy in and use all of the set environment variables and set (in this example) two additional values (denoted here by env2 and env3).
#@network [ MPI,LAPI,PVM]=[ sn_all|css0| en0| tk1],[ shared | not_shared],[ US | IP]
This environment variable specifies network and adapter types to use to run the job. Use the following: #@network.MPI = sn_all, shared, US
#@node This allows a user to acquire the requested number of nodes.
#@node_usage=[shared | not_shared] This variable tells LoadLeveler whether a node may be shared among tasks or not. Set this to not_shared when running on 8-way nodes, and shared when running on 32-way nodes so that other users may utilize the unused processors on the node.
If exclusive node usage on p690 (32-way) nodes is desired, a user may request the full number of processors (with the corresponding SUs charged to their account; see the Accounting Section to calculate charges).
#@tasks_per_node This sets the number of tasks to run on a node. If your program is MPI only, this value should be set to the number of MPI tasks per node. If your program uses both MPI in combination with a multithreading API (such as OpenMP), this value should be set to 1 and the OMP_NUM_THREADS environment variable set to the required number of threads per node.
#@wall_clock_limit Maximum wall clock time to use for the job. Syntax is "hours:minutes:seconds" (e.g. 00:10:00 is equivalent to 10 minutes). The maximim allowed value is 18 hours. If you require more than 18 hours, contact SDSC Consulting so that special arrangements can be made.
MALLOCMULTIHEAP when using threaded programs (OpenMP or Pthreads) set this environment variable to heaps:n where n is the minimum of the number of threads used and the number of processors on the node.
MEMORY_AFFINITY set this environment variable to MCM when running on 32-way nodes
MP_CSS_INTERRUPT set this variable to no for most cases. Setting it to yes can results in performance advantage when using nonblocking MPI communications. See IBM's AIX guide for more information.
MP_EAGER_LIMIT This variable sets the threshold for message size separating two communication protocols for MPI_Send: eager protocol for small sizes and synchronous protocol for large sizes. See page 29 of IBM's "Scientific Applications on RS/6000 SP Environments" guide and "Distributed Memory Programming and MPI" for more information.
MP_LABELIO set to yes in order to turn on labeling of the output according to MPI task rank.
MP_SHARED_MEMORY=YES An @environment option to allow MPI calls within a node to use shared memory, in some cases improving communication times between tasks on the same node.
MP_SINGLE_THREAD set to no if using OpenMP or P-threads, compiler auto-parallelization, MPI-IO or MPI-2 one-sided communication; that is if thread-safe (_r) compilers are used. Otherwise set to yes
OMP_NUM_THREADS=4 An @environment option to set the number of OpenMP threads per node.
SPINS=0:YIELDS=0 An @environment option to cause threads to hold onto their respective processors for the duration of the execution of a program. This avoids overhead associated with gaining and releasing processors as the execution point in a program moves into and out of parallel regions.
SPINLOOPTIME=5000 An @environment option to allow threads to stay active in critical regions, reducing the overhead associated with starting up and killing threads between parallel and serial regions.
XLSMPOPTS this variable can be set for multithreaded programs. The most important one is setting it to stack, which defines the size (in bytes) of stack available for one thread. See IBM's AIX guide for more information.

Back to Top


Did You Get
What You
Wanted?
Yes No
Comments