Skip to content

Running Jobs on Blue Gene

Running Batch Jobs

Batch jobs are run using the LoadLeveler scheduler. The path to the LoadLeveler binaries should be set by default in your login shell. To submit a job using LoadLeveler:

llsubmit run.script

run.script is a LoadLeveler script such as the following (exec refers to your executable in the following script):

#!/usr/bin/ksh
#@ environment = COPY_ALL;
#@ job_type = BlueGene
#@ account_no = <your user account>
#@ class = parallel
#@ bg_partition = <partition name; for example: top>
#@ output = file.$(jobid).out
#@ error = file.$(jobid).err
#@ notification = complete
#@ notify_user = <your email address>
#@ wall_clock_limit = 00:10:00
#@ queue
mpirun -mode VN -np <number of procs> -exe <your executable> -cwd <working directory>

Please note that the partition should be specified only using the bg_partition and NOT in the mpirun arguments. In addition, the job_type should be set to BlueGene. Use llq to check the status of jobs and llcancel to cancel jobs. Additional options are described in Blue Gene sections of the LoadLeveler user guide (IBM site).

Back to Top

MPIRUN Options

Some key mpirun parameters are:

Option Definition
–mode compute mode: CO or VN
–np number of compute processors
–mapfile logical mapping of processors
–cwd current working directory
–exe full path of executable
–args arguments of executable
–env environmental variables

Additional parameters are described in the mpirun user's manual (from IBM).

The parameters that specify the choice of resources are –mode, –np, and –mapfile. The behavior of these parameters is interdependent. Jobs run in partitions or blocks, which are typically in powers of two. A partition must be allocated (or booted) before a run and is restricted to a single user at a time. Please ensure that you use the defined partitions by specifying it in the Loadleveler script (using bg_partition). If you do not do so, an ad hoc partition is created for your run which may not be efficient and will interfere with other users who are using the defined partitions.

Two compute modes are available:

  1. In coprocessor (CO) mode, only one processor per compute node performs computation, while the other processor performs communication and I/O.
  2. In virtual node (VN) mode, both processors in a compute node perform computation as well as communication and I/O. Each processor is thus a virtual node.

For a given number of compute nodes, VN mode is usually faster than CO mode and so is preferred, since it makes better use of the machine. However, the memory per node on Blue Gene is relatively small, and in VN mode the memory per processor is half that of CO mode. Thus some problems may run only in CO mode rather than VN mode.

Back to Top

Partition Layout and Usage Guidelines

To make effective use of the Blue Gene, production runs should generally use one-fourth or more of one rack of the machine, i.e., 256 or more compute nodes. Thus the following seven predefined partitions are provided for production runs:

Partition name Number of nodes
SDSC all 3,072 nodes
R01R02 2,048 nodes combining rack 1 and rack 2
rack, R1, and R2 These 3 partitions each consist of all 1,024 nodes of rack 0, rack 1, and rack 2 respectively.

top & bot

512 nodes in the top, 512 nodes in the bottom of rack 0

R01–top & R01–bot

512 nodes in the top, 512 nodes in the bottom of rack 1

R02–top & R02–bot

512 nodes in the top, 512 nodes in the bottom of rack 2

top256–1 & top256–2

256 nodes in each half of the top

bot256–1 & bot256–2

256 nodes in each half of the bottom

Smaller 64 (bot64-1, …, bot64-8) and 128 (bot128-1 , … , bot128-4) node partitions are available for test runs. The partition layout on rack 0 and usage guidelines are detailed in the following diagram:

Blue Gene partitions

Diagram 1: Availability and Time Limits for Blue Gene Partitions

  Partition Availability Time limit

batch image 1

Batch All times 18 hrs.

batch image 2

Batch 7PM-7AM (PST)
Mon-Fri
All day on weekends
18 hrs.

batch image 3

Test 7AM-7PM (PST)
Mon-Fri
30 min.

Please note that the smaller partitions are contained within the larger partitions. Hence if there is a job running on the bot128-1 partition, the bot64-1, bot64-2, bot256-1, bot, and rack partitions will be unavailable in addition to the bot128-1 partition. Similarly, if there is a job running on the rack partition all the other partitions will be unavailable. Hence, if you have a small job please choose the smallest possible partition which fits your job to enable users to run on other partitions.

Back to Top

Accounting

The following algorithm determines the Service Units (SUs) charged from your allocation:

SUs = Wallclock_Hours x (Num of nodes in partition) x 2

Specifying a partition on Blue Gene precludes any other users from using the nodes in that partition. Therefore, you are charged for the entire partition you use, even if you do not use all the nodes for computations. For example, If you are using bot128-2, you are charged for 256 processors, whether they are used or not.

We advise you to use the smallest partition capable of running your job to minimize your charges.

How To Check Your Remaining Allocation

Users can check their remaining allocation using the reslist command (see the example below). Complete information on the usage and options of reslist may be found by typing reslist --help on bglogin.sdsc.edu.

bg-login1 % reslist
Querying database, this may take several seconds ...
Output shown is local machine usage. For full usage on roaming accounts,
please use tgusage.
                                    SU Hours   SU Hours
  Name       UID  ACID  ACC  PCTG  ALLOCATED       USED  USER
  jdoe     88888   300    U   100     500000    5000.00  DOE, JOHN
use300             300                500000  450000.00

To determine the allocation usage for a single user:

% reslist -u username

To determine the allocation usage for all users under a given account:

% reslist -a grp000

To determine the allocation usage for jobs run within a particular time period:

% reslist -j -u username -a grp000 --begindate=mm-dd-yyyy --enddate=mm-dd-yyyy

Back to Top

Monitoring Jobs

You can monitor jobs in the queue using the llq command with the -b option. This gives details of jobs currently in the queue and the partitions they are using. For example:

bg-login1 /users/consult> llq -b
Id                       Owner      Submitted   LL BG PT Partition        Size
________________________ __________ ___________ __ __ __ ________________ ______
bgsn.13985.0             u8240       9/6  09:47 C     FR bot              512

1 job step(s) in queue, 0 waiting, 0 pending, 0 running, 1 held, 0 preempted



Did You Get
What You
Wanted?
Yes No
Comments