SDSC Thread Graphic Issue 8, September 2006





RSS RSS Feed (What is this?)

User Services Director:
Anke Kamrath

Editor:
Subhashini Sivagnanam

Graphics Designer:
Diana Diehl

Application Designer:
Fariba Fana


Help Desk: User Questions

Frequently asked questions from our users

—Mahidhar Tatineni

Question:
Dear SDSC Consulting,

I tried to run my job interactively on dspoe and got an error saying not enough resources for this step as backfill or top-dog. What does this mean?

Answer:
A job is designated as "top-dog" when it is at the top of the queue (in this case, the interactive queue). However, this does not ensure that the job will run immediately. Sometimes, the interactive queue may be busy and there are not enough slots for a code to run. For example, consider the following instance of the queue:

ds100 % llstatus | more


Help Quetion 1 Table

This shows that we have no slots free on ds100 (Busy) and some slots free on ds101 and ds102. If we now try running interactively using 3 nodes and 4 tasks per node, we will get the "not enough resources" error as follows:

ds100 % poe a.out -nodes 3 -tasks_per_node 4
Found valid account 'USE300' for queue 'intera ctive'
Found a default account in ACL 'sdsc_datastar:username:account:sstrnp'
on DataStar NPACI nodes: all queues

Job passed jobfilter
llsubmit: Processed command file through Submit Filter:
"/users00/loadl/loadl/jobfilter-interactive.pl".
ERROR: 0031-365 LoadLeveler unable to run job, reason:
Not enough resources to start now:
Not enough resources for this step as backfill.

If you are at the top of the queue you will get the following error:

ERROR: 0031-365 LoadLeveler unable to run job, reason:
Not enough resources to start now:
Not enough resources for this step as "top-dog":

In both cases you see a "NR"(Not Running) in your job status:

ds100 % llq -u mahidhar
Id                       Owner      Submitted   ST PRI Class 
ds100.194081.0           mahidhar    9/8  14:59 NR 50  interactive





Question:
Dear SDSC Consulting,

I verified that a particular Blue Gene partition was free and submitted a job. Why does the job not run, even though the partition is free?

Answer:
The smaller partitions on the Blue Gene are subsets of larger partitions. For example, "top256-1" and "top256-2" are part of the "top" partition (and also the "rack" partition). Hence if a job is running on the "top" partition, the "top256-1" and "top256-2" partitions cannot be used even though they do not show any jobs running. The partition layout is detailed in the Blue Gene user guide

Please contact Mahidhar Tatineni with any questions or feedback.

Did you know ..?

that Home directories should not be used for Batch jobs
Home directories are good places for login scripts and source files.Because it is NFS-mounted (slower than GPFS), this area is not suitable for storing large amounts of output from batch jobs. Increased home directory quotas for non-I/O intensive work and CVSROOT area for storing CVS directory trees are provided upon request. Users can request the CVSROOT area by emailing consult@sdsc.edu.- Larry Diegel