Help Desk: User QuestionsFrequently asked questions from our users
—Mahidhar Tatineni
Question:
Dear SDSC Consulting,
I tried to run my job interactively on dspoe and got an error saying not enough resources for this step as backfill or top-dog. What does this mean?
Answer:
A job is designated as "top-dog" when it is at the top of the queue (in
this case, the interactive queue). However, this does not
ensure that the job will run immediately. Sometimes, the interactive
queue may be busy and there are not enough slots for a code
to run. For example, consider the following instance of the queue:
ds100 % llstatus | more
This shows that we have no slots free on ds100 (Busy) and some slots
free on ds101 and ds102. If we now try running interactively
using 3 nodes and 4 tasks per node, we will get the "not enough
resources" error as follows:
ds100 % poe a.out -nodes 3 -tasks_per_node 4
Found valid account 'USE300' for queue 'intera
ctive'
Found a default account in ACL 'sdsc_datastar:username:account:sstrnp'
on DataStar NPACI nodes: all queues
Job passed jobfilter
llsubmit: Processed command file through Submit Filter:
"/users00/loadl/loadl/jobfilter-interactive.pl".
ERROR: 0031-365 LoadLeveler unable to run job, reason:
Not enough resources to start now:
Not enough resources for this step as backfill.
If you are at the top of the queue you will get the following error:
ERROR: 0031-365 LoadLeveler unable to run job, reason:
Not enough resources to start now:
Not enough resources for this step as "top-dog":
In both cases you see a "NR"(Not Running) in your job status:
ds100 % llq -u mahidhar
Id Owner Submitted ST PRI Class
ds100.194081.0 mahidhar 9/8 14:59 NR 50 interactive
Question:
Dear SDSC Consulting,
I verified that a particular Blue Gene partition was free
and submitted a job. Why does the job not run, even though the partition
is free?
Answer:
The smaller partitions on the Blue Gene are subsets of larger
partitions. For example, "top256-1" and "top256-2" are part of the
"top" partition (and also the "rack" partition). Hence if a job is
running on the "top" partition, the "top256-1" and "top256-2"
partitions cannot be used even though they do not show any jobs running.
The partition layout is detailed in the Blue Gene user guide
Please contact Mahidhar Tatineni with any questions or feedback.
|