Predicting Batch Queue Times on DataStar and the IA-64 Cluster using BQP
Batch Queue Predictor (BQP) Overview
User Services has deployed BQP on DataStar and on the IA-64 Linux Cluster to help users manage their time on SDSC compute resources. BQP, a utility developed at University of California, calculates predictions for individual jobs submitted to the queue, assuming that the job is submitted at the same time that the BQP tool is executed or shortly thereafter.
BQP requests various forms of bound predictions for individual jobs. In 'normal' mode (not using -B or -d options below), the more information the user supplies on the command line, the less information the tool outputs. For instance, to see ALL predictions, the user does not specify a machine, queue, node count, or requested time value. To see a single prediction for a job with specified node/requested time dimensions, submitted to a specific machine/queue, then they would specify these four parameters. Any number of these parameters may be specified or left out.
Predictions are parameterized by a quantile and a confidence. The quantile describes the percent of all jobs submitted to a particular batch queue with a specific node range.
Availability
BQP is available from the command line of the login node for IA-64 Cluster, tg-login.sdsc.teragrid.org and DataStar, dslogin.sdsc.edu.
User Commands for BQP 1.0
| Command or option | Description |
| bqp | request bound predictions for individual batch queue jobs |
| -B | puts the tool in 'best' mode. Show shortest predictions for the set of jobs/sites specified |
| -c | specify the 'confidence region' around the bound prediction. Defaults to '0.95' but can also be '0.99' or '0.90' |
| -d | puts the tool in 'deadline' mode. Instead of bound prediction output, the user recieves as output the probability of the specified job executing in NUM seconds or less. |
| -h | print out help message with usage information |
| -l | puts the tool in 'list' mode. Show list of supported machine/queue pairs |
| -m | show only bound predictions for jobs submitted to the NAME machines (see -l for a list of supported machines) |
| -n | instruct tool to only output predictions for jobs requesting NUM nodes |
| -p | specify the bound prediction of interest. Defaults to 0.95, other options are 0.75 and 0.50 |
| -q | show only bound predictions for jobs submitted to the NAME queue (see -l for a list of supported queues) |
| -r | show only bound predictions for jobs requesting NUM seconds of execution time |
Example Command Line Syntax
| Command line | Description |
| bqp | show 0.95 bound predictions for all machine, queue, node size, and requested time ranges |
| bqp -m <machine> | show 0.95 bound predictions for all queue, node size, and requested time ranges |
| bqp -m <machine> -p 0.5 | show 0.5 (median) predictions for all queue, node size, and requested time ranges |
| bqp -m <machine> -q <queue> -n <node num> -r <req time> | show 0.95 bound prediction for one job submitted to the specified machine/ queue requesting <node num> nodes and <req time> seconds of runtime |
| bqp -m <machine> -q <queue> -n <node num> -r <req time> -d <deadline> | show the probability of a job submitted to <mach>/<queue>, requesting <node num> nodes and <req time> seconds of runtime, running in <deadline> seconds or less |
Sample Output
Example: A user has a job requesting 4 nodes for 3600 seconds (one hour) and needs the job to begin execution no later than 6 hours from the time they ran the tool.
tg-login1> bqp -q dque -n 4 pr 3600 -d 21600
p=0.405698
tg-login1>
We can see that when the bqp was executed, the job had a 40% chance of making it through the queue in 6 hours or less.
Written by Daniel Nurmi, based on work by John Brevik, Daniel Nurmi and Rich Wolski.
If you have any questions about this, please contact SDSC Consulting.



