Scheduling Parallel Jobs on Partitionable Parallel Computers

by Allen B. Downey.

This paper was submitted as part of my qualifying exam, which I passed on May 9, 1995. Coincidentally, it describes the topic of my Ph.D. dissertation, which I completed in June 1997. The dissertation itself is not available online, but the papers listed above cover pretty much the whole thing.

Click here for the postscript version. Sorry about the page layout; at the time I kinda liked Framemaker. Since then, I have become a LateX junky.


Abstract

Like many shared resources, parallel computers are susceptible to a tragedy of the commons -- individuals acting in their best interests overuse and degrade the resource. In the case of parallel computers, users trying to minimize runtimes for their own jobs might allocate more processors than they can use efficiently. This overindulgence lowers the effective use of the system and, due to long queue times, degrades the performance of all users' jobs.

A partial solution to this problem is a programming model that supports malleable programs, i.e., programs that can be configured to run on clusters (partitions of processors) of various sizes. Although these programs cannot change size after beginning execution, they can improve system use by choosing cluster sizes appropriate to the current state of the system. For example, when the system load is low, jobs might run faster by allocating more processors. If the system load is high, jobs should allocate fewer processors in order to run more efficiently and, indirectly, reduce queue times.

But malleable jobs are not a sufficient solution to the tragedy of the commons, because users have no direct incentive to reduce the number of processors they request when system load is high. Furthermore, even altruistic users might not have the information they need about queue times and the runtimes of their applications to make the best decisions.

One approach to this problem is to let the system choose cluster sizes for incoming jobs according to current load information and heuristics about the runtimes of jobs. The problems with this approach are: (1) it requires all jobs to be malleable, when in reality many workloads include jobs with limited malleability (they might require a fixed number of processors, or a power-of-two cluster size, for example); (2) it forces users to accept decisions that are contrary to their interests -- in practice users often subvert such systems; and (3) it is not easy to design an algorithm that will simultaneously satisfy the goals of high system utilization, short turnaround times, and fairness, especially when these goals are confounded by real-world considerations like different priorities among users.

Our project is intended to explore an alternative approach in which users are encouraged to choose a cluster size according to "enlightened" self-interest, i.e., the cluster size that minimizes the expected turnaround time of each job, considering both run time and queue time.

This project will proceed in four phases:

(1) Workload description: We will examine workloads on parallel computers at the San Diego Supercomputer Center (SDSC) and characterize such properties as runtimes, efficiency and communication patterns.

(2) Scheduler evaluation: We will evaluate the impact of various scheduling parameters on the performance of parallel applications. Specifically, we will examine the scheduling requirements of a distributed ocean/atmosphere model running at SDSC and NCAR.

(3) Analysis and simulation: Using a model of a parallel computer and decision-making users, we will evaluate the effect of the proposed job-by-job scheduling strategy on global system performance.

(4) Implementation: We will implement tools at SDSC that will allow users to estimate the runtime and queue times of their jobs and choose cluster sizes appropriately.


downey@sdsc.edu