This paper will appear at the 3rd Workshop on Job Scheduling Strategies for Parallel Processing which will take place in conjunction with IPPS '97.
A version of this paper has been published as U.C. Berkeley Technical Report CSD-97-929 (gzipped postscript).
This is a more recent version of the paper, the final copy I submitted to the workshop (gzipped postscript). (I think it's much better).
The abstract, introduction, and conclusions are below.
A partial solution to this problem is a programming model that supports malleable jobs, i.e., jobs that can be configured to run on clusters of various sizes. In existing systems, these jobs cannot change their cluster sizes dynamically; that is, once they begin execution, they cannot add or yield processors.
Malleable jobs improve system utilization by using fewer processors when system load is high, thereby running more efficiently and increasing the number of jobs in the system simultaneously. But malleable jobs are not a sufficient solution to the tragedy of the commons, because users have no direct incentive to restrict the cluster sizes of their jobs. Furthermore, even altruistic users might not have the information they need to make the best decisions.
One solution to this problem is a system-centric scheduler that chooses cluster sizes automatically, trying to optimize (usually heuristically) a system-wide performance metric like utilization or average turnaround time. The problem is that such systems often force users to accept decisions that are good for the system as a whole, but contrary to their immediate interests. For example, if there is a job in queue and one idle processor, a utilization-maximizing system might require the job to run, whereas the job might obtain a shorter turnaround time by waiting for more processors.
If such strategies are implemented, there are two possible outcomes: at best, users will be unsatisfied with the system; at worst, they will take steps to subvert it. Since these systems often rely on application information provided by users, it is not hard for a disgruntled user to manipulate the system for his own benefit. In anecdotal reports from supercomputer centers, this sort of user behavior is common, and not restricted to malevolent users; rather, it is an understanding in these environments that users will take advantage of loopholes in system policies.
Given that this is true, an important property for a scheduling strategy is robustness in the presence of self-interested users. As we will show in Section 5, many commonly-proposed allocation strategies do not have this property; their overall performance degrades severely if users try, naively, to improve the performance of individual jobs.
Thus our goal is to find a scheduling strategy that does not make decisions that are contrary to the interests of the users. We propose an application-centric scheduler that uses application information (run times on various cluster sizes) and system state (predicted queue times) to chooses the cluster size with the shortest expected turnaround time for each job.
This strategy optimizes the performance of individual jobs, so users have no incentive to subvert its decisions. The question, though, is what effect these local optimizations will have on the performance of the system as a whole. Using simulations based on a stochastic workload model, we show that the performance of one such strategy exceeds that of the best system-centric scheduler, improving both system utilization and average turnaround time.