A parallel workload model and its implications for
processor allocation
by
Allen B. Downey
This paper will appear at the 6th IEEE International Symposium on High
Performance Distributed Computing
(HPDC 97). The
postscript of the final version is here.
I recently presented this paper at AT&T Research. Here are the
slides I used.
The submitted version of this paper was been published as
U.C. Berkeley Technical Report
CSD-96-922.
The abstract, introduction, and conclusions are below.
Abstract
We develop a workload model based on the observed behavior of
parallel computers at the San Diego Supercomputer Center and the
Cornell Theory Center. This model gives us insight into the performance
of strategies for scheduling malleable jobs on space-sharing parallel
computers. We find that Adaptive Static Partitioning (ASP), which has
been reported to work well for other workloads, is inferior to some
FIFO strategies that adapt better to system load. The best of the
strategies we consider is one that explicitly restricts cluster
sizes when load is high (a variation of Sevcik's A+ strategy).
Introduction
Space-sharing, distributed-memory multiprocessors, like the Intel
Paragon, the Cray T3E and the IBM SP2, are often used in
supercomputing environments to support scientific
applications. These environments typically have the following
characteristics:
- For batch processing, jobs do not share processors, but rather
allocate a cluster of processors exclusively and run to
completion . Many of these machines also have an interactive
partition that uses timesharing, but this paper only addresses
scheduling strategies for batch partitions (pure space-sharing). In
the environments we have observed, the vast majority of computation is
done in batch mode.
- Many jobs on these systems are malleable , meaning that
they are capable of running on a range of cluster sizes. On the other
hand, the programming models used for scientific applications usually
do not generate jobs that can change cluster sizes dynamically .
Thus, once a job begins execution, its cluster size is fixed.
In current systems, users choose cluster sizes for their jobs by hand,
and the system does not have the option of allocating more or fewer
than the requested number of processors. The factors that should
influence the choice of a cluster size include the characteristics of
the application (resource requirements), the load on the system
(resource availability) and the performance requirements of the user
(deadlines, desired throughput, etc.). But users generally do
not have the information, tools, or inclination to weigh all of these
factors accurately. Allowing the system to make this decision has the
potential to improve system utilization and reduce users' wait times.
Toward this end, prior studies have proposed allocation
strategies that choose automatically how many processors to allocate
to each job. Most of these studies evaluate the proposed strategies
with analysis and simulation based on hypothetical workloads. Each of
the strategies we consider in this paper has been evaluated, under
different workload assumptions, in at least three prior studies.
One of these compares the performance of these strategies over
a wide range of workload parameters, and argue that the discrepancies
among various studies are due to differences in the hypothesized
workloads.
The goal of this paper is to focus this debate by constructing a new
workload model based on observations of space-sharing systems running
scientific workloads. We intend this model to cover the range of
workload parameters most relevant to current workloads and hence most
applicable to real systems. By narrowing the range of workload
parameters, we are able to examine the proposed strategies in more
detail and gain insight into the reasons for their success or failure.
Conclusions
- One of the strategies recommended in other studies (ASP) did not
perform well for our workload. We show that this policy is too
sensitive to short-term variations in system load. It performs worse
than simple FIFO strategies that use application characteristics to
bound cluster sizes.
- The application characteristics we examined, average and variance
of parallelism, are useful for choosing the upper bound on cluster size
(and thereby imposing a lower bound on efficiency). We found, though,
that strategies that considered variance of parallelism were no better than
those that considered only average parallelism.
- The processor working set, or ``knee'' of the speedup curve, is
not an optimal processor allocation.
- It is not necessary to set maximum cluster sizes
precisely. There is a tradeoff between large clusters/long queues and
small clusters/short queues, but within a wide range, system performance
(measured by average turnaround time) does not vary greatly.
- Of several strategies with equivalent turnaround times,
Sevcik's strategy, which uses a priori knowledge about system
load to restrict cluster sizes, yielded the lowest slowdowns. Depending
on what metric matters most to users, this strategy might be the
best choice.
- One of Sevcik's underlying assumptions---that cluster sizes
should decrease linearly as load increases---has been validated. The
other underlying assumption---that jobs with a more variable
parallelism profile should allocate fewer processors---has been
contradicted.
- Lifetimes for batch jobs on supercomputers are distributed
uniformly in log space. Our uniform-log model is useful for
summarizing these distributions and generating simulated workloads.
The observed distributions have coefficients of variation in the range
2--4.
downey@sdsc.edu