## A model for speedup of parallel programs

by
Allen B. Downey
A version of this paper has been published as U.C. Berkeley Technical Report
CSD-97-933 (gzipped postscript).

The abstract, introduction, and conclusions are below.

### Abstract

We propose a new model for parallel speedup that is based on two
parameters, the average parallelism of a program and its variance in
parallelism. We present a way to use the model to estimate these
program characteristics using only observed speedup curves (as
opposed to the more detailed program knowledge otherwise
required). We apply this method to speedup curves from real programs
on a variety of architectures and show that the model fits the
observed data well. We propose several applications for the
model, including the selection of cluster sizes for parallel jobs.

### Introduction

Speedup models describe the relationship between cluster size and
execution time for a parallel program. These models are useful for:
- Modeling parallel workloads: Many simulation studies use a
speedup model to generate a stochastic workload. Since our model
captures the behavior of many real programs, it lends itself to a
realistic workload model.
- Summarizing program behavior: If a program has run before
(maybe on a range of cluster sizes), we can record past execution
times and use a speedup model to summarize the historical data and
estimate future execution times. These estimates are useful for
scheduling and allocation.
- Inference of program characteristics: The parameters of
our model correspond to measureable program characteristics. Thus
we hypothesize that we can infer these characteristics by fitting our
model to an observed speedup curve and finding the parameters that
yield the best fit.

Our speedup model is a non-linear function of two parameters: A,
which is the average parallelism of a job, and \sigma which
approximates the coefficient of variation of
parallelism. The family of curves described by this model spans the
theoretical space of speedup curves. In [7], Eager, Zahorjan
and Lazowska derive upper and lower bounds for the speedup of a
program on various cluster sizes (subject to simplifying assumptions
about the program's behavior). When \sigma = 0, our model matches
the upper bound; as \sigma approaches infinity, our model approaches
the lower bound asymptotically.
This model might be used differently for different applications. In
[5] and
[6]
we use it to generate the
stochastic workload we use to evaluate allocation strategies for
malleable jobs. For
that application, we choose the parameters A and \sigma from
distributions and use them to generate speedup curves. In this paper,
we work the other way around -- we use observed speedup curves to
estimate the parameters of real programs. Our purpose here is to show
that this model captures the behavior of numerous programs running on
diverse parallel architectures. This technique is also useful for
summarizing the speedup curve of a job and interpolating between
speedup measurements.

### Conclusions

- Our proposed speedup model captures the behavior of numerous
scientific applications running on a variety of parallel computers,
both shared- and distributed-memory. Thus, we feel that this
model is a realistic choice for modeling parallel workloads.

- The parameters of our model correspond to measureable program
charactericstics. Thus, our observations of these benchmarks give
us some insight into the values and distibutions of these parameters
in a real workload.

downey@sdsc.edu