A model for speedup of parallel programs

by Allen B. Downey

A version of this paper has been published as U.C. Berkeley Technical Report CSD-97-933 (gzipped postscript).

The abstract, introduction, and conclusions are below.


Abstract

We propose a new model for parallel speedup that is based on two parameters, the average parallelism of a program and its variance in parallelism. We present a way to use the model to estimate these program characteristics using only observed speedup curves (as opposed to the more detailed program knowledge otherwise required). We apply this method to speedup curves from real programs on a variety of architectures and show that the model fits the observed data well. We propose several applications for the model, including the selection of cluster sizes for parallel jobs.

Introduction

Speedup models describe the relationship between cluster size and execution time for a parallel program. These models are useful for: Our speedup model is a non-linear function of two parameters: A, which is the average parallelism of a job, and \sigma which approximates the coefficient of variation of parallelism. The family of curves described by this model spans the theoretical space of speedup curves. In [7], Eager, Zahorjan and Lazowska derive upper and lower bounds for the speedup of a program on various cluster sizes (subject to simplifying assumptions about the program's behavior). When \sigma = 0, our model matches the upper bound; as \sigma approaches infinity, our model approaches the lower bound asymptotically.

This model might be used differently for different applications. In [5] and [6] we use it to generate the stochastic workload we use to evaluate allocation strategies for malleable jobs. For that application, we choose the parameters A and \sigma from distributions and use them to generate speedup curves. In this paper, we work the other way around -- we use observed speedup curves to estimate the parameters of real programs. Our purpose here is to show that this model captures the behavior of numerous programs running on diverse parallel architectures. This technique is also useful for summarizing the speedup curve of a job and interpolating between speedup measurements.

Conclusions


downey@sdsc.edu