Parallel Workloads Archive: Downey '97

The Downey 1997 Model

This model is based on an analysis of the SDSC logs, and was re-affirmed by the CTC log as well.

The goal was to produce a model that allows the remaining runtime of a job to be estimated, given its current age. In addition, this model is for moldable and malleable jobs, where the number of processors allocated can be chosen by the scheduler when the job is started. As a result, the speedup function must be included in the model.

Degrees of Parallelism

This model explicitly disregards the dominance of powers of two sizes in the logs, under the assumption that this is a result of the interface to commonly used queueing systems such as NQS. It is therefore proposed to "smooth" the distribution. The suggested model is log-uniform: the probability that a job use less than n processors is proportional to log n. This is interpreted as the average parallelism in the job, and is used in the definition of the speedup function.

Cumulative Runtime

Rather than model runtime independently from the degree of parallelism, the cummulative runtime (that is the sum over all processors) is modeled. This is taken to represent the total work done by the job, independent of how many processors are used. The actual runtime is then obtained by applying the speedup function, given the number of processors chosen by the scheduler.

The proposed model is again log-uniform: the probability that a job use less than t node-seconds is proportional to log t.

Speedup Function

The speedup function is modeled using 3 parameters: the average parallelism of the job, the variance in parallelism, and the number of processors. Given these parameters, a hypothetical parallelism profile is constructed, and the speedup calculated. The equations are given in the paper.

Distributions from which the parameters can be selected were later proposed by Cirne and Berman.

Arrival Process

The arrival process used is Poisson, with a twist: Each day of simulated time is divided into two parts of 12 hours each. During the first part (the day) jobs arrive according to the Poisson process, but may be queued if they cannot run immediately. During the second part (the night) no more jobs arrive, and the system serves jobs that were queued during the day.

Parallel Workloads Archive - Models