Parallel Workloads Archive: Feitelson '96

The Feitelson 1996 Model

This model is based on observations from 6 accounting logs from production parallel machines:

The 128-node iPSC/860 at NASA Ames
The 128-node IBM SP1 at Argonne
The 400-node Paragon at SDSC
The 126-node Butterfly at LLNL
The 512-node IBM SP2 at CTC
The 96-node Paragon at ETH, Zurich

In the first 3 cases the logs were analyzed directly, and in the others published observations were used.

The goal was to produce a workload model that reflects real workloads, and can be used to drive simulations of parallel schedulers. Thus the emphasis was on capturing various features of the workloads, at the expense of mathematical niceties. In particular, the model captures the discrete nature of the distribution of degrees of parallelism, the correlation between parallelism and runtime, and the repeated executions of the same jobs.

Degrees of Parallelism

Two conspicuous effects were observed in the logs and modeled. One is the dominance of small jobs over large ones: all logs had a significant number of serial jobs, and many others using only less than 10 nodes, even on machines with hundreds of nodes. The second is the discrete nature of the distribution, with some sizes preferred over others. The most popular sizes by far are powers of two, even on machines where there is no architectural reason for using such sizes.

The model for sizes is based on a harmonic distribution of order 1.5 (the probability for size n is proportional to 1/n^1.5), which is then hand-tailored to further emphasize small sizes and powers of two.

Correlation of Runtime with Parallelism

Two observations guided the modeling of runtimes. One is the well-known fact that the variability of runtimes is high, such that the coefficient of variation of the distribution is larger than one (the coefficient of variation is the quotient of the standard deviation divided by the mean; for an exponential distribution it is 1). The other is that there is a weak correlation between size (that is, degree of parallelism) and length (runtime). While there are small jobs that run for a long time and large jobs that run for a short time, the tendancy is for large jobs to run longer.

The model for runtimes is a hyperexponential distribution, so that the coefficient of variation is larger than one. In addition, a linear relation is made between the job size and the probability of using the exponential with the higher mean. Thus there is actually a different distribution for each size, and the mean runtime for larger sizes is higher.

Repeated Executions

It is often assumed that jobs are statistically equivalent and independent of each other. Analysis of the logs shows that this is in fact not always so. In particular, there are many cases where the same user runs the same job repeatadly for many times, creating a sequence of jobs with similar properties.

Defining the runlength of a job to be the number of repeated executions, it was found that runlegths come from a Zipf distribution. The model was therefore that the probability of a runlength of n is proportional to n^-2.5.

An important consequence of this feature is that this model captures a degree of feedback from the scheduler to the workload. Specifically, when a job is repeated n times, each repetition is submitted when the previous one terminates. Thus if the scheduler delays one repetition, it causes a corresponding delay in the submital of the next one.

Arrival Process

The arrival process used in this model is Poisson. However, the fact that each job may be repeated multiple times changes this, because the repetitions are generated immediately after the previous run completes. They do not wait a randomly chosen interarrival time.

Two Versions

The availabe code actually includes two versions: the original version from 1996, and an improved one from 1997. The differences are in the details of the distributions. In particular, the original model the runtimes were in arbitrary units with a mean of around 2, whereas in the improved model they were meant to reflect seconds. It is recomended that the 1997 version be used.

Parallel Workloads Archive - Models