Parallel Workloads Archive: CTC SP2

The Cornell Theory Center (CTC) IBM SP2 log

System:	512-node IBM SP2
Duration:	July 1996 thru May 1997
Jobs:	79,302

This log contains 11 months worth of accounting records for the 512-node IBM SP2 located at the Cornell Theory Center (CTC). Apparently, only 338 nodes are used for the batch jobs in the log. Scheduling on this machine was performed by EASY and LoadLeveler. For more information about CTC, see URL http://www.tc.cornell.edu/.

The workload log from the CTC SP2 was graciously provided by Dan Dwyer (dwyer@tc.cornell.edu) from the Cornell Theory Center, a high-performance computing center at Cornell University, Ithaca, New York, USA. The information below was provided by Steve Hotovy. If you use this log in your work, please use a similar acknowledgment. Also, please send a notice of your work to cal@tc.cornell.edu.

In addition to the production log from July 1996 to May 1997, an early log covering 75,944 jobs during June 1995 to April 1996 is also available. This is the log used by Hotovy in his analysis of the evolution of the workload soon after the machine was installed ([hotovy96]). During this period only LoadLeveler was used.

Downloads:

CTC-SP2-1996-0	3.6 MB gz	original log
CTC-SP2-1996-3.swf	1.5 MB gz	converted log
CTC-SP2-1996-3.1-cln.swf	1.5 MB gz	cleaned log -- RECOMMENDED, see usage notes
CTC-SP2-1996-1.swf	1.5 MB gz	OLD VERSION of converted log (replaced 1 Aug 2006)
CTC-SP2-1996-1.1-cln.swf	1.5 MB gz	OLD VERSION of cleaned log (replaced 1 Aug 2006)
CTC-SP2-1996-2.swf	1.5 MB gz	OLD VERSION of converted log (replaced 30 Nov 2011)
CTC-SP2-1996-2.1-cln.swf	1.5 MB gz	OLD VERSION of cleaned log (replaced 30 Nov 2011)
CTC-SP2-1995-2.swf	1.4 MB gz	the early log

(May need to click with right mouse button to save to disk)

System Environment

Of the 512 nodes in the system, 430 are dedicated to running batch jobs (but see usage notes below). The remainder of the nodes are used for interactive jobs, I/O nodes, special projecs, and system testing. The log pertains to the batch partition.

The CTC SP2 is heterogeneous in the sense that not all 512 nodes are identical. The actual configurations of the 430 nodes in the batch partition are as follows:

Node type Memory

128MB 256MB 512MB 1024MB 2048MB

Thin 352 30 0 0 0

Wide 0 22 21 4 1

Update (3 June 2013):

The link given above for data about the system is no longer available, but a snapshot from 1997 is available on the Internet Archive. In particular, this includes a page specifying the details of the SP system's comfiguration. This indicates that the system was divided into several distinct pools that were scheduled in different ways. Specifically, pool 4 was scheduled by EASY-LL, and included 21 racks of 16 thin nodes each, plus 27 nodes from additional racks. Given that 16x21=336, this may be the actual partition and size that gave rise to this log. This also matches the usage data as shown below. If including the nodes from the other racks the size is 363, but then the typical usage level is only 0.93.

(Thanks to Dan Tsafrir for digging this up.)

Papers Using these Logs:

These logs were used in the following papers:
[hotovy96] [downey97a] [downey97c] [downey98b] [smith98] [schwiegelshohn98b] [downey99] [squillante99] [krallmann99] [talby99b] [cirne00] [mualem01] [feitelson01] [cirne01b] [streit02] [srinivasan02] [srinivasan02b] [lawson02] [ernemann02] [sabin03] [shmueli03] [ernemann03] [islam03] [feitelson03a] [song04] [schroeder04] [streit04] [aridor04] [england04] [feitelson04b] [feitelson05b] [feitelson05c] [feitelson05d] [tsafrir05b] [dutot05] [heine05] [sabin05] [shmueli05] [zilber05] [feitelson06a] [tsafrir06a] [tsafrir06b] [shmueli06] [franke06] [sabin06] [ranjan06] [tsafrir07a] [feitelson07a] [tsafrir07b] [talby07] [shmueli07] [ranjan08] [iosup08] [feitelson08] [goh08] [shmueli09] [feitelson09] [folling09] [guim09] [minh09] [thebe09] [aida09] [tsafrir10] [yuan11] [lindsay12] [liux12] [utrera12] [niu12] [krakov12] [kumar12] [klusacek12] [etinski12] [ababneh12] [zakay13] [liang13] [chen13] [krakov13] [rajbhandary13] [cao14] [kumar14] [zakay14] [feitelson14] [liu15] [carastans17] [wang18] [soysal19]

Log Format

The original log is available as CTC-SP2-1996-0.

This file contains one line per completed job with the following white-space separated fields:

Job name
LoadLeveler class (defined classes are DSI, piofs, astro, and informix; the early log has over a hundred, which match queue names)
Number of processors allocated
Submission time (seconds since the Unix epoch)
Start time (seconds since the Unix epoch)
Completion time (seconds since the Unix epoch)
Amount of memory requested, in MB per node (see system configuration information above).
Type of nodes (T=thin, W=wide)
Mass storage needed (Y/N)
Type of adaptor (user, ethr, none). User implies user-level communication ovet the SP2's high performance switch.
Submission date and time
Start date and time
Completion date and time
User node time (wallclock time summed over all nodes)
Job type (Serial/Parallel/Pvm3)
Job completion status (Co=completed, Re=removed)
User ID
Cumulative user CPU time
Cumulative system CPU time
Name of LoadLeveler script
Maximum run time (in minutes). This is the estimate given by users in advance, and used by EASY for backfilling. This field does not exist in the early log, which was captured before EASY was introduced.

Conversion Notes

The converted log is available as CTC-SP2-1996-3.swf. The conversion from the original format to SWF was done subject to the following.

CPU time is computed as sum of user and system time, divided by the number of processors.
In the original production log, all jobs are recorded as having requested 128M memory. As this contains no information, it was changed to -1.
The LoadLevelar class was represented by the queue number.
The conversion loses the following data, that cannot be represented in the SWF:
- Type or nodes requested (thin or wide).
- Request for mass storage.
- Type of adaptor requested (high-performance switch or ethernet).
- Distinction between user and system CPU time.
The following anomalies were identified in the conversion:
- 6 jobs were recorded as requesting 0 runtime; this was changed to -1.
- 1733 jobs were recorded as using 0 CPU time; this was changed to -1. Of these, only 4 had "success" status.
- 7174 jobs got more runtime than they requested. In 1380 cases the extra runtime was larger than 1 minute.
- 156 jobs had an average CPU time higher than their runtime. In all but one the difference was larger than 1 minute.

The conversion was done by a log-specific parser in conjunction with a more general converter module.

The differences between conversion 3 (reflected in CTC-SP2-1996-3.swf) and conversion 2 (CTC-SP2-1996-2.swf) is only in the assumed size of the machine: in conversion 3 it set to 338.

The differences between conversion 2 (reflected in CTC-SP2-1996-2.swf) and conversion 1 (CTC-SP2-1996-1.swf) are

In conversion 1 the MaxProcs and MaxNodes attributes were specified as 512 (the full machine size). In conversion 2 they are set to 430 (the batch partition size).
In conversion 1 the LoadLeveler class of submitted jobs was omitted. In conversion 2 this is noted using the queue attribute of each job.

The converted early log is available as CTC-SP2-1995-2.swf. The conversion from the original format to SWF was done subject to the following.

10 records were omitted due to format problems that prevented reliable parsing.
CPU time is computed as sum of user and system time, divided by the number of processors.
In contrast with the production log, different jobs request different amounts of memory and this information is retained.
The conversion loses the following data, that cannot be represented in the SWF:
- Type or nodes requested (thin or wide).
- Request for mass storage.
- Type of adaptor requested (high-performance switch or ethernet).
- Distinction between user and system CPU time.
The following anomalies were identified in the conversion:
- 98 jobs had negative wait times (start time before submit time). In all cases, the difference was smaller than 60 seconds. These wait times were changed to 0 effectively adjusting the start and end times.
- 5026 jobs were recorded as using 0 processors. All of these have a "failed" status. The number of processors for these jobs was changed to -1.
- 3838 jobs were recorded as using 0 CPU time; this was changed to -1. Of these jobs, 2973 had a "success" status.
- 2220 jobs had an average CPU time higher than their runtime. In 2175 cases the extra CPU time was larger than 1 minute. In 8864 jobs the CPU time was missing altogether.

The difference between CTC-SP2-1995-2.swf and CTC-SP2-1995-1.swf is 1 second in the arrival times of the 98 jobs that had negative wait times.

Usage Notes

From the utilization plot of log CTC-SP2-1996-2 it is apparent that the utilization is actually capped at around 78.4%. This implies that the actual batch partition size used was probably 338 nodes, and not 430. This evidence was strong enough to warrant the production of the CTC-SP2-1996-3 version, where the size is indeed set to 338. In the early log it seems to have really been 430, but there is also a period where the utilization is nearly double what it should be. [See update above indicating the real number may actually be 336.]

The original log contains a flurry of activity by one user which may not be representative of normal usage. This has been removed in the cleaned version of the log, and it is recommended that this version be used.
The cleaned log is available as CTC-SP2-1996-2.1-cln.swf.

A flurry is a burst of very high activity by a single user. In this case, it involved 2080 jobs. The filter used to remove it was

user=135 and job>47420 and job<50308

Note that the filter was applied to the original log, and unfiltered jobs remain untouched. As a result, in the cleaned log job numbering is not consecutive.

Further information on flurries and the justification for removing them can be found in:

D. G. Feitelson and D. Tsafrir, “Workload sanitation for performance evaluation”. In IEEE Intl. Symp. Performance Analysis of Systems and Software, pp. 221-230, Mar 2006.
D. Tsafrir and D. G. Feitelson, “Instability in parallel job scheduling simulation: the role of workload flurries”. In 20th Intl. Parallel and Distributed Processing Symp., Apr 2006.

The Log in Graphics

File CTC-SP2-1996-2.swf

utilization with 430
nodes This is the utilization graph when assuming 430 nodes, showing that the utilization has a pronounced upper limit of 0.78, and implying that the actual partition size is actually smaller.

File CTC-SP2-1996-3.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

File CTC-SP2-1996-3.1-cln.swf (cleaned)

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

File CTC-SP2-1995-2.swf (early log)

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

Parallel Workloads Archive - Logs