Parallel Workloads Archive: NASA Ames iPSC/860

The NASA Ames iPSC/860 log

System: 128-node iPSC/860 hypercube
Duration: October 1993 thru December 1993
Jobs: 42050 total, 14794 user jobs

This log contains three months worth of sanitized accounting records for the 128-node iPSC/860 located in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center. The NAS facility supports industry, acadamia, and government labs all across the country. The workload on the iPSC/860 is a mix of interactive and batch jobs (development and production) mainly consisting of computational aeroscience applications. For more information about NAS, see URL http://www.nas.nasa.gov/.

This somewhat aged log has the distinction of being the first to be analyzed in detail. The results are described in a paper cited below. It includes basic information about the number of nodes, runtime, start time, user, and command. The number of nodes is limited to powers of two due to the architecture. Note that the log does not include arrival information, only start times.

The workload log from the NASA Ames iPSC/860 was graciously provided by Bill Nitzberg, who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.

You can also reference the following:
D. G. Feitelson and B. Nitzberg, ``Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860''. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (Eds.), Springer-Verlag, 1995, Lect. Notes Comput. Sci. vol. 949, pp. 337-360.

Downloads:

NASA-iPSC-1993-0 0.3 MB gz original log
NASA-iPSC-1993-3.swf 0.4 MB gz converted log
NASA-iPSC-1993-3.1-cln.swf 0.2 MB gz cleaned log -- RECOMMENDED, see usage notes
NASA-iPSC-1993-1.swf 0.4 MB gz OLD VERSION of converted log (replaced 1 Aug 2006)
NASA-iPSC-1993-1.1-cln.swf 0.2 MB gz OLD VERSION of cleaned log (replaced 1 Aug 2006)
NASA-iPSC-1993-2.swf 0.4 MB gz OLD VERSION of converted log (replaced 29 Nov 2011)
NASA-iPSC-1993-2.1-cln.swf 0.2 MB gz OLD VERSION of cleaned log (replaced 29 Nov 2011)
(May need to click with right mouse button to save to disk)

Papers Using this Log:

This log was used in the following papers: [feitelson96b] [windisch96] [feitelson98b] [downey99] [talby99b] [feitelson99a] [krevat02] [ernemann03] [song04] [feitelson04b] [song05] [feitelson05c] [feitelson06a] [ranjan06] [talby07] [ranjan08] [iosup08] [feitelson14] [meng15]

System Environment

The iPSC/860 machine located at NASA Ames was a 128-node hypercube. At the time it was the workhorse of the NAS facility for scientific computations (it has since been decommisioned). Up to 9 jobs could run on the system at the same time, by using distinct subcubes. Because jobs run on subcubes, job sizes are limited to powers of two.

The following summarizes the resource usage rules in effect during the time covered by the log.

Batch jobs were handled by NQS, which was configured with the following queues:
Time limitnumber of nodes
163264128
0:20q16s*# q32s*# q64s# q128s#
1:00q16m* q32m* q64m q128m
3:00q16l q32l q64l q128l
"*" = active during prime-time ("*" is not part of the name)
"#"= active during weekend day

Prime time is defined as Monday to Friday 6:00 to 20:00 PST. During this time, the running queues are q16s, q16m, q32s, and q32m. NQS jobs can use no more than 64 nodes (the size of the batch partition), and NQS will not kill interactive jobs.

The rest of the time is non-prime time. At such times all queues are runnable, and NQS jobs can use the entire cube. Moreover, NQS will kill interactive jobs to make room for NQS jobs.

Log Format

The original log file in available as NASA-iPSC-1993-0. This file contains one line per completed job with the following white-space separated fields:

The log also contains special entries about system status. Again there is one line per entry:

  "special" System  Type  Duration  Start-Date  Start-time  Comments...
These entries are distinguished by the first word in the line, which is "special".

"System" is nearly always "CUBE", referring to the iPSC/860.

"Type" is one of:
D Dedicated Time (reserved for exclusive use by a user or sysadmin)
P Preventative Maintenence
M Scheduled Facility Outage
S Software Failure
H Hardware Failure
F Unscheduled Facility Outage
O Other
Note that during dedicated time (type "D"), jobs may still be run. Dedicated time is used to restrict access to selected users for a period of time.

To be consistent with job entries, the special entries gives "Duration" and "Start-time" to the nearest second. However, all times were reported in minutes, and are only accurate within a few minutes.

Conversion Notes

The converted log is available as NASA-iPSC-1993-3.swf. The conversion from the original format to the standard workload format is generally straightforward. It was done subject to the following.

The conversion was done by a log-specific parser in conjunction with a more general converter module.

The difference between conversion 3 (reflected in NASA-iPSC-1993-3.swf) and and conversion 2 (NASA-iPSC-1993-2.swf) is that in the older conversion wait times were listed as 0. In the new one this was changed to -1, as we actually do not know what the wait times were (and what the original submit times were).

The differences between conversion 2 (reflected in NASA-iPSC-1993-2.swf) and conversion 1 (NASA-iPSC-1993-1.swf) is that in the original conversion timegm was used to convert dates and ti mes into UTC. This is wrong in case daylight saving time is used. Conversion 2 used timelocal with the correct timezone setting, which is hopefully the right thing to do.

Usage Notes

The original log contains 24,025 executions of the Unix pwd command on 1 node by sysadmin staff (out of a total of 42,264 jobs, so this is 56.8% of the log; the slightly different numbers that appear in the original paper are due to the fact that the original analysis ignored all 0-time jobs, and here they are included). This reflects a practice by the system administrators to verify that the system was up and responsive. It is recommended to delete these jobs before using or analyzing this log, as they do not reflect normal usage.

To aid in this, a cleaned version of the log is provided as NASA-iPSC-1993-3.1-cln.swf. The filter used to remove the spurious pwd jobs was

user=3 and application=1 and processors=1
Note that this filter was applied to the original log, and unfiltered jobs remain untouched. As a result, in the filtered log job numbering is not consecutive.

The Log in Graphics

File NASA-iPSC-1993-3.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot

File NASA-iPSC-1993-3.1-cln.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot


Parallel Workloads Archive - Logs