Parallel Workloads Archive: MetaCentrum

The MetaCentrum log

System: MetaCentrum Czech National Grid
Duration: Jan 2009 to May 2009
Jobs: 103,656

This log contains several months worth of accounting records from the national grid of the Czech republic, called MetaCentrum. This grid is composed of 14 clusters (called nodes), each with several multiprocessor machines, for a total of 806 processors.

For more information about the system, see URL http://www.metacentrum.cz/en/.

The MetaCentrum workload log was graciously provided by Czech National Grid Infrastructure MetaCentrum. If you use this log in your work, please use a similar acknowledgment. It was made available via the web page of Dalibor Klusacek. Data about failures and maintenance is also available.

Downloads:

METACENTRUM-2009-0 2.0 MB gz original log
METACENTRUM-2009-2.swf 1.5 MB gz converted log
METACENTRUM-2009-1.swf 1.5 MB gz OLD VERSION of converted log (replaced 13 Dec 2011)
(May need to click with right mouse button to save to disk)

Papers Using this Log:

This log was used in the following papers:
[klusacek10] [di12] [klusacek12] [feitelson14] [lic14] [lucarelli17]

System Environment

MetaCentrum is composed of 14 Linux clusters, with different configurations, as follows:
ClusterProcessorNodesTotal CPUs
0Itanium2 1.5GHz 8 8
1Opteron 2.2GHz 16 16
2Xeon 3.2GHz 10 10
3Opteron 2.6GHz 5 80
4AthlonMP 1.6GHz 16 32
5Xeon 2.4GHz 32 64
6Xeon 2.7GHz 36 148
7Xeon 3.1GHz 35 70
8Opteron 1.6GHz 10 20
9Opteron 2.4GHz 3 6
10Opteron 2.0GHz 23 92
11Xeon 3.0GHz 19 152
12Xeon 2.7GHz 8 64
13Xeon 2.3GHz 11 44

Jobs could run on processors from more than one cluster. While relatively rare, this did happen for 586 jobs in the log.

Scheduling is done with PBSpro, employing a system of 11 queues as follows:
QueuePriorityTime limit (hr)
q1 62 720
q2 70 720
q3 50 24
q4 60 2
q5 80 24
q6 65 720
q7 70 720
q8 70 4
q9 70 720
q10 99 720
q11 65 720

Importantly, data about failures and other special circumstances is provided together with the log. This is considered important for reliable evaluations, and in fact is the main point of the paper that introduced this log:

D. Klusacek and H. Rudova, ``The Importance of Complete Data Sets for Job Scheduling Simulations''. In Job Scheduling Strategies for Parallel Processing, Springer Verlag LNCS vol. 6253, pp. 132-153, 2010.

Log Format

The original log is available as METACENTRUM-2009-0.

This file contains one line per completed job with the following tab separated fields:

Conversion Notes

The converted log is available as METACENTRUM-2009-2.swf. The conversion from the original format to SWF was done subject to the following. The difference between the first conversion (reflected in METACENTRUM-2009-1.swf) and the second conversion (reflected in METACENTRUM-2009-2.swf) is The conversion was done by a log-specific parser in conjunction with a more general converter module.

Usage Notes

Flurries seem to exist but have not been cleaned yet.

The log contains all the jobs that terminated in the logging period. Some of these jobs are extremely long, as the maximal runtime allowed on this system is 30 days. Thus some of the logged jobs may have started up to 30 days before the start of the logging period. As a result the initial portion of the log is extremely sparse. This effect also occurs (to a lesser degree) towards the end of the log, because extremely long jobs that run in this period are not logged because they did not terminate by the end of the logging period.

The Log in Graphics

File METACENTRUM-2009-2.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot clusters utilization offered load performance


Parallel Workloads Archive - Logs