Parallel Workloads Archive: MetaCentrum2

The MetaCentrum 2 log

System: MetaCentrum Czech National Grid
Duration: Jan 2013 to Apr 2015
Jobs: 5,731,100

This log contains over two years worth of accounting records from the national grid of the Czech republic, called MetaCentrum. It is a longer log from a later period compared to the original MetaCentrum log.

The MetaCentrum grid is composed of a varying number of clusters, each with several multiprocessor machines with multicore CPUs or GPUs. Importantly, the the scheduling system underwent a significant reconfiguration in the middle of this period, which is the subject of a paper based on this log.

For more information about the system, see URL http://metavo.metacentrum.cz/en/index.html.

The MetaCentrum workload log was graciously provided by Czech National Grid Infrastructure MetaCentrum. If you use this log in your work, please use a similar acknowledgment. It was made available via the web page of Dalibor Klusacek, which also includes data about the configuration, specifying 19 clusters with 495 nodes and 8412 cores in total (however, the log appears to contain some jobs that ran on additional clusters as well). To acknowledge Dalibor's work please consider citing the paper that introduced this log:

D. Klusacek, S. Toth, and G. Podolnikova, ``Real-life Experience with Major Reconfiguration of Job Scheduling System''. In Job Scheduling Strategies for Parallel Processing, May 2015.

Downloads:

METACENTRUM-2013-1.swf 72 MB gz original log in augmented SWF format as received
METACENTRUM-2013-3.swf 58 MB gz re-converted log
METACENTRUM-2013-2.swf 58 MB gz OLD VERSION of re-converted log (replaced 16 Sep 2015)
(May need to click with right mouse button to save to disk)

Papers Using this Log:

This log was used in the following papers:
[klusacek15]

System Environment

MetaCentrum is composed of up to around 30 Linux clusters, with different configurations. Some were changed during the logging period.
no. Cluster From To NxC CoresMem/node (GB) GPUs/node
1 ajax.zcu.cz start end 1x8 8 72 -
2 alela.feec.vutbr.cz start 5-Oct-2013 12x8 96 32 -
3 doom.metacentrum.cz 30-Sep-2013 end 30x16 480 67 2xGPU
4 eru.ruk.cuni.cz start end 2x32 64 264 -
5 gram.zcu.cz start end 10x16 160 67 4xGPU
6 haldir.metacentrum.cz 2-Apr-2013 end 1x64 64 1040 -
7 hda.cerit-sc.cz (zapat) start end 112x16 1792 134 -
8 hdb.cerit-sc.cz (zigur) 22-Apr-2013 end 32x8 256 134 -
9 hdc.cerit-sc.cz (zegox) 22-Apr-2013 end 48x12 576 94 -
10 hermes.metacentrum.cz 19-Feb-2013 end 11x8 88 14 -
11a hildor.prf.jcu.cz start 11-Feb-2013 (renamed) 26x16 416 67 -
11b hildor.metacentrum.cz 11-Feb-2013 end 26x16 416 67 -
12 konos.fav.zcu.cz start end 9x12 108 24 2xGPU
13 losgar.ics.muni.cz start end 2x48 96 64 -
14 loslab.ics.muni.cz start end 14x12 168 12 -
15 luna.fzu.cz start end 47x16 752 96 -
16 mandos.ics.muni.cz start end 14x64 896 264 -
17 manegrot.ics.muni.cz 16-Dec-2014 end 4x32 128 512 -
18 manwe.ics.muni.cz start end 7x16 112 66 -
19 minos.zcu.cz start end 49x12 588 20 -
20 mudrc.metacentrum.cz 17-May-2014 end 12x4 48 3 -
21 nympha.zcu.cz start end 19x8 152 14 -
22a perian1-20.ncbr.muni.cz start 29-May-2014 20x8 160 25 -
22b perian21-40.ncbr.muni.cz start end 20x8 160 25 -
22c perian41-56.ncbr.muni.cz start end 16x12 192 50 -
23 quark.video.muni.cz start end 3x8 24 18 -
24 ramdal.ics.muni.cz start end 1x32 32 1058 -
25 skirit.ics.muni.cz start 5-Oct-2013 28x4 112 3 -
26 tarkil.cesnet.cz start end 28x8 224 22 -
27 ungu.cerit-sc.cz 12-Dec-2013 end 1x288 288 6144 -
28 urga.cerit-sc.cz 19-Nov-2014 end 1x384 384 6144 -
29a zewura.cerit-sc.cz start 12-Nov-2014 (split) 20x80 1600 512 -
29b zewura.cerit-sc.cz 12-Nov-2014 end 8x80 640 512 -
30 zebra.cerit-sc.cz 12-Nov-2014 end 12x80 960 512 -
31 zorg.cerit-sc.cz 11-Dec-2014 end 4x10 40 1536 -
32 kalpa.fzu.cz 6-Nov-2013 end 2x24 48 256 -
The notation NxC means N nodes with C cores each; Cores is the total cores.

Jobs could run on processors from more than one cluster. While relatively rare, this did happen for 7011 jobs in the log.

Scheduling is done with TORQUE with a custom built scheduler, employing a system of general queues served by two scheduling servers. The scheduler uses common approaches such as backfilling and fairshare. Documentation is available on the MetaCentrum site. The main queues are as follows:
QueuePriorityTime limit
q_2h 50 2h
q_4h 500 4h
q_1d 50 24h
q_2d 50 48h
q_4d 50 96h
q_1w 50 168h
q_2w 50 336h
q_2w_plus 50 720/1488h
backfill 20 24h
short 50 2h
normal 50 24h
long 50 720h
uv 30 96h
gpu 75 24h
gpu_long 55 168h
In addition, there are multiple special queues for specific users and groups and for administrative purposes. The full list is available in the MetaCentrum documentation.

The above data is valid for the second half of the log, from January 2014. Note that nearly all the queues have the same priority (50). The practical effect is that jobs are prioritized just by fairshare, and queues are basically only used to define various per-user/group limits. Thus, the system operates over one "virtual" queue which is ordered by fairshare. Before January 2014, there was a fixed queue ordering, where the highest priority was for "long" (70), followed by "short" (60), "normal" (50) and "backfill" (20). Fairshare was only used "locally", within a given queue. The changes in configuration are described in detail in the paper which introduced the log [klusacek15]. The change in configuration apparently led to a change in utilization as seen in the figures below.

Importantly, data about the specific requests made by users is included as an additional field in the original log. This is a ':'-separated list of properties, such as the number of nodes and cores requested, the architecture, and specific clusters to use or to avoid. The possible properties and the mapping of properties to clusters is available in the MetaCentrum documentation. This is considered important as it enables evaluations that take all these different constraints into account.

Log Format

The original log is available as METACENTRUM-2013-1.swf although it does not completely adhere to the SWF format. Note that fields that do not contain valid information are identically 1 instead of -1. Moreover, at the end of each line, 3 additional fields are included:

Conversion Notes

The converted log is available as METACENTRUM-2013-2.swf. The conversion from the original format to valid SWF was done subject to the following. The difference between the first conversion (reflected in METACENTRUM-2013-2.swf) and the second conversion (reflected in METACENTRUM-2013-3.swf) is that additional information about the clusters (partitions) and as a result the numbering of partitions changed. The conversion was done by a log-specific parser in conjunction with a more general converter module. This version of the converter can handle multi-partition (multi-cluster) jobs.

Usage Notes

Large scale flurries apparently exist but have not been cleaned yet.

The log contains all the jobs that started in the logging period, which is all of 2013-2014. Some of these jobs are extremely long, as the maximal runtime allowed on this system is 30 days. Therefore edge effects may happen at both ends of the log, where the logged data does not represent the actual load faithfully. In particular, all the jobs executing in 2015 are actually leftovers from 2014.

The Log in Graphics

File METACENTRUM-2013-3.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot clusters utilization offered load performance


Parallel Workloads Archive - Logs