The University of Luxemburg Gaia Cluster log
||University of Luxemburg Gaia Cluster
||May to August 2014
This log contains 3 months worth of data from the Gaia
cluster at the University of Luxemburg.
It is used mainly by biologists working with large data problems and
engineering people working with physical simulations.
The workload data includes CPU and memory usage, and also I/O activity
in a separate file (as I/O is not accommodated by the
standard workload format).
The workload log from the Gaia cluster system was graciously provided
by Joseph Emeras (email@example.com).
If you use this log in your work, please use a similar acknowledgment.
(May need to click with right mouse button to save to disk)
Papers Using this Log:
This log was used in the following papers:
The Gaia cluster is one of the 4 clusters operated by the ULHPC
(University of Luxembourg HPC Center).
Initially released in 2011, Gaia is now a heterogeneous cluster that
has been upgraded several times.
It currently feature 151 nodes, manufactured by Bull and Dell, with a
total of 2004 cores.
Several nodes (20) feature NVidia Tesla-class GPGPUs accelerators.
Full details about its configuration and history are available from
the University of Luxemburg
The scheduler used is OAR (oar.imag.fr/)
The log is available directly in SWF.
It is based on accounting data collected by the scheduler.
In addition, a companion
log with I/O data is available.
For each job, it lists the total amount of data read and written by
all the processes of this job.
The job ID field is the same as in the SWF files, to enable merging the data.
There is no data about any problems in the conversion process.
Nevertheless, an SWF parser
(customized for this log) was used in conjunction with a general
converter module to check the file.
The following anomalies were observed and in some cases corrected:
Due to the heterogeneity of the cluster it is not clear that all jobs
received the same level of service.
This may affect their wait times and maybe also the activity patterns
of certain users.
- In 99 jobs runtime was missing and approximated using CPU time.
- In 28 additional "failed" jobs both runtime and CPU time were missing.
- 2,880 jobs were recorded as using 0 CPU time; this was changed to -1.
Of these, 155 had "failed" status, but 2,626 had "success" status.
- 1,464 jobs were recorded as using 0 memory; this was changed to -1.
Of these, 64 had "failed" status, but 1,400 had "success" status.
- 1,500 jobs got more runtime than they requested.
In 285 cases the extra runtime was larger than 1 min.
Activity in the first 4-5 days is very low and probably reflects
remnants of activity from before logging actually started.
There appears to be a flurry by user 8 towards the end of the log;
this has not been cleaned yet.
The Log in Graphics
Archive - Logs