Abstract: Existing research has shown the benefits of running multi-level schedulers, either for single node parallel computation or multi-node distributed computation. But, there are some important practical considerations that must be addressed in order to use these multi-level scheduling architectures in multi-user production environments. In this presentation we'll discuss these practical considerations through lessons learned deploying Apache Mesos, a 2-level distributed scheduling system that has been used in organizations such as Twitter, Netflix, and Apple. We'll first highlight the multi-level scheduling systems that influenced Mesos as well as describe the 2-level Mesos architecture in detail. We'll then focus on the 1st-level scheduler of Mesos and the efficient multi-resource fair-sharing algorithm that it employs. Finally, we'll discuss the extensions that have been added over the years (or are being added today) driven by practical needs, from weights, to reservations, to quotas, to optimistic allocations, and deallocation.
Biography: Benjamin Hindman is a co-founder of Mesosphere and co-creator of the Apache Mesos project. Ben was a PhD student at UC Berkeley before bringing Mesos to Twitter where it now runs on tens of thousands of machines powering Twitter's datacenters. He is now Chief Architect at Mesosphere, where they are building the Mesosphere Datacenter Operating System (DCOS). An academic at heart, his research in programming languages and distributed systems has been published in leading academic conferences.
Influence of Dynamic Think Times
on Parallel Job Scheduler Performances in Generative Simulations.
Stephan Schlagkamp (TU Dortmund, Germany)
Evalix: Classification and Prediction
of Job Resource Consumption on HPC Platforms.
Joseph Emeras, Sébastien Varrette, Mateusz Guzek, and Pascal Bouvry (University of Luxembourg, Luxemburg)
Data Driven Scheduling Approach for
the Multi-Node Multi-GPU Cholesky Decomposition.
Yuki Tsujita and Toshio Endo (Tokyo Institute of Technology, Japan)
Scheduling for Better Energy Efficiency
on Many-core Chips.
Chanseok Kang, Seungyul Lee, Yong-Jun Lee, Jaejin Lee, and Bernhard Egger (Seoul National University, Korea)
On the Design and Implementation of an
Efficient Lock-Free Scheduler.
Florian Negele, Felix Friedrich (ETH Zürich), Suwon Oh, and Bernhard Egger (Seoul National University, Korea)
Back to parallel job scheduling workshops home page