The following listing contains papers dealing specifically with workloads on parallel machines, with few exceptions. It does not cover scheduling, but papers that discuss workloads used to evaluate scheduling are included.
Please send comments, abstracts, annotations, and additional references to feit@cs.huji.ac.il.
[aida09]
Kento Aida and Henri Casanova,
“Scheduling Mixed-Parallel Applications with Advance Reservations”.
Cluster Comput. 12(2), pp. 205--220, Jun 2009.
[almeida96]
Virgílio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira,
“Characterizing Reference Locality in the WWW”.
In Parallel & Distributed Inf. Syst., pp. 92--103, Dec 1996.
Annotation: On temporal locality, which is related to popularity, and spatial locality, which has fractal properties.
[amar08a]
Lior Amar, Ahuva Mu'alem, and Jochen Stößer,
“On the Importance of Migration for Fairness in Online Grid Markets”.
In 9th IEEE Grid Comput. Conf., pp. 65--74, Sep 2008.
[amar08b]
Lior Amar, Ahuva Mu'alem, and Jochen Stößer,
“The Power of Preemption in Economic Online Markets”.
In Grid Economics and Business Models, pp. 41--57, 2008.
Annotation: Preemption improves both theoretical bounds and empirical simulations of scheduling in a non-cooperative setting.
[aridor04]
Yariv Aridor, Tamar Domany, Oleg Goldshmidt, Edi Shmueli, Jose Moreira, and Larry Stockmeier,
“Multi-Toroidal Interconnects: Using Additional Communication Links to Improve Utilization of Parallel Computers”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 144--159, 2004.
Annotation: The (few) extra links help reduce allocation constraints.
[bansal01]
Nikhil Bansal and Mor Harchol-Balter,
“Analysis of SRPT Scheduling: Investigating Unfairness”.
In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 279--290, Jun 2001.
Annotation: With heavy tailed runtimes, few jobs account for a large part of the load. So starving them (as happens with SRPT under overload)
actually lets most of the jobs run well; trying to be fair to all
(e.g. with processor sharing) ends up being bad for all.
[barsanti06]
Lawrence Barsanti and Angela C. Sodan,
“Adaptive Job Scheduling via Predictive Job Resource Allocation”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 115--140, 2006.
[batat00]
Anat Batat and Dror G. Feitelson,
“Gang Scheduling with Memory Considerations”.
In 14th Intl. Parallel & Distributed Processing Symp. (IPDPS), pp. 109--114, May 2000.
Annotation: Using addmission control to prevent memory overload. Includes how to estimate memory usage based on historical information.
[bender05]
Michael A. Bender, David P. Bunde, Erik D. Demaine, Sándor P. Fekete, Vitus J. Leung, Hank Meijer, and Cynthia A. Phillips,
“Communication-Aware Processor Allocation for Supercomputers”.
In 9th Workshop Algorithms & Data Structures, Springer-Verlag, Lect. Notes Comput. Sci. vol. 3608, pp. 169--181, Aug 2005.
[brevik06]
John Brevik, Daniel Nurmi, and Rich Wolski,
“Predicting Bounds on Queuing Delay for Batch-Scheduled Parallel Machines”.
In 11th Symp. Principles & Practice of Parallel Programming (PPoPP), pp. 110--118, Mar 2006.
[buyya09]
Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic,
“Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility”.
Future Generation Comput. Syst. 25(6), pp. 599--616, Jun 2009.
[calzarossa93]
Maria Calzarossa and Giuseppe Serazzi,
“Workload Characterization: A Survey”.
Proc. IEEE 81(8), pp. 1136--1150, Aug 1993.
[calzarossa95]
Maria Calzarossa, Luisa Massari, Alessandro Merlo, Mario Pantano, and Daniele Tessera,
“Medea: A Tool for Workload Characterization of Parallel Systems”.
IEEE Parallel & Distributed Technology 3(4), pp. 72--80, Winter 1995.
Annotation: Medea provides various services such as clustering and curve fitting that can be applied to trace data.
[calzarossa95b]
M. Calzarossa, G. Haring, G. Kotsis, A. Merlo, and D. Tessera,
“A Hierarchical Approach to Workload Characterization for Parallel Systems”.
In High-Performance Computing and Networking, Springer-Verlag, Lect. Notes Comput. Sci. vol. 919, pp. 102--109, May 1995.
Annotation: Model the internal structure of the workload at levels of applications, algorithms, and routines.
[calzarossa00]
Maria Calzarossa, Luisa Massari, and Daniele Tessera,
“Workload Characterization Issues and Methodologies”.
In Performance Evaluation: Origins and Directions, Günter Haring, Christoph Lindemann, and Martin Reiser, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1769, pp. 459--482, 2000.
[cao04]
Junwei Cao and Falk Zimmermann,
“Queue Scheduling and Advance Reservations with COSY”.
In 18th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2004.
[chapin99b]
Steve J. Chapin, Walfredo Cirne, Dror G. Feitelson, James Patton Jones, Scott T. Leutenegger, Uwe Schwiegelshohn, Warren Smith, and David Talby,
“Benchmarks and Standards for the Evaluation of Parallel Job Schedulers”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 67--90, 1999.
Annotation: Discusses what should be included in workload models that serve as benchmarks for evaluation of scheduling schemes, suggests
a standard format, and looks at extensions for metacomputing.
[chiang94]
Su-Hui Chiang, Rajesh K. Mansharamani, and Mary K. Vernon,
“Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies”.
In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 33--44, May 1994.
Annotation: Shows that adaptive partitioning is better than static partitioning based on application characteristics such as the average parallelism.
However, a maximum partition size (which decreases with load) should
be imposed.
[chiang96]
Su-Hui Chiang and Mary K. Vernon,
“Dynamic vs. Static Quantum-Based Parallel Processor Allocation”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 200--223, 1996.
Annotation: Equipartitioning is shown to be better than using the processor working set combined with preemptions and multi-feedback queues.
It is improved even more if the processor working set is used as
the minimal allocation.
[chiang01]
Su-Hui Chiang and Mary K. Vernon,
“Characteristics of a Large Shared Memory Production Workload”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 159--187, 2001.
Annotation: Detailed study of the workload on the NCSA Origin 2000.
[chiang02]
Su-Hui Chiang, Andrea Arpaci-Dusseau, and Mary K. Vernon,
“The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 103--127, 2002.
[cirne00]
Walfredo Cirne and Francine Berman,
“Adaptive Selection of Partition Size for Supercomputer Requests”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1911, pp. 187--207, 2000.
Annotation: Describes SA, the AppLes application scheduler, that chooses among several partition sizes based on knowledge about the
application and the system state.
[cirne01]
Walfredo Cirne and Francine Berman,
“A Model for Moldable Supercomputer Jobs”.
In 15th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2001.
Annotation: A model for turning a rigid job into a moldable one, i.e. to create a set of optional partition sizes. Based on results
of a user survey regarding what users really need; for example,
they don't necessarily need power of 2 sizes.
[cirne01b]
Walfredo Cirne and Francine Berman,
“A Comprehensive Model of the Supercomputer Workload”.
In 4th Workshop on Workload Characterization, pp. 140--148, Dec 2001.
[crovella01]
Mark E. Crovella,
“Performance Evaluation with Heavy Tailed Distributions”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 1--10, 2001.
Annotation: Concise overview of what heavy tails are, how they affect performance evaluation, and possible impact on system design.
[cypher96]
Robert Cypher, Alex Ho, Smaragda Konstantinidou, and Paul Messina,
“A Quantitative Study of Parallel Scientific Applications with Explicit Communication”.
J. Supercomput. 10(1), pp. 5--24, 1996.
Annotation: A study of 8 SPMD scientific applications, implemented on the Touchstone Delta and on nCUBE machines. The main result is that they
all require a lot of Flops, but there is a great veriety in requirements
for memory, I/O, and communication bandwidth. Scaling equations are given.
[dasilva00]
Fabricio Alves Barbosa da Silva and Isaac D. Scherson,
“Improving Parallel Job Scheduling Using Runtime Measurements”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1911, pp. 18--38, 2000.
Annotation: Uses fuzzy sets to classify jobs as I/O, computing, or communicating, and use this to decide whether jobs require
gang scheduling.
[downey97a]
Allen B. Downey,
“Using Queue Time Predictions for Processor Allocation”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 35--57, 1997.
Annotation: First, derives a workload model from which the remaining execution time can be estimated conditioned on the time already
running. This is used to estimate when running jobs will
terminate, and free processors for queued jobs. And this is used
to decide whether to use available processors or wait for more to
be freed.
[downey97c]
Allen B. Downey,
“Predicting Queue Times on Space-Sharing Parallel Computers”.
In 11th Intl. Parallel Processing Symp. (IPPS), pp. 209--218, Apr 1997.
Annotation: Suggests a workload model in which runtimes are uniformly distributed in log space. This enables the one to calculate
the distribution of additional runtime conditioned on the
current runtime, and thus how much time will pass until resources
will be freed.
[downey98b]
Allen B. Downey,
“A Parallel Workload Model and Its Implications for Processor Allocation”.
Cluster Computing 1(1), pp. 133--145, 1998.
Annotation: Includes model of speedup as a function of parallelism, and log-uniform models of average parallelism and total work.
[downey99]
Allen B. Downey and Dror G. Feitelson,
“The Elusive Goal of Workload Characterization”.
Performance Evaluation Rev. 26(4), pp. 14--29, Mar 1999.
Annotation: Evidence that workloads are complex and hard to model. For example, using the mean and standard deviation of distributions
is problematic when they are very dispersive, as happens in practice;
also, workloads have internal structure that should not be ignored.
[dutot05]
Pierre-Franccois Dutot, Marco A. S. Netto, Alfredo Goldman, and Fabio Kon,
“Scheduling Moldable BSP Tasks”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Eitan Frachtenberg, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 3834, pp. 157--172, 2005.
[elabdounik03]
Rachid El Abdouni Khayari, Ramin Sadre, and Boudewijn R. Haverkort,
“Fitting World-Wide Web Request Traces with the EM-Algorithm”.
Performance Evaluation 52, pp. 175--191, 2003.
Annotation: Fitting heavy tails to a hyper-exponential distribution so as also to fit the moments.
[england04]
Darin England and Jon B. Weissman,
“Costs and Benefits of Load Sharing in the Computational Grid”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 160--175, 2004.
[ernemann02]
Carsten Ernemann, Volker Hamscher, and Ramin Yahyapour,
“Economic Scheduling in Grid Computing”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 128--152, 2002.
[ernemann03]
Carsten Ernemann, Baiyi Song, and Ramin Yahyapour,
“Scaling of Workload Traces”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 166--182, 2003.
[esbaugh07]
Bryan Esbaugh and Angela C. Sodan,
“Coarse-Grain Time Slicing with Resource-Share Control in Parallel-Job Scheduling”.
In 3rd High Perf. Comput. & Comm., Springer-Verlag, Lect. Notes Comput. Sci. vol. 4782, pp. 30--43, Sep 2007.
Annotation: Vary the time slices at different times and for different job classes to control the allocations.
[feitelson95e]
Dror G. Feitelson and Bill Nitzberg,
“Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 949, pp. 337--360, 1995.
Annotation: Lots of data bout the jobs mix, degrees of parallelism, resource use, classifications into interactive vs. batch or according to time of day,
runtime distributions, arrival process, and user activity.
[feitelson96b]
Dror G. Feitelson,
“Packing Schemes for Gang Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 89--110, 1996.
Annotation: Shows that using a buddy system is beneficial for allocating jobs to processors, because then they have a better chance of running in
multiple slots. Includes results of a workload analysis used to
drive the simulation.
[feitelson96c]
Dror G. Feitelson and Larry Rudolph,
“Toward Convergence in Job Schedulers for Parallel Supercomputers”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 1--26, 1996.
Annotation: Contends that one of the problem is lack of precise definition of assumptions and domain. Includes a critique of goals of schedulers
and a classification of jobs types into rigid, moldable, evolving,
and malleable.
[feitelson97a]
Dror G. Feitelson and Morris A. Jette,
“Improved Utilization and Responsiveness with Gang Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 238--261, 1997.
Annotation: Shows how gang scheduling can improve not only resopnsiveness but also utilization, despite the overheads, by providing the
scheduler with flexibility that is not possible with only space
slicing, and thus allowing it to reduce fragmentation. This is
backed up by results from the LLNL Cray T3D gang scheduler, which
is described in detail.
[feitelson97b]
Dror G. Feitelson,
“Memory Usage in the LANL CM-5 Workload”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 78--94, 1997.
[feitelson98]
Dror G. Feitelson and Ahuva Mu'alem Weil,
“Utilization and Predictability in Scheduling the IBM SP2 with Backfilling”.
In 12th Intl. Parallel Processing Symp. (IPPS), pp. 542--546, Apr 1998.
Annotation: Shows that a conservative variant of backfilling, in which jobs are moved ahead in the queue provided they do not delay any previously
scheduled job, is as efficient as the more aggressive variant used in
EASY, which just checks that the first job is not delayed.
[feitelson98b]
Dror G. Feitelson and Larry Rudolph,
“Metrics and Benchmarking for Parallel Job Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 1--24, 1998.
Annotation: Suggest that a standard workload model is needed as a benchmark for job schedulers. This should include external models (degree of parallelism
and runtime) and internal models (barrier synchronization and granularity).
Points out numerous issues that require additional research.
[feitelson99a]
Dror G. Feitelson and Michael Naaman,
“Self-Tuning Systems”.
IEEE Softw. 16(2), pp. 52--60, Mar/Apr 1999.
Annotation: Suggest tuning of operating system parameters using genetic algorithms to search for optimal values, based on performance
simulations with local workload logs. A case study of batch
scheduling on the Intel iPSC/860 is used.
[feitelson01]
Dror G. Feitelson,
“Metrics for Parallel Job Scheduling and their Convergence”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 188--205, 2001.
Annotation: Different metrics have different convergence properties. For non-preemptive scheduling, response time converges quickly. For preemptive scheduling, bounded
slowdown is better.
[feitelson02b]
Dror G. Feitelson,
“The Forgotten Factor: Facts; on Performance Evaluation and Its Dependence on Workloads”.
In Euro-Par 2002 Parallel Processing, Burkhard Monien and Rainer Feldmann, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2400, pp. 49--60, Aug 2002.
[feitelson02c]
Dror G. Feitelson,
“Workload Modeling for Performance Evaluation”.
In Performance Evaluation of Complex Systems: Techniques and Tools, Maria Carla Calzarossa and Salvatore Tucci, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2459, pp. 114--141, Sep 2002.
[feitelson03a]
Dror G. Feitelson,
“Metric and Workload Effects on Computer Systems Evaluation”.
Computer 36(9), pp. 18--25, Sep 2003.
Annotation: Shows that interactions between the workload, the metric, and the system being analyzed may have a significant impact on evaluation results.
[feitelson04b]
Dror G. Feitelson,
“A Distributional Measure of Correlation”.
InterStat , Dec 2004.
[feitelson05b]
Dror G. Feitelson,
“Experimental Analysis of the Root Causes of Performance Evaluation Results: A Backfilling Case Study”.
IEEE Trans. Parallel & Distributed Syst. 16(2), pp. 175--182, Feb 2005.
Annotation: Details of a triple interaction in which using accurate runtime estimates in a workload model causes schedulers to selectively
bias for or against short jobs, leading to diverging performance
metrics when using average slowdown or runtime.
[feitelson05c]
Dror G. Feitelson,
“On the Scalability of Centralized Control”.
In Workshop on System Management Tools for Large-Scale Parallel Systems, Apr 2005.
Annotation: Claims that centralized control is not so problematic in terms of scalability, because the computational power of the control node
grows exponentially.
[feitelson05d]
Dror G. Feitelson and Ahuva W. Mu'alem,
“On the Definition of ``On-Line'' in Job Scheduling Problems”.
SIGACT News 36(1), pp. 122--131, Mar 2005.
Annotation: On-line also means stability in the sense that the load is spread more or less evenly. This makes scheduling much simpler.
[feitelson06a]
Dror G. Feitelson and Dan Tsafrir,
“Workload Sanitation for Performance Evaluation”.
In IEEE Intl. Symp. Performance Analysis Syst. & Software (ISPASS), pp. 221--230, Mar 2006.
Annotation: Workloads may be multiclass, in which one class is usable and another needs to be filtered out. Examples are abnormal activity
by system personnel and workload flurries.
[feitelson07a]
Dror G. Feitelson,
“Locality of sampling and diversity in parallel system workloads”.
In 21st Intl. Conf. Supercomputing (ICS), pp. 53--63, Jun 2007.
Annotation: Suggests that distributions of workload attributes as seen on short timescales may be quite different from those on
a long timescale. The divergence between the distributions can be
used as a metric to quantify this effect, and localized sampling
(or rather repetitions of a single item) from the global
distribution can be used to generate it.
[feitelson08]
Dror G. Feitelson,
“Looking at Data”.
In 22nd Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2008.
Annotation: Examples of the importance of using real data rather than assumptions.
[feldmann98]
Anja Feldmann and Ward Whitt,
“Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network Performance Models”.
Performance Evaluation 31(3-4), pp. 245--279, Jan 1998.
[ferrari84]
Domenico Ferrari,
“On the Foundations of Artificial Workload Design”.
In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 8--14, Aug 1984.
Annotation: One question is what workload attributes to model, and how to quantify the representativeness of the resulting model.
The answer is that it should lead to the same performence
predictions as the real workload.
The other issue is dynamics: how the workload behaves over time.
The answer is not a random sampling from a population, but rather
a model of moving from one state to another.
These issues are illustrated by modeling of an interactive
computer system.
[ferschweiler01]
Ken Ferschweiler, Mariacarla Calzarossa, Cherri Pancake, Daniele Tessera, and Dylan Keon,
“A Community Databank for Performance Tracefiles”.
In Euro PVM/MPI, Y. Cotronis and J. Dongarra, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2131, pp. 233--240, 2001.
Annotation: Describes a repository for data regarding parallel applications, e.g. MPI calls and other events of interest.
[folling09]
Alexander Fölling, Christian Grimme, Joachim Lepping, and Alexander Papaspyrou,
“Decentralized grid scheduling with evolutionary fuzzy systems”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 16--36, 2009.
[frachtenberg03b]
Eitan Frachtenberg, Dror G. Feitelson, Juan Fernandez, and Fabrizio Petrini,
“Parallel Job Scheduling Under Dynamic Workloads”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 208--227, 2003.
Annotation: Optimizing performance by measuring the effect of parameters such as the scheduling time quantum and the multiprogramming level. Using a low
multiprogramming level is possible if backfilling is used to serve queued
jobs.
[franke99b]
H. Franke, J. Jann, J. E. Moreira, P. Pattnaik, and M. A. Jette,
“An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific”.
In Supercomputing '99, Nov 1999.
Annotation: Use a workload model fitted to the ASCI workload to evaluate gang scheduling and backfilling. Both are much better than FCFS.
Gang scheduling is better for large jobs, even with a limited
MPL.
[franke06]
Carsten Franke, Joachim Lepping, and Uwe Schwiegelshohn,
“On Avantages of Scheduling using Genetic Fuzzy Systems”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 68--93, 2006.
[gehring99]
Jörn Gehring and Thomas Preiss,
“Scheduling a Metacomputer With Uncooperative Sub-schedulers”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 179--201, 1999.
Annotation: Derive a model of the expected workload in a meta-computing environment, and use it to evaluate several scheduling algorithms.
The main idea is to interleave meta-scheduling decisions with local
scheduling done by each participating machine.
[ghare99]
Gaurav Ghare and Scott T. Leutenegger,
“The Effect of Correlating Quantum Allocation and Job Size for Gang Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 91--110, 1999.
Annotation: By providing longer quanta to rows of the gang scheduling matrix in which many jobs are mapped (i.e. where there is a concentration
of jobs that each use a small number of processors) performance is
improved.
[goh08]
Lee Kee Goh and Bharadwaj Veeravalli,
“Design and Performance Evaluation of Combined First-Fit Task Allocation and Migration Strategies in Mesh Multiprocessor Systems”.
Parallel Comput. 34(9), pp. 508--520, Sep 2008.
[guim09]
Francesc Guim, Ivan Rodero, and Julita Corbalan,
“The resource usage aware backfilling”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 59--79, 2009.
[harcholb97]
Mor Harchol-Balter and Allen B. Downey,
“Exploiting Process Lifetime Distributions for Dynamic Load Balancing”.
ACM Trans. Comput. Syst. 15(3), pp. 253--285, Aug 1997.
Annotation: First, presents evidence that process lifetimes are best modeled by $prob( rm lifetime > t ) propto t^k$, where $k approx -1$, which
means that the probability that a process runs for $t$ sec is about $1/t$,
and a process that already ran for $t$ secs is expected to run for another
$t$. This is then used to choose processes for migration: those whose
current runtime (and hence expected additional runtime) is more than the
migration cost divided by the load difference (and thus are expected to
benefit from the migration).
[harcholb01]
Mor Harchol-Balter, Nikhil Bansal, Bianca Schroeder, and Mukesh Agrawal,
“SRPT Scheduling for Web Servers”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 11--20, 2001.
[heine05]
Felix Heine, Matthias Hoverstadt, Odej Kao, and Achim Streit,
“On the Impact of Reservations from the Grid on Planning-Based Resource Management”.
In ICCS Workshop Grid Computing Security & Resource Management, Lect. Notes Comput. Sci. vol. 3516, pp. 155--162, May 2005.
[hotovy96]
Steven Hotovy,
“Workload Evolution on the Cornell Theory Center IBM SP2”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 27--40, 1996.
Annotation: Traces changes as the load matured after installing the system, and presents some statistics of the mature workload.
[iosup06]
Alexandru Iosup, Dick H. J. Epema, Carsten Franke, Alexander Papaspyrou, Lars Schley, Baiyi Song, and Ramin Yahyapour,
“On Grid Performance Evaluation using Synthetic Workloads”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 232--255, 2006.
[iosup08]
Alexandru Iosup, Hui Li, Mathieu Jan, Shanny Anoep, Catalin Dumitrescu, Lex Wolters, and Dick H. J. Epema,
“The Grid Workloads Archive”.
Future Generation Comput. Syst. 24(7), pp. 672--686, May 2008.
[islam03]
Mohammad Islam, Pavan Balaji, P. Sadayappan, and D. K. Panda,
“QoPS: A QoS based scheme for Parallel Job Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 252--268, 2003.
[jann97]
Joefon Jann, Pratap Pattnaik, Hubertus Franke, Fang Wang, Joseph Skovira, and Joseph Riodan,
“Modeling of Workload in MPPs”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 95--116, 1997.
[jin00]
Shudong Jin and Azer Bestavros,
“Sources and Characteristics of Web Temporal Locality”.
In 8th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 28--35, Aug 2000.
Annotation: Much of temporal locality arrises from skewed popularity: more popular items are requested repeatedly often. But there
are also short-term correlations, which can be seen by focusing
on items that are requested the same number of times.
[kamath02]
Purushotham Kamath, Kun-chan Lan, John Heidemann, Joe Bannister, and Joe Touch,
“Generation of High Bandwidth Network Traffic Traces”.
In 10th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 401--412, Oct 2002.
Annotation: How to scale or merge workload traces to increase the total load.
[kavas01]
Avi Kavas, David Er-El, and Dror G. Feitelson,
“Using Multicast to Pre-load Jobs on the ParPar Cluster”.
Parallel Comput. 27(3), pp. 315--327, Feb 2001.
Annotation: Prealoading executable files by multicasting them generally leads to faster execution than demand loading via NFS.
[kleban03]
Stephen D. Kleban and Scott H. Clearwater,
“Hierarchical Dynamics, Interarrival Times, and Performance”.
In Supercomputing, Nov 2003.
Annotation: Modeling the arrivals of jobs to a supercomputer as the result of a complex hierarchy of decisions by managers, users,
and the scheduler.
[kleineweber11]
Christoph Kleineweber, Axel Keller, Oliver Niehörster, and André Brinkman,
“Rule-Based Mapping of Virtual Machines in Clouds”.
In 19th Euromicro Conf. Parallel, Distrib. & Network-Based Proc., pp. 527--534, Feb 2011.
[klusacek10]
Dalibor Klusávcek and Hana Rudová,
“The Importance of Complete Data Sets for Job Scheduling Simulations”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 6253, pp. 132--153, 2010.
[kotsis97]
Gabriele Kotsis,
“A Systematic Approach for Workload Modeling for Parallel Processing Systems”.
Parallel Comput. 22, pp. 1771--1787, 1997.
Annotation: A shoping list of all that has to be done.
[krallmann99]
Jochen Krallmann, Uwe Schwiegelshohn, and Ramin Yahyapour,
“On the Design and Evaluation of Job Scheduling Algorithms”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 17--42, 1999.
Annotation: Propose a methodology to incorporate complex policies and objective functions into the evaluation of schedulers.
[krevat02]
Elie Krevat, José G. Castaños, and José E. Moreira,
“Job Scheduling for the BlueGene/L System”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 38--54, 2002.
[lawson02]
Barry G. Lawson and Evgenia Smirni,
“Multiple-Queue Backfilling Scheduling with Priorities and Reservations for Parallel systems”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 72--87, 2002.
[lee04]
Cynthia Bailey Lee, Yael schwartzman, Jennifer Hardy, and Allan Snavely,
“Are User Runtime Estimates Inherently Inaccurate?”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 253--263, 2004.
Annotation: It seems that yes, they are.
[lee07]
Cynthia Bailey Lee and Allan E. Snavely,
“Precise and Realistic Utility Functions for User-Centric Performance Analysis of Schedulers”.
In 17th Intl. Symp. High Performance Distributed Comput. (HPDC), Jun 2007.
Annotation: Define synthetic utility functions giving importance to users as function of resopnse time, based on data of how such functions
generally look. Then define a scheduler that uses genetic algorithms
to schedule jobs so as to optimize utility.
[leland86]
W. E. Leland and T. J. Ott,
“Load-Balancing Heuristics and Process Behavior”.
In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 54--69, 1986.
[leland94]
Will E. Leland, Murad S. Taqqu, Walter Willinger, and Daniel V. Wilson,
“On the Self-Similar Nature of Ethernet Traffic”.
IEEE/ACM Trans. Networking 2(1), pp. 1--15, Feb 1994.
Annotation: It is self-similar (bursty) at all time scales, and requires a fractal model rather than common Poisson-related models.
[li04]
Hui Li, David Groep, and Lex Walters,
“Workload Characteristics of a Multi-cluster Supercomputer”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 176--193, 2004.
[li06]
Hui Li, Michael Muskulus, and Lex Wolters,
“Modeling Job Arrivals in a Data-Intensive Grid”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 210--231, 2006.
[lingrand09]
Diane Lingrand, Johan Montagnat, Janusz Martyniak, and David Colling,
“Analyzing the EGEE production grid workload: application to jobs submission optimization”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 37--58, 2009.
Annotation: Detailed analysis of EGEE data, and in particular job failures and resubmission strategies of users.
[liuz10]
Zhuo Liu, Aihua Liang, and Limin Xiao,
“A Parallel Workload Model and its Implications for Maui Scheduling Policies”.
In 2nd Intl. Conf. Comput. Modeling & Simulation, pp. 384--389, Jan 2010.
[liy07]
Yawei Li, Prashasta Gujrati, Zhiling Lan, and Xian-he Sun,
“Fault-Driven Re-Scheduling for Improving System-Level Fault Resilience”.
In Intl. Conf. Parallel Processing (ICPP), Sep 2007.
[lo98]
Virginia Lo, Jens Mache, and Kurt Windisch,
“A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 25--46, 1998.
[lublin03]
Uri Lublin and Dror G. Feitelson,
“The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs”.
J. Parallel & Distributed Comput. 63(11), pp. 1105--1122, Nov 2003.
[medernach05]
Emmanuel Medernach,
“Workload Analysis of a Cluster in a Grid Environment”.
In Job Scheduling Strategies for Parallel Processing, Jun 2005.
[minh09]
Tran Ngoc Minh and Lex Wolters,
“Modeling parallel system workloads with temporal locality”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 101--115, 2009.
[minh11]
Tran Ngoc Minh and Lex Walters,
“Towards a Profound Analysis of Bags-of-Tasks in Parallel Systems and Their Performance Impact”.
In 20th Intl. Symp. High Performance Distributed Comput. (HPDC), pp. 111-122, Jun 2011.
[mualem01]
Ahuva W. Mu'alem and Dror G. Feitelson,
“Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling”.
IEEE Trans. Parallel & Distributed Syst. 12(6), pp. 529--543, Jun 2001.
Annotation: Demonstrates that the relative performance of EASY backfilling and conservative backfilling depends on the workload (both for models
and for actual logs) and on the metric used (response time or slowdown).
Also shows user runtime estimates to be inaccurate, but this turns
out to be beneficial for performance.
[nguyen96b]
Thu D. Nguyen, Raj Vaswani, and John Zahorjan,
“Parallel Application Characterization for Multiprocessor Scheduling Policy Design”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 175--199, 1996.
Annotation: Speedup characteristics and the possibility of assessing them at runtime on a dynamically partitioned system.
[parsons95]
Eric W. Parsons and Kenneth C. Sevcik,
“Multiprocessor Scheduling for High-Variability Service Time Distributions”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 949, pp. 127--145, 1995.
Annotation: Propose that preemptions be used to improve average response time.
[pascual09]
Jose Antonio Pascual, Javier Navaridas, and Jose Miguel-Alonso,
“Effects of topology-aware allocation policies on scheduling performance”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 138--156, 2009.
[peris94]
Vinod G. J. Peris, Mark S. Squillante, and Vijay K. Naik,
“Analysis of the Impact of Memory in Distributed Parallel Processing Systems”.
In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 5--18, May 1994.
Annotation: Investigates the tradeoff between improved efficiency when contending jobs are each allocated less processors, and increased paging overhead
due to insufficient memory when less processors are used. The result is
that the paging overhead dominates, so partition sizes should not be
decreased to the point where the application's dataset does not fit
into memory. A stochastic model of paging behavior as an application
moves from one locality to another is introduced.
[petrini03]
Fabrizio Petrini, Eitan Frachtenberg, Adolfy Hoisie, and Salvador Coll,
“Performance Evaluation of the Quadrics Interconnection Network”.
Cluster Comput. 6(2), pp. 1125--142, Apr 2003.
Annotation: A large number of micro-benchmarks with various communication patterns.
[ranjan06]
Rajiv Ranjan, Aaron Harwood, and Rajkumar Buyya,
“SLA-Based Coordinated Superscheduling Scheme for Computational Grids”.
In 8th IEEE Intl. Conf. Cluster Computing, Sep 2006.
[ranjan08]
Rajiv Ranjan, Aaron Harwood, and Rajkumar Buyya,
“A Case for Cooperative and Incentive-Based Federation of Distributed Clusters”.
Future Generation Comput. Syst. 24, pp. 280--295, 2008.
[rosti02]
Emilia Rosti, Giuseppe Serazzi, Evgenia Smirni, and Mark S. Squillante,
“Models of Parallel Applications with Large Computation and I/O Requirements”.
IEEE Trans. Softw. Eng. 28(3), pp. 286--307, Mar 2002.
[sabin03]
Gerald Sabin, Rajkumar Kettimuthu, Arun Rajan, and Ponnuswamy Sadayappan,
“Scheduling of Parallel Jobs in a Heterogeneous Multi-Site Environment”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 87--104, 2003.
[sabin05]
Gerald Sabin and P. Sadayappan,
“Unfairness Metrics for Space-Sharing Parallel Job Schedulers”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Eitan Frachtenberg, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 3834, pp. 238--256, 2005.
[sabin06]
Gerald Sabin, Matthew Lang, and P. Sadayappan,
“Moldable Parallel Job Scheduling Using Job Efficiency: An Iterative Approach”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 94--114, 2006.
[schroeder04]
Bianca Schroeder and Mor Harchol-Balter,
“Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness”.
Cluster Computing 7(2), pp. 151--161, Apr 2004.
Annotation: When assigning tasks to hosts, it is best to divide them into groups by size, but the total load imposed by the different
groups should not be equal; rather, it should lead to equal
average slowdowns (which is defined to be fair).
[schwiegelshohn98b]
Uwe Schwiegelshohn and Ramin Yahyapour,
“Improving First-Come-First-Serve Job Scheduling by Gang Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 180--198, 1998.
Annotation: Actually a combination of two schemes: jobs that need few processors are scheduled FCFS, while wide (i.e. highly parallel) jobs may preempt
current jobs, but only one such wide job can run at a time, and only
for a limited fraction of the time, so as not to degrade resopnse time
for the smaller jobs that were interrupted.
[sevcik89]
Kenneth C. Sevcik,
“Characterization of Parallelism in Applications and their use in Scheduling”.
In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 171--180, May 1989.
Annotation: Defines the shape of a parallel program to be the proportion of the time that the program has various degrees of
parallelism. This is used to derive parameters of the program,
e.g. average, minimum, and maximum parallelism, which are used
to guide the partitioning of a machine among competing programs.
Using more parameters results in better performance.
[sevcik94]
K. C. Sevcik,
“Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems”.
Performance Evaluation 19(2-3), pp. 107--140, Mar 1994.
Annotation: Choose partition size and scheduling order based on information about the applications, especially their total work and how the speedup changes
with the number of allocated processors.
[shah10]
Syed Nasir Mehmood Shah, Ahmad Kamil Bin Mahmood, and Alan Oxley,
“Analysis and evaluation of grid scheduling algorithms using real workload traces”.
In Intl. Conf. Mgmt. Emergent Digital EcoSystems, pp. 234--239, Oct 2010.
[sherwood02]
Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder,
“Automatically Characterizing Large Scale Program Behavior”.
In 10th Intl. Conf. Architect. Support for Prog. Lang. & Operating Syst. (ASPLOS), pp. 45--57, Oct 2002.
Annotation: Use clustering to identify phases in the program and a representative slice of each one.
[shmueli03]
Edi Shmueli and Dror G. Feitelson,
“Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 228--251, 2003.
Annotation: Using dynamic programming to find a set of jobs that together maximize utilization, rather than considering the jobs one at a time.
[shmueli05]
Edi Shmueli and Dror G. Feitelson,
“Backfilling with Lookahead to Optimize the Packing of Parallel Jobs”.
J. Parallel & Distributed Comput. 65(9), pp. 1090--1107, Sep 2005.
Annotation: Using dynamic programming to find a set of jobs that together maximize utilization, rather than considering the jobs one at a time.
[shmueli06]
Edi Shmueli and Dror G. Feitelson,
“Using site-level modeling to evaluate the performance of parallel system schedulers”.
In 14th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 167--176, Sep 2006.
Annotation: Feedback from system performance affects the submittal of additional work by the users, so a trace collected from a
system with one scheduler may not be suitable for simulation
of another scheduler that has different performance.
[shmueli07]
Edi Shmueli and Dror G. Feitelson,
“Uncovering the Effect of System Performance on User Behavior from Traces of Parallel Systems”.
In 15th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 274--280, Oct 2007.
Annotation: User session lengths are correlated with response times: a long response time is a good predictor for a session end. Slowdown is not
such a good predictor.
[shmueli09]
Edi Shmueli and Dror G. Feitelson,
“On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing?”.
IEEE Trans. Parallel & Distributed Syst. 20(7), pp. 983--996, Jul 2009.
Annotation: User behavior induces a feedback cycle from system performance to the generation of additional jobs, and this has to be taken into account in evaluations and can
also lead to improved designs.
[singh82]
Ajay Singh and Zary Segall,
“Synthetic Workload Generation for Experimentation with Multiprocessors”.
In 3rd Intl. Conf. Distributed Comput. Syst. (ICDCS), pp. 778--785, Oct 1982.
Annotation: Exalts the use of synthetic workloads: they are more flexible, controllable, and adjustable than real workloads.
The paper is about the B language, which allows for a very
detailed description of task graphs that can be used to
represent parallel applications.
[smirni97]
Evgenia Smirni and Daniel A. Reed,
“Workload Characterization of Input/Output Intensive Parallel Applications”.
In 9th Intl. Conf. Comput. Performance Evaluation, Springer-Verlag, Lect. Notes Comput. Sci. vol. 1245, pp. 169--180, Jun 1997.
Annotation: Measurements of 3 applications from the SIO initiative, with data on numbers, sizes, and times of I/O requests.
Main results are that distribution of accesses is very modal (e.g. few
distinct sizes), and dominant patterns are sequential and interleaved.
[smith98]
Warren Smith, Ian Foster, and Valerie Taylor,
“Predicting Application Run Times Using Historical Information”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 122--142, 1998.
Annotation: The main contribution is using a search technique to define the templates used to judge whether or not jobs are similar to each
other, and one should be used in the prediction for the other.
[smith99]
Warren Smith, Valerie Taylor, and Ian Foster,
“Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 202--219, 1999.
Annotation: Better knowledge about the current workload allows for optimizations in the scheduler. The knowledge is obtained by collecting and analyzing
historical data about job executions.
[sodan09]
Angela C. Sodan,
“Adaptive scheduling for QoS virtual machines under different resource allocation--Performance Effects and Predictability”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 259--279, 2009.
[sodan10]
Angela C. Sodan and Wei Jin,
“Backfilling with Fairness and Slack for Parallel Job Scheduling”.
In J. Physics: Conf. Ser., High Perf. Comput. Symp., vol. 256 2010.
[sodan11]
A. C. Sodan,
“Service Control with the Preemptive Parallel Job Scheduler Scojo-PECT”.
Cluster Comput. 14(2), pp. 165--182, Jun 2011.
[song04]
Baiyi Song, Carsten Ernemann, and Ramin Yahyapour,
“Parallel Computer Workload Modeling with Markov Chains”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 47--62, 2004.
[song05]
Shanshan Song, Kai Hwang, and Yu-Kwong Kwok,
“Trusted Grid Computing with Security Binding and Trust Integration”.
J. Grid Computing 3(1-2), pp. 53--73, Jun 2005.
Annotation: Uses a trust model based on fuzzy logic.
[sotomayor06]
Borja Sotomayor, Kate Keahey, and Ian Foster,
“Overhead Matters: A Model for Virtual Resource Management”.
In 2nd Workshop Virtualization Technology in Distributed Comput., Nov 2006.
Annotation: Overhead for virtualization should be scheduled rather than being deducted from a user's account.
[squillante99]
Mark S. Squillante, David D. Yao, and Li Zhang,
“The Impact of Job Arrival Patterns on Parallel Scheduling”.
Performance Evaluation Rev. 26(4), pp. 52--59, Mar 1999.
Annotation: Show that the interarrival time is more heavy-tailed than the exponential distribution often assumed, and that this leads to
lower performance.
[squillante99b]
Mark S. Squillante, David D. Yao, and Li Zhang,
“Analysis of Job Arrival Patterns and Parallel Scheduling Performance”.
Performance Evaluation 36--37, pp. 137--163, 1999.
Annotation: Show that the interarrival time is more heavy-tailed than the exponential distribution often assumed, and that this leads to
lower performance.
[srinivasan02]
Srividya Srinivasan, Rajkumar Kettimuthu, Vijay Subramani, and Ponnuswamy Sadayappan,
“Selective reservation Strategies for Backfill Job Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 55--71, 2002.
Annotation: Instead of deciding in advance how many reservations to make (e.g. 1 in EASY or for all jobs in conservative), decide
on-the-fly and make reservations only for jobs that are
delayed too much.
[streit02]
Achim Streit,
“A Self-Tuning Job Scheduler Family with Dynamic Policy Switching”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 1--23, 2002.
Annotation: At each scheduling step try FCFS, SJF, and LJF, and choose the one the seems to generate the best schedule.
[streit04]
Achim Streit,
“Enhancements to the Decision Process of the Self-Tuning dynP Scheduler”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 63--80, 2004.
[talby99a]
David Talby and Dror G. Feitelson,
“Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling”.
In 13th Intl. Parallel Processing Symp. (IPPS), pp. 513--517, Apr 1999.
Annotation: Suggests slack-based backfilling, in which the scheduler does not commit to a tight schedule in order to leave space for future
improvements.
[talby99b]
David Talby, Dror G. Feitelson, and Adi Raveh,
“Comparing Logs and Models of Parallel Workloads Using the Co-Plot Method”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 43--66, 1999.
Annotation: A statistical analysis of the similarity of various workload logs and models, showing that the models fall in the same space as the
original logs, but parameterization is needed to capture the specific
characteristics of any one log.
[talby05]
David Talby and Dror G. Feitelson,
“Improving and Stabilizing Parallel Computer Performance Using Adaptive Scheduling”.
In 19th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2005.
Annotation: Simulate the behavior of several schedulers (specifically, variants of backfilling) in the background, and periodically switch to the one that
gives the best performance results. One of the results is that it is better
to use only few candidates, because even a relatively weak scheduler occasionally
wins but then typically fails to deliver in the next period.
[talby07]
David Talby, Dror G. Feitelson, and Adi Raveh,
“A Co-Plot Analysis of Logs and Models of Parallel Workloads”.
ACM Trans. Modeling & Comput. Simulation 12(3), Jul 2007.
Annotation: A statistical analysis of the similarity of various workload logs and models, showing that the models fall in the same space as the
original logs, but parameterization is needed to capture the specific
characteristics of any one log. Two dimensions seem to suffice.
[tang10]
Wei Tang, Narayan Desai, Daniel Buettner, and Zhiling Lan,
“Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on Blue Gene/P”.
In 24th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2010.
Annotation: Which part of the scheudling is most sensitive to inaccurate runtime estimates, and how to improve it using historical data.
[tang11]
Wei Tang, Zhiling Lan, Narayan Desai, Daniel Buettner, and Yongen Yu,
“Reducing Fragmentation on Torus-Connected Supercomputers”.
In Intl. Parallel & Distributed Processing Symp. (IPDPS), May 2011.
[thebe09]
Ojaswirajanya Thebe, David P. Bunde, and Vitus J. Leung,
“Scheduling restartable jobs with short test runs”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 116--137, 2009.
[tsafrir05b]
Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson,
“Modeling User Runtime Estimates”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Eitan Frachtenberg, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 3834, pp. 1--35, 2005.
Annotation: A detailed model of the distribuion of runtime estimates (it is modal, with around 20 values accounding for the vast majority), and how to assign estimates
to jobs in a way that mimics the estimates of real users.
[tsafrir06a]
Dan Tsafrir and Dror G. Feitelson,
“Instability in Parallel Job Scheduling Simulation: The Role of Workload Flurries”.
In 20th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2006.
Annotation: A butterfly effect: a small change in one job causes a flurry (large sequence of similar jobs by the same user) to be scheduled differently leading to a big effect
on the perceived overall performance.
[tsafrir06b]
Dan Tsafrir and Dror G. Feitelson,
“The dynamics of backfilling: solving the mystery of why increased inaccuracy may help”.
In IEEE Intl. Symp. Workload Characterization (IISWC), pp. 131--141, Oct 2006.
Annotation: Inaccurate estimates don't really improve performance; they just nudge the system towards a schedule that is more similar to SJF.
[tsafrir07a]
Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson,
“Backfilling Using System-Generated Predictions Rather Than User Runtime Estimates”.
IEEE Trans. Parallel & Distributed Syst. 18(6), pp. 789--803, Jun 2007.
Annotation: Suggests using the average of the last two jobs by the same user as a runtime prediction. requires updating the prediction when new information
becomes available, e.g. when the job runs longer than the previous prediction.
[tsafrir07b]
Dan Tsafrir, Keren Ouaknine, and Dror G. Feitelson,
“Reducing Performance Evaluation Sensitivity and Variability by Input Shaking”.
In 15th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 231--237, Oct 2007.
Annotation: Run the simulation many (e.g. hundreds) of times with slightly different input, to see the distribution of results.
[tsafrir10]
Dan Tsafrir,
“Using Inaccurate Estimates Accurately”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer, Lect. Notes Comput. Sci. vol. 6253, pp. 208--221, 2010.
[vandenbossche11]
Ruben Van den Bossche, Kurt Vanmechelen, and Jan Broeckhove,
“An Evaluation of the Benefits of Fine-Grained Value-Based Scheduling on General Purpose Clusters”.
Future Generation Comput. Syst. 27(1), pp. 1--9, Jan 2011.
[verma08]
Akshat Verma, Puneet Ahuja, and Anindya Neogi,
“Power-Aware Dynamic Placement of HPC Applications”.
In 22nd Intl. Conf. Supercomputing (ICS), pp. 175--184, Jun 2008.
[vetter03]
Jeffrey S. Vetter and Frank Mueller,
“Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures”.
J. Parallel & Distributed Comput. 63(9), pp. 853--865, Sep 2003.
Annotation: A study of 5 applications finds consistent usage of collective communications with small payloads, usage
of non-blobking communications, and different
granularities.
[wan96]
Michael Wan, Reagan Moore, George Kremenek, and Ken Steube,
“A Batch Scheduler for the Intel Paragon with a Non-Contiguous Node Allocation Algorithm”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 48--64, 1996.
Annotation: Nodes are partitioned into sets according to attributes (e.g. amount of memory), and sets are grouped into groups that are used to
schedule jobs from different queues. The allocation is done by a
modified 2-D buddy system. Statistics of the workload on the Paragon
at San-Diego Supercomputer center are given.
[windisch96]
Kurt Windisch, Virginia Lo, Reagan Moore, Dror Feitelson, and Bill Nitzberg,
“A Comparison of Workload Traces from Two Production Parallel Machines”.
In 6th Symp. Frontiers Massively Parallel Comput., pp. 319--326, Oct 1996.
Annotation: Analysis of the workloads on the NASA Ames iPSC/860 and the SDSC Paragon.
[wiseman03]
Yair Wiseman and Dror G. Feitelson,
“Paired Gang Scheduling”.
IEEE Trans. Parallel & Distributed Syst. 14(6), pp. 581--592, Jun 2003.
Annotation: Run pairs of complementary jobs in each time slice, so as to better utilize resources (specifically, I/O and CPU).
[yeo05]
Chee Shin Yeo and Rajkumar Buyya,
“Service Level Agreement Based Allocation of Cluster Resources: Handling Penalty to Enhance Utility”.
In IEEE Intl. Conf. Cluster Comput., Sep 2005.
[yeo10]
Chee Shin Yeo, Srikumar Venugopal, Xingchen Chua, and Rajkumar Buyya,
“Autonomic metered pricing for a utility computing service”.
Future Generation Comput. Syst. 26(8), pp. 1368--1380, Oct 2010.
[yuan11]
Yulai Yuan, Yongwei Wu, Qiuping Wang, Guangwen Yang, and Weimin Zheng,
“Job Failures in High Performance Computing Systems: A Large-Scale Empirical Study”.
Comput. & Math. with Applications 62, 2011.
[zeng09]
Xijie Zeng and Angela C. Sodan,
“Job scheduling with lookahead group matchmaking for time/space sharing on multi-core parallel machines”.
In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 232--258, 2009.
[zhang01]
Y. Zhang, H. Franke, J. E. Moreira, and A. Sivasubramaniam,
“An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 133--158, 2001.
Annotation: Combining all three is best.
[zhang03a]
Yanyong Zhang, Hubertus Franke, Jose Moreira, and Anand Sivasubramaniam,
“An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration”.
IEEE Trans. Parallel & Distributed Syst. 14(3), pp. 236--247, Mar 2003.
Annotation: Combining all three is best.
[zhang03b]
Yanyong Zhang, Antony Yang, Anand Sivasubramaniam, and Jose Moreira,
“Gang Scheduling Extensions for I/O Intensive Workloads”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 183--207, 2003.
Annotation: Co-locating a job's processes with partitions of the files it uses.
[zilber05]
Julia Zilber, Ofer Amit, and David Talby,
“What is Worth Learning from Parallel Workloads? A User and Session Based Analysis”.
In 19th Intl. Conf. Supercomputing (ICS), pp. 377--386, Jun 2005.
[zheng11]
Ziming Zheng, Li Yu, Wei Tang, Zhiling Lan, Rinku Gupta, Narayan Desai, Susan Coghlan, and Daniel Buettner,
“Co-Analysis of RAS Log and Job Log on Blue Gene/P”.
In Intl. Parallel & Distributed Processing Symp. (IPDPS), May 2011.
[zhou00]
Bing Bing Zhou, David Welsh, and Richard P. Brent,
“Resource Allocation Schemes for Gang Scheduling”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 1911, pp. 74--86, 2000.
Annotation: Shows that re-packing and running jobs in multiple slots when possible improve the performance of gang scheduling.
[zhou01]
B. B. Zhou and R. P. Brent,
“On the Development of an Efficient Coscheduling System”.
In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 103--115, 2001.
[zotkin99]
Dmitry Zotkin and Peter J. Keleher,
“Job-Length Estimation and Performance in Backfilling Schedulers”.
In 8th Intl. Symp. High Performance Distributed Comput. (HPDC), Aug 1999.