Bibliographical Listing on Parallel Workloads

With few exceptions, the following listing contains papers dealing specifically with workloads on parallel machines, or using data from the Parallel Workloads Archive. It does not cover scheduling, but papers that discuss workloads used to evaluate scheduling are included.

Please send comments, abstracts, annotations, and additional references to feit@cs.huji.ac.il.


download a BibTeX file of this listing.
For a more complete bibliographical listing on workload modeling, see the bibliography of the Workload Modeling book.
[ababneh12]
Ismail Ababneh, Saad Bani-Mohammad, Wail Mardini, Hilal Alawneh, and Mohammadj Hamed, “The Effect of Communication on the Performance of Allocation Request Shape Changes in 2D Mesh-Connected Multicomputers”. Intl. J. Parallel, Emergent & Distributed Syst. 27(5), pp. 409--433, Oct 2012.

[abbes10]
Heithem Abbes, Christophe Cérin, and Mohamed Jemni, “A Decentralized and Fault-Tolerant Desktop Grid System for Distributed Applications”. Concurrency & Computation -- Pract. & Exp. 22(3), pp. 261--277, Mar 2010.

[agmonby11]
Orna Agmon Ben-Yehuda, Muli Ben-Yehuda, Assaf Schuster, and Dan Tsafrir, “Deconstructing Amazon EC2 Spot Instance Pricing”. In 3rd Intl. Conf. Cloud Computing Tech. & Sci., pp. 304--311, Nov 2011.

[aida00]
Kento Aida, “Effect of Job Size Characteristics on Job Scheduling Performance”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Conput. Sci. vol. 1911, pp. 1--17, 2000.
Annotation: Shows that certain priority-based scheduling schemes are sensitive to job sizes, as they may determine the priority.

[aida09]
Kento Aida and Henri Casanova, “Scheduling Mixed-Parallel Applications with Advance Reservations”. Cluster Comput. 12(2), pp. 205--220, Jun 2009.

[almeida96]
Virgílio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira, “Characterizing Reference Locality in the WWW”. In Parallel & Distributed Inf. Syst., pp. 92--103, Dec 1996.
Annotation: On temporal locality, which is related to popularity, and spatial locality, which has fractal properties.

[amar08a]
Lior Amar, Ahuva Mu'alem, and Jochen Stößer, “On the Importance of Migration for Fairness in Online Grid Markets”. In 9th IEEE Grid Comput. Conf., pp. 65--74, Sep 2008.

[amar08b]
Lior Amar, Ahuva Mu'alem, and Jochen Stößer, “The Power of Preemption in Economic Online Markets”. In 5th Grid Economics and Business Models, Springer-Verlag, Lect. Notes Comput. Sci. vol. 5206, pp. 41--57, 2008.
Annotation: Preemption improves both theoretical bounds and empirical simulations of scheduling in a non-cooperative setting.

[aridor04]
Yariv Aridor, Tamar Domany, Oleg Goldshmidt, Edi Shmueli, Jose Moreira, and Larry Stockmeier, “Multi-Toroidal Interconnects: Using Additional Communication Links to Improve Utilization of Parallel Computers”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 144--159, 2004.
Annotation: The (few) extra links help reduce allocation constraints.

[bacso14]
Gábor Bacsó, Ádám Visegrádi, Attila Kertesz, and Zsolt Németh, “On Efficiency of Multi-Job Grid Allocation Based on Statistical Trace Data”. J. Grid Comput. , 2014.

[bansal01]
Nikhil Bansal and Mor Harchol-Balter, “Analysis of SRPT Scheduling: Investigating Unfairness”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 279--290, Jun 2001.
Annotation: With heavy tailed runtimes, few jobs account for a large part of the load. So starving them (as happens with SRPT under overload) actually lets most of the jobs run well; trying to be fair to all (e.g. with processor sharing) ends up being bad for all.

[barsanti06]
Lawrence Barsanti and Angela C. Sodan, “Adaptive Job Scheduling via Predictive Job Resource Allocation”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 115--140, 2006.

[batat00]
Anat Batat and Dror G. Feitelson, “Gang Scheduling with Memory Considerations”. In 14th Intl. Parallel & Distributed Processing Symp. (IPDPS), pp. 109--114, May 2000.
Annotation: Using addmission control to prevent memory overload. Includes how to estimate memory usage based on historical information.

[bender05]
Michael A. Bender, David P. Bunde, Erik D. Demaine, Sándor P. Fekete, Vitus J. Leung, Hank Meijer, and Cynthia A. Phillips, “Communication-Aware Processor Allocation for Supercomputers”. In 9th Workshop Algorithms & Data Structures, Springer-Verlag, Lect. Notes Comput. Sci. vol. 3608, pp. 169--181, Aug 2005.

[brevik06]
John Brevik, Daniel Nurmi, and Rich Wolski, “Predicting Bounds on Queuing Delay for Batch-Scheduled Parallel Machines”. In 11th Symp. Principles & Practice of Parallel Programming (PPoPP), pp. 110--118, Mar 2006.

[buyya09]
Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, and Ivona Brandic, “Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility”. Future Generation Comput. Syst. 25(6), pp. 599--616, Jun 2009.

[calzarossa93]
Maria Calzarossa and Giuseppe Serazzi, “Workload Characterization: A Survey”. Proc. IEEE 81(8), pp. 1136--1150, Aug 1993.

[calzarossa95]
Maria Calzarossa, Luisa Massari, Alessandro Merlo, Mario Pantano, and Daniele Tessera, “Medea: A Tool for Workload Characterization of Parallel Systems”. IEEE Parallel & Distributed Technology 3(4), pp. 72--80, Winter 1995.
Annotation: Medea provides various services such as clustering and curve fitting that can be applied to trace data.

[calzarossa95b]
M. Calzarossa, G. Haring, G. Kotsis, A. Merlo, and D. Tessera, “A Hierarchical Approach to Workload Characterization for Parallel Systems”. In High-Performance Computing and Networking, Springer-Verlag, Lect. Notes Comput. Sci. vol. 919, pp. 102--109, May 1995.
Annotation: Model the internal structure of the workload at levels of applications, algorithms, and routines.

[calzarossa00]
Maria Calzarossa, Luisa Massari, and Daniele Tessera, “Workload Characterization Issues and Methodologies”. In Performance Evaluation: Origins and Directions, Günter Haring, Christoph Lindemann, and Martin Reiser, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1769, pp. 459--482, 2000.

[cao04]
Junwei Cao and Falk Zimmermann, “Queue Scheduling and Advance Reservations with COSY”. In 18th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2004.

[cao14]
Zhibo Cao and Shoubin Dong, “Based on User-Trust to Estimate User Runtime in Backfilling”. J. Comput. Inf. Syst. 10(10), pp. 4443--4450, May 2014.
Annotation: Adjust user estimates using a factor that reflects the accuracy of previous estimates.

[carvalho12]
Marcus Carvalho and Francisco Brasileiro, “A User-Based Model of Grid Computing Workloads”. In 13th Intl. Conf. Grid Computing, pp. 40--48, Sep 2012.

[chapin99b]
Steve J. Chapin, Walfredo Cirne, Dror G. Feitelson, James Patton Jones, Scott T. Leutenegger, Uwe Schwiegelshohn, Warren Smith, and David Talby, “Benchmarks and Standards for the Evaluation of Parallel Job Schedulers”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 67--90, 1999.
Annotation: Discusses what should be included in workload models that serve as benchmarks for evaluation of scheduling schemes, suggests a standard format, and looks at extensions for metacomputing.

[chen13]
Junliang Chen, Bing Bing Zhou, Chen Wang andPeng Lu, Penghao Wang, and Albert Y. Zomaya, “Throughput Enhancement Through selective Time Sharing and Dynamic Grouping”. In 27th Intl. Parallel & Distributed Processing Symp. (IPDPS), pp. 1183--1192, May 2013.

[chiang94]
Su-Hui Chiang, Rajesh K. Mansharamani, and Mary K. Vernon, “Use of Application Characteristics and Limited Preemption for Run-To-Completion Parallel Processor Scheduling Policies”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 33--44, May 1994.
Annotation: Shows that adaptive partitioning is better than static partitioning based on application characteristics such as the average parallelism. However, a maximum partition size (which decreases with load) should be imposed.

[chiang96]
Su-Hui Chiang and Mary K. Vernon, “Dynamic vs. Static Quantum-Based Parallel Processor Allocation”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 200--223, 1996.
Annotation: Equipartitioning is shown to be better than using the processor working set combined with preemptions and multi-feedback queues. It is improved even more if the processor working set is used as the minimal allocation.

[chiang01]
Su-Hui Chiang and Mary K. Vernon, “Characteristics of a Large Shared Memory Production Workload”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 159--187, 2001.
Annotation: Detailed study of the workload on the NCSA Origin 2000.

[chiang02]
Su-Hui Chiang, Andrea Arpaci-Dusseau, and Mary K. Vernon, “The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 103--127, 2002.

[cirne00]
Walfredo Cirne and Francine Berman, “Adaptive Selection of Partition Size for Supercomputer Requests”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1911, pp. 187--207, 2000.
Annotation: Describes SA, the AppLes application scheduler, that chooses among several partition sizes based on knowledge about the application and the system state.

[cirne01]
Walfredo Cirne and Francine Berman, “A Model for Moldable Supercomputer Jobs”. In 15th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2001.
Annotation: A model for turning a rigid job into a moldable one, i.e. to create a set of optional partition sizes. Based on results of a user survey regarding what users really need; for example, they don't necessarily need power of 2 sizes.

[cirne01b]
Walfredo Cirne and Francine Berman, “A Comprehensive Model of the Supercomputer Workload”. In 4th Workshop on Workload Characterization, pp. 140--148, Dec 2001.

[crovella01]
Mark E. Crovella, “Performance Evaluation with Heavy Tailed Distributions”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 1--10, 2001.
Annotation: Concise overview of what heavy tails are, how they affect performance evaluation, and possible impact on system design.

[cypher96]
Robert Cypher, Alex Ho, Smaragda Konstantinidou, and Paul Messina, “A Quantitative Study of Parallel Scientific Applications with Explicit Communication”. J. Supercomput. 10(1), pp. 5--24, 1996.
Annotation: A study of 8 SPMD scientific applications, implemented on the Touchstone Delta and on nCUBE machines. The main result is that they all require a lot of Flops, but there is a great veriety in requirements for memory, I/O, and communication bandwidth. Scaling equations are given.

[dasilva00]
Fabricio Alves Barbosa da Silva and Isaac D. Scherson, “Improving Parallel Job Scheduling Using Runtime Measurements”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1911, pp. 18--38, 2000.
Annotation: Uses fuzzy sets to classify jobs as I/O, computing, or communicating, and use this to decide whether jobs require gang scheduling.

[deb13]
Budhaditya Deb, Mohak Shah, Scott Evans, Manoj Mehta, Anthony Gargulak, and Tom Lasky, “Towards Systems Level Prognostics in the Cloud”. In IEEE Conf. Prognostics & Health Mgmt., Jun 2013.

[deng13]
Kefeng Deng, Junqiang Song, Kaijun Ren, and Alexandru Iosup, “Exploring Protfolio Scheduling for Long-Term Execution ofScientific Workloads in IaaS Clouds”. In Supercomputing, Nov 2013.

[deng13b]
Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup, “A Periodic Portfolio Scheduler for Scientific Computing in the Data Center”. In Job Scheduling Strategies for Parallel Processing, Narayan Desai and Walfredo Cirne, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 8429, pp. 156--176, 2013.

[di12]
Sheng Di, Derrick Kondo, and Walfredo Cirne, “Characterization and Comparison of Cloud versus Grid Workloads”. In Intl. Conf. Cluster Comput., pp. 230--238, Sep 2012.

[di14]
Sheng Di, Derrick Kondo, and Franck Cappello, “Characterizing and Modeling Cloud Applications/Jobs on a Google Data Center”. J. Supercomput. , 2014.

[dorier14]
Atthieu Dorier, Gabriel Antoniu, Rob Ross, Dries Kimpe, and Shadi Ibrahim, “CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination”. In 28th Intl. Parallel & Distributed Processing Symp. (IPDPS), pp. 155--164, May 2014.

[downey97a]
Allen B. Downey, “Using Queue Time Predictions for Processor Allocation”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 35--57, 1997.
Annotation: First, derives a workload model from which the remaining execution time can be estimated conditioned on the time already running. This is used to estimate when running jobs will terminate, and free processors for queued jobs. And this is used to decide whether to use available processors or wait for more to be freed.

[downey97c]
Allen B. Downey, “Predicting Queue Times on Space-Sharing Parallel Computers”. In 11th Intl. Parallel Processing Symp. (IPPS), pp. 209--218, Apr 1997.
Annotation: Suggests a workload model in which runtimes are uniformly distributed in log space. This enables the one to calculate the distribution of additional runtime conditioned on the current runtime, and thus how much time will pass until resources will be freed.

[downey98b]
Allen B. Downey, “A Parallel Workload Model and Its Implications for Processor Allocation”. Cluster Comput. 1(1), pp. 133--145, 1998.
Annotation: Includes model of speedup as a function of parallelism, and log-uniform models of average parallelism and total work.

[downey99]
Allen B. Downey and Dror G. Feitelson, “The Elusive Goal of Workload Characterization”. Performance Evaluation Rev. 26(4), pp. 14--29, Mar 1999.
Annotation: Evidence that workloads are complex and hard to model. For example, using the mean and standard deviation of distributions is problematic when they are very dispersive, as happens in practice; also, workloads have internal structure that should not be ignored.

[dutot05]
Pierre-François Dutot, Marco A. S. Netto, Alfredo Goldman, and Fabio Kon, “Scheduling Moldable BSP Tasks”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Eitan Frachtenberg, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3834, pp. 157--172, 2005.

[elabdounik03]
Rachid El Abdouni Khayari, Ramin Sadre, and Boudewijn R. Haverkort, “Fitting World-Wide Web Request Traces with the EM-Algorithm”. Performance Evaluation 52, pp. 175--191, 2003.
Annotation: Fitting heavy tails to a hyper-exponential distribution so as also to fit the moments.

[emeras13]
Joseph Emeras, Vinicius Pinheiro, Krzysztof Rzadca, and Denis Trystram, “OStrich: Fair Scheduling for Multiple Submissions”. In 10th Parallel Proc. & App. Math., Springer-Verlag, Lect. Notex Comput. Sci. vol. 8385, pp. 26--37, Sep 2013.

[emeras15]
Joseph Emeras, Sébastien Varrette, Mateusz Guzek, and Pascal Bouvry, “Evalix: Classification and Prediction of Job Resource Consumption on HPC Platforms”. In Job Scheduling Strategies for Parallel Processing, 2015.

[england04]
Darin England and Jon B. Weissman, “Costs and Benefits of Load Sharing in the Computational Grid”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 160--175, 2004.

[ernemann02]
Carsten Ernemann, Volker Hamscher, and Ramin Yahyapour, “Economic Scheduling in Grid Computing”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 128--152, 2002.

[ernemann03]
Carsten Ernemann, Baiyi Song, and Ramin Yahyapour, “Scaling of Workload Traces”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 166--182, 2003.

[esbaugh07]
Bryan Esbaugh and Angela C. Sodan, “Coarse-Grain Time Slicing with Resource-Share Control in Parallel-Job Scheduling”. In 3rd High Perf. Comput. & Comm., Springer-Verlag, Lect. Notes Comput. Sci. vol. 4782, pp. 30--43, Sep 2007.
Annotation: Vary the time slices at different times and for different job classes to control the allocations.

[etinski12]
M. Etinski, J. Corbalan, J. Labarta, and M. Valero, “Parallel Job Scheduling for Power Constrained HPC Systems”. Parallel Comput. 38(12), pp. 615--630, Dec 2012.
Annotation: Using dynamic voltage scaling to run different jobs at different speeds.

[feitelson95e]
Dror G. Feitelson and Bill Nitzberg, “Job Characteristics of a Production Parallel Scientific Workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 949, pp. 337--360, 1995.
Annotation: Lots of data bout the jobs mix, degrees of parallelism, resource use, classifications into interactive vs. batch or according to time of day, runtime distributions, arrival process, and user activity.

[feitelson96b]
Dror G. Feitelson, “Packing Schemes for Gang Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 89--110, 1996.
Annotation: Shows that using a buddy system is beneficial for allocating jobs to processors, because then they have a better chance of running in multiple slots. Includes results of a workload analysis used to drive the simulation.

[feitelson96c]
Dror G. Feitelson and Larry Rudolph, “Toward Convergence in Job Schedulers for Parallel Supercomputers”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 1--26, 1996.
Annotation: Contends that one of the problem is lack of precise definition of assumptions and domain. Includes a critique of goals of schedulers and a classification of jobs types into rigid, moldable, evolving, and malleable.

[feitelson97a]
Dror G. Feitelson and Morris A. Jette, “Improved Utilization and Responsiveness with Gang Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 238--261, 1997.
Annotation: Shows how gang scheduling can improve not only resopnsiveness but also utilization, despite the overheads, by providing the scheduler with flexibility that is not possible with only space slicing, and thus allowing it to reduce fragmentation. This is backed up by results from the LLNL Cray T3D gang scheduler, which is described in detail.

[feitelson97b]
Dror G. Feitelson, “Memory Usage in the LANL CM-5 Workload”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 78--94, 1997.

[feitelson98]
Dror G. Feitelson and Ahuva Mu'alem Weil, “Utilization and Predictability in Scheduling the IBM SP2 with Backfilling”. In 12th Intl. Parallel Processing Symp. (IPPS), pp. 542--546, Apr 1998.
Annotation: Shows that a conservative variant of backfilling, in which jobs are moved ahead in the queue provided they do not delay any previously scheduled job, is as efficient as the more aggressive variant used in EASY, which just checks that the first job is not delayed.

[feitelson98b]
Dror G. Feitelson and Larry Rudolph, “Metrics and Benchmarking for Parallel Job Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 1--24, 1998.
Annotation: Suggest that a standard workload model is needed as a benchmark for job schedulers. This should include external models (degree of parallelism and runtime) and internal models (barrier synchronization and granularity). Points out numerous issues that require additional research.

[feitelson99a]
Dror G. Feitelson and Michael Naaman, “Self-Tuning Systems”. IEEE Softw. 16(2), pp. 52--60, Mar/Apr 1999.
Annotation: Suggest tuning of operating system parameters using genetic algorithms to search for optimal values, based on performance simulations with local workload logs. A case study of batch scheduling on the Intel iPSC/860 is used.

[feitelson01]
Dror G. Feitelson, “Metrics for Parallel Job Scheduling and their Convergence”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 188--205, 2001.
Annotation: Different metrics have different convergence properties. For non-preemptive scheduling, response time converges quickly. For preemptive scheduling, bounded slowdown is better.

[feitelson02b]
Dror G. Feitelson, “The Forgotten Factor: Facts; on Performance Evaluation and Its Dependence on Workloads”. In Euro-Par 2002 Parallel Processing, Burkhard Monien and Rainer Feldmann, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2400, pp. 49--60, Aug 2002.

[feitelson02c]
Dror G. Feitelson, “Workload Modeling for Performance Evaluation”. In Performance Evaluation of Complex Systems: Techniques and Tools, Maria Carla Calzarossa and Salvatore Tucci, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2459, pp. 114--141, Sep 2002.

[feitelson03a]
Dror G. Feitelson, “Metric and Workload Effects on Computer Systems Evaluation”. Computer 36(9), pp. 18--25, Sep 2003.
Annotation: Shows that interactions between the workload, the metric, and the system being analyzed may have a significant impact on evaluation results.

[feitelson04b]
Dror G. Feitelson, “A Distributional Measure of Correlation”. InterStat , Dec 2004.

[feitelson05b]
Dror G. Feitelson, “Experimental Analysis of the Root Causes of Performance Evaluation Results: A Backfilling Case Study”. IEEE Trans. Parallel & Distributed Syst. 16(2), pp. 175--182, Feb 2005.
Annotation: Details of a triple interaction in which using accurate runtime estimates in a workload model causes schedulers to selectively bias for or against short jobs, leading to diverging performance metrics when using average slowdown or runtime.

[feitelson05c]
Dror G. Feitelson, “On the Scalability of Centralized Control”. In Workshop on System Management Tools for Large-Scale Parallel Systems, Apr 2005.
Annotation: Claims that centralized control is not so problematic in terms of scalability, because the computational power of the control node grows exponentially.

[feitelson05d]
Dror G. Feitelson and Ahuva W. Mu'alem, “On the Definition of ``On-Line'' in Job Scheduling Problems”. SIGACT News 36(1), pp. 122--131, Mar 2005.
Annotation: On-line also means stability in the sense that the load is spread more or less evenly. This makes scheduling much simpler.

[feitelson06a]
Dror G. Feitelson and Dan Tsafrir, “Workload Sanitation for Performance Evaluation”. In IEEE Intl. Symp. Performance Analysis Syst. & Software (ISPASS), pp. 221--230, Mar 2006.
Annotation: Workloads may be multiclass, in which one class is usable and another needs to be filtered out. Examples are abnormal activity by system personnel and workload flurries.

[feitelson07a]
Dror G. Feitelson, “Locality of sampling and diversity in parallel system workloads”. In 21st Intl. Conf. Supercomputing (ICS), pp. 53--63, Jun 2007.
Annotation: Suggests that distributions of workload attributes as seen on short timescales may be quite different from those on a long timescale. The divergence between the distributions can be used as a metric to quantify this effect, and localized sampling (or rather repetitions of a single item) from the global distribution can be used to generate it.

[feitelson08]
Dror G. Feitelson, “Looking at Data”. In 22nd Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2008.
Annotation: Examples of the importance of using real data rather than assumptions.

[feitelson09]
Dror G. Feitelson and Edi Shmueli, “A Case for Conservative Workload Modeling: Parallel Job Scheduling with Daily Cycles of Activity”. In 17th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), Sep 2009.
Annotation: Shows that a user-aware scheduler that prioritizes interactive parallel jobs only leads to improved performance when daily cycles are included in the workload model, because otherwise it does not have when to run the low priority non-interactive jobs.

[feitelson14]
Dror G. Feitelson, Dan Tsafrir, and David Krakov, “Experience with Using the Parallel Workloads Archive”. J. Parallel & Distributed Comput. 74(10), pp. 2967--2982, Oct 2014.
Annotation: A review of data quality issues in the Parallel Workloads Archive, including data cleaning and attempts to reconstruct missing data.

[feldmann98]
Anja Feldmann and Ward Whitt, “Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network Performance Models”. Performance Evaluation 31(3-4), pp. 245--279, Jan 1998.

[ferrari84]
Domenico Ferrari, “On the Foundations of Artificial Workload Design”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 8--14, Aug 1984.
Annotation: One question is what workload attributes to model, and how to quantify the representativeness of the resulting model. The answer is that it should lead to the same performence predictions as the real workload. The other issue is dynamics: how the workload behaves over time. The answer is not a random sampling from a population, but rather a model of moving from one state to another. These issues are illustrated by modeling of an interactive computer system.

[ferschweiler01]
Ken Ferschweiler, Mariacarla Calzarossa, Cherri Pancake, Daniele Tessera, and Dylan Keon, “A Community Databank for Performance Tracefiles”. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Y. Cotronis and J. Dongarra, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2131, pp. 233--240, Sep 2001.
Annotation: Describes a repository for data regarding parallel applications, e.g. MPI calls and other events of interest.

[folling09]
Alexander Fölling, Christian Grimme, Joachim Lepping, and Alexander Papaspyrou, “Decentralized grid scheduling with evolutionary fuzzy systems”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 16--36, 2009.

[frachtenberg03b]
Eitan Frachtenberg, Dror G. Feitelson, Juan Fernandez, and Fabrizio Petrini, “Parallel Job Scheduling Under Dynamic Workloads”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 208--227, 2003.
Annotation: Optimizing performance by measuring the effect of parameters such as the scheduling time quantum and the multiprogramming level. Using a low multiprogramming level is possible if backfilling is used to serve queued jobs.

[franke99b]
H. Franke, J. Jann, J. E. Moreira, P. Pattnaik, and M. A. Jette, “An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific”. In Supercomputing '99, Nov 1999.
Annotation: Use a workload model fitted to the ASCI workload to evaluate gang scheduling and backfilling. Both are much better than FCFS. Gang scheduling is better for large jobs, even with a limited MPL.

[franke06]
Carsten Franke, Joachim Lepping, and Uwe Schwiegelshohn, “On Avantages of Scheduling using Genetic Fuzzy Systems”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 68--93, 2006.

[garg11]
Saurabh Kumar Garg, Chee Shin Yeo, and Rajkumar Buyya, “Green Cloud Framework for Improving Carbon Efficiency of Clouds”. In EuroPar, Lect. Notes Comput. Sci. vol. 6852, pp. 491--502, Aug 2011.

[gehring99]
Jörn Gehring and Thomas Preiss, “Scheduling a Metacomputer With Uncooperative Sub-schedulers”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 179--201, 1999.
Annotation: Derive a model of the expected workload in a meta-computing environment, and use it to evaluate several scheduling algorithms. The main idea is to interleave meta-scheduling decisions with local scheduling done by each participating machine.

[ghare99]
Gaurav Ghare and Scott T. Leutenegger, “The Effect of Correlating Quantum Allocation and Job Size for Gang Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 91--110, 1999.
Annotation: By providing longer quanta to rows of the gang scheduling matrix in which many jobs are mapped (i.e. where there is a concentration of jobs that each use a small number of processors) performance is improved.

[goh08]
Lee Kee Goh and Bharadwaj Veeravalli, “Design and Performance Evaluation of Combined First-Fit Task Allocation and Migration Strategies in Mesh Multiprocessor Systems”. Parallel Comput. 34(9), pp. 508--520, Sep 2008.

[gomezm13]
César Gómez-Martín, Miguel A. Vega-Rodrígez, José-Luis González-Sánchez, Javier Corral-García, and David Cortés-Polo, “Performance and Energy Aware Scheduling Simulator for High-Performance Computing”. In 7th Iberian Grid Infrastructure Conf., pp. 17--29, Sep 2013.

[guim09]
Francesc Guim, Ivan Rodero, and Julita Corbalan, “The Resource Usage Aware Backfilling”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 59--79, 2009.

[hao14]
Yongsheng Hao, Guanfeng Liu Rongtao Hou, Yongsheng Zhu, and Junwen Lu, “Performance Analysis of Gang Scheduling in a Grid”. J. Netw. & Syst. Mgmt. , 2014.

[harcholb97]
Mor Harchol-Balter and Allen B. Downey, “Exploiting Process Lifetime Distributions for Dynamic Load Balancing”. ACM Trans. Comput. Syst. 15(3), pp. 253--285, Aug 1997.
Annotation: First, presents evidence that process lifetimes are best modeled by $prob( rm lifetime > t ) propto t^k$, where $k approx -1$, which means that the probability that a process runs for $t$ sec is about $1/t$, and a process that already ran for $t$ secs is expected to run for another $t$. This is then used to choose processes for migration: those whose current runtime (and hence expected additional runtime) is more than the migration cost divided by the load difference (and thus are expected to benefit from the migration).

[harcholb01]
Mor Harchol-Balter, Nikhil Bansal, Bianca Schroeder, and Mukesh Agrawal, “SRPT Scheduling for Web Servers”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 11--20, 2001.

[heine05]
Felix Heine, Matthias Hoverstadt, Odej Kao, and Achim Streit, “On the Impact of Reservations from the Grid on Planning-Based Resource Management”. In ICCS Workshop Grid Computing Security & Resource Management, Lect. Notes Comput. Sci. vol. 3516, pp. 155--162, May 2005.

[hotovy96]
Steven Hotovy, “Workload Evolution on the Cornell Theory Center IBM SP2”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 27--40, 1996.
Annotation: Traces changes as the load matured after installing the system, and presents some statistics of the mature workload.

[huang13a]
Kuo-Chan Huang, Tse-Chi Huang, Yuan-Hsin Tung, and Pin-Zei Shih, “Effective Processor Allocation for Moldable Jobs with Application Speedup Model”. In Advances in Intelligent Systems and Applications, Springer-Verlag, vol. 2 pp. 563--572, Dec 2013.

[huang13b]
Kuo-Chan Huang, Tse-Chi Huang, Mu-Jung Tsai, and Hsi-Ya Chang, “Moldable Job Scheduling for HPC as a Service”. In 8th Future Information Technology, Springer-verlag, Lect. Notes Elect. Eng. vol. 276, pp. 43--48, Sep 2013.

[huang13c]
Kuo-Chan Huang, Tse-Chi Huang, Mu-Jung Tsaia, Hsi-Ya Chang, and Yuan-Hsin Tung, “Moldable Job Scheduling for HPC as a Service with Application Speedup Model and Execution Time Information”. J. Convergence 4(4), pp. 14--22, Dec 2013.

[iosup06]
Alexandru Iosup, Dick H. J. Epema, Carsten Franke, Alexander Papaspyrou, Lars Schley, Baiyi Song, and Ramin Yahyapour, “On Grid Performance Evaluation using Synthetic Workloads”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 232--255, 2006.

[iosup08]
Alexandru Iosup, Hui Li, Mathieu Jan, Shanny Anoep, Catalin Dumitrescu, Lex Wolters, and Dick H. J. Epema, “The Grid Workloads Archive”. Future Generation Comput. Syst. 24(7), pp. 672--686, May 2008.

[islam03]
Mohammad Islam, Pavan Balaji, P. Sadayappan, and D. K. Panda, “QoPS: A QoS based scheme for Parallel Job Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 252--268, 2003.

[jackson14]
Gary Jackson, Pete Keleher, and Alan Sussman, “Decentralized Scheduling and Load Balancing for Parallel Programs”. In 14th Intl. Symp. Cluster, Cloud and Grid Comput., pp. 324--333, May 2014.

[jann97]
Joefon Jann, Pratap Pattnaik, Hubertus Franke, Fang Wang, Joseph Skovira, and Joseph Riodan, “Modeling of Workload in MPPs”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1291, pp. 95--116, 1997.

[jin00]
Shudong Jin and Azer Bestavros, “Sources and Characteristics of Web Temporal Locality”. In 8th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 28--35, Aug 2000.
Annotation: Much of temporal locality arrises from skewed popularity: more popular items are requested repeatedly often. But there are also short-term correlations, which can be seen by focusing on items that are requested the same number of times.

[kamath02]
Purushotham Kamath, Kun-chan Lan, John Heidemann, Joe Bannister, and Joe Touch, “Generation of High Bandwidth Network Traffic Traces”. In 10th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 401--412, Oct 2002.
Annotation: How to scale or merge workload traces to increase the total load.

[kavas01]
Avi Kavas, David Er-El, and Dror G. Feitelson, “Using Multicast to Pre-load Jobs on the ParPar Cluster”. Parallel Comput. 27(3), pp. 315--327, Feb 2001.
Annotation: Prealoading executable files by multicasting them generally leads to faster execution than demand loading via NFS.

[kleban03]
Stephen D. Kleban and Scott H. Clearwater, “Hierarchical Dynamics, Interarrival Times, and Performance”. In Supercomputing, Nov 2003.
Annotation: Modeling the arrivals of jobs to a supercomputer as the result of a complex hierarchy of decisions by managers, users, and the scheduler.

[kleineweber11]
Christoph Kleineweber, Axel Keller, Oliver Niehörster, and André Brinkman, “Rule-Based Mapping of Virtual Machines in Clouds”. In 19th Euromicro Conf. Parallel, Distrib. & Network-Based Proc., pp. 527--534, Feb 2011.

[klusacek10]
Dalibor Klusávcek and Hana Rudová, “The Importance of Complete Data Sets for Job Scheduling Simulations”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 6253, pp. 132--153, 2010.
Annotation: Uses the Metacentrum log and others to demonstrate that including information about failures and specific job requirements leads to more complicated scheduling decisions and reduced performance.

[klusacek12]
Dalibor Klusávcek and Hana Rudová, “Performance and Fairness for Users in Parallel Job Scheduling”. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and others, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 7698, pp. 235--252, 2012.

[klusacek15]
Dalibor Klusávcek, Simon Tóth, and Gabriela Podolníková, “Real-Life Experience with Major Reconfiguration of Job Scheduling System”. In Job Scheduling Strategies for Parallel Processing, May 2015.

[kotsis97]
Gabriele Kotsis, “A Systematic Approach for Workload Modeling for Parallel Processing Systems”. Parallel Comput. 22(13), pp. 1771--1787, Feb 1997.
Annotation: A shoping list of all that has to be done.

[krakov12]
David Krakov and Dror G. Feitelson, “High-Resolution Analysis of Parallel Job Workloads”. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and others, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 7698, pp. 178--195, May 2012.
Annotation: Suggests the use of heatmaps where the X axis is the load experienced by each job, and the Y axis is its performance. This shows the pattern of how jobs distribute.

[krakov13]
David Krakov and Dror G. Feitelson, “Comparing Performance Heatmaps”. In Job Scheduling Strategies for Parallel Processing, Narayan Desai and Walfredo Cirne, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 8429, pp. 42--61, 2013.

[krallmann99]
Jochen Krallmann, Uwe Schwiegelshohn, and Ramin Yahyapour, “On the Design and Evaluation of Job Scheduling Algorithms”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 17--42, 1999.
Annotation: Propose a methodology to incorporate complex policies and objective functions into the evaluation of schedulers.

[krevat02]
Elie Krevat, José G. Castaños, and José E. Moreira, “Job Scheduling for the BlueGene/L System”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 38--54, 2002.

[kubert12]
Roland Kübert and Stefan Wesner, “High-Performance Computing as a Service with Service Level Agreements”. In 9th Intl. Conf. Services Comput., pp. 578--585, Jun 2012.

[kumar12]
Rajath Kumar and Sathish Vadhiyar, “Identifying Quick Starters: Towards an Integrated Framework for Efficient Predictions of Queue Waiting Times of Batch Parallel Jobs”. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and others, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 7698, pp. 196--215, 2012.

[kumar14]
Rajath Kumar and Satish Vadhiyar, “Prediction of Queue Waiting Times for Metascheduling on Parallel Batch Systems”. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and Narayan Desai, (ed.), Springer, Lect. Notes Comput. Sci. vol. 8828, pp. 108--128, May 2014.

[kurowski12]
Krzysztof Kurowski, Ariel Oleksiak, Wojciech Piatek, and Jan We¸glarz, “Impact of Urgent Computing on Resource Management Policies, Schedules and Resources Utilization”. Procedia Comput. Sci. 9, pp. 1713--1722, 2012.

[lawson02]
Barry G. Lawson and Evgenia Smirni, “Multiple-Queue Backfilling Scheduling with Priorities and Reservations for Parallel systems”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 72--87, 2002.

[leal13]
Katia Leal, “Self-Adjusting Resource Sharing Policies in Federated Grids”. Future Generation Comput. Syst. 29(2), pp. 488--496, Feb 2013.

[lee04]
Cynthia Bailey Lee, Yael Schwartzman, Jennifer Hardy, and Allan Snavely, “Are User Runtime Estimates Inherently Inaccurate?”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 253--263, 2004.
Annotation: It seems that yes, they are.

[lee07]
Cynthia Bailey Lee and Allan E. Snavely, “Precise and Realistic Utility Functions for User-Centric Performance Analysis of Schedulers”. In 17th Intl. Symp. High Performance Distributed Comput. (HPDC), pp. 107--116, Jun 2007.
Annotation: Define synthetic utility functions giving importance to users as function of resopnse time, based on data of how such functions generally look. Then define a scheduler that uses genetic algorithms to schedule jobs so as to optimize utility.

[leland86]
Will E. Leland and Teunis J. Ott, “Load-Balancing Heuristics and Process Behavior”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 54--69, May 1986.

[leland94]
Will E. Leland, Murad S. Taqqu, Walter Willinger, and Daniel V. Wilson, “On the Self-Similar Nature of Ethernet Traffic”. IEEE/ACM Trans. Networking 2(1), pp. 1--15, Feb 1994.
Annotation: It is self-similar (bursty) at all time scales, and requires a fractal model rather than common Poisson-related models.

[li04]
Hui Li, David Groep, and Lex Walters, “Workload Characteristics of a Multi-cluster Supercomputer”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 176--193, 2004.

[li06]
Hui Li, Michael Muskulus, and Lex Wolters, “Modeling Job Arrivals in a Data-Intensive Grid”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 210--231, 2006.

[li15]
Song Li, Yangfan Zhou, Lei Jiao, Xinya Yan, Xin Wang, and Michael Rung-Tsong Lyu, “Towards Operational Cost Minimization in Hybrid Clouds for Dynamic Resource Provisioning with Delay-Aware Optimization”. IEEE Trans. Services Comput. 8(3), pp. 398--409, May/Jun 2015.

[liang13]
Aihua Liang, Limin Xiao, and Li Ruan, “Adaptive Workload Driven Dynamic Power Management for High Performance Computing Clusters”. Comput. & Elect. Eng. 39(7), pp. 2357--2368, Oct 2013.

[lic14]
Chao Li, Rui Wang, Tao Li, Depei Qian, and Jingling Yuan, “Managing Green Datacenters Powered by Hybrid Renewable Energy Systems”. In 11th Intl. Conf. Autonomic Comput., pp. 261--272, Jun 2014.

[lindsay12]
A. M. Lindsay, M. Galloway-Carson, C. R. Johnson, D. P. Bunde, and V. J. Leung, “Backfilling with Guarantees Made as Jobs Arrive”. Concurrency & Computation -- Pract. & Exp. , 2012.
Annotation: Conservative backfilling with the added feature that when the schedule is compressed jobs are prioritized based on a selected desirable trait.

[lingrand09]
Diane Lingrand, Johan Montagnat, Janusz Martyniak, and David Colling, “Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 37--58, 2009.
Annotation: Detailed analysis of EGEE data, and in particular job failures and resubmission strategies of users.

[liu15]
Xiaocheng Liu, Yabing Zha, Quanjun Yin, Yong Peng, and Long Qin, “Scheduling Parallel Jobs with Tentative Runs and Consolidation in the Cloud”. J. Syst. & Softw. 104, pp. 141--151, Jun 2015.

[liux12]
Xiaocheng Liu, Chen Wang, Xiaogang Qiu, Bing Bing Zhou, Bin Chen, and Albert Y. Zomaya, “Backfilling under Two-Tier Virtual Machines”. In Intl. Conf. Cluster Comput., pp. 514--522, Sep 2012.
Annotation: Use two VMs per CPU, and schedule the foreground VM using EASY and the background VM using SJF.

[liuz10]
Zhuo Liu, Aihua Liang, and Limin Xiao, “A Parallel Workload Model and its Implications for Maui Scheduling Policies”. In 2nd Intl. Conf. Comput. Modeling & Simulation, pp. 384--389, Jan 2010.

[lix13]
Xin Li, Zhuzhong Qian, Sanglu Lu, and Jie Wu, “Energy Efficient Virtual Machine Placement Algorithm with Balanced and Improved Resource Utilization in a Data Center”. Math. & Comput. Modelling 58(5-6), pp. 1222--1235, Sep 2013.

[liy07]
Yawei Li, Prashasta Gujrati, Zhiling Lan, and Xian-he Sun, “Fault-Driven Re-Scheduling for Improving System-Level Fault Resilience”. In Intl. Conf. Parallel Processing (ICPP), Sep 2007.

[lo98]
Virginia Lo, Jens Mache, and Kurt Windisch, “A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 25--46, 1998.

[lublin03]
Uri Lublin and Dror G. Feitelson, “The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs”. J. Parallel & Distributed Comput. 63(11), pp. 1105--1122, Nov 2003.

[medernach05]
Emmanuel Medernach, “Workload Analysis of a Cluster in a Grid Environment”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Eitan Frachtenberg, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3834, pp. 36--61, 2005.

[meng15]
Jie Meng, Samuel McCauley, Fulya Kaplan, Vitus J. Leung, and Ayse K. Coskun, “Simulation and Optimization of HPC Job Allocation for Jointly Reducing Communication and Cooling Costs”. Sustainable Computing: Informatics & Syst. 5, 2015.

[minh09]
Tran Ngoc Minh and Lex Wolters, “Modeling Parallel System Workloads with Temporal Locality”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 101--115, 2009.

[minh11]
Tran Ngoc Minh and Lex Walters, “Towards a Profound Analysis of Bags-of-Tasks in Parallel Systems and Their Performance Impact”. In 20th Intl. Symp. High Performance Distributed Comput. (HPDC), pp. 111-122, Jun 2011.

[ming13]
Wu Ming, Yang Jian, and Ran Yongyi, “Dynamic Instance Provisioning Strategy in an IaaS Cloud”. In 32nd Chinese Control Conf., pp. 6670--6675, Jul 2013.

[mualem01]
Ahuva W. Mu'alem and Dror G. Feitelson, “Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling”. IEEE Trans. Parallel & Distributed Syst. 12(6), pp. 529--543, Jun 2001.
Annotation: Demonstrates that the relative performance of EASY backfilling and conservative backfilling depends on the workload (both for models and for actual logs) and on the metric used (response time or slowdown). Also shows user runtime estimates to be inaccurate, but this turns out to be beneficial for performance.

[neves12]
Marcelo Veiga Neves, Tiago Ferreto, and César De Rose, “Scheduling MapReduce Jobs in HPC Clusters”. In EuroPar, pp. 179--190, Aug 2012.

[nguyen96b]
Thu D. Nguyen, Raj Vaswani, and John Zahorjan, “Parallel Application Characterization for Multiprocessor Scheduling Policy Design”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 175--199, 1996.
Annotation: Speedup characteristics and the possibility of assessing them at runtime on a dynamically partitioned system.

[niu12]
Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Mingliang Liu, Yan Zhai, Wenguang Chen, and Weimin Zheng, “Employing Checkpoint to Improve Job Scheduling in Large-Scale Systems”. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and others, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 7698, pp. 36--55, 2012.

[parsons95]
Eric W. Parsons and Kenneth C. Sevcik, “Multiprocessor Scheduling for High-Variability Service Time Distributions”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 949, pp. 127--145, 1995.
Annotation: Propose that preemptions be used to improve average response time.

[pascual09]
Jose Antonio Pascual, Javier Navaridas, and Jose Miguel-Alonso, “Effects of Topology-Aware Allocation Policies on Scheduling Performance”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 138--156, 2009.

[peris94]
Vinod G. J. Peris, Mark S. Squillante, and Vijay K. Naik, “Analysis of the Impact of Memory in Distributed Parallel Processing Systems”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 5--18, May 1994.
Annotation: Investigates the tradeoff between improved efficiency when contending jobs are each allocated less processors, and increased paging overhead due to insufficient memory when less processors are used. The result is that the paging overhead dominates, so partition sizes should not be decreased to the point where the application's dataset does not fit into memory. A stochastic model of paging behavior as an application moves from one locality to another is introduced.

[petrini03]
Fabrizio Petrini, Eitan Frachtenberg, Adolfy Hoisie, and Salvador Coll, “Performance Evaluation of the Quadrics Interconnection Network”. Cluster Comput. 6(2), pp. 125--142, Apr 2003.
Annotation: A large number of micro-benchmarks with various communication patterns.

[qiu14]
Zhan Qiu and Juan R. Pérez, “Assessing the Impact of Concurrent Replication with Cancelling in Parallel Jobs”. In 22nd Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 31--40, Sep 2014.

[rajbhandary13]
Avinab Rajbhandary, David P. Bunde, and Vitus J. Leung, “Variations of Conservative Backfilling to Improve Fairness”. In Job Scheduling Strategies for Parallel Processing, Narayan Desai and Walfredo Cirne, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 8429, pp. 177--191, 2013.

[ranjan06]
Rajiv Ranjan, Aaron Harwood, and Rajkumar Buyya, “SLA-Based Coordinated Superscheduling Scheme for Computational Grids”. In 8th Intl. Conf. Cluster Comput., Sep 2006.

[ranjan08]
Rajiv Ranjan, Aaron Harwood, and Rajkumar Buyya, “A Case for Cooperative and Incentive-Based Federation of Distributed Clusters”. Future Generation Comput. Syst. 24(4), pp. 280--295, Apr 2008.

[rosti02]
Emilia Rosti, Giuseppe Serazzi, Evgenia Smirni, and Mark S. Squillante, “Models of Parallel Applications with Large Computation and I/O Requirements”. IEEE Trans. Softw. Eng. 28(3), pp. 286--307, Mar 2002.

[sabin03]
Gerald Sabin, Rajkumar Kettimuthu, Arun Rajan, and Ponnuswamy Sadayappan, “Scheduling of Parallel Jobs in a Heterogeneous Multi-Site Environment”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 87--104, 2003.

[sabin05]
Gerald Sabin and P. Sadayappan, “Unfairness Metrics for Space-Sharing Parallel Job Schedulers”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Eitan Frachtenberg, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3834, pp. 238--256, 2005.

[sabin06]
Gerald Sabin, Matthew Lang, and P. Sadayappan, “Moldable Parallel Job Scheduling Using Job Efficiency: An Iterative Approach”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 4376, pp. 94--114, 2006.

[schroeder04]
Bianca Schroeder and Mor Harchol-Balter, “Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness”. Cluster Comput. 7(2), pp. 151--161, Apr 2004.
Annotation: When assigning tasks to hosts, it is best to divide them into groups by size, but the total load imposed by the different groups should not be equal; rather, it should lead to equal average slowdowns (which is defined to be fair).

[schwiegelshohn98b]
Uwe Schwiegelshohn and Ramin Yahyapour, “Improving First-Come-First-Serve Job Scheduling by Gang Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 180--198, 1998.
Annotation: Actually a combination of two schemes: jobs that need few processors are scheduled FCFS, while wide (i.e. highly parallel) jobs may preempt current jobs, but only one such wide job can run at a time, and only for a limited fraction of the time, so as not to degrade resopnse time for the smaller jobs that were interrupted.

[sevcik89]
Kenneth C. Sevcik, “Characterization of Parallelism in Applications and their use in Scheduling”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 171--180, May 1989.
Annotation: Defines the shape of a parallel program to be the proportion of the time that the program has various degrees of parallelism. This is used to derive parameters of the program, e.g. average, minimum, and maximum parallelism, which are used to guide the partitioning of a machine among competing programs. Using more parameters results in better performance.

[sevcik94]
K. C. Sevcik, “Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems”. Performance Evaluation 19(2-3), pp. 107--140, Mar 1994.
Annotation: Choose partition size and scheduling order based on information about the applications, especially their total work and how the speedup changes with the number of allocated processors.

[shah10]
Syed Nasir Mehmood Shah, Ahmad Kamil Bin Mahmood, and Alan Oxley, “Analysis and evaluation of grid scheduling algorithms using real workload traces”. In Intl. Conf. Mgmt. Emergent Digital EcoSystems, pp. 234--239, Oct 2010.

[shah11]
Syed Nasir Mehmood Shah, Ahmad Kamil Bin Mahmood, and Alan Oxley, “Dynamic Miltilevel Hybrid Scheduling Algorithms for Grid Computing”. In Intl. Conf. Computational Science, pp. 402--411, Jun 2011.

[shai13]
Ohad Shai, Edi Shmueli, and Dror G. Feitelson, “Heuristics for Resource Matching in Intel's Compute Farm”. In Job Scheduling Strategies for Parallel Processing, Narayan Desai and Walfredo Cirne, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 8429, pp. 116--135, 2013.
Annotation: On selecting nodes so as to balance resource usage with multiple resources, e.g. cores and memory.

[sheikhalishahi11]
Mehdi Sheikhalishahi, Manoj Devare, Lucio Grandinetti, and Demetrio Laganà, “A General-purpose and Multi-level Scheduling Approach in Energy Efficient Computing”. In 1st Proc. Intl. Conf. Cloud Computing and Services Science, pp. 37--42, May 2011.

[sheikhalishahi12]
Mehdi Sheikhalishahi, Ignacio Martín Llorente, and Lucio Grandinetti, “Energy Aware Consolidation Policies”. In Advances in Parallel Computing, Vol. 22: Applications, Tools and Techniques on the Road to Exascale Computing, IOS Press, pp. 109--116, 2012.

[sheikhalishahi14]
Mehdi Sheikhalishahi, Lucio Grandinetti, Richard M. Wallace, and Jose Luiz Vazquez-Poletti, “Autonomic Resource Contention-Aware Scheduling”. spe , 2014.

[sherwood02]
Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder, “Automatically Characterizing Large Scale Program Behavior”. In 10th Intl. Conf. Architect. Support for Prog. Lang. & Operating Syst. (ASPLOS), pp. 45--57, Oct 2002.
Annotation: Use clustering to identify phases in the program and a representative slice of each one.

[shih13]
Po-Chi Shih, Kuo-Chan Huang, Che-Rung Lee, I-Hsin Chung, and Yeh-Ching Chung, “TLA: Temporal Look-Ahead Procesor Allocation Method for Heterogeneous Multi-Cluster Systems”. J. Parallel & Distributed Comput. , 2013.
Annotation: Simulate allocation options to get idea of expected performance taking cluster heterogeneity into account.

[shmueli03]
Edi Shmueli and Dror G. Feitelson, “Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 228--251, 2003.
Annotation: Using dynamic programming to find a set of jobs that together maximize utilization, rather than considering the jobs one at a time.

[shmueli05]
Edi Shmueli and Dror G. Feitelson, “Backfilling with Lookahead to Optimize the Packing of Parallel Jobs”. J. Parallel & Distributed Comput. 65(9), pp. 1090--1107, Sep 2005.
Annotation: Using dynamic programming to find a set of jobs that together maximize utilization, rather than considering the jobs one at a time.

[shmueli06]
Edi Shmueli and Dror G. Feitelson, “Using site-level modeling to evaluate the performance of parallel system schedulers”. In 14th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 167--176, Sep 2006.
Annotation: Feedback from system performance affects the submittal of additional work by the users, so a trace collected from a system with one scheduler may not be suitable for simulation of another scheduler that has different performance.

[shmueli07]
Edi Shmueli and Dror G. Feitelson, “Uncovering the Effect of System Performance on User Behavior from Traces of Parallel Systems”. In 15th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 274--280, Oct 2007.
Annotation: User session lengths are correlated with response times: a long response time is a good predictor for a session end. Slowdown is not such a good predictor.

[shmueli09]
Edi Shmueli and Dror G. Feitelson, “On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing?”. IEEE Trans. Parallel & Distributed Syst. 20(7), pp. 983--996, Jul 2009.
Annotation: User behavior induces a feedback cycle from system performance to the generation of additional jobs, and this has to be taken into account in evaluations and can also lead to improved designs.

[singh82]
Ajay Singh and Zary Segall, “Synthetic Workload Generation for Experimentation with Multiprocessors”. In 3rd Intl. Conf. Distributed Comput. Syst. (ICDCS), pp. 778--785, Oct 1982.
Annotation: Exalts the use of synthetic workloads: they are more flexible, controllable, and adjustable than real workloads. The paper is about the B language, which allows for a very detailed description of task graphs that can be used to represent parallel applications.

[skowron13]
Piotr Skowron and Krzysztof Rzadca, “Fair Share Is Not Enough: Measuring Fairness in Scheduling with Cooperative Game Theory”. In 10th Parallel Proc. & Applied Math., Springer-Verlag, Lect. Notes Comput. Sci. vol. 8385, pp. 38--48, Sep 2013.

[smirni97]
Evgenia Smirni and Daniel A. Reed, “Workload Characterization of Input/Output Intensive Parallel Applications”. In 9th Intl. Conf. Comput. Performance Evaluation, Springer-Verlag, Lect. Notes Comput. Sci. vol. 1245, pp. 169--180, Jun 1997.
Annotation: Measurements of 3 applications from the SIO initiative, with data on numbers, sizes, and times of I/O requests. Main results are that distribution of accesses is very modal (e.g. few distinct sizes), and dominant patterns are sequential and interleaved.

[smith98]
Warren Smith, Ian Foster, and Valerie Taylor, “Predicting Application Run Times Using Historical Information”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1459, pp. 122--142, 1998.
Annotation: The main contribution is using a search technique to define the templates used to judge whether or not jobs are similar to each other, and one should be used in the prediction for the other.

[smith99]
Warren Smith, Valerie Taylor, and Ian Foster, “Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 202--219, 1999.
Annotation: Better knowledge about the current workload allows for optimizations in the scheduler. The knowledge is obtained by collecting and analyzing historical data about job executions.

[sodan09]
Angela C. Sodan, “Adaptive Scheduling for QoS Virtual Machines under Different Resource Allocation--Performance Effects and Predictability”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 259--279, 2009.

[sodan10]
Angela C. Sodan and Wei Jin, “Backfilling with Fairness and Slack for Parallel Job Scheduling”. In J. Physics: Conf. Ser., High Perf. Comput. Symp., vol. 256 2010.

[sodan11]
A. C. Sodan, “Service Control with the Preemptive Parallel Job Scheduler Scojo-PECT”. Cluster Comput. 14(2), pp. 165--182, Jun 2011.

[song04]
Baiyi Song, Carsten Ernemann, and Ramin Yahyapour, “Parallel Computer Workload Modeling with Markov Chains”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 47--62, 2004.

[song05]
Shanshan Song, Kai Hwang, and Yu-Kwong Kwok, “Trusted Grid Computing with Security Binding and Trust Integration”. J. Grid Computing 3(1-2), pp. 53--73, Jun 2005.
Annotation: Uses a trust model based on fuzzy logic.

[sotomayor06]
Borja Sotomayor, Kate Keahey, and Ian Foster, “Overhead Matters: A Model for Virtual Resource Management”. In 2nd Workshop Virtualization Technology in Distributed Comput., Nov 2006.
Annotation: Overhead for virtualization should be scheduled rather than being deducted from a user's account.

[squillante99]
Mark S. Squillante, David D. Yao, and Li Zhang, “The Impact of Job Arrival Patterns on Parallel Scheduling”. Performance Evaluation Rev. 26(4), pp. 52--59, Mar 1999.
Annotation: Show that the interarrival time is more heavy-tailed than the exponential distribution often assumed, and that this leads to lower performance.

[squillante99b]
Mark S. Squillante, David D. Yao, and Li Zhang, “Analysis of Job Arrival Patterns and Parallel Scheduling Performance”. Performance Evaluation 36--37, pp. 137--163, Aug 1999.
Annotation: Show that the interarrival time is more heavy-tailed than the exponential distribution often assumed, and that this leads to lower performance.

[srinivasan02]
Srividya Srinivasan, Rajkumar Kettimuthu, Vijay Subramani, and Ponnuswamy Sadayappan, “Selective reservation Strategies for Backfill Job Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 55--71, 2002.
Annotation: Instead of deciding in advance how many reservations to make (e.g. 1 in EASY or for all jobs in conservative), decide on-the-fly and make reservations only for jobs that are delayed too much.

[srinivasan02b]
Srividya Srinivasan, Vijay Subramani, Rajkumar Kettimuthu, Praveen Holenarsipur, and P. Sadayappan, “Effective Selection of Partition Sizes for Moldable Scheduling of Parallel Jobs”. In 9th High Perf. Comput., Lect. Notes Comput. Sci. vol. 2552, pp. 174--183, Dec 2002.

[streit02]
Achim Streit, “A Self-Tuning Job Scheduler Family with Dynamic Policy Switching”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2537, pp. 1--23, 2002.
Annotation: At each scheduling step try FCFS, SJF, and LJF, and choose the one the seems to generate the best schedule.

[streit04]
Achim Streit, “Enhancements to the Decision Process of the Self-Tuning dynP Scheduler”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3277, pp. 63--80, 2004.

[talby99a]
David Talby and Dror G. Feitelson, “Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling”. In 13th Intl. Parallel Processing Symp. (IPPS), pp. 513--517, Apr 1999.
Annotation: Suggests slack-based backfilling, in which the scheduler does not commit to a tight schedule in order to leave space for future improvements.

[talby99b]
David Talby, Dror G. Feitelson, and Adi Raveh, “Comparing Logs and Models of Parallel Workloads Using the Co-Plot Method”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1659, pp. 43--66, 1999.
Annotation: A statistical analysis of the similarity of various workload logs and models, showing that the models fall in the same space as the original logs, but parameterization is needed to capture the specific characteristics of any one log.

[talby05]
David Talby and Dror G. Feitelson, “Improving and Stabilizing Parallel Computer Performance Using Adaptive Backfilling”. In 19th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2005.
Annotation: Simulate the behavior of several schedulers (specifically, variants of backfilling) in the background, and periodically switch to the one that gives the best performance results. One of the results is that it is better to use only few candidates, because even a relatively weak scheduler occasionally wins but then typically fails to deliver in the next period.

[talby07]
David Talby, Dror G. Feitelson, and Adi Raveh, “A Co-Plot Analysis of Logs and Models of Parallel Workloads”. ACM Trans. Modeling & Comput. Simulation 12(3), Jul 2007.
Annotation: A statistical analysis of the similarity of various workload logs and models, showing that the models fall in the same space as the original logs, but parameterization is needed to capture the specific characteristics of any one log. Two dimensions seem to suffice.

[tang10]
Wei Tang, Narayan Desai, Daniel Buettner, and Zhiling Lan, “Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on Blue Gene/P”. In 24th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2010.
Annotation: Which part of the scheduling is most sensitive to inaccurate runtime estimates, and how to improve it using historical data.

[tang11]
Wei Tang, Zhiling Lan, Narayan Desai, Daniel Buettner, and Yongen Yu, “Reducing Fragmentation on Torus-Connected Supercomputers”. In Intl. Parallel & Distributed Processing Symp. (IPDPS), pp. 828--839, May 2011.

[tang13]
Wei Tang, Narayan Desai, Daniel Buettner, and Zhiling Lan, “Job Scheduling with Adjusted Runtime Estimates on Production Supercomputers”. J. Parallel & Distributed Comput. 73(7), pp. 926--938, Jul 2013.
Annotation: Predict each jobs accuracy (ratio of actual to requested time) based on historical data, and use the adjusted runtime (i.e. the request multiplied by the adjustment factor) in scheduling decisions.

[thebe09]
Ojaswirajanya Thebe, David P. Bunde, and Vitus J. Leung, “Scheduling Restartable Jobs with Short Test Runs”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 116--137, 2009.

[tian14]
Wenhong Tian and Chee Shin Yeo, “Minimizing Total Busy Time in Offline Parallel Scheduling with Application to Energy Efficiency in Cloud Computing”. Concurrency & Computation -- Pract. & Exp. , 2014.

[toosi11]
Adel Nadjaran Toosi, Rodrigo N. Calheiros, Ruppa K. Thulasiram, and Rajkumar Buyya, “Resource Provisioning Policies to Increase IaaS Provider's Profit in a Federated Cloud environment”. In 13th Intl. Conf. High Performance Comput. & Commun., pp. 279--287, Sep 2011.

[tsafrir05b]
Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson, “Modeling User Runtime Estimates”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Eitan Frachtenberg, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 3834, pp. 1--35, 2005.
Annotation: A detailed model of the distribuion of runtime estimates (it is modal, with around 20 values accounding for the vast majority), and how to assign estimates to jobs in a way that mimics the estimates of real users.

[tsafrir06a]
Dan Tsafrir and Dror G. Feitelson, “Instability in Parallel Job Scheduling Simulation: The Role of Workload Flurries”. In 20th Intl. Parallel & Distributed Processing Symp. (IPDPS), Apr 2006.
Annotation: A butterfly effect: a small change in one job causes a flurry (large sequence of similar jobs by the same user) to be scheduled differently leading to a big effect on the perceived overall performance.

[tsafrir06b]
Dan Tsafrir and Dror G. Feitelson, “The Dynamics of Backfilling: Solving the Mystery of Why Increased Inaccuracy May Help”. In IEEE Intl. Symp. Workload Characterization (IISWC), pp. 131--141, Oct 2006.
Annotation: Inaccurate estimates don't really improve performance; they just nudge the system towards a schedule that is more similar to SJF.

[tsafrir07a]
Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson, “Backfilling Using System-Generated Predictions Rather Than User Runtime Estimates”. IEEE Trans. Parallel & Distributed Syst. 18(6), pp. 789--803, Jun 2007.
Annotation: Suggests using the average of the last two jobs by the same user as a runtime prediction. requires updating the prediction when new information becomes available, e.g. when the job runs longer than the previous prediction.

[tsafrir07b]
Dan Tsafrir, Keren Ouaknine, and Dror G. Feitelson, “Reducing Performance Evaluation Sensitivity and Variability by Input Shaking”. In 15th Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 231--237, Oct 2007.
Annotation: Run the simulation many (e.g. hundreds) of times with slightly different input, to see the distribution of results.

[tsafrir10]
Dan Tsafrir, “Using Inaccurate Estimates Accurately”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer, Lect. Notes Comput. Sci. vol. 6253, pp. 208--221, 2010.

[utrera12]
Gladys Utrera, Siham Tabik, Julita Corbalan, and Jesús Labarta, “A Job Scheduling Approach for Multi-Core Clusters Based on Virtual Malleability”. In 18th EuroPar, pp. 191--203, Aug 2012.

[vandenbossche11]
Ruben Van den Bossche, Kurt Vanmechelen, and Jan Broeckhove, “An Evaluation of the Benefits of Fine-Grained Value-Based Scheduling on General Purpose Clusters”. Future Generation Comput. Syst. 27(1), pp. 1--9, Jan 2011.

[verma08]
Akshat Verma, Puneet Ahuja, and Anindya Neogi, “Power-Aware Dynamic Placement of HPC Applications”. In 22nd Intl. Conf. Supercomputing (ICS), pp. 175--184, Jun 2008.

[vetter03]
Jeffrey S. Vetter and Frank Mueller, “Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures”. J. Parallel & Distributed Comput. 63(9), pp. 853--865, Sep 2003.
Annotation: A study of 5 applications finds consistent usage of collective communications with small payloads, usage of non-blobking communications, and different granularities.

[wan96]
Michael Wan, Reagan Moore, George Kremenek, and Ken Steube, “A Batch Scheduler for the Intel Paragon with a Non-Contiguous Node Allocation Algorithm”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1162, pp. 48--64, 1996.
Annotation: Nodes are partitioned into sets according to attributes (e.g. amount of memory), and sets are grouped into groups that are used to schedule jobs from different queues. The allocation is done by a modified 2-D buddy system. Statistics of the workload on the Paragon at San-Diego Supercomputer center are given.

[windisch96]
Kurt Windisch, Virginia Lo, Reagan Moore, Dror Feitelson, and Bill Nitzberg, “A Comparison of Workload Traces from Two Production Parallel Machines”. In 6th Symp. Frontiers Massively Parallel Comput., pp. 319--326, Oct 1996.
Annotation: Analysis of the workloads on the NASA Ames iPSC/860 and the SDSC Paragon.

[wiseman03]
Yair Wiseman and Dror G. Feitelson, “Paired Gang Scheduling”. IEEE Trans. Parallel & Distributed Syst. 14(6), pp. 581--592, Jun 2003.
Annotation: Run pairs of complementary jobs in each time slice, so as to better utilize resources (specifically, I/O and CPU).

[xiao02]
Li Xiao, Songqing Chen, and Xiaodong Zhang, “Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands”. IEEE Trans. Parallel & Distributed Syst. 13(3), pp. 223--240, Mar 2002.

[yang13]
Xu Yang, Zhou Zhou, Sean Wallace, Zhiling Lan, Wei Tang, Susan Coghlan, and Michael E. Papka, “Integrating Dynamic Pricing of Electricity into Energy aware Scheduling for HPC Systems”. In Supercomputing, Nov 2013.

[yeo05]
Chee Shin Yeo and Rajkumar Buyya, “Service Level Agreement Based Allocation of Cluster Resources: Handling Penalty to Enhance Utility”. In Intl. Conf. Cluster Comput., Sep 2005.

[yeo06]
Chee Shin Yeo and Rajkumar Buyya, “Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters”. In Intl. Conf. Parallel Processing (ICPP), pp. 451--458, Aug 2006.

[yeo10]
Chee Shin Yeo, Srikumar Venugopal, Xingchen Chua, and Rajkumar Buyya, “Autonomic metered pricing for a utility computing service”. Future Generation Comput. Syst. 26(8), pp. 1368--1380, Oct 2010.

[yuan11]
Yulai Yuan, Yongwei Wu, Qiuping Wang, Guangwen Yang, and Weimin Zheng, “Job Failures in High Performance Computing Systems: A Large-Scale Empirical Study”. Comput. & Math. with Applications 62, 2011.

[zakay12]
Netanel Zakay and Dror G. Feitelson, “On Identifying User Session Boundaries in Parallel Workload Logs”. In Job Scheduling Strategies for Parallel Processing, Walfredo Cirne and others, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 7698, pp. 216--234, 2012.
Annotation: Suggests using long interarrival times to identify session breaks rather than long think times, because parallel jobs can be very long.

[zakay13]
Netanel Zakay and Dror G. Feitelson, “Workload Resampling for Performance Evaluation of Parallel Job Schedulers”. In 4th Intl. Conf. Performance Engineering, pp. 149--159, Apr 2013.

[zakay14]
Netanel Zakay and Dror G. Feitelson, “Workload Resampling for Performance Evaluation of Parallel Job Schedulers”. Concurrency & Computation -- Pract. & Exp. 26(12), pp. 2079--2105, Aug 2014.

[zakay14b]
Netanel Zakay and Dror G. Feitelson, “Preserving User behavior Characteristics in Trace-Based Simulation of Parallel Job Scheduling”. In 22nd Modeling, Anal. & Simulation of Comput. & Telecomm. Syst. (MASCOTS), pp. 51--60, Sep 2014.
Annotation: Rather than preserving job arrival times one should preserve their order and the dependencies between them.

[zeng09]
Xijie Zeng and Angela C. Sodan, “Job Scheduling with Lookahead Group Matchmaking for Time/Space Sharing on Multi-Core Parallel Machines”. In Job Scheduling Strategies for Parallel Processing, Eitan Frachtenberg and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 5798, pp. 232--258, 2009.

[zhang01]
Y. Zhang, H. Franke, J. E. Moreira, and A. Sivasubramaniam, “An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 133--158, 2001.
Annotation: Combining all three is best.

[zhang03a]
Yanyong Zhang, Hubertus Franke, Jose Moreira, and Anand Sivasubramaniam, “An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration”. IEEE Trans. Parallel & Distributed Syst. 14(3), pp. 236--247, Mar 2003.
Annotation: Combining all three is best.

[zhang03b]
Yanyong Zhang, Antony Yang, Anand Sivasubramaniam, and Jose Moreira, “Gang Scheduling Extensions for I/O Intensive Workloads”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2862, pp. 183--207, 2003.
Annotation: Co-locating a job's processes with partitions of the files it uses.

[zilber05]
Julia Zilber, Ofer Amit, and David Talby, “What is Worth Learning from Parallel Workloads? A User and Session Based Analysis”. In 19th Intl. Conf. Supercomputing (ICS), pp. 377--386, Jun 2005.

[zheng11]
Ziming Zheng, Li Yu, Wei Tang, Zhiling Lan, Rinku Gupta, Narayan Desai, Susan Coghlan, and Daniel Buettner, “Co-Analysis of RAS Log and Job Log on Blue Gene/P”. In Intl. Parallel & Distributed Processing Symp. (IPDPS), pp. 840--851, May 2011.

[zhou00]
Bing Bing Zhou, David Welsh, and Richard P. Brent, “Resource Allocation Schemes for Gang Scheduling”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 1911, pp. 74--86, 2000.
Annotation: Shows that re-packing and running jobs in multiple slots when possible improve the performance of gang scheduling.

[zhou01]
B. B. Zhou and R. P. Brent, “On the Development of an Efficient Coscheduling System”. In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson and Larry Rudolph, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 2221, pp. 103--115, 2001.

[zhou13]
Zhou Zhou, Zhiling Lan, Wei Tang, and Narayan Desai, “Reducing Energy Costs for IBM Blue Gene/P via Power-Aware Job Scheduling”. In Job Scheduling Strategies for Parallel Processing, Narayan Desai and Walfredo Cirne, (ed.), Springer-Verlag, Lect. Notes Comput. Sci. vol. 8429, pp. 96--115, 2013.

[zotkin99]
Dmitry Zotkin and Peter J. Keleher, “Job-Length Estimation and Performance in Backfilling Schedulers”. In 8th Intl. Symp. High Performance Distributed Comput. (HPDC), pp. 236--243, Aug 1999.


Back to the Parallel Workloads Archive home page
feit@cs.huji.ac.il / 27 Aug 2015