Muli Ben-Yehuda's journal

May 13, 2011

Expertly running scientific tasks on grids and clouds

Filed under: Uncategorized — Muli Ben-Yehuda @ 10:10 AM

My wife Orna Agmon Ben-Yehuda is a graduate student at the Technion CS department, working with Prof. Assaf Schuster. Orna recently published a paper discussing part of her PhD work that I think is amazing work. ExPERT: Pareto-Efficient Task Replication on Grids and Clouds tackles the following problem. Let’s say you are a scientist and you have a collection of computational tasks that you want to run. You also have several resources at your disposable for running these tasks. For example, you could run some — or all — of your tasks on Amazon’s EC2 cloud, which costs money but provides fairly high reliability and quick turnaround, or you could run some — or all — of your tasks on a local computational grid, which is free, but also unreliable and slow. How do you choose?

Orna’s paper tackles this question systematically and makes several contributions. First, it proposes a useful model for reasoning about this problem. Second, it presents ExPERT, a framework that helps the scientist characterize and understand the range of potential choices, assisting the user in picking the right choice. In particular, ExPERT finds those specific choices that are in the the Pareto frontier: the set of choices that are better in some sense than all other choices, and no worse than others. Using ExPERT, a scientist who cares more about cost could pick the cheapest option for running her tasks, while a scientist who cares about response time could pick a more expensive option that provides the quickest response. Another scientist could choose how to balance the two, getting the best response time for a given budget, or the best cost for a given response time. The full abstract is below, and the paper is available here.

Many scientists perform extensive computations by executing large bags of similar tasks (BoTs) in mixtures of computational environments, such as grids and clouds. Although the reliability and cost may vary considerably across these environments, no tool exists to assist scientists in the selection of environments that can both fulfill deadlines and fit budgets. To address this situation, in this work we introduce the ExPERT BoT scheduling framework. Our framework systematically selects from a large search space the Pareto-efficient scheduling strategies, that is, the strategies that deliver the best results for both makespan and cost. ExPERT chooses from them the best strategy according to a general, user-specified utility function. Through simulations and experiments in real production environments we demonstrate that ExPERT can substantially reduce both makespan and cost, in comparison to common scheduling strategies. For bioinformatics BoTs executed in a real mixed grid+cloud environment, we show how the scheduling strategy selected by ExPERT reduces both makespan and cost by 30%-70%, in comparison to commonly-used scheduling strategies.

Blog at