Muli Ben-Yehuda's journal

June 14, 2011

Odds and Ends from USENIX FCW11

Filed under: Uncategorized — Muli Ben-Yehuda @ 9:56 PM

This week I am at the USENIX Federated Conferences Week in lovely Portland, Oregon. Yesterday Orna and I walked back and forth through Portland, today I am at the 3rd Workshop on I/O Virtualization and tomorrow I will be at the USENIX Annual Technical Conference.

I am wearing multiple hats this week. Earlier this morning I presented our work on “SplitX: Split Guest/Hypervisor Execution on Multi-core”, and Abel Gordon presented our work on “VAMOS: Virtualization Aware Middleware”. Both presentations went well, if I might say so myself. Nadav Amit will present our work on “vIOMMU: Efficient IOMMU Emulation” tomorrow — if his flight makes it here in time. I will also participate in a panel on challenges in cloud I/O later today, will be chairing a session on Friday, and will be summarizing several sessions for ;login today and tomorrow.

I like being here at USENIX ATC and WIOV; the topics are close to my heart and the halls are full of familiar faces. It is also good, however, to step outside of one’s comfort zone every so often. Accordingly I agreed to serve on the technical program committee for SPSN ’11: The First International Workshop on Security and Privacy in Social Networks. This should be interesting.

Last but not least, a journey that started almost two years ago reached its end — at least the end of the beginning — recently when Marcelo Tosatti applied the nested VMX patchset to the KVM tree. Kudos to Nadav Har’El who took a bunch of research code (in both the best and worst senses of the term) and continuously polished and rewrote it until it finally conformed to the highest standards of open source development. Thanks to Nadav’s tireless efforts, nested virtualization on Intel machines will soon be available on every KVM deployment near you.

May 13, 2011

Expertly running scientific tasks on grids and clouds

Filed under: Uncategorized — Muli Ben-Yehuda @ 10:10 AM

My wife Orna Agmon Ben-Yehuda is a graduate student at the Technion CS department, working with Prof. Assaf Schuster. Orna recently published a paper discussing part of her PhD work that I think is amazing work. ExPERT: Pareto-Efficient Task Replication on Grids and Clouds tackles the following problem. Let’s say you are a scientist and you have a collection of computational tasks that you want to run. You also have several resources at your disposable for running these tasks. For example, you could run some — or all — of your tasks on Amazon’s EC2 cloud, which costs money but provides fairly high reliability and quick turnaround, or you could run some — or all — of your tasks on a local computational grid, which is free, but also unreliable and slow. How do you choose?

Orna’s paper tackles this question systematically and makes several contributions. First, it proposes a useful model for reasoning about this problem. Second, it presents ExPERT, a framework that helps the scientist characterize and understand the range of potential choices, assisting the user in picking the right choice. In particular, ExPERT finds those specific choices that are in the the Pareto frontier: the set of choices that are better in some sense than all other choices, and no worse than others. Using ExPERT, a scientist who cares more about cost could pick the cheapest option for running her tasks, while a scientist who cares about response time could pick a more expensive option that provides the quickest response. Another scientist could choose how to balance the two, getting the best response time for a given budget, or the best cost for a given response time. The full abstract is below, and the paper is available here.

Many scientists perform extensive computations by executing large bags of similar tasks (BoTs) in mixtures of computational environments, such as grids and clouds. Although the reliability and cost may vary considerably across these environments, no tool exists to assist scientists in the selection of environments that can both fulfill deadlines and fit budgets. To address this situation, in this work we introduce the ExPERT BoT scheduling framework. Our framework systematically selects from a large search space the Pareto-efficient scheduling strategies, that is, the strategies that deliver the best results for both makespan and cost. ExPERT chooses from them the best strategy according to a general, user-specified utility function. Through simulations and experiments in real production environments we demonstrate that ExPERT can substantially reduce both makespan and cost, in comparison to common scheduling strategies. For bioinformatics BoTs executed in a real mixed grid+cloud environment, we show how the scheduling strategy selected by ExPERT reduces both makespan and cost by 30%-70%, in comparison to commonly-used scheduling strategies.

May 12, 2011

Greek adventure

Filed under: Uncategorized — Muli Ben-Yehuda @ 4:09 PM

Sitting inside working in my office in Haifa on a beautiful day, instead of doing something fun outside in the sun, always feels slightly wrong. Sitting in a conference room in Heraklion, Crete, and looking out of the window at the gorgeous view outside only feels slightly wronger.

Orna and I are currently in beautiful Heraklion, Crete. I am here mostly for work — the IOLanes EU research project meeting and review — and for a little vacationing; Orna is in full vacation mode. We both badly need a vacation after the pressure-cooker of the ACM Symposium on Cloud Computing deadline. During the last couple of weeks of April we worked around the clock, culminating in the submission of three different papers on April 30th. IOLanes provided a much needed opportunity to visit Greece and rest a little.

We were suppose to arrive in Heraklion yesterday morning, but ended up spending a few hours in Athens first, courtesy of striking air traffic controllers. After landing in Athens we took the bus from the airport to the city center, where we walked around and enjoyed Greek hospitality and cooking. We had planned to return to the airport the same way, but when we returned to the bus stop a few hours later, we discovered tens of thousands of demonstrators blocking the bus station and nearby roads. Our first indication of imminent trouble was the large number of uniformed police officers blocking the streets in full riot gear, including gas masks. Moving closer, we saw the more experienced on-lookers wearing scarves or surgical masks. While the demonstration was peaceful at that time, clearly, tensions were running high and violence was not far off. Orna was feeling adventurous and wanted to hang around and try to intercept the bus somewhere along its route, but cooler heads — mine — prevailed. Once we saw the news agent lock and barricade his shop, we backtracked out of the danger area and found a taxi back to the airport. When we arrived to the airport we discovered that the demonstration indeed turned violent shortly after we left. Nonetheless, we did not let the unexpected adventure deter us from enjoying Greek hospitality and most excellent cooking.

Today and tomorrow I am working, and Saturday we will be dedicated to playing tourist. Next week it’s back to Haifa, with a detour to Tel-Aviv University on Tuesday evening where I will be giving the Turtles talk to the local defcon chapter dc9723. Should be an interesting experience with a different crowd than usual — younger and hopefully rowdier 🙂

April 10, 2011

3rd Workshop on I/O Virtualization, VAMOS and SplitX

Filed under: Uncategorized — Muli Ben-Yehuda @ 1:15 PM

The program committee meeting for the 3rd Workshop on I/O Virtualization was held this Friday. I like the resulting program quite a bit, regardless of the fact that two of our submissions—VAMOS and SplitX—were accepted. WIOV is probably my favorite workshop ever, and this year it will be held again with the USENIX Annual Technical Conference, another favorite venue. The full program will be available online in a week or two.

Our two papers which have been accepted, “SplitX: Split Guest/Hypervisor Execution on Multi-Core” (joint with with Alex Landau and Abel Gordon) and “VAMOS: Virtualization Aware Middleware” (joint with Abel Gordon, Dennis Filimonov, and Maor Dahan) tackle the I/O virtualization problem from two different directions. VAMOS follows the same general line of thought as our earlier Scalable I/O and IsoStack work. Raising the level of abstraction of I/O operations—socket calls instead sending and receiving Ethernet frames, file system operations instead of reading and writing blocks—improves I/O performance because it cuts down the number of protection-domain crossings needed. In VAMOS, we perform I/O at the level of middleware operations, with the guest passing database queries to the hypervisor instead of reading and writing disk blocks. This gives a nice boost to performance, as you might expect, and is fairly easy to do taking advantage of the inherent modularity of middleware—which to me was a surprising result.

SplitX is a whole other kettle of fish. It has been clear to us for some time that the inherent overhead of x86 machine virtualization is tied to the trap-and-emulate model, as can be seen perhaps most clearly in the Turtles paper. With the trap-and-emulate model, both direct and indirect overheads are inherent in the model, because we time-multiplex two different contexts (the guest and the hypervisor) onto the same CPU core, incurring both the switch overhead and the indirect cost of dirtying the caches. But what if we could run guests on their own cores, and hypervisors on their own cores, and never the twain shall meet? SplitX presents our initial exploration of this—very promising, if I may say so myself—idea.

The papers will be available online later, but shoot me an email to get the current draft.

March 15, 2011

vIOMMU paper to appear in 2011 USENIX Annual Technical Conference

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:33 PM

Well, it’s official: our vIOMMU paper, which I wrote about previously, has been accepted to the 2011 USENIX Annual Technical Conference. I love it when that happens 🙂

March 2, 2011

vIOMMU: Efficient IOMMU Emulation

Filed under: Uncategorized — Muli Ben-Yehuda @ 4:01 PM

My colleague Nadav Amit will be presenting his M.Sc. research, which I had the pleasure of helping with, this upcoming Sunday. The summer before last Nadav did a summer internship with my group at the Haifa Research Lab. Nadav’s internship was dedicated to analyzing the IOTLB behavior of ontemporary IOMMUs, and resulted in this WIOSCA paper. In order to analyze IOTLB behavior, we had to first collect traces of how modern operating systems set-up their DMA buffers, and to do that, Nadav developed IOMMU emulation in KVM.

For his M.Sc., Nadav researched how to emulate IOMMUs efficiently, leading to two primary contributions: first, that waiting just a few milliseconds before tearing down an IOMMU mapping can boost performance substantially due to high temporal reuse. Second, that is possible to emulate a hardware device without trapping to the hypervisor on every device interaction, by using a separate core (a sidecore) to run the device emulation code. The full abstract is below, and everyone is invited to the talk.

Direct device assignment, where a guest virtual machine directly interacts with an I/O device without host intervention, is appealing, because it allows an unmodified (non-hypervisor-aware) guest to achieve near-native performance. But device assignment for unmodified guests suffers from two serious deficiencies: (1) it requires pinning of all the guest’s pages, thereby disallowing memory overcommitment,
and (2) it exposes the guest’s memory to buggy device drivers.

We solve these problems by designing, implementing, and exposing an emulated IOMMU (vIOMMU) to the unmodified guest. We employ two novel optimizations to make vIOMMU perform well: (1) waiting a few milliseconds before tearing down an IOMMU mapping in the hope it will be immediately reused (“optimistic teardown”), and (2) running the vIOMMU on a sidecore, and thereby enabling for the first time the use of a sidecore by unmodified guests. Both optimizations are highly effective in isolation. The former allows bare-metal to achieve 100% of a 10Gbps line rate. The combination of the two allows an unmodified guest to do the same.

February 3, 2011

new paper accepted

Filed under: Uncategorized — Muli Ben-Yehuda @ 5:40 PM

Our paper “Ginkgo, Automated, Application-Driven Memory Overcommitment for Cloud Computing” has been accepted to the ASPLOS RESoLVE workshop. Here is the abstract:

Continuous advances in multicore and I/O technologies have caused memory to become a very valuable sharable resource that limits the number of virtual machines (VMs) that can be hosted in a single physical server. While today’s hypervisors implement a wide range of mechanisms to overcommit memory, they lack memory allocation policies and frameworks capable of guaranteeing levels of quality of service to their applications.

In this short paper we introduce Ginkgo, a memory overcommit framework that takes an application-aware approach to the problem. Ginkgo dynamically estimates VM memory requirements for applications without user involvement or application changes. Ginkgo regularly monitors application progress and incoming load for each VM, using this data to predict application performance under different VM memory sizes. It automates the distribution of memory across VMs during runtime to satisfy performance and capacity constraints while optimizing towards one of several possible goals, such as maximizing overall system performance, minimizing application quality-of-service violations, minimizing memory consumption, or maximizing profit for the cloud provider.

Using this framework to run the benchmarks DayTrader 2.0 and SPECweb2009, our initial experimental results indicate that overcommit ratios of at least 2x can be achieved while maintaining application performance, independently of additional memory savings that can be enabled by techniques such as page coalescing.

I will post the final version of the paper on the publications page when it will be ready.

First post!

Filed under: Uncategorized — Muli Ben-Yehuda @ 4:46 PM

Well, it was bound to happen sometime. I have a new blog.

October 5, 2010

Filed under: Uncategorized — Muli Ben-Yehuda @ 5:14 PM

You know it’s a good day when you get to use angry turtle in a presentation. Angry turtle is angry!

angry turtle

October 4, 2010

Filed under: Uncategorized — Muli Ben-Yehuda @ 8:57 AM

I was hurrying down the Newark airport terminal, wondering whether I
was going to make the connecting flight to Seattle, en-route to
Vancouver for the 9th USENIX
Symposium on Operating Systems Design and
. Suddenly, my cell phone rang. It
was Michael
, a long-time co-worker and mentor. “Have you seen the
email?” “No, I just landed in Newark and am on the way to catch a
connection to Seattle. Which email?” “Here, let me read you the

Dear Authors,

Your paper has been selected as one of two
winners of the OSDI Jay
Best Paper award.”

Receiving this award is a unique experience and a great honor. It is
doubly sweet because of all the research projects I’ve worked on, the
Turtles nested virtualization project is perhaps the one I am most
proud of. When Orit, Ben, and I started working on it in 2008, we set
out to do the impossible. Many colleagues claimed that efficient
nested x86 virtualization on the Intel platform could not be
done. Eventually, working long and hard, and with help from friends,
we showed that not only could it be done, it even performs well. I’ve
learned a lot in the process, about x86 virtualization, about leading
a team, and about the art and craft doing research, but the most
important lesson was to never lose hope, to always believe that
eventually, it will work. And guess what? It did!

If you want to know how we did it, and what we learned in the process,
check out The Turtles
Project: Design and Implementation of Nested Virtualization

In classical machine virtualization, a hypervisor runs multiple
operating systems simultaneously, each on its own virtual machine. In
nested virtualization a hypervisor can run multiple other
hypervisors with their associated virtual machines. As operating
systems gain hypervisor functionality—Microsoft Windows 7 already
runs Windows XP in a virtual machine—nested virtualization will
become necessary in hypervisors that wish to host them. We present the
design, implementation, analysis, and evaluation of high-performance
nested virtualization on Intel x86-based systems. The Turtles project,
which is part of the Linux/KVM hypervisor, runs multiple
unmodified hypervisors (e.g., KVM and VMware) and operating
systems (e.g., Linux and Windows). Despite the lack of architectural
support for nested virtualization in the x86 architecture, it can
achieve performance that is within 6-8\% of single-level (non-nested)
virtualization for common workloads, through multi-dimensional
for MMU virtualization and multi-level device
for I/O virtualization.

The scientist gave a superior smile before replying, “What
is the tortoise standing on?” “You’re very clever, young man, very
clever”, said the old lady. “But it’s turtles
all the way down!

« Previous PageNext Page »

Blog at