Muli Ben-Yehuda's journal

March 4, 2019

Data center vignettes #5

Filed under: Uncategorized — Muli Ben-Yehuda @ 8:55 AM

It’s Monday. The rain drums on the roof. You are worried about next week’s upgrade to the storage servers. They are struggling. Linux, bless its little heart, is not keeping up with the new NVMe SSDs. With RAID5 and compression, it’s slower than a three legged turtle. The plan is to put in stronger CPUs and more RAM across the entire storage server fleet.

You sit up straight. An idea has just occurred to you. This could be big. Really big. You know how no one does machine learning in software anymore? How the big cloud guys build custom ASICs? They do it because at scale, a 20% reduction in CPU utilization is huge.

What if you could achieve the same efficiency as the big guys for your storage servers? What if there was a way to accelerate in hardware common storage operations and offload them from the CPU? The performance improvements would be nice, but the TCO savings, beginning with avoiding that messy data center wide upgrade next week — that’s going to be huge. It will delight your boss. And your CFO.

As the rain continues drumming, you realize that you don’t need to build the storage accelerator. Lightbits already built it. You put that upgrade on hold and give them a call to order a batch of LightFields. Your day just got a whole lot better. Even the rain has stopped.

hd-wallpaper-of-a-windmill-viewed-from-the-bottom-680x425

 

 

March 2, 2019

Data center vignettes #4

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:25 PM

It’s late on a clear and cold Saturday night. You had a few with friends at the pub and now you’re heading home. You can’t wait to get back and snuggle with the cats. And then the email comes in.

“We have a problem. There’s something wrong with the database clusters’ latencies. I’m not sure what’s going on but the tail keeps rising. If this continues for much longer, we  are going to be in violation of our SLAs and the brown stuff will hit the fan. Can you take an urgent look?”

Sigh, the cats will have to wait. Good thing you didn’t go for that last round at the pub. Let’s see. The database clusters look OK, no nodes have failed recently, CPU utilization is OK, query processing time within acceptable bounds, what the hell is going on?

And then you see it. Some of the new batch of SSDs are failing on some of the nodes. When they fail, Linux resets them and they come back until they fail again. They have been jittering for the last few hours, slowing down the nodes they’re on. And every time an SSD fails, the latency on that node spikes up, bringing the cluster’s entire  tail latency up. It’s either kill those nodes, reducing capacity to a dangerous level, or make a midnight trip to the data center and take care of those drives.

As you navigate the quiet and empty streets on the way to the data center, it occurs to you. Drives have always failed and will continue failing. What you need is for those drives to just fail in place while everything continues working, no slowdown, no tail latency increase. Then you could be home with the cats right now. You  keep driving.

At home, if anyone were listening, they might or might not hear the cats quietly meowing “LightOS… use LightOS. From Lightbits. Coming soon.”

March 1, 2019

Data center vignettes #3

Filed under: Uncategorized — Muli Ben-Yehuda @ 10:36 AM

It’s Friday. You’ve been hacking on this cool bit of code for awhile. It will be such a pleasure to deploy and see the user engagement numbers go up. The CI is green. The code is tight. You take a deep breath and deploy.

Ten minutes later, everything is fine. Ten minutes after that, still good. An hour passes. You check out the Grafana dashboard, and everything looks OK, except… why does the size of one of your key data stores continually increase with the new code?

This is not yet an emergency but it will become one if it keeps up. Each of your servers is limited to two SSDs. The infrastructure guys wanted to keep SKU sprawl to a minimum, and most of the CPU cycles are used for computation anyway, so they decided to “right size” the storage on each server to two SSDs per node. You crunch some numbers and realize that the data store is going to stop growing and stabilize — exactly 100GB after it exhausts all available space on your servers. You curse and roll back the code until they can install more SSDs, sometime next decade.

Wouldn’t it have been nice if there was no limitation on the amount of storage your application could use, while still enjoying the benefits of direct attached SSDs? Enter LightOS, coming soon from Lightbits Labs.

February 28, 2019

Data center vignettes #2

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:28 AM

It’s Thursday. The weekend is near. But you may not be able to enjoy it. The word has come down from above with the final requirements for next year’s Cassandra cluster’s growth. It’s going to have to grow by a lot. We’re talking seven if not eight figures. Not uptime or nodes. Dollars. Greenbacks. You’re going to have to invest a few millions, maybe tens of millions, in a new storage backend for the Cassandra cluster, because what you have right now is already bursting at the seams. When you built the cluster on direct attached storage, it made sense. That’s what all the cool kids were doing. The one true way to build a cloud — just stick the SSDs into the compute nodes.

As the work week draws to a close, you realize that you’re going to have to think different. To think hyperscale. And hyperscale means storage disaggregation. If it’s good enough for AWS, Facebook, and all those other guys, it has got to be good enough for your Cassandra cluster. But you don’t have their engineering teams or scale (yet). where are you going to find a disaggregated storage solution that fits your needs, runs on your servers, with your SSDs, and your existing data center network?

Rest easy and enjoy the weekend, friend. Lightbits LightOS is coming soon and it delivers exactly what you need.

Data center vignettes #1

Filed under: Uncategorized — Muli Ben-Yehuda @ 9:08 AM

It’s a slow Wednesday afternoon. The rain drips outside, collecting in large puddles. The data centers are humming along, the developers are drinking coffee and writing code, the customers’ orders keep coming in though the web sites. All is well in the world of Foo Corp’s infrastructure.

Suddenly, PagerDuty starts paging. The dashboards are turning red. There was a massive spike in demand, and Foo Corp’s databases are struggling to meet it. Request latency is shooting through the roof. Demand is high and growing higher and the systems are unable to handle it.

The ops team jolts into action and the database guys start flooding the relevant slack channels. In a few minutes, you see what happened: the storage systems everything is built on are no longer serving storage. It might be a network issue with the expensive RDMA network you put in; it might be an issue with the new NVMe SSDs you bought that take a looong tiiiime to run their garbage collection cycles. Maybe you’ll figure it out later and write a nice postmortem no one will read. But right now, whatever it is, it’s painful to leave customer orders on the floor because the infrastructure just can’t serve.

Sounds painful? We think so too. Good thing Lightbits LightOS is coming soon.

 

September 1, 2017

A brief update of the last five years except not really

Filed under: Uncategorized — Muli Ben-Yehuda @ 2:48 PM

So there I was, sitting at a hotel room in Dublin, Ireland, procrastinating before checking out and heading to the airport. Five years older, maybe even five years wiser. Who knows. It’s good to write again, even though I’m not sure I actually have something to say. I’m still here?

If I look back at the last five years, there have been some good times and some horrible times. But overall, the gradient is proceeding in the right direction. Entropy always increases, but so far the system has remained mostly orderly.

I’ve been feeling an itch to go back to writing recently, but I’m not sure what to write about. Technology? I spend most of my waking hours thinking about it and how it applies to Lightbits and how Lightbits applies to the world and enough is enough. Life? I don’t think so. Some things should remain private. Business? I’m not sure I have enough to say. Perhaps I’ll write about travel and the wander lust that sometimes takes over me and the places I’ve been and the things I’ve seen. Or perhaps not. We’ll see.

But here are some pictures of Dublin from last night.

January 16, 2012

Sometimes, a paper is more than just a paper

Filed under: Uncategorized — Muli Ben-Yehuda @ 10:51 AM

Sometimes, a paper is more than just a paper. Around late 2005 or early 2006 I started working on direct device assignment, a useful approach for I/O virtualization where you give a virtual machine direct access to an I/O device so that it can read and write the physical machine’s memory without hypervisor involvement. The main reason to use direct device assignment is performance: since you bypass the hypervisor on the I/O path, it stands to reason that for I/O intensive workloads — the hardest workloads to virtualize — direct device assignment would provide bare-metal performance. Right?

Wrong. Since 2006, we’ve seen again and again that even with direct device assignment virtual machines performance falls far short of bare-metal performance for the same workload. Sometime in 2009, we realized that after you solve all other problems, one particular thorny issue remains: interrupts. The interrupt delivery and completion architectural mechanisms in contemporary x86 machines, even with the latest virtualization support, were not designed for delivering interrupts directly to untrusted virtual machines. Instead, every hypervisor programs the interrupt controllers to deliver all interrupts directly to the hypervisor, which then injects the relevant interrupts to each virtual machine. For interrupt-intensive virtualized workloads, these exits to the hypervisor can lead to a massive drop in performance.

Although it is possible to work around the interrupt issue by modifying the virtual machine’s device drivers to use polling, as we did in the Turtles paper and in the Tamarin paper that will be presented in FAST ’12, it always annoyed me that the promise of bare-metal performance for virtual machines remained unreachable for unmodified virtual machines. That is, until now.

Through the amazing work of a combined IBM and Technion team, we came up with an approach — called ELI, for Exitless Interrupts — that allows direct and secure handling of interrupts directly in virtual machines — without any changes to the underlying hardware. With ELI, direct device assignment can finally do what it was always meant to do: provide virtual machines with bare-metal performance. It is nice to look back at the research over the last five or six years that lead us to this point; it will be even nicer, when we present this work in ASPLOS in London in a couple of months, to ponder what other breakthroughs the next few years hold.

“ELI: Bare-Metal Performance for I/O Virtualization”, by Abel Gordon, Nadav Amit, Nadav Har’El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster and Dan Tsafrir. In ASPLOS ’12: Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems.

Direct device assignment enhances the performance of guest virtual machines by allowing them to communicate with I/O devices without host involvement. But even with device assignment, guests are still unable to approach bare-metal performance, because the host intercepts all interrupts, including those interrupts generated by assigned devices to signal to guests the completion of their I/O requests. The host involvement induces multiple unwarranted guest/host context switches, which significantly hamper the performance of I/O intensive workloads. To solve this problem, we present ELI (ExitLess Interrupts), a software-only approach for handling interrupts within guest virtual machines directly and securely. By removing the host from the interrupt handling path, ELI manages to improve the throughput and latency of unmodified, untrusted guests by 1.3x — 1.6x, allowing them to reach 97%–100% of bare-metal performance even for the most demanding I/O-intensive workloads.

January 2, 2012

New year, same stuff

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:10 PM

I guess I should write something here, but I am not quite sure what. My life is a roller-coaster of the mundane; rarely do I have a chance to sit back and pontificate. So, all is well, work work work study research kids school work sleep work work work fun! Not that I am complaining, mind you.

Perhaps I’ll write more tomorrow. Or in three months. We’ll see.

August 25, 2011

New Paper: Deconstructing Amazon EC2 Spot Instance Pricing

Filed under: Uncategorized — Muli Ben-Yehuda @ 2:22 PM

Ever wonder how Amazon prices its spot instances? Or, having dug deeper, perhaps wondered why the prices sometimes appear a little funny? Wonder no more: Orna Agmon Ben-Yehuda tells the gruesome story of how Amazon really prices its spot instances in our new paper Deconstructing Amazon EC2 Spot Instance Pricing. Warning: not for the faint of heart.

Cloud providers possessing large quantities of spare capacity must either incentivize clients to purchase it or suffer losses. Amazon is the first cloud provider to address this challenge, by allowing clients to bid on spare capacity and by granting resources to bidders while their bids exceed a periodically changing spot price. Amazon publicizes the spot price but does not disclose how it is determined.

By analyzing the spot price histories of Amazon’s EC2 cloud, we reverse engineer how prices are set and construct a model that generates prices consistent with existing price traces. We find that prices are usually not market-driven as sometimes previously assumed. Rather, they are typically generated at random from within a tight price interval via a dynamic hidden reserve price. Our model could help clients make informed bids, cloud providers design profitable systems, and researchers design pricing algorithms.

Academic Highs and Lows

Filed under: Uncategorized — Muli Ben-Yehuda @ 12:33 PM

One of the reasons I love the academic life is the built-in highs. There’s nothing quite the high you get when you make a discovery, or when something finally works like it should. I won’t lie: I’ve been known to do the happy happy joy joy dance in the halls on such occasions. The high when a paper is accepted lasts for a few days; winning a prestigious award is a rare pleasure and the high lasts longer. Learning that someone else cites your work is always nice, especially if it causes the all-important h-index to rise, as it did last night.

But, with the highs also come the lows: rejection never ceases to hurt, and at least statistically, most papers will be rejected before they get accepted. But you know what, that’s OK too, because hurting when your paper gets rejected just means you care. Without lows, there could not be any highs — and it’s the highs that matter.

Next Page »

Blog at WordPress.com.