Muli Ben-Yehuda's journal

February 28, 2019

Data center vignettes #2

Filed under: Uncategorized — Muli Ben-Yehuda @ 11:28 AM

It’s Thursday. The weekend is near. But you may not be able to enjoy it. The word has come down from above with the final requirements for next year’s Cassandra cluster’s growth. It’s going to have to grow by a lot. We’re talking seven if not eight figures. Not uptime or nodes. Dollars. Greenbacks. You’re going to have to invest a few millions, maybe tens of millions, in a new storage backend for the Cassandra cluster, because what you have right now is already bursting at the seams. When you built the cluster on direct attached storage, it made sense. That’s what all the cool kids were doing. The one true way to build a cloud — just stick the SSDs into the compute nodes.

As the work week draws to a close, you realize that you’re going to have to think different. To think hyperscale. And hyperscale means storage disaggregation. If it’s good enough for AWS, Facebook, and all those other guys, it has got to be good enough for your Cassandra cluster. But you don’t have their engineering teams or scale (yet). where are you going to find a disaggregated storage solution that fits your needs, runs on your servers, with your SSDs, and your existing data center network?

Rest easy and enjoy the weekend, friend. Lightbits LightOS is coming soon and it delivers exactly what you need.

Data center vignettes #1

Filed under: Uncategorized — Muli Ben-Yehuda @ 9:08 AM

It’s a slow Wednesday afternoon. The rain drips outside, collecting in large puddles. The data centers are humming along, the developers are drinking coffee and writing code, the customers’ orders keep coming in though the web sites. All is well in the world of Foo Corp’s infrastructure.

Suddenly, PagerDuty starts paging. The dashboards are turning red. There was a massive spike in demand, and Foo Corp’s databases are struggling to meet it. Request latency is shooting through the roof. Demand is high and growing higher and the systems are unable to handle it.

The ops team jolts into action and the database guys start flooding the relevant slack channels. In a few minutes, you see what happened: the storage systems everything is built on are no longer serving storage. It might be a network issue with the expensive RDMA network you put in; it might be an issue with the new NVMe SSDs you bought that take a looong tiiiime to run their garbage collection cycles. Maybe you’ll figure it out later and write a nice postmortem no one will read. But right now, whatever it is, it’s painful to leave customer orders on the floor because the infrastructure just can’t serve.

Sounds painful? We think so too. Good thing Lightbits LightOS is coming soon.


Blog at