Cassandra Lessons Learned: Right-Sizing Cassandra Clusters

This is the first of two tragic Cassandra hardware utilization stories. Hopefully you can take away a few nuggets to help you avoid letting this happen you.

A big customer of mine, an online personal financial platform, had been running mission critical customer-facing systems on Cassandra in production for years. It is a great brand and growing business. Their workloads and data expanded over time. Every year they procured more hardware, rolled out more data environments, and grew their footprint. After 4+ years in production, we’re talking about a lot of Cassandra nodes across a lot of clusters.

Then one day in the car on the way to lunch with a Sr Director of Operations and Engineering…

[Me]  “How’s your cluster setup going for next year?”

[Customer]  “Great. We’re building out capacity, rolling out 60 new nodes.”

[Me]  “Excellent. What boxes are you using?”

[Customer]  “Oh, I think they’re 24 core boxes with 128 GB memory. And finally we got SSDs.”

[Me]  “SSDs. Nice. Those are pretty good sized boxes.”

[Customer]  “Yeah. It’s the only server option I had available for running databases. Expensive.”

[Me]  “Hmm. How many Cassandra nodes are you running on each of those boxes?”

[Customer]  “One. There’s not a Cassandra feature to deploy more than one that we know of.”

[Me] “Holy smokes! You have a lot of extra capacity. You could probably get **3 Cassandra nodes on each of those boxes.”

[Customer] “Really?! I had no idea. Ugh! We need to talk more about this.”

My poor customer was self-inflicting a maximum 33% hardware utilization on himself. He bought 60 servers when he only needed 20.

Considering that Cassandra capacity is usually planned for peak load, in my customer’s use case the chances were good his cluster was going to run at 5-15% utilization off-peak (most of the time), a real waste of server capacity.

“So what”, you say, “Why should I care if that cluster had such low hardware utilization? I’m sure mine’s fine.”

Here’s why you should care: This kind of thing happens all the time in the Cassandra world. At the time of that conversation, I had two other customers struggling with the same challenge, a large chip maker, and a network equipment manufacturer.

There are many variations of how and why this happens. You just might be unknowingly wasting CapEx and leaving a ton of scalability on the table with your Cassandra clusters.

Here’s the reality check:

My customers needed to deploy a different number of Cassandra software nodes than the number of physical host servers they had available. But they didn’t have a way to do it that was feasible for them.

What my customers really needed was the ability to deploy scenarios like 52 Cassandra nodes on 23 host servers, or maybe 17 nodes on 11 servers, or possibly 500 nodes on as few servers as possible to keep their hardware costs down for such a large cluster.

How does this happen? Because you optimize the number of server hosts and Cassandra nodes separately.

  • Host Count: Provision host server capacity based on the best economics you can get from your IT or Purchasing for gear that meets or exceeds the Cassandra minimum server specifications. Current best practice is to get enough servers to handle the aggregate peak workload or data set size, whichever is the limiter in your use case.
  • Cassandra Node Count: You really need to size your Cassandra software nodes and determine your cluster node count based on testing before you go to production. Sizing Cassandra before you’re in production is a both art and science. It’s hard. You can guestimate through calculations. But sizing calculators are usually not effective. Luckily for you, Cassandra scales linearly. You can test a small cluster and then arrive at your total cluster node count by dividing your aggregate workload or data size by the capacity of each node (from your tests). Don’t forget to factor in replication and multi-DC.

Chances are slim to none the boxes available to you are exactly the right size for a single Cassandra node. Most of my Cassandra customers had no choice but to use servers that were significantly over the minimum Cassandra server requirements. So they tried to over-size their Cassandra node density to get the most out of their servers. That’s a path proven to be fraught with pain.

As your workloads and data sizes change over time, the node:host mismatch is exacerbated. But that’s a topic for another day and another blog post…

**Here’s how we get to the sizing of 3 Cassandra nodes per box based on current practices with Cassandra 3.x:

  • Storage – Under a write or read/write workload each Cassandra node does its own back end garbage collecting and flushing that causes a lot of storage IO. Under heavy write workload and tuned well, it will be compacting almost constantly without falling too far behind, if at all. This means the amount of data you can put on a Cassandra node is often IO bound. Think: max of ~1.5TB for HDD and ~4-5TB for SSD. Because there are many variables that factor in how much data you can fit on a Cassandra node while meeting your SLAs, you will want to load test this with your own data model, access patterns with realistic payload sizes, client code, driver setup, hardware stack, etc, before deciding your production data density per node.
  • Cores – A single Cassandra node under max load recruits maybe ~12 cores. That 12 cores is not a hard limit. It’s a ballpark before diminishing returns set in. Yes, I’m generalizing here and not hitting all the permutations. It’s still valid. 8 cores is a good sweet spot for a moderate sized yet very capable node.
  • Memory – The Cassandra JVM heap can only be so big before you start to get in GC trouble. Buffer cache will take as much memory as you throw at it, which is good for read-heavy workloads. But the cache benefit might not be there for a write or mixed read/write workload. 42GB is fine for 8 cores and reasonable data density per node.

So how the do you get the node deployment flexibility you need? Let’s look at your options.

Use virtual machines. VM’s are a solid, time-tested option. But then again, you lose ~25% performance off the top due to hypervisor overhead. So that 20 servers instantly turns into 26 servers to handle the hypervisor. Plus, noisy neighbor problems mean that each node may not always get all the resources it needs. And for systems like Cassandra, persistent local storage can be a challenge. There are serious tradeoffs with VMs.

Automate it yourself. Run multiple Cassandra processes per box. Hey, Apple does it. Others do it. You can do it, too, with enough elbow grease and scripting and pain and maintenance expense and maybe downtime if you don’t get it right. Now you’re in the database business, which is fine if that’s your thing. For most companies, it’s not their thing.

Use containers. Containers are a great option in many ways, but certainly not perfect. They give much better performance than VM’s because they have no hypervisor overhead. They make sure each node has predictable consistent performance through resource isolation, avoiding the noisy neighbor problem.

  1. They don’t have a clean way to accommodate persistent local attached storage which, for a bunch of reasons, is critical for Cassandra and other low latency distributed systems. There are options for external persistent storage. However, before rolling them to production, you really need to look closely at their performance and how they handle failure scenarios to deliver high availability. If they’re shared storage or if they do not deliver the required performance, they will struggle or fail to deliver on SLAs with scalability and high availability.
  2. Current tooling makes production deployment in containers with appropriate storage management a complex, potentially painful orchestration automation task for databases like Cassandra. Not to mention handling the operational scenarios for things like scale out and bootstrap, rolling upgrades, repair, drive failures, compute failures, etc.

Any of the three deployment options will work: containers, VMs, or multiple Cassandra processes per OS. Each has tradeoffs.

Robin Systems provides a unique software platform that nails the container-based solution for you. To make the container approach work for real world production scenarios, you need to do all of these things:

  • Flexible Node Orchestration
  • Storage Management
  • Application Awareness

No one else does the above combination in the way Robin does. And it works.

You can check out our NoSQL databases page to learn more or stay tuned for an upcoming blog detailing what you need to know about each one.

About the author

Rich Reffner, Director, Technical Field Operations, is responsible for field engineering operations and customer support at Robin Systems. Before joining Robin, during his time with DataStax Rich worked closely with Cassandra users to adopt, scale and get the most from Cassandra and related technologies like Spark, Solr, and graph databases. With 20+ years experience in enterprise software, Rich leads an exceptional Robin field technical team. He studied engineering at Ohio State and business at Berkeley.

mm

Author Rich Reffner, Director Field Operations

More posts by Rich Reffner, Director Field Operations