Cassandra/Datastax Lessons Learned: Taming the Dev/Test challenge in production eco systems

The problem with Cassandra/Datastax deployment

In our last blog post, we addressed some of the challenges of sizing a production Cassandra/Datastax cluster and how the work we are doing here at Robin systems can help with that challenge. In this post, we will talk about the other side of the Cassandra/Datastax ecosystem, all the stuff that goes on before production.

Anyone developing and planning to deploy Cassandra/Datastax is immediately confronted by the intimate relationship between the application, data model, and deployment topology. There are countless presentations, blog posts and webinars directed at these challenges. I am aware of very few that address the challenge of providing the environments needed for comprehensive support of these efforts.

Current solutions and the challenges of each

As a developer coming to Cassandra/Datastax it is very easy to spin up an instance on your laptop or in Docker and to immediately begin developing an application. You will quickly master the challenges of language specific access patterns, data modeling and perhaps even compaction strategies. Taking this approach, you will almost certainly encounter hidden problems when you attempt to scale this to a multi-node solution. I ran across one company that had built an application based on Cassandra that they were running on a single server. Unmonitored, it had grown the total size of the data to 5+ TB! This masked all sorts of problems. Key among them was a primary key structure that did not distribute the data properly. This led to massive partitions and ultimately failing compaction processes.

So why not start off your development with a multi-node ecosystem? Access to infrastructure. Most organizations simply make it too difficult for a developer to gain access and control over a multi-node eco-system. A system where they can develop and experiment as needed with a Cassandra style distributed system.

As you move along in the life cycle of a Cassandra based application you will eventually get to a point where you want to test it at scale. In this phase, you will want to fully understand the relationship between processor, memory, IO, and network. In a system that seeks to take advantage of the high-performance Cassandra offers, this process can be time-consuming and challenging. During this exercise, you will need to frequently alter the landscape including the number of nodes and resources allocated to each node. You also need to run these tests for long durations to determine the impacts of your compaction strategy choices as the data size grows.

Most organizations do not have the ability to easily change the hardware they test on from the standpoint of allocated resources. Creating, testing, destroying and re-creating ecosystems to support this process can be very time consuming and also error prone as you try to ensure consistency across the nodes in the deployment. Also, there can be large delays required in duplicating the data during these cycles.

Robin approach

So those are just a few of the challenges facing an organization that wants to support Cassandra as a primary development platform. The core challenges stem from a lack of flexibility in the available infrastructure. Robin Cloud Platform (RCP) helps organizations address these challenges as a software solution that makes it possible to easily manage a large group of compute/storage assets. These assets can then be easily allocated and controlled to support specific use cases.

Revisiting the developer use case from above. The challenge is the need to provide developer(s) with a multi-node Cassandra ecosystem that can be quickly established/discarded and iterated upon. Using Robin Cloud Platform, it is easy to support this challenge. RCP supports the ability to break a large set of resource into pools. A pool can be devoted to developers and they can then easily provision a Cassandra cluster that fits their needs as depicted in the image below.

Create multi-node Apache Cassandra Datastax ecosystem

Through something Robin calls a bundle, operations can set up a template that enforces how a Cassandra cluster should be deployed while still giving the developers the ability to quickly provision a cluster that meets the immediate need. The process of starting a new cluster takes just a very few minutes reducing the need for the developers to hold onto a cluster just in case. The pool also effectively isolates the developers from over allocating resources.

Developers frequently want to take something another developer has done and make modifications. Cassandra clusters deployed using Robin can be easily cloned. In essence, this makes it possible for a developer to get an exact duplicate of a cluster another developer may be using and to then extend/iterate upon it. Since the clone can be a “thin” clone this can reduce significantly the resources required within the developer pool.

In the use case of deployment/scale-up testing, Robin brings some unique capabilities that make this much simpler.  Robin makes it easier to deploy and re-deploy clusters quite easy. Once a cluster is up and running though, Robin makes it possible to scale the cluster both up and out through simple UI selections. For example, You are running your scale tests on a 5 node cluster with 4 cores and 16GB of memory. As the data size increases, you notice that the CPU’s are maxed out. If you are using Robin, you can scale the servers out by simply adding more cores to the nodes as shown in the image below.

Scale servers in Cassandra Datastax ecosystem by adding more cores to the nodes with Robin Cloud Platform

You could also add more memory or alter the allocated IOPs per attached volume. Additionally, scaling out or adding additional nodes is as simple as a UI command. This functionality makes it very easy to manipulate a cluster to quickly identify the best configuration to support your workload. Just in case you are wondering, Robin also makes it possible to specify the placement rules for each of the nodes and its associated storage. In practice, this makes it possible to ensure that each Cassandra node will not experience a noisy neighbor problem and that it will benefit from local storage.

Conclusion

Cassandra/Datastax is an incredibly compelling technology for deploying high-performance applications. The tight linkage between the application, database, and hardware can make this a difficult technology to deploy and operate successfully. Robin systems can dramatically reduce some of the infrastructure and life cycle problems encountered when using distributed applications and Cassandra/Datastax in particular.

Coming Soon

Robin Cloud Platform (RCP) Community Edition*

RCP Community Edition (CE) is ideal for small DevOps teams looking to get started with RCP and experimenting with container-based apps.

  • Designed to run any linux application, especially stateful applications like big data and databases
  • Spin up clusters within minutes
  • Includes QoS, Scaling, Cloning, Snapshots, Time Travel
  • “Free for Life”; auto-deployment on AWS with up to 5 nodes (any size)

*CE includes pre-packaged Cassandra, MongoDB, Elasticsearch and Big Data bundles

While you are waiting for Robin Cloud Platform (RCP) Community Edition, here’s a white paper for you.

Simplifying Data Management with DataStax & RCP - White Paper
mm

Author Cary Bourgeois, Systems Engineer

More posts by Cary Bourgeois, Systems Engineer

Join the discussion One Comment