Container-based virtualization and microservice architecture have taken the world by storm. Applications with a microservice architecture consist of a set of narrowly focused, independently deployable services, which are expected to fail. The advantage: increased agility and resilience. Agility since individual services can be updated and redeployed in isolation. While given the distributed nature of microservices, they can be deployed across different platforms and infrastructures, and the developers are forced to think about resilience from the ground up instead of as an afterthought. These are the defining principles for large web-scale and distributed applications, and web companies like Netflix, Twitter, Amazon, Google, etc have benefitted significantly with this paradigm.
Add containers to the mix
Containers are fast to deploy, allow bundling of all dependencies required for the application (break out of dependency hell), and are portable, which means you can truly write your application once and deploy it anywhere. Microservice architecture and containers, together, make applications that are faster to build and easier to maintain while having overall higher quality.
Image borrowed from Martin Fowler’s excellent blog: http://martinfowler.com/articles/microservices.html
A major change forced by microservice architecture is decentralization of data. This means, unlike monolithic applications which prefer a single logical database for persistent data, microservices prefer letting each service manage its own database, either different instances of the same database technology or entirely different database systems.
Unfortunately, databases are complex beasts, have a strong dependence on storage, have customized solutions for HA, DR, and scaling, and if not tuned correctly will directly impact application performance. Consequently, the container ecosystem has largely ignored the heart of most applications— storage — and thus limit the benefits of container-based microservices due to the inability to containerize stateful & data-heavy services such as databases.
The majority of the container ecosystem vendors have focussed mostly on stateless applications. Why? Stateless applications are easy to deploy and manage. For example, they have the ability to respond to events by adding or removing instances of a service without needing to significantly change or reconfigure the application. For stateful applications, most container ecosystem vendors have focussed on orchestration, which only solves the problems of deployment and scale, or existing storage vendors have tried to retrofit their current solutions for containers via volume plug-ins to orchestration solutions. Unfortunately, this is not sufficient.
Robin Application Virtualization Platform (AVP)
Robin is a container-based, application-centric, server and storage virtualization platform software which turns commodity hardware into a high-performance, elastic, and agile application/database consolidation platform. In particular, Robin is built for data applications such as databases and big data clusters as it provides all the benefits of hypervisor-based virtualization but with bare-metal performance (up to 40 percent better than VMs) and application-level IO resource management capabilities such as minimum IOPS guarantee and max IOPS caps. Robin also dramatically simplifies data lifecycle management with features such as one-click database snapshot, clones, and time travel.
To dive deeper into this, let’s take the example of Cassandra, a modern NoSQL database, and look at the scope of management challenges that need to be addressed.
Cassandra Management Challenges
While poor schema design and query performance remain the most prevalent problems, they are rather application and use case specific, and require an experienced database administrator to resolve. In fact, I would say most Cassandra admins, or any DBA for that matter, enjoy this task and pride themselves at being good at it.
The management tasks which database admins would rather avoid and have automated are:
- Low utilization and lack of consolidation
- Complex cluster lifecycle management
- Manual & cumbersome data management
- Costly scaling
Let’s look at these one by one.
1 – Low utilization and lack of consolidation
Cassandra clusters are, typically, created per use-case or SLA (read intensive, write intensive). In fact, the common practice is to give each team its own cluster. This would be an acceptable practice if clusters weren’t deployed on dedicated physical servers. In order to avoid performance and noisy neighbor issues, most enterprises stay away from virtual machines. This, unfortunately, means that underlying hardware has to be sized for peak workloads, leaving large amounts of spare capacity and idle hardware due to varying load profiles.
All this leads to poor utilization of infrastructure and very low consolidation ratios. This is a big issue for enterprises on both – on-premise and in the cloud.
Underutilized servers == Wasted money.