1. Performance, Performance, Performance!
Hypervisors can slow down your databases queries by as much as 2x.
There has been a lot of debate about the performance overhead of hypervisors. Virtualization vendors argue that they have been able to optimize hypervisors to the extent there is no real performance overhead. This is nothing but smoke and mirrors.
The fact remains hypervisors do add an extra layer of processing that adds significant overhead for short queries and writes. In technical jargon, these IO operations are called “Random IOs,” which forms the bulk of IO workload for OLTP databases that often do index-based lookups. In this study, IBM researchers found that hypervisors reduce the random IO throughput by almost 50%.
Source: IBM Research Report RC25482 (AUS1407-001) July 21, 2014
So how do you reconcile this with the claims made by virtualization vendors? For one, not all IOs are alike. Most of the non-database applications, such as virtual desktops or print servers, read and write data in bulk, resulting in lots of sequential IOs where the hypervisor overhead gets amortized over lots of IOs. But databases are different!
In addition, many times, users are forced to buy expensive flash-based storage to offset the hypervisor overhead.
So, the next time someone tells you that VMs don’t have any performance overhead, insist on a like-to-like comparison. That is, run your database application on the same compute and storage servers with and without hypervisors and make sure that workload includes OLTP operations such as index-lookup queries and single-row inserts. The result will not be very different from what the above chart shows.
2. Noise from the Neighbors
VMs can cause unpredictable database performance.
Operating Systems(OS) are designed with the assumption that each machine has exclusive access to disks. Hypervisors break that premise by encapsulating OSs inside VMs and packing many of them on a single physical server. And while each guest VM OS continues to optimize its IO pattern assuming exclusive disk access, the hypervisor mixes the IOs across all VMs and chops them in random fashion, thereby completely negating any workload/machine-specific optimizations.
This effect is called “VM IO blending” and it results in poor, unpredictable IO performance. And more the number of VMs involved, the worse the effect gets. This Virtualization Review article beautifully captures the effect of IO blending by showing how the overall throughput drops as the number of VMs on a physical server increase.
Source: Virtualization Review, “Virtualization IOs: Blended but Not on the Rocks”
This effect is also exacerbated by the nature and volatility of the workloads running in VMs. Databases perform steady IOs but if the neighboring machine happens to be an infrequently used test sandbox, the sudden spike in testing activities can results in unforeseen database slowdown and unpredictable query performance. And if that databases happens to be powering your website or your call center applications, your business can ill afford such unpredictability.
3. Thought VMs Reduce Complexity? Think Again!
VMs can further aggravate database management, patching and upgrade woes.
While spinning up a VM using a database VM template may be easy, your problems may have just begun. An uncontrolled sprawl of database software installations can not only result in licensing compliance issues; it also imposes an unnecessary burden on your DBA staff to keep the database software patched and upgraded. And given how time and effort intensive these operations are, this could create significant management burden on your database operations team. What you need is a way to consolidate databases without causing database software sprawl!
Your DBAs spend an enormous amount of time fine tuning database and query performance. Often this is their #1 priority. Hypervisors only add to that workload by introducing yet another cause of performance degradation and unpredictability.
4. Sprawling Clusters
VMs can’t tame cluster sprawl.
Remember the days when each new application required its own dedicated server and storage? Each of these dedicated servers ran one – and only one – application, leading to underutilized servers and skyrocketing costs. Hypervisors were designed to help address this problem – so that you can continue creating “dedicated servers” (VM) per application, but share physical hardware across multiple such “virtual servers.”
As long as applications were monolithic, server-level consolidation was a close enough proxy for application consolidation. However, with distributed or scale-out architecture becoming the norm for modern applications, the problem of underutilization returns to haunt us in the form of cluster sprawl. New age applications (such as microservices-based applications) and databases (Hadoop, NoSQL) are increasingly looking like traditional HPC workloads that rely on compute and storage clusters to handle large volumes of data using parallel processing. And each time you need to deploy a new scale-out application, a new cluster appears to run it. The result: a growing number of underutilized clusters that are costing businesses a lot more than they should.
VMs offer very little to address this. First, given how IO intensive these applications are, they are mostly deployed on bare-metal servers to avoid the performance penalty (IaaS Clouds such as AWS are the only places where you will see these clusters running on top of VMs for the lack of a bare-metal option). Secondly, creating dedicated VMs for each cluster is not only operationally challenging, it does not even address the problem since you will still be stuck with underutilized VMs with no ability to dynamically shift resources across clusters.
5. Data Gravity
Database snapshot and cloning ¹ VM snapshot and cloning.
Your developers and QA staff need to wait for weeks to get a copy of the production data for testing, and you may be wondering why the promised agility with VM-level snapshot/cloning isn’t helping. Well, for the simple reason that VM-level snapshot and cloning work only for the content of Virtual Machine Disk files (VMDK files) and the database data is stored outside these files on backend storage systems for performance and availability reasons.
You could fall back to storage-level thin snapshot and clones (if your SAN supports it) but your DBAs will now have to install the database software manually, create a new database instance or clusters on top of cloned storage volumes, and reconfigure the cloned data. This whole process could be extremely complex, error prone, and time-consuming. And this won’t even work for distributed applications such as Hadoop or NoSQL since creating a consistent application snapshot would require coordination across the storage volumes connected to the various cluster nodes (or distributed snapshots), which can’t be done without the application knowledge.
Given that the test environment set up could account for up to 90% of overall QA cycle time, faster database or application-level snapshot and cloning could dramatically boost your developer/QA productivity and accelerate time to market.
VMs will give some cost savings but you can do better.
Consolidating databases using VMs will buy you some savings by increasing your hardware utilization and reducing the database software licensing costs – but only as long as you are prepared to spend a LOT more on storage. You will probably need to invest in an expensive storage array (possibly an All-Flash array) to overcome hypervisor IO overheard and IO blending effects. And while databases will always benefit from faster storage, please be aware that you are investing more than you should just to compensate for VMs.
Secondly, your consolidation density will be constrained by the need to duplicate OS and database software. Since each VM has a fixed operating overhead, eliminating or even reducing this overhead can enable you to pack more databases per server, thereby minimizing your database licensing cost even further.
Yes, you can virtualize databases without losing performance. But you must be willing to spend less!
So, is there a better way to virtualize databases? Certainly. Containers provide a native, lightweight OS-based virtualization alternative that creates isolated OS partitions to consolidate databases without the hypervisor overhead. Unlike hypervisors, containers impose no IO performance overhead and run applications at bare-metal speed. Per the IBM research report referenced above, containers delivered almost 2x better random IO throughput as compared to the KVM hypervisor.
And because containers share the OS kernel, they eliminate OS and database software duplication thereby delivering significantly higher consolidation density. In a report titled “Containers: economically they appear to a better option than hardware [server] virtualization”, 451 Research analysts Jay Lyman and Owen Rogers conclude:
With bare-metal performance, better consolidation density, and lightweight virtualization, containers certainly provide a more natural platform for database virtualization than VMs. However, just as you need a lot more than the hypervisor for server virtualization, you need a holistic solution that combines containers with intelligent storage technology that can ensure predictable IO performance at the database-level and handle agile data lifecycle management for distributed applications.