Compute and Storage for Data
“Apologies, Captain. I seem to have reached an odd functional impasse. I am, uh … stuck.”
Star Trek: The Next Generation, “The Last Outpost”
Data is the life blood of modern data-driven enterprise. It is the fuel that drives digital innovations and is fast becoming the currency of the future. In an increasingly data-centric world, companies will succeed or fail based on their ability to effectively collect, store, and harness massive volumes of data for strategic business imperatives. So it is no surprise that modern data applications such as Hadoop, NoSQL, Spark and ElasticSearch form the cornerstone of the next-generation enterprise IT. A sizable chunk of new IT spend is going toward so-called “big data” or “data-driven” projects. Yet the success of those initiatives and their ROI is still in question. According to Wikibon, businesses realize a return of only 55 cents on every dollar invested as against expected $3.50. There is clearly a disconnect between what businesses expect and what IT is able to deliver today.
Current IT Infrastructure
A key factor behind this disconnect is the “impedance mismatch” between the legacy infrastructure and the needs of modern data-centric applications. The current IT infrastructure dates to a time when the need of the day was for multiple applications to run on a single machine. It is compute-centric in the sense that it was designed with the primary objective of maximizing compute resource utilization. The assumption was that data would be available as needed without any real cost. Hypervisors and virtual machines did a great job of solving that problem.
In contrast, modern applications are highly distributed and consume massive amounts of data. You can no longer ignore the cost of storing and moving data. High-end storage arrays are simply unviable from a cost perspective, and commodity infrastructure lacks enterprise-grade resilience, necessitating that applications make multiple copies of data. This causes storage budgets to inflate beyond the reach of even the most resourceful organizations. We have been able to buy some time via hyper-converged appliances that co-locate compute and storage so that the data is available locally. To be fair to the hyper-converged approach, it is the first step in at least thinking about data and storage in the context of virtualization. But it is a Band-Aid that goes only so far – compute and storage nodes require fundamentally different performance/capacity tradeoffs. That is why a one-size-fits-all approach that tightly couples them to hyper-converged nodes cannot scale as the cluster sizes and the amount of data grows.
Provisioning a Data Pipeline
These limitations lead to multiple problems – it takes a long time to ingest vast amounts of data and that leads to long deployment times. It takes long time to provision a data pipeline because we have to copy data for each application. The data sprawl from these multiple copies leads to increased security risks and higher storage budgets. And the increased network traffic and disk IOs lead to longer latencies and performance hits. This in turn leads to cluster sprawl since the only way to meet your SLAs is to deploy dedicated physical clusters, and that means lower resource utilization and higher costs.
The only way to solve all these problems is to take a data-first approach. An approach where efficient management of data is the primary objective. An approach that places applications where the data is, instead of copying data to where the applications are. We can then get not just 1-click provisioning, but instant provisioning of new applications, better performance from faster queries, and lower TCO from compute networks that can scale out to 100s and 1000s of nodes and storage networks that can easily scale out to petabytes, exabytes and beyond.
Robin Systems was started with a singular goal: to reimagine and redefine the IT infrastructure for today’s data-centric enterprise. This, no doubt, is an extremely challenging and ambitious goal which can be realized only by the highest caliber engineering team and visionary investors. We are blessed to have both. We are creating exciting technology breakthroughs that will change the shape of modern IT forever. We can’t hold our excitement to share the details with you in the near future. So stay tuned.
Also, drop us a note if our vision excites you enough and you share our passion. We are always looking for dreamers and doers!
For more information on how ROBIN Hyper-Converged Kubernetes Platform decouples compute and storage to break the data impasse – read this.