Scene: Look what I got myself into.
I just provisioned a Hadoop Cluster. And here is how the journey to multi-tenancy started…
Overheard a conversation among geeks and techies—just another name for geeks
Bob: Hey, we just got started on Hadoop last week
Bill: We are brainstorming on Hadoop Strategies. What is multi-tenancy?
Bob: Had Cloudera and Hortonworks present to us. Think we will go with Hortonworks. Start with a few POCs.
Me: Hey this sounds very interesting. Maybe we can use some of this stuff too. BTW, what the heck is Hadoop?
Bob, Bill, Jill and a few assorted people: Words like Big Data, Petabytes, Analytics are bandied about. I think I get the gist.
Armed with all this half-baked information I decide to do the same thing. Have calls with Cloudera and Hortonworks. Identify a few POCs and hit the road to Big Data.
Couple months later two of my POCs are online. 250TB has been ingested. Couple workloads are up and running. We are now at the “what next?” stage.
Management decision: Onboard a couple more workloads – this means – capacity planning, need to add 10 servers to the existing 20-node cluster. Expand the cluster yadda, yadda, yadda.
Hardware procurement cycle is 6 weeks, another 2 weeks to ingest data, onboard workloads etc. and we are looking at a 2-3 month deployment cycle for just this additional capacity.
Now imagine doing the same Project with Robin.
We procure the 20 bare-metal servers, ingest the data, onboard the 2 applications and we are up and running in about the same time. When it’s time to expand the POC there is no need to add new servers. We create the required number of clusters in the same shared physical hardware, we ingest the data as well as workloads, and are up and running in about 2 weeks.
Time saved = 6 weeks; Resources saved = 10 new servers.
Worth reading: Global retailer case study
This multi tenancy edge is just one key Robin value proposition. Others are:
a. Virtualized Storage
b. Data Sharing
c. Agility with One-Click Provisioning
d. Efficiency of Containers
e. Performance with SSD caching
f. Elasticity with Separation of compute and storage
g. Datalake – Aggregation of HDFS data and heterogenous storage such as HDFS, Ceph etc
h. High Availability – Monitoring, alerting, failover
i. Quality of Service – Production, QA, Sandbox
More stories from Big Data Trenches