A large Kubernetes-based Hadoop cluster requires support from various teams like Network Admins, IT, Security Admins, System Admins. Also, operational costs like Data Center expenses are pretty high for cooling, electricity, etc.
Hadoop runs on ‘commodity’ hardware. But these are not cheap machines, they are server grade hardware. So running a large Hadoop cluster, say 100 nodes, will cost a significant amount of money. For example, for a Hadoop node of 4000, a 100 node cluster would cost $400,000 for hardware.
Handling transient load spikes requires downtime when more CPU, Memory and IO resources are assigned to each Application.
Because different Applications require different compute and storage ratios, customers often build separate clusters for each Application. Dedicating an entire server to run a single App results in 40-60% underutilization.
As Hadoop is cutting edge technology, experts are hard to find & unfortunately they need to use management and deployment tools that are not mature. Add to that the Kubernetes complexity. As Hadoop is cutting edge technology, and Kubernetes is complex, experts are hard to find & unfortunately they need to use management and deployment tools that are not mature.
Hadoop has only one NameNode for the cluster, and if it goes down, the whole Hadoop cluster becomes inoperable. This prevents the use of Hadoop for mission critical, always-up applications.

Run Hadoop

in Kubernetes-based Enterprise environment with reduced cost and simple administration

READ WHITE PAPERREAD WHITE PAPER

ROBIN hyper-converged Kubernetes platform is a software-only solution that runs on-premises in your private data center or in public-cloud AWS, Azure, GCP environments to provide a self-service deployment of any NoSQL or big data application. Robin brings to life a 1-Click simplicity to deploy and also Snapshot, Clone, Patch, Upgrade, Backup, Restore, Scale, Control QoS of the entire application. This is done with a single mouse click or REST API call independent of the size and complexity of the application.

Customers Choose ROBIN hyper-converged Kubernetes platform to:

  • Leverage the power of Kubernetes for Hadoop
  • Slash deployment and management times from weeks to minutes
  • Share resources among multiple applications and users while guaranteeing performance isolation
  • Dynamically scale performance and capacity to meet changing needs
  • Decouple Compute and Storage and scale them independently

Focus on your Analytics Apps, not Infrastructure

Hadoop and its ecosystem apps like Hortonworks, Cloudera, Kafka, Spark, Hbase, Tensorflow, Druid and others are popular tools used in modern data analytics, AI and ML projects. However, deploying these apps typically starts with weeks of careful infrastructure planning to ensure good performance, ability to scale to meet anticipated growth and continued fault tolerance, as well as high availability of services. Post deployment, the rigidity of the infrastructure poses operational challenges in adjusting resources to meet changing needs, patching, upgrades, and performance tuning of the analytics apps.

ROBIN hyper-converged Kubernetes platform takes an innovative new approach where application lifecycle workflows are natively embedded into a tightly converged storage, network, and Kubernetes stack; enabling 1-click self-service experience for both deployment and lifecycle management of Big data, Database and AI/ML applications. Enterprises using Robin will gain simpler and faster roll-out of critical IT and LoB initiatives, such as containerization, cloud-migration, cost-consolidation and developer productivity.

Focus on Applications and not on the infrastructure with ROBIN Hyper-Converged Kubernetes Platform

Deploy Kubernetes-based Hadoop Clusters and its Services with 1-Click

  • Simplify cluster deployment using ROBIN interface — provision an entire operational data pipeline within minutes
  • Deploy, Scale, Patch, Upgrade, Snapshot, Restore with 1-click
  • Control QoS of the entire application stack all done with a 1-click or REST API call regardless of the size and complexity of the application
  • Specify service scale, compute and storage needs, enable service component, and specify data locality, anti-affinity constraints and placement hints
  • For eg – 23 minutes to provision
    • 64 node Hadoop Cluster with
      1408 CPU Cores, 4.5 TB of Memory, 1.5 PB of Storage
    • Services enabled: Atlas, Spark, Hive, Kerberos, Sentry, HDFS, namenode HA
    • Data-locality enabled for DataNodes

Self-service deployment of a Cloudera cluster on the Robin platform

Deploy Hadoop Clusters and its Services with 1-click

Data Analytics Pipelines Span Many Applications

  • One common platform to run any App across any stage of your Analytic Pipeline
  • Robin covers apps in all stages of the Analytics pipeline – ingest, store, process, server and visualize
  • Significantly reduce project delivery timelines – pick Apps from any stage to provision and end-to-end Analytics pipeline in minutes
  • Right-size, never over-provision resources

Build an Agile and Elastic big data Pipeline

Hadoop - Data Analytics Pipeline span many Applications

Maximize Infrastructure Utilization through Consolidation

  • Run multiple big data Application clusters on same infrastructure, while guaranteeing performance isolation
  • Minimize infrastructure footprint by deploying dense servers
  • Assign each application its own CPU, Memory, IOPs quota to ensure predictable performance
  • Dynamically trade CPU, Memory and Storage IOPs resources between application clusters with 1-Click
  • Assign IOPs quotas per application and dynamically adjust at runtime to meet changing needs
Hadoop - Maximize infrastructure and utilization through consolidation

Extend Compute but share Data between Hadoop Clusters with 1-Click

  • Avoid YARN job queuing and spin up compute clusters that share HDFS data
  • 1-Click Deploy Compute-only Hadoop Clusters – Deploy when needed, teardown when done
  • Customize services per cluster – Impala, Spark, Hive, etc.
  • Deploy inside Virtual Machines, Bare Metal or Cloud – Robin’s Containerized deployment maximizes utilization

Dynamically Adjust Resources with 1-Click

  • Dynamically adjust CPU, Memory and IO resources assigned to each Application through 1-click operation
  • Alternatively, use Robin’s REST API to automate dynamic adjustment of resources – increase it when the load spike happens and decrease when it goes away
  • Automate with Robin the several changes required to /proc and /sys entries – beyond what the Docker engine handles – for complex data-intensive Applications
  • Prioritize IOs from multiple Applications differently to ensure they meet their QoS settings with Robin storage scheduler

Scale to Meet Long-Term Growth with 1-Click

  • Robin enables easy permanent cluster growth, for eg. Increase the number of datanodes of a Hadoop cluster or QueryRouters of a MongoDB cluster – through 1-click operation or a single REST API call
  • Robin executes a complex workflow under the hood
    • Allocates new resources to accommodate the additional components which includes CPU, Cores, Memory, Storage, Capacity etc.
    • Generates new placement plan to determine the best servers to spawn the containers on – taking into account any anti/affinity, isolation and multi-tenancy policies that were assigned to the Application when it was first provisioned
    • Carves out new storage volumes including media type (SSD, HDD, NVMe), data protection (2 or 3 way replication), data locality, compression and encryption settings
    • Allocates new IP addresses from the pool of registered IP addresses
    • Brings the containers online with IP addresses and newly provisioned storage volumes
    • Virtualizes cgroup, /proc, /sys and sysinfo(2) for each container
    • Executes Application-specific hook scripts to make the running Application cluster aware of the newly added containers

Decouple and Scale Compute and Storage Separately

  • Deploy your big data applications in converged and/or decoupled nodes where compute and storage resources can be scaled and upgraded separately.
  • Create and programmatically attach to the compute hosts storage volumes of varying characteristics such as media type (HDD, SSD, NVMe), capacity (sub-disk, whole-disk or multi-disk), and data protection (replication).
  • Choose converged nodes for low latency applications, like MongoDB, Cassandra, Kafka or Postgres.
  • Pick the decoupled compute and storage nodes option for all other use cases including for Hadoop’s datanode services.
  • Optimize provisioning of those Applications that require specialized hardware to achieve better performance with Robin’s intelligent application placement based on node capabilities (exposed via tags). Eg – Spin Spark and TnesorFlow on compute nodes with GPUs and use storage coming from other nodes in the same rack to minimize inter-rack network traffic.
Hadoop - Decouple and Scale Compute and Storage Separately

ROBIN for Hortonworks - On-demand Webinar

Deploying, right-sizing, ability to meet seasonal peaks without disrupting availability or supporting multiple clusters on a shared platform are often seen as the most difficult challenges in a Hadoop deployment.

In this on-demand webinar, Eric Thorsen is VP, Industry Solution at Hortonworks, with a specialty in Retail and Consumer Products discusses these operational complexities that are often associated with Hadoop deployments and how they adversely impact the business. The webinar also covers ROBIN and how it can help address these challenges.

ROBIN Hyper-Converged Kubernetes Platform Business Benefits for Hadoop

Storage savings & improved performance

With enterprise-grade data protection at the storage layer, ROBIN hyper-converged Kubernetes platform obviates the need of inefficient application-level data protection, such as 3-way replication used by distributed applications. This results in 50% or more storage savings and helps improve the write performance.

Months to Minutes

ROBIN hyper-converged Kubernetes platform decides the placement of an Application, provisions containers and storage for each application component, and configures the application – thus enabling single-click deployment of even the most complex applications.

Run 24×7 with QoS

Continuously monitor the entire application and infrastructure stack with ROBIN hyper-converged Kubernetes platform to automatically recover failed nodes and disks, failover applications, and ensure that each application dynamically gets adequate disk IO and network bandwidth to deliver the Application-to-Spindle QoS Guarantee.

40% Higher Hardware Utilization

With decoupled storage from compute, ROBIN hyper-converged Kubernetes platform ensures data protection against compute failure. As no persistent data is stored on the compute nodes, compute layer can be elastically expanded or shrunk without any data movement or copy.