Robin Systems helps Walmart speed data ingestion

By May 16, 2016News Item

With the need to speed data ingestion and share files across applications, Walmart turns to Robin Systems to reconfigure and reduce file clustering in Hadoop. Read more.

Robin Systems helps Walmart speed data ingestion

With the need to speed data ingestion and share files across applications, Walmart turns to Robin Systems to reconfigure and reduce file clustering in Hadoop.

Big data is of little use until applications can get their hands on it. Data ingestion — the process of obtaining, importing and formatting data — becomes critically important as data volumes grow and applications demand its immediate availability. Walmart Stores Inc., the world’s largest retailer, with more than 5,000 stores in the United States, turned to Robin Systems for help.

As big data gets bigger and applications require instant access for real-time streaming analytics, you have to take that data in faster, store it and prepare it for use. That’s data ingestion,” said Judith Hurwitz, CEO of Hurwitz & Associates LLC, a cloud consultancy in Needham, Mass. “For two enterprises that are otherwise equal, the one that can ingest data faster is going to have a distinct advantage.”

That was exactly the challenge facing Walmart. Through the use of a container-based platform for compute and data virtualization, the company was able to raise the ingestion speed of 250 million files by a factor of 8.5, ultimately improving query performance by 250%. These gains were achieved alongside a simultaneous cut in infrastructure, from 16 servers with 320 cores to 10 servers with 160 cores.

Despite its massive presence and technology expertise, Walmart — like other retail businesses — is struggling in a world turned upside down by the advent of digital-only retailers, typified by Amazon, according to Sushil Kumar, chief marketing officer at Robin Systems, based in San Jose, Calif.

Big data is of little use until applications can get their hands on it. Data ingestion — the process of obtaining, importing and formatting data — becomes critically important as data volumes grow and applications demand its immediate availability. Wal-Mart Stores Inc., the world’s largest retailer, with more than 5,000 stores in the United States, turned to Robin Systems for help.

“As big data gets bigger and applications require instant access for real-time streaming analytics, you have to take that data in faster, store it and prepare it for use. That’s data ingestion,” said Judith Hurwitz, CEO of Hurwitz & Associates LLC, a cloud consultancy in Needham, Mass. “For two enterprises that are otherwise equal, the one that can ingest data faster is going to have a distinct advantage.”

That was exactly the challenge facing Walmart. Through the use of a container-based platform for compute and data virtualization, the company was able to raise the ingestion speed of 250 million files by a factor of 8.5, ultimately improving query performance by 250%. These gains were achieved alongside a simultaneous cut in infrastructure, from 16 servers with 320 cores to 10 servers with 160 cores.

Despite its massive presence and technology expertise, Wal-Mart — like other retail businesses — is struggling in a world turned upside down by the advent of digital-only retailers, typified by Amazon, according to Sushil Kumar, chief marketing officer at Robin Systems, based in San Jose, Calif.

One challenge facing Wal-Mart was the Savings Catcher component of its mobile app. Shoppers use the app to scan the bar code on their point-of-sale receipts, and Wal-Mart uses the information to compare prices to other retailers that are geographically close by. If a lower price is found, Wal-Mart refunds the difference to a virtual gift card that shoppers can spend on a subsequent store visit. The ingestion process needs to occur within milliseconds of the bar-code scan, Kumar said.

Wal-Mart manages about 30 PB of data in Hadoop, with dedicated server clusters for each application, according to Kumar. “For five different Hadoop environments, there would be five different clusters,” he said. The downsides of this approach were slowed application development and testing as individual clusters were configured, and led to duplication of data, data ingestion bottlenecks and servers that were typically vastly underutilized.

mm

Author RobinSystems

More posts by RobinSystems