Go To Businessweek.com

BW Mall - Sponsored Links

Buy a link now!

text size: T T CEO Guide to Technology September 07, 2011, 12:05 AM EDT

Getting a Handle on Big Data with Hadoop

The flood of information from social media and elsewhere is propelling companies' use of free and customizable software called Hadoop to manage it

By

Wal-Mart Stores, struggling to translate its brick-and-mortar success to the Web, is using free software named after a stuffed elephant to help it gain an edge on Amazon.com in the $165.4 billion U.S. e-commerce market.

As customers flock to social media, Wal-Mart expects sites such as Facebook and Twitter to play a bigger role in online shopping. By analyzing what social network users say about products on those sites, the world’s largest retailer aims to glean insights into what consumers want.

With its online sales less than a fifth of Amazon’s last year, Wal-Mart executives have turned to software called Hadoop that helps businesses quickly and cheaply sift through terabytes or even petabytes of Twitter posts, Facebook updates, and other so-called unstructured data. Hadoop, which is customizable and available free online, was created to analyze raw information better than traditional databases like those from Oracle.

“When the amount of data in the world increases at an exponential rate, analyzing that data and producing intelligence from it becomes very important,” says Anand Rajaraman, senior vice-president of global e-commerce at Wal-Mart and head of @WalmartLabs, the retailer’s division charged with improving its use of the Web.

Walt Disney, General Electric, Nokia, and Bank of America are also using Hadoop. The software can be applied to a variety of tasks including marketing, advertising, and sentiment and risk analysis. IBM used the software as the engine for its Watson computer, which competed with the champions of TV game show Jeopardy.

Wal-Mart’s Big Bet

For all its girth in retail stores, Wal-Mart’s online operations—started more than a decade ago—are still dwarfed by Amazon.com. According to analysts at Wells Fargo Securities, Wal-Mart has about $6 billion in online sales, compared with Amazon.com’s $34.2 billion in 2010 revenue.

The retailer is making a big bet on Hadoop, so-called open-source software that was started by a group of Yahoo! developers. One of the challenges of Hadoop is getting it all to work together in a corporation. Hadoop is made up of a half-dozen separate software pieces that require integration to get it to work, says Merv Adrian, a research vice-president at Gartner. That requires expertise, which is in short supply, he says.

Still, Hadoop is riding the “big data” wave, where the massive quantity of unstructured information “presents a growth opportunity that will be significantly larger” than the $25 billion relational database industry dominated by Oracle, IBM, and Microsoft, according to a July report by Cowen & Co.

This year, 1.8 zettabytes (1.8 trillion gigabytes) of data will be created and replicated, according to a June report by market research firm IDC Digital Universe and sponsored by EMC, the world’s biggest maker of storage computers. One zettabyte is the equivalent of the information on 250 billion DVDs, according to Cisco Systems’ Visual Networking Index.

Data Spending Growth

The increasing popularity of Hadoop software also mirrors the growth in corporate spending on handling data. Since 2005, the annual investment by corporations to create, manage, store, and generate revenue from digital information has increased 50 percent to $4 trillion, according to the IDC report.

About 80 percent of corporations’ data is the unstructured type, which includes office productivity documents, e-mail, Web content, in addition to social media. By contrast, Oracle sells companies its Exadata system to manage huge quantities of structured information such as financial data.

“Hadoop plays in a much larger market than Exadata and is a materially cheaper way to process vast data sets,” says Peter Goldmacher, an analyst at Cowen & Co. in San Francisco.

READER DISCUSSION