Go To Businessweek.com

BW Mall - Sponsored Links

Buy a link now!

text size: T T Innovation and Engineering in America September 08, 2011, 5:45 PM EDT

Data Analytics: Crunching the Future

(page 2 of 4)

Now a second wave of startups is finding ways to use cheap but powerful servers to analyze new categories of data such as blog posts, videos, photos, tweets, DNA sequences, and medical images. “The old days were about asking, ‘What is the biggest, smallest, and average?’ ” says Michael Olson, CEO of startup Cloudera. “Today it’s, ‘What do you like? Who do you know?’ It’s answering these complex questions.”

 

The big bang in data analytics occurred in 2006 with the release of an open-source system called Hadoop. The technology was created by a software consultant named Doug Cutting, who had been examining a series of technical papers released by Google. The papers described how the company spread tremendous amounts of information across its data centers and probed that pool of data for answers to queries. Where traditional data warehouses crammed as much information as possible on a few expensive computers, Google chopped up databases into bite-size chunks and sprinkled them among tens of thousands of cheap computers. The result was a lower-cost and higher-capacity system that lots of people can use at the same time. Google uses the technology throughout its operations. Its systems study billions of search results, match them to the first letters of a query, take a guess at what people are looking for, and display suggestions as they type. You can see the bite-size nature of the technology in action on Google Maps as tiny tiles come together to form a full map.

Cutting created Hadoop to mimic Google’s technology so the rest of the world could have a way to sift through massive data sets quickly and cheaply. (Hadoop was the name of his son’s toy elephant.) The software first took off at Web companies such as Yahoo! and Facebook and then spread far and wide, with Walt Disney, the New York Times, Samsung, and hundreds of others starting their own projects. Cloudera, where Cutting, 48, now works, makes its own version of Hadoop and has sales partnerships with Hewlett-Packard and Dell.

Dozens of startups are trying to develop easier-to-use versions of Hadoop. For example, Datameer, in San Mateo, Calif., has built an Excel-like dashboard that allows regular business people, instead of data priests, to pose questions. “For 20 years you had limited amounts of computing and storage power and could only ask certain things,” says Datameer CEO Stefan Groschupf. “Now you just dump everything in there and ask whatever you want.” Top venture capital firms Kleiner Perkins Caufield & Byers and Redpoint Ventures have backed Datameer, while Accel Partners, Greylock Partners, and In-Q-Tel, the investment arm of the CIA, have helped finance Cloudera.

Past technology worked with data that fell neatly into rows and columns—purchase dates, prices, the location of a store. Amazon.com, for instance, would use traditional systems to track how many people bought a certain type of camera and for what price. Hadoop can handle data that don’t fit into spreadsheets. That ability, combined with Hadoop’s speedy divide-and-conquer approach to data, lets users get answers to questions they couldn’t even ask before. Retailers can dig into not just what people bought but why they bought it. Amazon can (and does) analyze its website logs to see what other items people look at before they buy that camera, how long they look at them, whether certain colors on a Web page generate more sales—and synthesize all that into real-time intelligence. Are they telling their friends about that camera? Is some new model poised to be the next big hit? “These insights don’t come super easily, but the information is there, and we do have the machine power now to process it and search for it,” says James Markarian, chief technology officer at data specialist Informatica.

READER DISCUSSION