(An earlier version of this story ran online.)
Over the past few decades, the National Security Agency and the Central Intelligence Agency have spent big money to get the latest in spying technology. The supercomputer maker Cray (CRAY) builds multimillion-dollar machines tuned for “pattern matching” and other analytical functions and sells the systems to government agencies. Less recognized is that, in this era of open-source software, the NSA gets direct access to the inventions of thousands of the smartest computer-science minds on the planet for free.
The popularity of open-source software among the latest generation of big-time Web players means private companies disclose to the public much of the core technology behind their data management, search, and social networking services. It started with the founding of Google (GOOG) in 1998. The search engine giant needed to collect and analyze so much data that it couldn’t afford to buy systems from big-name tech companies. Instead, Google created its own open-source software programs that ran across hundreds of thousands of computers. Yahoo! (YHOO), Facebook (FB), and Twitter have been even more aggressive about open-sourcing their underlying infrastructure.
After keeping a close eye on these developments, the NSA said in 2009 that it was building a system based on Hadoop, a software program for processing vast amounts of data that Google and Yahoo had popularized. The agency also set up its own open-source project for data mining called Accumulo. Among the citizen coders who’ve contributed to the NSA effort are employees of Silicon Valley startups (Hortonworks), cybersecurity firms (Endgame), and federal contractors (you guessed it: Booz Allen Hamilton (BAH)). The leaked NSA PowerPoint presentation shows that the agency considers Hadoop and MapReduce, another program designed for handling big data sets, crucial to its surveillance efforts.
The NSA is fortunate that so many engineers spend their time creating exactly the kind of technology the government needs: programs that can collect huge volumes of information and analyze it for patterns. The social graph technology that Facebook has popularized is a spook’s fantasy, showing how people relate to each other and even finding nonobvious relationships between people. “Open-source tools are becoming the infrastructure that every company is putting themselves on,” says Bob Gourley, chief technology officer at advisory firm Crucial Point and a former member of a U.S. Department of Defense cyberdefense group. “Why would large government enterprises be any different?”
Washington still relies on specialized, pricey systems; the CIA’s venture capital arm, In-Q-Tel, funds these types of companies out in the open. But when you hear about the NSA sucking up petabytes of information every hour, you can be sure that programs developed by your favorite consumer Web companies are helping to power the effort.