Google's Flu Snafu and the Reliability of Web Data
Photograph by George Marks/Getty Images
The Web is full of data—much of it meaningful—but there’s some question as to how much we should actually rely on it. The latest evidence comes at Google’s (GOOG) expense, with some researchers questioning the validity of Google’s Flu Trends algorithm. They say the service, which estimates the number of flu cases around the world by analyzing trends on Google’s search engine, vastly overestimated this year’s season in the U.S. compared with more traditional methods of measuring flu cases.
But this snafu is just a microcosm of a broader debate over how much stock we should put in Web and social media data, and in what cases it’s most valid. It’s hard to figure out how much we should value speed and scale over quality of data. Millions of (presumably) younger people proactively searching or tweeting about a topic provides a huge and theoretically unbiased data set, while traditional methods of phone calls or focus groups reach a smaller number of (presumably) older people who know they’re being observed, but who also are answering questions directly relevant to the research at hand.
The exact details of the discrepancy are explained in a Nature article published on Wednesday, but it appears to be a case of a lot of data that didn’t mean what Google thought it meant. Google’s search data covers almost the entirety of the Web-surfing world and, in theory, can see outbreaks coming before they hit because it can watch the flu-related searches intensify in volume in real time. The Centers for Disease Control and Prevention says Google Flu Trends usually tracks very closely with its own data and can deliver results days faster, Nature writer Declan Butler reported.
Researchers think this year’s discrepancy might have something to do with hyped-up media reports leading to a volume of Web searches for flu-related terms that was disproportionate—almost double, nationwide—to the actual number of cases. The CDC claims about 6 percent of the U.S. population was affected with flu-like symptoms during the peak period.
On the other hand, one project called Flu Near You, which relies on volunteers to report cases of flu among their friends and family, estimated a number closer to (albeit lower than) the CDC’s official statistics, perhaps because the data is based on clinical definitions of “influenza” and relies on people expressly reporting known cases. However, Flu Near You claims less than 45,000 participants and, according to Nature, covers only 70,000 people.
Responding to my inquiry about the discrepancy, a Google spokesperson sent the following statement:
“Flu Trends is meant to be a complementary tool to the surveillance systems used by the CDC. Since its initial launch in 2008 and through this flu season, Flu Trends has accurately predicted the start and peak time of flu season. However, this season our models estimated a higher influenza like illness rate than the Centers for Disease Control in some regions. As we do each year, we will be performing a model analysis and potential model update to improve the accuracy of the tool.”
And while Google’s predictions might be prone to the undue influence of a fear-mongering media environment, CDC researcher Lyn Finelli told Nature she’s even more skeptical of efforts to track flu outbreaks using Twitter data. She cites a low signal-to-noise ratio and a population of largely young-adult users that doesn’t align with the country’s overall demographic makeup.
To the contrary, however, Johns Hopkins University computer scientist Michael Paul told Nature that he’s a big believer in Twitter data, especially because it generates a large data set that’s less susceptible to sample errors than smaller-scale projects such as Flu Near You. He claims to have developed a model that can accurately track the flu using Twitter, something a handful of other projects are already working on.
But flu statistics aside, questions over the validity of Twitter, Google, and other websites as data sources are nothing new. Last year, for example, I profiled a company called the Dachis Group that has devised a method for tracking companies’ presences, buzz, and sentiment on social media. It claims its algorithms for ranking the buzz around Super Bowl XLVI advertisers were far more accurate—or at least yielded drastically different results—than USA Today‘s traditional AdMeter rankings of Super Bowl ads based on phone-based polling.
Although people appear generally willing to do away with phone surveys and other marketing-based polling efforts, there’s a lot more skepticism when it comes to using the Web to predict political elections and gauge response to culturally popular events such as presidential debates or the Olympics. I covered both sides of the debate in October, as pre-election fever was in full force and many people were atwitter about Twitter’s tweets-per-minute counts during the presidential debates. What side experts fall on seems to depend on how much they trust the demographics, the subjects themselves, the sample size, and how well someone can actually analyze sentiment in text.
Even on Google, politics has proven that interest doesn’t necessarily signify intent. Leading up to the presidential election in November, Mitt Romney was trending quite a bit higher than Barack Obama in search volume. Election night, however, was a different story, with Obama winning in a landslide.
Perhaps the best advice on how to deal with Web data comes from Harvard epidemiologist John Brownstein, who told Nature, “You need to be constantly adapting these models, they don’t work in a vacuum. You need to recalibrate them every year.”
As Web usage and users change along with the world around them, there’s really no guarantee that a single data point means the same thing or has the same effect from year to year. Even search is under attack by companies trying to proactively surface content for consumers before they know to look for it.
When accuracy is paramount, no place—Twitter, Google, the telephone, or the wisdom of crowds—is the holy grail; they’ll all have to play a role.
Also from GigaOM:
Cloud and Data Fourth-Quarter 2012 Analysis (subscription required)