Three years ago, New York City’s health department was trying to track a suspected outbreak of food-borne illness that went on for weeks at a particular restaurant. One investigator, who was a fan of Yelp (YELP), looked up the restaurant’s reviews on the ratings website and saw that diners reported they’d gotten sick after eating there. The agency created a Yelp account and sent messages to the reviewers to get more information about what they and their companions had eaten, to determine which foods were responsible for the illness.
Soon after, investigators started a program to discover whether Yelp reviews could alert the Department of Health and Mental Hygiene to food-borne outbreaks it hadn’t learned about through complaints the agency gets by phone and online. (New York’s mayor at the time, Michael Bloomberg, is the majority owner of Bloomberg Businessweek parent Bloomberg LP.) The agency asked Yelp for a data feed of public reviews on New York restaurants and worked with researchers at Columbia University to develop algorithms that would flag suspect reviews—keywords such as “vomit,” “diarrhea,” and “food poisoning,” were part of the equation.
Public health officials have been trying to harvest useful information from social media for years with varying degrees of success. Google (GOOG) search queries appeared to be an early indicator of seasonal flu patterns, until they weren’t. Companies are trying to use data on social networks and Internet forums to learn about drug side effects.
The challenge is that Internet data is full of noise and false signals—things that look like problems but aren’t—and it’s time-consuming to investigate. “We only have so many public health resources to respond to them—most of them will not be outbreaks, so how do we narrow it down?” says Dr. Sharon Balter, a medical epidemiologist at the New York City Department of Health.
Over nine months in 2012 and 2013, the agency got 294,000 reviews from Yelp, and software identified 893 that might signal outbreaks, according to a report on the project published on Thursday by the Centers for Disease Control. Epidemiologists reviewed those and found 468 with reports of potentially recent illnesses. Eventually the department interviewed 27 Yelp reviewers who responded to its queries, and detected three suspected outbreaks that it hadn’t known about before. The agency dispatched inspectors, who found such violations as barehanded contact with food, vegetables that were served without being washed, and mice and roaches.
Balter describes the work as a pilot program and hopes the algorithm will become more precise over time. Other places are experimenting with similar ideas, notably Chicago and Utah. New York also wants to pull in data from other review websites.
The department now gets a daily bundle of reviews from Yelp, and the software identifies about 23 a week that warrant further evaluation. That compares with about 60 complaints a week—3,000 a year—that come through the city’s 311 reporting system, though some of those have to do with rodents in restaurants, not food-borne illness, Balter says.