BusinessWeek Logo
Innovation August 18, 2008, 11:17AM EST

IBM's Speech Recognition

Demand for the technology is expected to rise dramatically over the next few years, and IBM's speech research group is focusing on forming partnerships to take it to market

There aren't too many good-news stories coming out of Iraq, but here's one. The U.S. military is bridging the communications gap between its soldiers and Iraqis by tapping some innovative speech recognition technology from IBM Research (IBM). Using a laptop computer or PDA, soldiers speak into a microphone and the software translates what they say in English into Arabic. Iraqi soldiers or civilians see and hear the words in Arabic, and their answers are immediately translated into English. About 10,000 of these systems are in use in the battle zone.

But what's a boon for the U.S. military highlights a conundrum for IBM Research, which provides the technology gratis. When the military selected speech recognition technology for a new medical records network, it chose an offering from market leader, Burlington (Mass.)-based Nuance Communications (NUAN). For all of IBM's expertise and resources, the 3,000 or so scientists in its basic research facilities worldwide face a major challenge to shepherd their innovations from the lab into the marketplace.

Partnering Up

David Nahamoo, the chief technology officer for IBM Research's speech and translation division, is out to change that. On Aug. 18, Nahamoo announced a new strategy at SpeechTEK 2008, a gathering of the leaders in the speech recognition industry in New York City. Rather than trying to push its technology mainly through IBM's product and services divisions, the speech research group is focusing on forming partnerships with other companies to take the technology to market. Partners include Vlingo, the company that provides speech services for Yahoo! oneSearch (YHOO); PhoneTag, which converts mobile voice mail to text; and Jajah, which offers real-time phone translation between English and Mandarin. "We can find partners, spread the risk, and improve our ability to address these markets," says Nahamoo.

IBM has been performing research into speech recognition for four decades. Some of the technology has found its way into products sold by the company's software and services business, notably in the auto industry. But the technology hasn't had the kind of impact that Nahamoo and his bosses believe is possible, in applications including autos, mobile phones, call centers, medical systems, and transcription services. The issue for IBM? That each of these applications on its own represents a relatively small market. That's why IBM needs partners who are experts in different niches. "This new strategy gives very talented people in IBM an outlet for their work," says William Meisel, president of technology consulting firm TMA Associates.

A Combined Technology

Overall, demand for speech recognition technology is expected to rise dramatically over the next few years as people use their mobile phones as all-purpose lifestyle devices (so barking "find pizza" into your phone would load directions to the nearest pizza parlor). In-car entertainment and navigation systems are increasingly controlled by voice commands. This growth in adoption is being fueled by steady improvements in speech recognition accuracy.

Speech recognition isn't one technology but several combined. You start building a voice recognition engine by recording words, phrases, and sentences, and putting them in a database. Then you create a library of the specific pronunciations of the different words to be recognized. Then you map the sounds in the recordings to the word pronunciations.

Reader Discussion

 

BW Mall - Sponsored Links

 

Magazine

Current Issue

BusinessWeek Cover