Ray Kurzweil got his start by inventing a machine-reader for blind people, which he sold to Xerox in 1980. Kurzweil's second company, Kurzweil Applied Intelligent Systems, developed one of the first voice-recognition engines, capable of understanding discrete words and turning them into text or commands. Kurzweil AI was rocked by an accounting fraud uncovered in 1994, resulting in prison sentences for the company's CEO and vice-president of sales. No charges were brought against Kurzweil, who denied any knowledge of wrongdoing. The company was sold in 1996 to Lernout & Hauspie. Today, Kurzweil is again focusing on computer systems to aid the disabled. His current venture, Kurzweil Educational Systems, is developing a reading machine for people with dyslexia and other reading and learning disabilities--a group that numbers as many as 50 million Americans. BUSINESS WEEK correspondent Paul C. Judge interviewed Kurzweil in KESI's Waltham (Mass.) offices.

Q: You've been involved in speech recognition from an early stage. What are some of the key factors that are making speech systems more widely available?
Kurzweil Applied Intelligent Systems was founded in 1982, with the goal of creating a voice-activated word processor. The grail has been very large vocabulary, speaker independence, and continuous speech. One thing that makes it possible today is Moore's Law. It's only been in the last six months that we've had PCs that can support the processing requirements of continuous speech. The next step now is to integrate natural-language understanding with continuous speech dictation. People don't want to say "open file" and "close file." They want to say, "Get me the current letter to engineering." It's awkward to go back and forth between different modalities.

Q: How did you first get involved in speech-recognition technology?
I started with an interest in pattern recognition, which was the science project that I developed to win the Westinghouse Science Award as a high school student. From there, I moved into optical character recognition. That was a solution in search of a problem. That's what led me into reading machines for the blind. It combined optical character recognition and a speech synthesizer, which took the text from a page that was scanned in and read it out loud in a synthesized voice.

At the time, I had a blind guy who was head of international sales. He traveled all over the world on business for us, but he was limited to reading documents in Braille. But doing print-to-speech opened up any printed document to him.

Q: What other kinds of disabilities can be ameliorated through speech-recognition technology?
Reading-disabled people are another population that can use speech synthesis. Speech programs are also useful for people who can't use their hands, from the profoundly physically impaired to people with mild repetitive stress injuries. In fact, people who used the keyboard a lot and developed RSI [became] an urgent early market for speech recognition. Now, dictation products that can handle 80 to 90 words per minute are better than all but the fastest typists.

Computers are an ideal technology for overcoming the handicaps of disabled people. We're not creating computers that are far-ranging cybernetic geniuses. When these systems go outside their area of expertise, they start to flounder. But a disabled person is a normal intelligent person, with a narrow deficit. That's a perfect match for today's computer technology.

Another disabled technology is a sensory aid for the deaf. Most deaf can't lip read unless they are close. It doesn't work over the phone. That's a significant handicap for deaf people, not to be able to speak over the phone. Real-time text-to-speech and speech-to-text can address that. The technology is good enough right now. It needs to be packaged in the right way. Ultimately, I can imagine a screen with a speech-to-text reader embedded into a pair of eyeglasses that deaf people could wear.

Q: What about applications for general use that will incorporate speech recognition?
Over the next several years, speech technology will be providing the most effective interface for everyone. Computers have gradually moved towards human ways of communicating. I expect we will see over the next 12 to 24 months this technology will become relatively ubiquitous. It's like E-mail, which was around for years before people started using it. Now that we have computers powerful enough to do it, everyone uses it.

Q: What are the key obstacles to widespread use of speech-recognition systems?
The technology is far from perfect. One key principle is that the systems should learn as they go. Accuracy continues to be the most important parameter, because it's critical for acceptance. We've got systems that are 97% or 98% accurate now. But getting from 98% to 99% accuracy takes a lot of work.

Another obstacle is natural-language understanding. It's not a yes or no feature, but something that will get built in gradually. It makes sense, for example, that speech technology should move into Web browsers. The universe of knowledge on the Web is huge; speech is a good way for people to get access to it. But speech-enabled browsers won't be able to understand the subtleties of a request for another couple of decades.

So the key obstacles I see ahead are 1) improving accuracy; 2) improving natural-language understanding, to the point where people can search for articles on the Web using normal conversational commands; and 3) opening up foreign languages to non-native speakers.

Q: Do you see any particular applications in the next few years?
Over the next 24 months, continuous speech dictation products are going to become very common. I expect we'll see a translating telephone, for example. If you're talking to someone who doesn't speak English, it would be immensely helpful to be able to converse with them over a telephone, using a speech-recognition system that could translate back and forth. By the middle of the next decade, I expect it will be ubiquitous.

The PC will become like a personal research assistant. The computer will clarify things, ask questions about what you want it to do, the kinds of things you're looking for, and then go do it. It's the same kind of discourse people have when they are working together.

Q: How soon before natural language capabilities are built into these systems?
Talking to a processor is a narrow task, although not as narrow as we once thought. Within one year you will be able to talk to all of your major software programs. Ten years out, we will be able to have substantive human discourse with computers. By 2020 or 2030, I think computers will surpass basic human intelligence.

Q: You had a close view of Microsoft's decision to invest $45 million in Lernout & Hauspie last fall, by virtue of your advisory role at L&H after it bought Kurzweil Applied Intelligent Systems. What was Microsoft's rationale for that investment?
First, L&H has a very strong patent portfolio in basic speech-recognition technology, thanks partly to patents acquired with Kurzweil AI. Second, Microsoft was interested in L&H's core speech-recognition technology. Third, L&H has extensive capabilities with foreign languages. That's very important to Microsoft, which sells software in many different languages. Fourth was L&H's strength in the vertical market of medical dictation systems. Those systems draw upon some pretty sophisticated medical knowledge systems, which understand the requirements of HMOs, Medicare, and a host of other parameters that doctors operate within. Microsoft also has licensed L&H's text-to-speech technology. It's a formidable array of related technology that works well together. Microsoft and L&H are pooling technology and resources. Our speech technology will go into Microsoft Office and eventually into the operating system itself.

Return to main story

Updated Feb. 12, 1998 by bwwebmaster
Copyright 1998, Bloomberg L.P.
Terms of Use