Science fiction TV shows typically portray people speaking when they interact with futuristic computers, not clacking away on keyboards or fiddling with mice. How close is speech recognition technology getting to this type of dream system?
At a recent Microsoft forum held in Singapore, chairman Bill Gates said that giving instructions to PCs by voice will become mainstream in three to four years, although he expects the computer keyboard to remain an important device, according to an AFX report on Forbes.com. Gates said Microsoft is spending tens of millions annually on the technology.
Alex Acero, research area manager in natural language processing at Microsoft Research in Redmond, Wash., qualifies the boss’s comments somewhat: “The problem is that the keyboard is a very good data entry mechanism, and the way I envision things, dictation on the desktop will probably be one of the last places where the technology will go mainstream.” A big reason speech recognition isn’t more widely used is that the technology is far from perfect, he said….toggling back and forth between voice commands and content has always driven people nuts.Dan Miller>Text “Microsoft XP products have had speech baked into them for the past few years,” said Dan Miller, analyst with Opus Research, a San Francisco-based conversational access technology (CAT) consultancy. “But toggling back and forth between voice commands and content has always driven people nuts. Compounding the problem is the need to train the darned things. And if you cough or hiccup or there’s background noise, the error recovery feature is something that only stalwarts will put up with,” he says.
Acero said researchers studying artificial intelligence and pattern recognition are making steady progress towards the Holy Grail, a more human-like system that can learn from mistakes and generalize from the context. Microsoft research is even delving into areas like programming “self-doubt” into PCs so they check with users when they encounter an ambiguous statement, and emotional detection that causes the system to react differently when users are angry or upset, for example, by adding a higher weight to a correction.
However, Acero points out that technology in general doesn’t need to be 100 per cent perfect to be useful. In the medical and legal fields, speech dictation technology is widely used, and is obviously better than the time-consuming alternative of transcribing audio tapes.
“For these applications, where the system is fed terms and context, it’s good enough if they’re about 90 per cent correct,” he said. “I don’t believe we’ll ever get it 100 per cent right, just like humans don’t. I think the right measure to use is: are users willing to use the technology, even if it’s not perfect?”
Acero believes there are more compelling reasons to use speech recognition in areas where data entry is difficult, such as hand-held devices, and telephony, where the technology is an improvement over menu-driven customer service systems that often frustrate callers.
Network-based CAT technology for enterprises is where the real action is today, rather than at the desktop level for general-purpose users. According to a study conducted by Genesys Tele-communications Laboratories, 42 per cent of organizations have already deployed or plan to deploy speech recognition in their contact centres, and 65 per cent of users prefer voice self-service to touchtone. Enterprises see a clear value in automating and improving customer service interactions, and are driving development and growth in this direction.
Case in point: Toronto-based VoiceGenie Technologies Inc., which provides the infrastructure that supports speech recognition of input from telephone systems for network-based applications, reported record revenue and 50 per cent growth in sales over last year, said Frank Tersigni, vice-president of marketing and business development at VoiceGenie.
Tersigni points out that Microsoft has yet to develop products in languages other than English, unlike its partners, Peabody, Mass.-based ScanSoft Inc. and Menlo Park, Calif.-based Nuance Communications Inc. But Miller said Microsoft is softening its English-only position and expanding its vocabularies.
“They were pretty stubborn in the first couple of years about developing products in English only, but that single language barrier is falling away fairly rapidly. It’s a question of months,” he said.
Miller believes Microsoft’s initiatives are having a salutary effect on the CAT industry by providing an impetus to “get its act together.”
ScanSoft and Nuance, for example, recently announced plans to merge. “If you look at speech licenses then the world belongs to ScanSoft and Nuance. Together, they may not be an 800-pound gorilla like Microsoft, but they’re maybe a 100-pound gorilla. You have to take Microsoft seriously even though you don’t measure their power in market share at this point,” he said.
Voice apps spreading as standards mature