China leads the way in Microsoft voice interfaces

Voice-interface applications being developed at a Microsoft Corp. lab in Beijing may lead to an explosion of Internet use in the world’s largest country and later find their way to customers around the world.

A team at Microsoft Research China (MRC) is developing voice recognition software that lets users control a PC and dictate text using speech, as well as speech synthesis software that shapes the tone of a computer’s “voice” based on the structure of the sentence. Speech-based control of PocketPC handheld devices also is under development.

It’s all part of a broad effort at the lab to devise easier ways for the Chinese to use computers. They now use a keyboard that is decidedly designed for use with phonetic – especially European – languages.

However, the research may also be of benefit elsewhere, allowing users everywhere to be able eventually to surf the Web, send and retrieve text messages, dictate documents and operate a computer simply by speech.

The difficulty of text input for Chinese users makes the need for another input method urgent, said Eric Chang, research manager in the speech group at MRC. To input Chinese text using a keyboard requires knowing a phonetic system for rendering Chinese words and choosing from among a variety of possible matches of each syllable to a Chinese character. Another method, handwriting recognition, is difficult because of the complexity of Chinese characters, Chang said.

In the upcoming Chinese version of the Office XP application suite, due to go on sale in the third quarter, Microsoft will include the capability to control applications and dictate text by speaking Mandarin Chinese. The dictation function uses context to choose among several characters that sound the same, and a spell-check function can underline combinations of characters that seem ambiguous.

Although some experienced computer users in China have adapted to phonetic input methods using the keyboard, wider embrace of Internet use over PCs and handheld devices probably will require easier input methods, said Sean Zhang, managing director of Microsoft (China) Research & Development Center, which includes MRC.

“Even for one character, there are so many ways to spell it, it just kills us,” Zhang said.

The voice recognition technology Microsoft has developed already is a faster input method than typing, except for users who are used to typing Chinese using a phonetic system, Chang said.

Zhang envisions what he calls a multimode input system, which would offer future users the options of inputting words via handwriting, speech recognition and a numeric keypad. The different methods would be appropriate to different devices, such as PCs, mobile phones and set-top boxes.

Another application of voice recognition being tested by Chang’s team will allow users to navigate a Web site by voice. He demonstrated calling up Microsoft’s Encarta digital encyclopaedia via a Compaq Computer Corp. iPaq PocketPC and entering a search term, in English, by speaking into the device’s microphone. A server called up the list of entries under that category in Encarta and displayed them on the device.

The advent of 115Kbps GPRS (General Packet Radio Service), a big step up from current speeds on most wireless networks, will be critical for adoption of the technology, he added.

The voice research team headed by Chang also has developed voice-synthesis software that can read Chinese a sentence at a time, rather than one syllable at a time, which allows for a more natural speech flow. The software, which Chang demonstrated at the lab, samples from syllables in 10,000 recited sentences.

The goal, Chang said, is to make synthesized speech enjoyable to hear, whether for retrieving e-mail over a phone or listening to a typed letter to make sure it sounds right. The new system is a step above an earlier version that simply produced one syllable after another in a monotone.

“You really have to be motivated to listen to that voice,” Chang said.

A remaining challenge for voice synthesis is that it requires a lot of data storage – 500MB for the software demonstrated last week – so the function probably will remain server-based in the near future.

MRC’s mission is not just to explore technologies better suited to Chinese users but to make advancements for Microsoft users around the world, explained Hongjiang Zhang, assistant managing director and senior researcher at MRC. Because the early leaders of the lab set it on a course for research mostly in multimedia – including voice, graphics and automatic indexing of multimedia content – its work has universal application.

The lab’s work in voice recognition and other device interfaces is now extending to the Japanese market and beyond, Hongjiang Zhang said.

Voice recognition technology is destined to be the major input method of the future, said Chang, a native of Taiwan who specialized in voice research in his doctoral thesis at Massachusetts Institute of Technology. The trend of processing power doubling every 18 months, as well as data storage getting cheaper and more plentiful, will help make this a reality, he believes.

“Moore’s Law is on our side,” Chang said.

– IDG News Service