The speech

“Wassup?” That phrase has insinuated itself into the consciousness of most North American TV viewers. Even if you’ve never said it, you may have heard it before.

What happens when you try to input a phrase like that into your messaging so that it can be spoken to your visually impaired friend waiting somewhere in cyberspace for a hello? Frankly, most times, the technology does not understand it. And the message makes no sense.

So maybe technology isn’t the great enabler it is purported to be, but it’s getting there – it’s approaching the technology we know it could be. We’ve seen Star Trek, we’ve watched the captain have complete conversations with the ship’s computer, requesting and receiving data in the most casual of voices.

Elizabeth Herrell, research director at Cambridge, Mass.-based Giga Information Group, said when people look at the speech recognition market, they are really looking at two different technologies – text to speech and speech to text.

“Text to speech is being done in every language,” Herrell said. “AT&T came out with a text-to-speech (technology) which sounds really good – you’d be amazed. You can type the words and then listen to what you’ve typed. The quality is good. One of the problems is perception and if people hear a robotic machine talking then they feel that ‘This is a machine and I don’t know if I want to talk to a machine’. But if the speech is natural then people will be more comfortable.”

Text to speech is all about programming. The words are programmed to come out as the sounds people recognize them having, according to Matthew Ki.

Ki, marketing manager for Samsung Electronics Canada Inc., said OCR (optical character recognition) technology is becoming mainstream.

“There are many different devices for this now. There is even a pen that you can move across the word and the technology decodes what is written and brings that out in voice,” he said.

But Paul Loba, technology consultant for the Canadian Institute for the Blind (CNIB) in Toronto, pointed out one aspect of voice output technology that could stand some improvements.

“When it comes to the graphical user interface, there is room for some changes. If I am using a pages reader and it comes to a graphic, most speech output technology developers have not thought to explain some of the graphics so I don’t get to find out about any of that,” Loba said.

Paul Snayd, program director of IBM’s Accessibility Centre in Austin, Tex., agreed with Loba. “The GUI was one technological advancement that was good and bad. It made it so that the screen was better to look at – instead of characters everywhere we could incorporate graphics. But that presented a problem as to how to present that to a blind person. This is no small task.”

He added that as technology evolves, it will create challenges for vendors to overcome to ensure that those with disabilities have the best use of the technology.

“In terms of technology making new things possible for people, it does, absolutely, and it will continue to do so,” Snayd said. “If you think of what some of the technology has done already – take a home page reader for example – that was a big change.”

First companies need to make sure they have the basics down, then, Herrell predicts, they will ensure the graphics are accessible, she said.

“I know the visually impaired have a great tool in speech recognition with the voice portals that are beginning to come out, but they are still at the beginning. We need to develop the platforms and applications. It’s not like you can go to a Web site and automatically it becomes speech – someone has to rewrite that page in voice XML,” she said.

Loba, who is blind, said his talking computer and Braille display are most important to him right now.

Rejean Proulx, a database manager with IBM in Toronto, who is also blind, said his screen reader is his central piece of technology.

He said it would be nice to have something that would say, “Okay, there are two columns of text here”.

“How do we read things – left to right – that’s how I read too, but if there are columns I don’t know about then it gets jumbled.” Proulx said.

Changes in the air

Loba said there are some wireless advances he would like speech recognition technology developers work toward.

“How about a speech-enabled cell phone?” he asked. “I don’t understand why my cell phone doesn’t talk to me – it’s just a matter of putting in a chip.”

The speech recognition and mobile markets are going to be intrinsically tied together, Herrell said.

“They go hand in hand. I think a lot of the new speech applications that are being developed are going to be developed for a mobile workforce, as well as the B2B applications. I see the CRM applications for getting contact or customer information. I see the service type applications,” she said.

There is a future for server applications as well, Snayd added. He said many of the client-based speech products have been disappearing in the market.

“The companies have not done well and several have gone out of business in the past 18 months. IBM still offers ViaVoice as a client-based product, but the future of that is much more a server based,” Snayd said. “What may be bad about that type of product for the visually-impaired person is that if they are not connected to a network of some kind, it would make it difficult to use the product.”

IBM has some ideas in the works for the future of speech and voice recognition tools, but he stressed these are concepts and may never see the light of day, Snayd said.

“We’re looking at stenographic technology as an interface for deaf users. It would be a speech recognition technology so that if a deaf employee walked into your office for an impromptu meeting, the individual could hand you a microphone and as you speak into it…He or she would be able to read the text of what you’re saying on a laptop, or they could use the text as reassurance while they read lips.” That type of technology is very dependent on voice recognition technology – how it works or doesn’t work, Snayd said.

He said that as long as you came up with a limited vocabulary, that type of system should only take 20 or 30 seconds of training time.

“For planned meetings,” Snayd said, “there could be a relay service, a stenography service – we use one in Chicago – and I understand these services are growing. This allows the deaf to receive text transmissions of everything being said and they can respond vocally or through text to voice. One hearing impaired employee told me that this technology has made him feel part of the team. He is finally getting information in real time and it makes him feel more productive and part of the conversation.”

In St. Marys, Ont., IBM is using voice to text technology for the classroom. Professors are testing a tool that puts their lectures onto a large screen for the hearing impaired, and other students, to use.

“The professors and teaching aids are finding this is a great tool for all students, and it can allow for more discussion time, after which the professor can leave and his lecture will appear on the screen,” Snayd said.

Snayd didn’t mention any type of speech recognition in home appliances, but a couple of others did.

Loba said he has a talking microwave but that the price for this type of appliance more than doubles when you add that technology. “If we can put this kind of technology in the general market – you may not want (a talking microwave), but have you ever left the door open, which drains the light? Because this one will tell you. It can have uses for the general public, and that kind of large market will drive the prices down.”

Proulx said he would like to see one of those talking refrigerators everyone always mentions.

“It is important to think about webifying appliances and giving them speech capabilities,” Proulx said. “How about a digital washing machine, so I know when I hit the gentle cycle button. Technology could be a real enabler.”

He stressed that text to voice should be a part of every Web page. “Make it so that all of this is palatable to all of mankind. If people don’t want to talk/hear from the computer – give them a mute button.”

Governing speech

In section 508 of the U.S. rehabilitation act it states that the government as an employer/buyer will not purchase any equipment that is not accessible and usable by everyone, Proulx said.

“Canada doesn’t have anything like that yet. This is a big deal and it’s time this country got on board.”

In 1990, the U.S. passed the Americans with Disabilities Act, which states that all companies must supply hearing impaired persons with relay telephones, and that those working with special needs, must have those needs met.

Snayd said IBM is trying to take into account each country’s requirements and legislations while they make all their technology, hardware and software, accessible.

“Technology has yet to unravel where we all go with this, but it could be the great equalizer,” Snayd said.

One speech technology that is very pervasive is that of telephone service and call centre applications.

Steven Chambers, vice-president of worldwide marketing for SpeechWorks International Inc. in Boston, said he thinks speech recognition technology used in phone systems could be an equalizer.

“People said the Web would be a democratizer, spreading information far and wide, but it was really only for those who could afford a computer and access. Speech doesn’t do that. If you need information or a service you can use a landline or rotary phone. It means the technology has the power to democratize across all socio and economic boundaries,” Chambers said.

Voice recognition in phone systems in up to 90 per cent accuracy, although part of this is due to the fact that it is programmed for key words and phrases, he said.

One of the reasons for the phone applications being everywhere is that is just good business sense, Herrell said.

“We no longer have to make employees sit and answer questions about forgotten passwords. It can be automated. That’s the value card,” she said. “We were paying people to sit and answer every question that came in, and finally someone said, ‘How can we automate this?'”

Herrell said speech automation will not replace the people sitting at the desk – there always needs to be a point when you can talk to a person who can be a bit more intuitive than a computer ever could, and can give an emotional response if that is what is needed.