Speech technologies trim costs and boost service

Speech integration technology is nothing new, as any telephone caller who has ever barked back responses to a seemingly endless series of voice prompts can testify. But an improved generation of speech integration software, based on more powerful processors and emerging Internet-focused standards, promises to make the technology more useful and cost-effective.

Until recently, organizations tended to shy away from speech integration because of the technology’s complexity and cost. “I had one client who had 60 people on its (speech integration) project,” says Elizabeth Ussher, Meta Group Inc.’s vice-president of global networking strategies who covers speech technologies.

Today, preconfigured speech templates, drop-in objects and other packaged tools make speech integration development less burdensome. Hardware improvements, particularly speedier processors, also help make speech integration a more practical technology. “Speech recognition is now very widely deployable,” says Ussher. “I’m seeing clients with a return on their investment within three to six months.”

Yet another reason for increased interest in enterprise speech integration can be found in the almost exponential proliferation of mobile phones, PDAs and other portable wireless devices. Speech input/output is an attractive alternative to cramped keyboards and miniscule displays. “If I’m on my mobile phone while driving my car, I’m not going to push buttons for my account number,” says Ussher. “I’m going to wait for an agent – living or virtual.”

Enterprises looking into speech integration face two basic technology choices. The oldest and simplest type of speech integration – “directed dialogue” products – prompts callers with a series of questions and recognizes only a limited number of responses, such as “yes” and “no,” specific names and numbers.

A new and more sophisticated approach – “natural language” – to speech integration handles complete sentences and aims to engage callers in lifelike banter with a virtual call centre agent. The technology is also more forgiving of word usage. “If a customer calls Thrifty and asks about rates from JFK Airport in New York, they might say ‘JFK’ or ‘John F. Kennedy’ or ‘Kennedy Airport,'” says SpeechWorks cofounder and CTO Michael Phillips. “The system has to be prepared for the different variations that might be used.”

Directed dialogue tools, while less expensive than natural language systems, suffer from their limited recognition capabilities. As a result, they are mostly used for simple applications, such as automated switchboard attendants or credit card activators. Natural language systems have a wide range of applications, including product and service ordering, telebanking, and travel reservation booking.

A pair of emerging technologies – VoiceXML and Speech Application Language Tags (SALT – are also helping to advance voice integration. Both specifications rely on Web technology to make it easier to develop and deploy speech integration applications.

“You don’t have to reinvent the wheel and program a new interface to get speech recognition access to your data,” says Brian Strachman, a speech recognition analyst at technology research company In-Stat/MDR.

While most people think of speech integration in terms of customer self-service, the technology can also be used internally to connect an enterprise’s employees and business partners to critical information. Aircraft mechanics, for example, can use speech integration to call up technical data onto a PDA or notebook screen. Likewise, inventory takers can enter data directly into databases via speech-enabled PDAs, without ever using their hands.

The Bank of New York, for example, has tied speech recognition into its phone directory and human resources systems. Using technology supplied by Phonetic Systems, the bank operates an automated voice attendant that lets callers connect to a specific employee simply by speaking that person’s name. But in the event of a major emergency that requires entire departments to move to a new location, the employees can call into the system to instantly create updated contact information. The information then becomes available to anyone calling the bank’s attendant.

The speech-based approach is designed to help bank employees resume their work as soon as possible, even before they have access to computers. “The automated attendant was already connected to our back-end systems,” says Jeffrey Kuhn, senior vice-president of business continuity and planning. “We simply expanded the number of data fields that are shared between the Phonetic’s product, our HR system and our phone directory system.”

The biggest challenge Kuhn faced in deploying the technology was getting it to mesh with the bank’s older analog PBX systems.

That problem was eventually solved, although the interface ports on the old PBX units must now be manually set, which is a minor inconvenience.

Speech integration’s primary benefit for callers is convenience, since the technology eliminates the need to wait for a live agent. Problems handling foreign accents, minor speech impediments and quirky word pronunciations are largely fading away as software developers give their products the capability to recognize and match a wider array of voice types. “Every four to five years, speech technologies improve by a factor of two,” says Kai-Fu Lee, vice-president of Microsoft Speech Technologies.

For enterprises, speech integration’s bottom-line benefits include cheaper 24/7 user support and data access. Bank of New York’s Kuhn estimates that his system handles the work of five full-time employees.

Despite the potential benefits, CIOs shouldn’t view speech integration as a panacea to their rising call centre costs. The technology itself requires constant attention, which adds to its base cost and detracts from potential savings. “It’s labour intensive,” says Meta Group’s Ussher. “It’s not like a washing machine that runs on its own. It’s a technology that requires constant tweaking, pushing and updating.”

While speech integration will certainly become more capable and self-sufficient in the years ahead, few observers believe the technology will ever fully replace living, breathing call centre agents.

In-Stat/MDR’s Strachman says that speech integration will primarily be used to eliminate call centre grunt work, such as the recitation of fares and schedules, and to give end users a new way to access critical data. The handling of complex issues, such as technical support, will probably always require access to a live expert. “For call centre agents to stay employed, they’re going to have to be more highly skilled and trained than they are now,” says Strachman.