Trend update: Speech and voice self-service

META Trend:Spurred by lower transaction costs, the adoption of speech-oriented industry standards (e.g., VXML, SALT) will accelerate during 2005-07 and simplify the development of speech-enabled self-service applications, which will extend beyond the call center into the broader business. Vendor consolidation will continue through 2008, influenced by integration and packaging in the struggle for differentiation. Vertical packaging and expansion into intelligent applications (e.g., knowledgebases) will be readily available in 2009, reducing customization to less than 20% of the total effort.

Enterprises are expected to provide convenient, consistent, and accurate information, enabling customers to use the path of least resistance to get what they need. Customers desire a quick route to information, using many different interaction channels, as illustrated by the evolution of customer support – first with live call- center agents, followed by interactive voice response systems, then via the Web (including e-mail), and currently with speech recognition systems. Speech deployments have enjoyed nearly 50% growth year over year since 1999, a trajectory that is expected to continue as the spectrum of applications increases beyond simple stock quotes and customer service. In coming years, leading companies envision using speech recognition in more complex problem-solving and translation applications. Customers will use self-service applications initially, but if this proves cumbersome or simply does not work, callers will head directly to live agents, dramatically increasing service costs. Speech recognition capability in converged/pervasive devices will largely remain targeted at command-and-control applications through 2006/07. By 2008/09, as processing power and memory are enhanced, we expect speech recognition to be used for more mainstream application interfaces (e.g., dictation, input to search engines). However, environmental concerns (e.g., noise) will hamper adoption in uncontrolled environments (e.g., office, factory floor).

By using this technology in the customer interaction center, the customer experience is fully controlled by the enterprise. Intuitive and easy-to-use applications encourage customers to access information through self-service channels, not only improving service but reducing the cost of delivering it. When systems are implemented properly, the human voice is the most intuitive and widely applicable method of interaction. Similar to the much-hyped discussion of outsourcing as the cure to cutting cost from operational budgets, speech recognition systems also play a key role in reducing cost. While many sources cite savings of 25%-35% by displacing live customer service to lower-cost labor forces, self-service speech technology can cut cost even more dramatically. By avoiding the cost of the live interaction altogether, speech recognition can cut operational cost by another 75%-85% under the cost of offshoring alone. Using speech technology, savings also come from many other areas: telecommunications, project management, training costs for the outsourcer’s agents and supervisors, downtime, transition time, etc. As a bonus, true 24×7 service is available to customers. While we are not suggesting that a customer never speak with an agent, users should consider that workload shifted offshore tends to be first- or second-level inquiry – a need that can be fulfilled via self-service, specifically speech-driven models.


Our research indicates that currently more than 80% of speech deployments require extensive customization, since users are largely unable to take advantage of reusable dialogs. Recently, vendors have started offering a small number of prepackaged applications, but the reusable components are still limited to less than 40%. This will improve with the evolution of standards such as Voice Extensible Markup Language (VXML) and Speech Application Language Tags (SALT). In fact, more than 90% of new deployments are exploiting these standards, in some form, along with components from the proprietary application vendors (e.g., ScanSoft, Nuance). These “voice browser” standards take advantage of Web servers that are already in place, providing the availability and flexibility to scale from small to very large enterprise needs. The functional requirements and need to leverage existing investment are no different for speech technology than any other technology within the organization. Fortunately, the operational payoff is very good – by making a single change in the Web application, users can quickly and easily change speech applications. Because the underlying technology is XML, the more expensive voice application remains untouched, reducing (and in many cases removing) the requirement for expensive professional services resources.

Vendor Consolidation

As with any maturing technology, the speech market is currently in the throes of consolidation. In the past three years, ScanSoft scored an enterprise hat trick with Philips, SpeechWorks, and Phonetic, while Genesys snapped up Telera early in the VXML game. The Telera technology provided Genesys with an immediate revenue stream and continues to grow both within the industry and the company. Vertical and intelligent applications will continue the requirement for systems integration partners, but creativity will be required to differentiate these providers. However, through the ever-growing standards-based application development, we expect more enterprises to take over the daily operational aspects of these technologies.

Cost: Speech Has Its Price

Getting started in the speech recognition area can be costly. The initial development costs for a speech recognition application are generally 3x the cost of licenses and equipment. This may sound high at first; however, the track record of speech applications has shown very favorable return on investment – often less than nine months from system acceptance to breakeven. Alternatively, there are hosted options run by application service providers that are viable for many enterprises, particularly those that are ramping into the technology. These provide attractive “per transaction” pricing to help ease the initial burden, though there will be both resource and financial startup requirements. When pursuing this alternative, we recommend enterprises secure the option to purchase the application and systems integration for deployment within the enterprise IT span of control. This enhances the value of the investment and ensures options in subsequent decisions – guaranteeing competitive pricing and service levels going forward.

Breaking Out of the Call Center and Into the Enterprise

Much like its IVR predecessor, the speech application is basically an “electronic agent.” As such, its most important attribute is its ability to conduct dialog with the user, determine user needs, and provide an appropriate response – much in the same manner that a real person does. To accommodate intelligent applications, speech will require integration with underlying search technologies that include artificial reasoning, such as case-based reasoning or neural network algorithms, which enable search threads to continually evolve (rather than simply executing a series of discrete searches and refining the results from one search to the next). We expect this to be a key market supplied by partnerships with a few best-in-class vendors to mimic human dialog. Furthermore, this enables business units (other than call centers and help desks) to exploit this technology for access to knowledge repositories and problem solving – taking self-service to the next level.

Bottom Line: Speech recognition technologies are maturing and standards usage rapidly increasing. Deployments and major enhancements to current applications must consider tight integration with existing information resources to transition from pure customer service requests into intelligent applications.

Business Impact: Enterprises must continue to look beyond the call center for self-service opportunities.

