Something to talk about: Voice recognition

Sponsored By: AWS

What’s New in Amazon’s AI Toolkit? Meet Lex Polly and Rekognition

Lex is a fully managed service that uses voice and text to build conversational interfaces that mimic the way humans think and communicate. Powered by the same conversational engine as Amazon’s Alexa, it reduces the multi-platform development effort and allows developers to build and deploy conversational bots on mobile platforms. Common uses include building an automated customer support agent that answers questions; building custom bots that connect to enterprise data resources, and controlling connected devices.

Want to turn text into high quality spoken output? Look no further than Amazon’s Polly, a service that enables existing applications to speak and creates opportunities for a range of speech-enabled products, such as automobiles, appliances, and mobile apps. Choose from dozens of lifelike voices in multiple languages. Simply send text to the Amazon Polly API and Amazon Polly will convert it to speech and send an audio stream that you can either play immediately or save for future use in an audio file. Developers are able to use standardized Speech Synthesis Markup Language (SSML) to customize pronunciation and control speech variables such as volume and pitch.

Amazon Rekognition is an artificial intelligence designed for deep, learning-based image recognition. Based on the same technology developed by Amazon to analyze billions of images each day for Prime Photos, Rekognition uses a probability-based model to detect and identify objects, scenes and faces in images. Why scroll through endless photos looking for shots of people, places and things when Rekognition can detect them in a matter of minutes and present you with numerical proof of the probability of accuracy.

How does a city in Western Canada city distinguish itself from other contenders in a bid to become home to Amazon’s second global headquarters? Winnipeg’s approach is a YouTube video in which a celebrity football player tours the city with Amazon’s digital assistant, Alexa. The jury remains out on which city will win, but there’s no uncertainty about the future of voice recognition and artificial intelligence.

With the launch of new voice-controlled devices, voice recognition will soon be ubiquitous, both at home and in the workplace. Today’s toddlers will grow up thinking it’s natural to talk to their lights, locks, automobiles, appliances, and entertainment gear. When they have homes of their own, AI, voice and robotics will be integrated into almost every aspect of daily life. There may still be the occasional knob and switch, but most will be used only as manual alternatives.

Your remote will soon be old-fashioned

Damien Dutton

In a June 2017 blog entitled “I’ve Seen The Future and It’s All Talk”, marketing expert Damian Dutton predicts a voice revolution when it comes to accessing services and maintaining a competitive edge. “We believe that voice is going to be increasingly important in interactive marketing,” says Dutton, CEO and Founder of Beeliked, a UK-based, digital marketing platform. “I guarantee that in a few years time, using a remote to control the functions on your TV or buttons on your car stereo will seem very old-fashioned. Why would you fiddle around working out which button to press when you can just say what you want?”

As part of a digital campaign to herald the release of a novel by Da Vinci Code author, Dan Brown, Dutton’s company created an online video experience featuring Amazon’s Polly – a cloud-based text to speech service that is rapidly changing the way businesses engage with consumers. After voting on several different covers for the book, guests are welcomed by Brian, Polly’s British English male voice, and Brown signs a virtual book with the cover they have just chosen.

Alexa, take a memo

As market leaders such as Amazon, Apple and other technology vendors compete for top billing, early iterations of voice-based technologies have grown to include services that integrate easily with entertainment apps, e-books, e-learning, personal assistants and public address systems. Advances in artificial intelligence and machine learning have made it possible for text to speech services, such as Polly, to improve the consumer experience with natural language processing.

2017 as the turning point

The logical extension of advanced speech and listening capacities is a smart environment, where voice control serves as the key interface in a connected ecosystem. Speaking this year in Las Vegas at CES, the annual showcase of consumer electronics, Shawn DuBravac, chief economist of the Consumer Technology Association, described 2017 as a turning point for text to speech technologies. According to DuBravac, we are presently on the verge of a new era of computing in which computers will reach parity with humans when it comes to translating speech into text.

With speech technologies fast approaching a 100% accuracy rate, there are implications for far-reaching areas, such as supports for the blind. The Canadian National Institute for the Blind is using an iPhone app called BlindSquare to provide voice directions that help users in one Toronto neighbourhood navigate the interiors of the businesses they frequent. Similarly, in the UK, the Royal National Institute for the Blind is using Polly to transcribe periodicals into audio files.

The enormous amount of cloud-based data is also leveraging other AI tools for customer engagement, including scalable, cost-effective apps that use neural network models to identify objects and faces within images. Commuters in Berlin’s Suedkreuz station, for example, receive warnings that they are entering a facial-recognition zone, where video cameras are presently using biometric technology to identify the images of volunteers who have contributed their passport photos to a test database. If things go well, stations across the country will soon be recording facial images and checking them against those on wanted lists.

Endgame: Simplifying the customer experience

As IoT connected products continue to improve, the next decade will bring unparalleled opportunities for businesses to connect with consumers through new technologies and innovative ways to simplify the consumer experience. A 2017 report from Juniper Research estimates that by 2021 the number of connected IoT (Internet of Things) devices will reach over 46 billion. With much of our interaction with these devices expected to be verbal, there’s definitely something to talk about.

Enjoy reading this? Check our other articles in the What’s Possible series

What’s Possible 2025 | Deep learning and artificial intelligence

What’s Possible 2025 | The citizen experience

What’s Possible 2025 | Smart government 


Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.

Jim Love, Chief Content Officer, IT World Canada

Sponsored By: AWS

Suzanne Robicheau
Suzanne Robicheau
Suzanne Robicheau is a communications specialist based in Wolfville, Nova Scotia, where working remotely continues to fuel her passion for new mobile technologies -- especially on snowy days.