The Amazon Echo was just the beginning of Alexa’s Voice-first Revolution

Why does Amazon’s Alexa team need 1000 people? originally appeared on Quora: the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by Brian Roemmele, Alchemist & Metaphysician, on Quora:

“We’ve been working behind the scenes for the last four years, we have more then 1,000 people working on Alexa and the Echo ecosystem … It’s just the tip of the iceberg.”—Jeff Bezos, May 31st, 2016 [1]

In 10 Years You Will Do More Talking And Less Typing

I have stated for quite some time that 50% of all human interaction with computers will be through voice-assisted AI in the next ten years [2]. I went on to say:

“What if we didn’t need to learn arcane commands? What if you could use the most effective and powerful communication tool ever invented? This tool evolved over millions of years and allows you to express complex ideas in very compact and data dense ways yet can be nuanced to the width of a hair. What is this tool? It is our voice… The last 60 years of computing, humans were adapting to the computer. The next 60 years the computer will adapt to us. It will be our voices that will lead the way; it will be a revolution.”

The Humble Echo Beginnings: “Lets Build A Self-Contained Talking Book Appliance”gt

In early 2014, the original premise of Echo (Project C) was to be a stand-alone, speech synthesizer based, book reader built around seven powerful omni-directional microphones and a surprisingly good WiFi/Bluetooth speaker (with separate woofer and tweeter). This small-scale mission by Amazon’s secret Lab126 soon morphed into a far more robust solution that is just now taking form for most people. By the time Echo shipped in late 2014, Jeff Bezos realized that the combination of Amazon Web Service (AWS), speaker independent speech recognition, and high quality speech synthesis tied to specialized affordable hardware may just about invent a new computer platform and modality. Jeff was right.

Alexa operates via discrete programs called skills. When the first Echo shipped in late 2014, it had about 20 skills. Today there are over 5,200 skills with over 100 new skills now added per day. A skill allows Alexa to interact with a particular domain, such as the Domino’s Pizza skill that allows for a direct ordering system with the company and the customer via Alexa. A skill can be viewed as a combination of an app and a website with highly organized keywords and actions.

Currently Echo, the hardware, interacts with Alexa, the software. However, in the future Alexa will move onto countless systems and apps. The first non-Amazon device was Alexa for Raspberry PI. Amazon also has extended Alexa to their Android tablets. But the most important milestone took place on November 30th, 2016 at the AWS re:Invent 2016 conference. Amazon is making advanced AI and Alexa available to developers to build on and add to their applications.

In 2016 Amazon Pressed The Turbo Mode Switch For Developers

The use cases of Alexa are quite limited for developers who want to do things that currently can not be done with an existing Alexa skill. Amazon is addressing this with a new direction. They are now bringing the same expertise via the cloud to open source the APIs that developers may consume to build intelligent applications. Called Amazon AI (AWS AI), the new service offers powerful AI capabilities such as image analysis, text to speech conversion, and natural language processing. Amazon has been granting access to the same deep learning and artificial intelligence in its retail business for enhancing the customer experience for over ten years.

Here are the major aspects released under AWS AI:

Amazon Rekognition is the rich image analysis service that can identify various attributes of an image [3].
Amazon Polly is a service that accepts text or a string and returns an MP3 audio file containing the speech. With support for 47 different voices in 23 different languages, the service exposes rich cognitive speech capabilities [4].
Amazon Lex is the new service for natural language processing and automatic speech recognition. It is the same service that powers Alexa and Amazon Echo. The service converts text or voice to a set of actions that developers can parse to perform a set of actions [5].
Amazon Lex is the same deep learning technologies Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) that power Amazon Alexa available for use in any Voice First or Voice Enabled applications. Developers can use Amazon Lex to build services and other types of web and mobile applications that support engaging, lifelike interactions. Amazon Polly will add speech response to the logic behind program using Amazon Lex. With 47 very high quality voices in 23 languages. Finally Amazon Rekognition will allow Alexa projects to also use image recognition to add visual decoding to Alexa-like solutions.

These tools are unprecedented in their scope and abilities. Amazon is literally opening up their crown jewels for the world of developers to build upon. I believe AWS AI will spark a developer land rush not seen since the release of iOS and Android.

Amazon Is Building The AWS For The Voice First Revolution

Amazon created AWS (Amazon Web Services) as a way for companies to use 1 to 1,000s of servers based on demand and other conditions. No longer did one need to hire network engineers and other staff to operate servers locally, they were centrally managed by Amazon in the cloud. This aspect of Amazon’s business has become the most profitable and AWS now powers about eight out of ten websites (Quora is hosted on AWS). It is clear that Amazon wants to do the same thing they did for web servers ten years ago in 2006.

In 2016 Amazon is creating an AWS for Voice First systems and it is very powerful. Amazon assembled a huge support staff to not only build Alexa and future products like Echo but to also build the infrastructure that became AWS AI. This was not a trivial task and the company quickly grew to over 1,000 people just working on Alexa related AI by the summer of 2016. Today Amazon is approaching 1,500 and may reach 2,000 workers in the first quarter of 2017.

It is clear that most of the world has been surprised by how fast and pervasive Voice First systems have become. For example, the Echo Dot was the top selling item on Thanksgiving, Black Friday, and Cyber Monday 2016 at Amazon [6]. It may become the top selling item in Amazon l’s history.

Google Home, a competing product to Echo, has been sold out at about 60% of Target stores this holiday season. Apple is releasing AirPod, their first stand-alone Voice First device. Finally Samsung acquired Viv, from the creators of Siri, to voice enable everything from Washing Machines to X-Ray machines.

By 2017 there may be a few 100 million Voice First devices and we are just getting started.

You Will Need An Army

The reason Amazon is growing an army is that the range and scope of the Voice First world will dwarf just about any other element of the personal computer revolution started back in the 1970s. It will eclipse the smartphone revolution at some point. We will have dozens of Voice First devices and dozens of Voice enabled devices. Amazon is all-in to be the supplier for developers as well as the Voice First revolution, while also offering the generation after generation of Echo and Alexa devices.

If my prediction is even half correct, that being 50% of human to computer interactions will be via Voice enabled AI in ten years; even a 5,000 person Alexa army will then seem quite small.

[1] Jeff Bezos says more than 1,000 people are working on Amazon Echo and Alexa

[2] There is A Revolution Ahead and It Has A Voice

[3] Amazon Rekognition – Image Detection and Recognition Powered by Deep Learning

[4] Amazon Polly – Text to Speech in 47 Voices and 24 Languages

[5] Amazon Lex – Build Conversational Voice & Text Interfaces

[6] BOOM: “Echo Dot is the best-selling at Amazon on Thanksgiving and Black Friday”—Amazon by Brian Roemmele on Accepting Payments

This question originally appeared on Quora. Ask a question, get a great answer. Learn from experts and access insider knowledge. You can follow Quora on Twitter, Facebook, and Google+. More questions:

Alexa: What is next for Amazon’s Alexa/Echo?
Amazon: What are the chances that an Echo-like device from Apple performs better than Echo?
Speech Recognition: Why is Siri important?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s