Every trend study that I have read in 2018 mentions Voice Search as a critical trend to watch out for. When Google CEO Sundar Pichai mentioned that “20% of mobile search queries are voice searches on Google” during his keynote for IO 2016 the world took notice.

Today we know very little about voice search and how it will affect us. Google has promised that they will soon launch voice analytics on search console. This new feature will allow webmasters to analyze how their site performs on voice search queries.

If one has to probe deeper into what voice search could mean to countries like India, you need to follow the big four (Amazon, Google, Apple & Facebook) and the progress they have made with the technology.

Voice Search: The Present

Context & Intent

Google launched its new feature Voice Match last October. This new feature allows Google to recognize individual voices and provide personalized info like commute times and your daily brief, as well as your favorite music. Google Home supports voice match for up to 6 users.

Alexa also allows you to create your own voice profile. The brand provides developers with the option of building personalized experiences into new skills. You can delete your voice profile at any time in the app, and voice profiles not used for three years will automatically be deleted. If you deactivate a device, your voice profile information will automatically be deleted from the cloud.

Apple has also been teaching Siri to recognize your voice. To unlock the device the company uses a combination of a special phrase and authentication of your unique voice to ensure that Siri only responds to you.

Languages

If you add the dimension of language to voice search, it unravels the complexity involved. Last year Google’s Speech Recognition API included 30 more languages taking its total tally of languages supported to 119, however, Dialogflow (formerly API.AI) currently only supports 15 languages excluding dialects.

At Google, to incorporate new language varieties, the team starts by collected speech samples from native speakers for common phrases used. To improve the accuracy of voice search queries, they train the model using machine learning using samples over time.

Meanwhile, Apple follows a similar model but transcribes accents & dialects by hand to allow its system to understand the exact representation of the spoken text. The team also captures a range of sounds in a variety of voices to design an acoustic model that tries to predict words sequences.

Apple’s Siri can speak 21 languages localized for 36 countries. Microsoft Cortana, by contrast, has eight languages tailored for 13 countries. Amazon’s Alexa currently supports Japanese & German besides other English locales.

“The only way to leapfrog today’s limited functionality versions is to open the system up and let the world teach them.” – Dag Kitllaus, Co-founder of Siri (acquired by Apple) & VIV (acquired by Samsung)

Support for local languages makes voice search more inclusive for the internet. For instance, Siri currently supports Mandarin & Cantonese in China but will soon support Shanghainese, a special dialect of Wu Chinese spoken only around Shanghai. To make assistants or voice search smarter, you need to open it for people to improve.

Voice Search Quality guidelines

To improve the quality of voice search results, Google has a team of quality raters who evaluate each query on dimensions like information satisfaction, length, formulation, and elocution on Google assistant. Here are the dimensions Google is asking these raters to cover in detail:

  • Information Satisfaction: The response to the query should meet the information needs of the user.
  • Length: The response should have an appropriate length and should match the complexity of the question asked.
  • Formulation: The response should be grammatically correct and should be formulated in a way that a native speaker would answer it.
  • Elocution: In the response, the intonation of the voice should sound natural, and every spoken word should have clear pronunciation.
Turning Conversations Into Actions

Developers can use Alexa Skills Kit to create voice-first experiences for brands. With the Alexa Skills Kit, you can voice-enable smart home devices, deliver fresh content, enable hands-free cab-hailing, and more. Alexa Voice Design Guide helps developers to build skills by helping them to design voice flows, interaction models and understand intent or utterances.

Similarly, Actions for Google (using conversation actions) allows developers to design and integrate a conversational user interface into their mobile apps, website, devices and so on. Dialogflow can be used to develop apps for Google Assistant.

Voice Search: Challenges Ahead

Algorithm Bias

The algorithm used for Google Translate was recently reported to have gender bias when translating genderless pronouns. The algorithm preferred male pronouns in situations where a modern human translator would make a more politically correct choice. Google is aware of the problem of algorithm bias. Here’s a Google-produced video about it:

Language

Google uses Neural Machine Translation (NMT) Networks to increase the fluency and accuracy of Google Translate. Last year the brand used NMT to translate nine languages in India; Hindi, Bengali, Marathi, Tamil, Telugu, Gujarati, Punjabi, Malayalam, and Kannada.

To train NMT networks, the system is fed sentences from languages, which are to be translated. For instance, for a Hindi to English translation, the system is taught the same sentences in Hindi and its counterpart in English in order to understand the translation.

Identifying parallel content on the web is a challenge especially if the content is in regional language which makes the task of training machines to do the translation much harder. Also, language translation has to overcome the finner issues like nuance and tone in translations, which humans can interpret and understand, but teaching that to a machine will be much harder.

Voice Search advertising

If the only response to a voice search query is a paid one, then why would you trust the answer? People won’t ignore voice ads in the same way they might overlook ads on a screen. Also, why would anyone listen through ads to get answers to their questions? Google hasn’t shared numbers for voice search in a while as they don’t want to field questions from journalists on how they plan to monetize voice search.

In a JWT study of 1,000 smartphone users in the U.K, U.S., Germany and Spain found that many potential voice search users didn’t see the advantage of voice search over text search, with 29 percent of all non-voice users saying they “don’t see the point.”

Voice Search: The Possible Future

Voice technology providers certainly have a long way to go when it comes to tuning the assistive experiences to understand what not to do from a consumer experience. Voice is the future interaction model as it certainly eliminates the number of choices that one encounters in text search.

Donn Morril, Senior Manager, Solutions Architecture for Amazon’s Alexa (Echo) team during his session for Hardwired NYC reiterated Jeff Bezos’s quote “Alexa is a bet that they are willing to be misunderstood for long periods of time.”