Printer Friendly

Voice search: 'beam me up, Scotty' for everyone.

With the Klingons in hot pursuit, Captain James Kirk whipped out his communicator and issued voice instructions. As the Klingons moved within striking distance, Kirk calmly said, "Scotty, beam me up."

Before the Klingons could vaporize the prudent captain, Kirk flickered, turned into little sparkles, and materialized in the U.S.S. Enterprise starship's transporter room. Who had time to futz with a dollhouse-sized keyboard while a Klingon war party was advancing with malice?

What's not to like about talking to a mobile device or having voice communication available within any computing session or application? After watching a couple of Star Trek episodes in 1968, I knew that voice interaction was the Spock-logical way to interact with computers. Touch was not what I wanted to do. Voice was easy, convenient, and faster than pawing keys and performing gestures, and it was conversational. The problem was that in 1968, there were mainframes and no BlackBerries other than fruit, no Apples other than Beatle tunes, and no Googles other than an arcane math term.

The New Voice Interfaces

Today, voice interfaces are starting to shout. I can pop into an AT&T or any other mobile device store and buy a device that works similar to Captain Kirk's communicator. There's no "Beam me up," but there is "Find pizza," "Call home," and a clutch of other functions that no longer require finger-to-keypad acrobatics. This is progress after 42 years.

Companies such as Huawei (www.hua promote their voice technologies giving pride of place to voice search. This Shenzhen, China-based company is just one of dozens of companies pushing voice as the next big thing in human computer interaction. Half a world away, I can download a "free toolbar" from and "start searching the web" by using only my voice.

Googler Hugo Barra, a director of product management, said in August 2010, "Twenty five percent of those using Android 2.0 are already using voice search" ( future). Google executives pointed out that its voice search feature delivers 70% accuracy. As important are Google's increasingly aggressive steps to push its voice technology into its communications products. When I logged into my Gmail account in late August 2010, I discovered that I could initiate voice calls from that application. I asked myself, "Is this a Skype killer?" My vision is blurry, particularly when looking into the future. I do see the bright white line that Google is following. That arrow straight rule leads to voice ubiquity: actions, search, and conversational computing.

Better Search-to-Gadget Technology

We have been testing software that converts spoken words to text and to computer instructions for many years. The early systems were (in a word) horrible. Today, out of the box, the voice functions on the BlackBerry and Apple devices on my desk work reasonably well. There are three reasons for the improvement of speech-to-gadget technology: 1) Processors in a modern mobile device are capable of performing as well as some desktop computers did 2 or 3 years ago. Moore's Law has made computationally intense operations a commodity function in many devices; 2) engineers and scientists have found numerical recipes that eliminate the laborious training sessions some of the early speech recognition systems required; and 3) there are new methods that tap into databases of phonemes and then rely on advanced mathematical processes to pluck the "meaning" from the thicket of possibilities in milliseconds. The most forward-pushing methods combine on-phone capabilities with cloud-based resources. For most types of voice interactivity on today's smartphones, latency or delay is not an issue for most users.

The popular media focus attention on the voice search capabilities of companies such as Google and Apple. Google continues to make some remarkable voicedevice functions possible. I learned that it is possible to access Google's enhanced voice functions from devices equipped with the most recent version of the open source Android-operating system. Google's extensions to its voice search capability makes it possible to send a text message (SMS or short message service document) by clicking a button, saying the command and the name of the person to whom the message should be sent, and then speaking the message. When I see people texting while they're driving 65 mph on a Kentucky gravel road, I hope these thrill seekers shift to Google's voice-enabled devices. Google has other tricks up its sleeve. For example, ask the Android-equipped device for directions or send an email. A savvy user can tell an Android 2.2 and above device to ring an alarm, locate a particular type of music, and find a particular image without any keyboarding required.


Understanding Natural Language

Apple captured headlines with its acquisition of Siri, a vendor of mobile search. Siri offers a voice-based interface. With the deal rumored to be in the $200 million range, Apple seems to be moving along a path that is similar to Google's. The Siri twist is that it has developed a "mobile assistant," that is, the technology can understand natural language, not just a word or a simple phrase. Asking a question out loud strikes me as a more natural action than typing a question into a search box. You can navigate to and search for Siri Assistant. According to Apple's Siri information, "Just ask Siri to book restaurants, movies, taxis and more." How much more? Apple is not saying and, as with Google, you can use the functionality today.

But Google and Apple are not the only companies working to change search from punching a keyboard into talking directly to a device. Voice interaction with a computer is important to professionals as well. Nuance Communications, formed with the merger of ScanSoft and Nuance in October 2005, produces Dragon Naturally-Speaking, a voice-to-text product that runs on desktop computers. Nuance's Medical Mobile Search application makes it possible to speak a medical term into a mobile device. The Medical Mobile Search application parses the search instruction and retrieves information from medical information sources, including MEDLINE. A verbal request is quicker than fumbling with a minikeyboard.

The broad question is, "How does voice search affect information retrieval?"

The proliferation of voice-enabled applications could be interpreted as early rumblings of an interface earthquake. When I instruct my mobile device to "locate gasoline," the search result I want is a map. In a perfect world, the map would be in real time, show my location, and present graphic indicators of the nearest gas station. When I want more information about a particular gasoline station, I want to say "more information" and have the mobile device tell me what I want to know.

The voice search is performing a number of tasks that are often not required for a keyword search on a traditional search engine. The magic includes the voice processing to formulate the query, hooking into the mobile device's geolocation function, sending the query, mashing up necessary data, and delivering a graphic map with visual identifiers for the gas stations. What looks like a straightforward function combines a textbook of algorithms automatically. The user just talks.

The user experience (UX) experts, or what I call "interface specialists," have an important job to do. These professionals have to take the outputs of the search system and present them to a user in a form that is easy to understand, readable regardless of the type of display on the user's gizmo, and deliver the answer. But in my experience, search systems have not been particularly good at providing exactly what the user wants or needs. Traditional information retrieval has been able to generate laundry lists of results and suggestions that require the user to pick and winnow many links.

Relieving the Headaches

Many of these "hits" are not germane to the user's query. Facets or suggestions can be more confusing than instructive. What happens when interfaces cannot deal with the information the voice-controlled search system generates? The answer is "unhappy users." With search systems generating dissatisfaction among 65% or more of a search system's users, a good experience is now as important as the quality of the search results. One interview subject told me, "Can you make this headache go away?"

On a recent trip to a client location, we ran voice queries on devices from two different manufacturers. Each mobile device used a different voice recognition technology from well-known vendors. In order to avoid legal hassles, I will stop short of identifying which of the industry-leading devices these were.

What we learned was that both systems worked, up to a point. Each system did some things quite well and others not particularly well. For voice parsing simple instructions such as "Call home," each system was correct most of the time. When the systems failed to recognize our instructions, the miss was difficult to correct. Multiple attempts yielded multiple misses. One of our tests was the name "Yankeelov"; both systems struggled to locate the correct item. Another test was the location of a particular restaurant. Both systems struggled to deal with a location with the street name of "US Route 22/3." We had to ask a person, which is not a desirable activity in rural Kentucky.

Voice search systems today are getting better and being used by users who want to eliminate keyboard fumbling or time-consuming tasks such as making certain a word is spelled correctly. Voice search is simply faster than keying a query in certain situations. In other situations, the keyboard is essential. The mobile and medical applications for voice search are obvious. Yet, there may be even larger and more lucrative markets for the technology. These range from enterprise applications for search, customer support, and online commerce.

Companies racing to grab land in these markets include Astute Solutions ( in Columbus, Ohio. The firm asserts that 60% of the information delivered in an organization is verbal. Voice technology, according to the company, will play a role in call transcription and text-to-voice and voice-to-text linking. Astute's approach makes use of what the company calls "intent-driven search." The idea is that new content and new access modes can consider what the user wants to do, and then it retrieves and displays data for that particular contextual activity. Of course, Astute is not the only interesting company in the voice search sector.

The Chinese proverb "Let many flowers blossom" can be shifted to describe what is taking place in the voice search arena, "Let many voices be heard." Voice and voice search technology is becoming one of the juggernauts in information access and retrieval. The impact of the interaction method will alter applications, interfaces, and the nature of "answers" itself. If you listen closely, you can hear the chatter of the searches being launched by mobile voice users. Are traditional search and content processing companies listening? "Beam me up, Scotty. Scotty, are you there?"

Stephen E. Arnold is a consultant. His website is; his blog, Beyond Search, is at / wordpress. Send your comments about this column to
COPYRIGHT 2010 Information Today, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2010 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Redefining Search
Author:Arnold, Stephen E.
Publication:Information Today
Geographic Code:1USA
Date:Nov 1, 2010
Previous Article:Pew report: digital natives get personal.
Next Article:DeepDyve focuses on key phrase search.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters