AT&T's Watson answers the call: spanning decades, AT&T's Watson engine continues to evolve and entice developers.
Alexander Graham Bell spoke those famous words to his assistant, Thomas A. Watson, during Bell's first successful use of his invention, the telephone, in 1876. At that moment, the speech technology industry was born. And despite the industry's significant growth since then, Thomas A. Watson's early contribution to speech technology is still honored today with AT&T's Watson technology, a speech and natural language platform that has been decades in the making.
At a basic level, AT&T's Watson is a speech and natural language engine that processes and analyzes speech input, performs at least one service, and returns a result in real time.
AT&T's Watson is frequently confused with IBM's Watson, an artificial intelligence-based computer system that is able to answer questions asked in natural language. "IBM's Watson is more of a question-and-answer application, whereas AT&T's Watson is a platform," Mazin Gilbert, assistant vice president of technical research at AT&T Labs, says.
"The way we have architected AT&T Watson is that we realized that to build an application or a service, which is the ultimate goal here, there are many different technologies that need to be working synchronously. This is how you get a seamless, ubiquitous experience," Gilbert adds.
To that end, AT&T's Watson can combine speech with other modalities, such as touch screens (e.g., Find the closest Starbucks), text, facial recognition, audio files, and gestures. It can also leverage various technologies, such as natural language understanding, automatic speech recognition, text-to-speech, dialogue management, and voice biometrics.
AT&T Launches Watson APIs
While AT&T's Watson has been used in IVRs for more than 20 years, it wasn't until July 2012 that the company launched a developer program, which offered APIs for its Watson technology. The APIs enable developers to create apps and services with voice recognition and transcription capabilities, and include an open, generic API, "a sort of holy grail" of being able to transcribe speech to text, Gilbert says. "That API is trained on a million-plus words and hundreds of thousands of speakers, and that's available to developers who want to do speech recognition and don't have a clear notion of what application they need," he says.
With these APIs, developers don't have to be specifically skilled in creating speech apps; they can send AT&T audio and it can return text of what an end user has said. "We're doing this so people don't have to reinvent the wheel," Gilbert said in an earlier interview with Speech Technology.
The AT&T Developer Platform provides access to software developer kits and code samples for several environments, such as Microsoft Visual Studio, and is compatible across a number of mobile platforms.
"The whole world of mobile development is so complicated now because of all the different options," says Deborah Dahl, principal at Conversational Technologies, chair of the Multimodal Coordination Group, and cochair of the Hypertext Coordination Group at the World Wide Web Consortium.
Dahl offers that when it comes to mobile intelligent virtual assistants, the vocabulary in Nuance's Nina is the most customizable application. However, she maintains that AT&T's Watson offers a variety of contexts. Currently, there are nine speech contexts available: gaming, business search, social media, TV, Web search, general purpose or generic, voicemail to text, SMS, and question and answer.
As an example, the business search context is trained on tens of millions of local business entries, and lets users transcribe search queries. The question-and-answer context is trained on 10 million questions and enables users to transcribe questions and have the correct answer returned. "AT&T's option is a good balance between price, ease of use, and flexibility as far as customizing it for your own application," Dahl says. "The things that stand out to me are the levels of customization and the number of environments."
"The combination of these modalities is key to providing a ubiquitous experience because, given these different environments, such as sitting in your car, you want to have a very different experience than using it at home," Dahl says. "In some cases, more than one modality is required to fulfill an action. I'm not aware of a platform that brings all this rich technology into a single framework."
The API program has proved to be popular; currently there are 43,000 developers using the platform. AT&T offers a free 90-day trial period and a yearly subscription for $99. Already, AT&T uses Watson in several different use cases, such as for its home automation and security product, AT&T Digital Life; AT&T
U-verse Easy Remote for television; and for speech-connected cars with partners GM and QNX Software Systems.
With QNX, Watson's speech engine analyzes words spoken by a driver and fits them into known patterns. What has been said is then routed from the cloud to the car. The in-vehicle intent engine from QNX performs the rest of the speech analysis to figure out how to act.
"Sharing the workload across client and server offers automotive manufacturers and end users the best of both worlds," said Andy Gryc, automotive product marketing manager at QNX Software Systems, in a statement. "The server-side analysis, provided by AT&T Watson, is optimized for complex scenarios, such as a navigation application in which the driver may verbalize destinations in hundreds of different ways. The QNX client-side analysis grants car makers greater flexibility, enabling them to adapt the AT&T Watson results for a variety of in-car applications, regional aspects, or personal tastes."
The Heavy Hitters
While AT&T Watson-based APIs provide access for developers not trained to use speech apps or who work in different platforms, the company has also worked with developers who are using the full-fledged Watson engine.
"They're not just using Watson as an API; they want to customize and personalize and have proactive analytics and do sophisticated technologies like speaker verification," Gilbert says. "These developers understand the technology and want to create something unique in the market. Our enterprise customers are not looking for the technology, they're looking for an end-to-end service where speech and natural language is part of that service."
A third way AT&T works with customers is by licensing technology through a joint strategic agreement. "There's the idea of combining one plus one and [coming up with] more than two, and we can go after a market together," Gilbert says. "We can't do it ourselves, they can't do it themselves, and we join forces to create either a speech, virtual assistant biometric, or translation application for different verticals."
Interacting with Interactions
Interactions is a company in the natural language processing business that typically helps large organizations, such as Humana, Hyatt, and Marriott, build intelligent virtual assistant applications for customer support.
Enterprises can use intelligent virtual assistants in a variety of applications in both customer care and sales settings, serving to route or self-serve customers, explains Mike Iacobucci, CEO of Interactions. "In a routing environment, Interactions has deployed applications fronting a caller with 'How may I help you?' and either routing [her] to existing high-performing self-service applications or...to new Interactions Virtual Assistant self-service applications," Iacobucci says. "In a self-service environment, Interactions can handle incredibly complex self-service processes."
For example, Interactions completes Medicare enrollment applications for several healthcare providers, such as Humana. In other self-service instances, Interactions can complete processes such as creating hotel reservations or filling out insurance claims.
Iacobucci says the company has designed its operations with a human element that uses the same protocols as a speech recognition engine. It is a way of understanding extremely complex human dialogue where speech recognition historically experiences trouble, including open-ended sentences, alphanumeric data, out-of-grammar responses, and scenarios with background noise or accents.
The company's technology is focused on using the combination of human-assisted understanding and an automatic speech recognition (ASR) engine to create conversational, or humanlike, systems. "The application really doesn't know where the understanding is coming from," Iacobucci says. "Our technology determines where the understanding should come from and has a great level of understanding of languages and accents far beyond what's capable from speech recognition alone."
Iacobucci says the company designs its back-end recognition technology as part of its offering assuming that there will be a level of ASR to work in tandem with Human Assisted Understanding (HAU). ASR, he explains, takes on the parts of the conversation that can be easily automated, and HAU takes on tasks that are more difficult and not suited for ASR.
In April 2013, Interactions signed a licensing agreement allowing it to use Watson in its speech-enabled virtual assistant applications for enterprises in the customer care market. The company uses Watson as a recognition resource to build highly conversational and humanlike virtual assistant applications that are based on its technology suite. "For speech recognition, Interactions' applications leverage ... its own human-assisted understanding in conjunction with the Watson ASR engine," Iacobucci says. "We found Watson to be the most advanced, sophisticated ASR engine that we could leverage [more] than other ASR engines that we explored."
Developers need such sophisticated virtual assistants to recognize an open vocabulary, and not one that is suited just for retail or music, for example, Gilbert says. "You want natural language to go beyond recognizing entities, attributes, and intents," he says. "You want it to recognize context of words and phrases given a particular transaction. You have to include things like different kinds of parsing. These are all intelligence that you have to do as part of natural language."
Iacobucci says the company has found its partnership with AT&T is not only strong technically, but also that AT&T has been very supportive of its endeavors.
"This is really far more than a technical partnership," Iacobucci says. "Together we're building some really sophisticated, high-performing applications and delivering on the promise of speech recognition systems that are providing a great customer experience, and we're doing that hundreds of millions of times a year."
SpeakToIt is the creator of an intelligent virtual assistant for smartphones and tablets, and employs a team of 25 linguists and software engineers. The company's technology uses human-machine interfaces based on natural language interaction and predictive assistance. Its mobile application, SpeakToIt Assistant, has been downloaded more than 9 million times, most of which have been for Google's Android operating system, according to Artem Goncharuk, the company's chief technical officer.
SpeakToIt's intelligent virtual assistant can update statuses on social networking sites such as Twitter and Facebook; access content from sites such as Google, Trip Advisor, Yelp, or Foursquare; shop on sites such as Amazon; and update calendars and notes using Evernote. The virtual assistant supports English, Spanish, German, Russian, Portuguese, and Chinese (Mandarin and Cantonese) and will support French, Japanese, and Korean by the end of the year. The company relaunched the virtual assistant in December with compatibility for iOS7.
While the SpeakToIt Assistant sounds a lot like Apple's Siri, there are some distinctions, Goncharuk explains. "The SpeakToIt solution is cross platform, so you can use the same assistant based on your preferences and context on different devices. You can use the same assistant on your Android phone or iPad or in your car, for example."
According to Ilya Gelfenbeyn, CEO of SpeakToIt, the company initially started out on Android using Google's ASR. For iOS, SpeakToIt tried Nuance briefly and then switched to Watson. "We use a number of solutions depending on the platform and typically with a solution that's native to the platform," Gelfenbeyn says. "However, there are a number of solutions where ASR is not available, and in that case we use Watson."
Gelfenbeyn says that Watson offers the company the capability to do customer grammars and adjust the recognizers dynamically. "It allows us to do pretty neat things for personalization," he says. "We are looking forward to getting more options for recognizers in terms of languages and also accents.
"For us, it's a great solution in terms of flexibility," Gelfenbeyn says. "Our experience was extremely positive with the support the AT&T speech team and the API team has provided."
The End Game
Between its strategic partnerships, licensing agreements, and developer platform, it looks like the use of Watson's platform will continue to grow, especially considering that using speech is still a relatively novel concept in the world of business, aside from the call center.
"A lot of companies are still in the stage where they're trying to decide what they want to do with speech in a mobile environment," Dahl says. "They might think, 'Well, this is great, but how can it apply to my company?'"
If Gilbert has his way, that question will be answered from the thousands of developers using AT&T's APIs.
"I'm very excited [about] where we are," Gilbert says. "We're investing heavily into all these different areas. We're not trying to be the best speech company in the world, but we're using our partners and our assets to [better] position those that are in high, rapid-growth businesses."
Staff Writer Michele Masterson can be reached at firstname.lastname@example.org.
|Printer friendly Cite/link Email Feedback|
|Publication:||Speech Technology Magazine|
|Article Type:||Cover story|
|Date:||Mar 22, 2014|
|Previous Article:||AT&T and IBM: which Watson works for you? Both companies are committed to reshaping the future of speech-enabled interactions.|
|Next Article:||IBM's Watson brings cognitive computing to customer engagement: while the technology is still new, early developments are piquing interest in its...|