Printer Friendly
The Free Library
14,560,361 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Automatic Speech Recognition Fine-Tunes Self-Service.


Imagine you are a customer and you have just completed a call with your insurance company. For the first time you can recall, you hang up smiling. You have received all the requested information about your policy quickly, without spending an eternity on hold or having to bear those long, annoying touch-tone menus -- and the call was completed without ever talking to Noun 1. talking to - a lengthy rebuke; "a good lecture was my father's idea of discipline"; "the teacher gave him a talking to"
lecture, speech

rebuke, reprehension, reprimand, reproof, reproval - an act or expression of criticism and censure; "he had to
 a live agent.

Does this sound like science fiction? Welcome to the new millennium, where more of the classic touch-tone IVR (Interactive Voice Response) An automated telephone information system that speaks to the caller with a combination of fixed voice menus and data extracted from databases in real time.  systems that have dominated the enterprise for 30 or more years are being provided with major face lifts. More to the point, perhaps it's a new set of vocal chords.

Voice recognition technology has come of age and, when integrated into business-critical systems such as IVR and office automation systems, it can provide a new level of service at a surprisingly reasonable cost. Replacing functions such as the basic hierarchical dual-tone multi-frequency Dual-tone multi-frequency (DTMF) signaling is used for telephone signaling over the line in the voice-frequency band to the call switching center. The version of DTMF used for telephone tone dialing is known by the trademarked term Touch-Tone  (DTMF (Dual-Tone MultiFrequency) The type of audio signals that are generated when you press the buttons on a touch-tone telephone. See also DMTF.

DTMF - Dual Tone Multi Frequency
) menus and complex dialogs employing natural language understanding, automatic speech recognition (ASR (Automatic Speech Recognition) Using voice recognition to replace keypad entry for telephone voice menus. Typically used to speak the digits 0 through 9 insted of keying them, ASR systems may be able to recognize a limited vocabulary. See voice recognition and AVSR. ) is finding its way into enterprises conducting e-business and carriers deploying voice-activated dialing and automated directory assistance.

Thanks to language modeling, sophisticated grammars and accuracy tuning tools, speaker-independent ASR engines can attain an accuracy rate of 97 percent or better, rivaling that of a live agent. Combined with natural language understanding, this allows callers to navigate an application without having to follow a strict menu structure common in a typical IVR system. For instance, a caller who wants to transfer $100 between his or her bank accounts need not listen to a series of prompts such as "for transfer, press one" and "for checking account, press two."

All that is required is a verbal caller request such as, "I'd like to transfer $100 from my savings account Savings Account

A deposit account intended for funds that are expected to stay in for the short term. A savings account offers lower returns than the market rates.

Notes:
 to my checking account, please." The application responds by prompting the caller to articulate the account number and, upon validation, handles the transaction appropriately.

ASR is only part of this remarkable achievement. Text-to-speech (TTS (1) See text-to-speech.

(2) (Transaction Tracking System) Software that monitors a transaction until completion. In the event of a hardware or software failure, it ensures that the database is brought back to its former state before the attempt to
), an application that uses basic computer ASCII text Alphanumeric characters that are not in any proprietary file format. See ASCII file.  and simulates speech, has produced quality very close to the natural human voice. This human-like automated response allows callers to listen and understand with ease rather than struggle through the tedious, monotone mon·o·tone  
n.
1. A succession of sounds or words uttered in a single tone of voice.

2. Music
a. A single tone repeated with different words or time values, especially in a rendering of a liturgical text.
 sound that has been the hallmark of TTS for 35 or more years. TTS technology is capable of simulating actual speech while maintaining the appropriate prosody prosody: see versification.
prosody

Study of the elements of language, especially metre, that contribute to rhythmic and acoustic effects in poetry.
, speed, voice inflections and other characteristics that are important to human communication.

While businesses experiment with voice technology, the industry itself has accelerated the development efforts of linguists, dia log designers and speech technology engineers to create a broader selection of vastly improved products. To aid this development effort, tools are available that range from grammar and vocabulary to call flow and dialog design, enabling the creation of extensive, complex and accurate applications. Additionally, speaker verification, which is a biometric technology, provides unsurpassed security over the telephone when coupled with traditional passwords or account numbers.

In this global business environment, supporting multiple languages is vital. Currently, most of the languages spoken in North and South America South America, fourth largest continent (1991 est. pop. 299,150,000), c.6,880,000 sq mi (17,819,000 sq km), the southern of the two continents of the Western Hemisphere. , Western and Eastern Europe and Asia are supported through speech recognition, while developing areas such as India, the Middle East and Africa are either available or in development.

VoiceXML:

The Emerging Standard

Voice technology is bridging yet another gap to access the enormous expanse of information contained on the Web. VoiceXML, a scripting language born of the same family as HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
, allows voice applications to be served up to speech browsers in the same way that HTML pages are served up to the traditional Web browsers. The similarity of VoiceXML to HTML makes it easy for developers to create voice applications that can leverage the existing Web infrastructure and enable companies to use existing investments with voice access to information on the Web.

VoiceXML is a particularly compelling, emerging technology for voice applications and it represents the first potential standard for voice applications. Originating with the VoiceXML Forum, a consortium of companies that includes Motorola, IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) , Lucent and AT&T, the VoiceXML standard is now the responsibility of the Worldwide Web Consortium (W3C (World Wide Web Consortium, www.w3.org) An international industry consortium founded in 1994 by Tim Berners-Lee to develop standards for the Web. It is hosted in the U.S. by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (www.csail.mit.edu/index.php). ), an organization with a long record of establishing technological standards. The involvement of the W3C, coupled with widespread developer acceptance of the VoiceXML specification, will promote interoperability among diverse voice applications and businesses around the globe.

Connecting The Voice Application

Voice recognition over the telephone has created some challenges for telephony equipment vendors. Often when a caller's voice is transmitted over the traditional public switched telephone networks (PSTN (Public Switched Telephone Network) The worldwide voice telephone network. Once only an analog system, the heart of most telephone networks today is all digital. In the U.S. ) to an ASR engine, it can be garbled and difficult to understand. Additionally, satellite repeaters and other telephone equipment can introduce echo, static or noise, which negatively affect the accuracy rate of the speech recognition engine.

Voice over IP (VoIP) technology has become prevalent in large enterprises and when introduced, can cause packet loss, latency and jitter A flicker or fluctuation in a transmission signal or display image. The term is used in several ways, but it always refers to some offset of time and space from the norm. For example, in a network transmission, jitter would be a bit arriving either ahead or behind a standard clock cycle  that affect the voice sample. To counter these negative effects, equipment vendors are designing telephone network interfaces that provide superior echo cancellation, noise filters, jitter buffers and caching to improve voice quality and deliver excellent speech recognition. The accuracy of the ASR, of course, remains the major factor in user adoption of voice applications.

The Session Initiation Protocol (protocol) Session Initiation Protocol - (SIP) A very simple text-based application-layer control protocol. It creates, modifies, and terminates sessions with one or more participants. Such sessions include Internet telephony and multimedia conferences.

It is described in RFC 2543.
 (SIP) standard is also emerging throughout VoIP networks to handle call control in a distributed network. Easier to use and more flexible than other protocols, such as H.323 and Megaco, SIP is becoming the preferred protocol for voice application developers. SIP has won widespread adoption by high-profile organizations such as Microsoft, which has integrated the SIP standard into its latest operating system.

Adoption Is The Key

Voice recognition over the telephone has reached a critical mass, demonstrated by the response of businesses to dramatically increased end user adoption. For service providers, voice applications provide a competitive differentiation that can drive revenue. Additional benefits to wireless carriers include promotion of the safe use of cell phones while driving and the increase of usable "minutes" that the carriers sell on their networks. Carriers recognize that speech technology represents more than another enhanced service. Speech recognition improves the usability of existing services and allows for the expansion of new, revenue-generating applications, such as instant conferencing and instant messaging services.

An increase in employee productivity, customer satisfaction, sales automation and more efficient service centers all contribute to the bottom line of any enterprise. Currently, there are a wide range of revenue-generating voice applications available that include customer relationship management systems (CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. ), sales automation, e-business systems such as stock trading and voice banking, and more sophisticated interactive voice response (IVR) systems.

IVR vendors are now scrambling to integrate speech recognition into their products, propelled largely by competitive pressures, although this differentiation could be short-lived once it becomes commonplace in the IVR. Many IVRs have reached a limit on functionality constrained by DTMF. Fortunately, speech can broaden the services that an IVR can provide.

Voice recognition and voice-enabled applications have hit the mainstream, and we should expect to see an explosive growth of these types of services now and in the near future.

Steve Parsons is director of product management for the New Network Services division of NMS See NetWare Management System.  Communications (formerly Natural MicroSystems, www.nmss.com). In this position he is responsible for product marketing and management of HearSay hearsay: see evidence. , the company's high-density voice portal platform, integrating NMS telephony hardware with best-of-breed speech products.
COPYRIGHT 2001 Technology Marketing Corporation
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2001, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Parsons, Steve
Publication:Customer Interaction Solutions
Geographic Code:1USA
Date:Oct 1, 2001
Words:1224
Previous Article:VoiceXML In The Real World.
Next Article:Technology Partners Offer New Services To Ensure Customer Success.
Topics:



Related Articles
New Speech Technologies End The Madness Of Traditional IVR.
WORK BY NIST RESEARCHER SUGGESTS PERFORMANCE BREAKTHROUGH IN SPEAKER RECOGNITION.(National Institute of Standards and Technology)(Brief Article)
LUMENVOX ANNOUNCES LINUX SUPPORTED SPEECH RECOG. ENGINE V2.5.(Product Announcement)
Phonetic Systems diversifies with new language offerings. (New Products).(Brief Article)
SpeechPearl XML. (IT News).
Working Solutions adds enhanced transcription tuning.(Working Solutionz Software)(Brief Article)
Phonetic Systems solutions available on Microsoft Speech Server 2004.(New Products)(Brief Article)
ScanSoft unveils OpenSpeech Recognizer 3.0.(New Products ...)(Brief Article)
LumenVox's Speech Tuner supports Nuance 8.5.(SPEECH-WORLD[TM])(Brief Article)
Altitude Software launches Altitude Voice Portal.(NEWS BRIEFS)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles