Printer Friendly
The Free Library
14,701,494 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Can you understand me now? Soon records and information managers will be required to address the output of voice recognition systems as an integral part of their electronic records retention decisions.


At the Core

This article

* defines voice recognition technology

* explains how the technology is used

* discusses how voice recognition will impact records management

"I would like to speak to a human being."

"I am sorry. I do not understand what you have asked. Please repeat your request."

"I would like to speak to a human being."

"I am sorry. I do not understand what you have asked ..."

So went a recent conversation with the voice recognition computer at a health provider's 800 number. The caller needed to explain a rather complex situation to reorder re·or·der  
v. re·or·dered, re·or·der·ing, re·or·ders

v.tr.
1. To order (the same goods) again.

2. To straighten out or put in order again.

3. To rearrange.

v.
 a prescription. He was fully aware that the voice recognition system he was addressing could not answer his question, one that required several sentences to explain. For some reason, the system could not interpret the phrase "speak to a human being" well enough to transfer the caller to a live operator. It was not until he spoke the magic phrase "customer service representative" that the system responded, "I will connect you with a customer service representative."

This interchange highlights the advantages and drawbacks of voice recognition technology. On the positive side, individuals can communicate in their own words and no longer have to worry about pushing a variety of buttons and listening to a decision tree. On the other hand, if the voice recognition software is unable to recognize and properly interpret the words spoken, users can become increasingly frustrated frus·trate  
tr.v. frus·trat·ed, frus·trat·ing, frus·trates
1.
a. To prevent from accomplishing a purpose or fulfilling a desire; thwart:
 and concerned that they cannot accomplish their task.

What Is Voice Recognition Technology?

Voice recognition technology describes the ability of a computer to understand human speech. More completely, according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 MSN (1) (MicroSoft Network) A family of Internet-based services from Microsoft, which includes a search engine, e-mail (Hotmail), instant messaging (Windows Live Messaging) and a general-purpose portal with news, information and shopping (MSN Directory). .com, it is "a system of computer input and control in which the computer can recognize spoken words and transform them into digitized commands or text. With such a system, a computer can be activated and controlled by voice commands or take dictation [that] is input to a word processor or desktop publishing desktop publishing, system for producing printed materials that consists of a personal computer or computer workstation, a high-resolution printer (usually a laser printer), and a computer program that allows the user to select from a variety of type fonts and sizes,  system."

The broadest type of voice recognition technology is automatic speech recognition (ASR (Automatic Speech Recognition) Using voice recognition to replace keypad entry for telephone voice menus. Typically used to speak the digits 0 through 9 insted of keying them, ASR systems may be able to recognize a limited vocabulary. See voice recognition and AVSR. ). Within ASK the most common divisions are text-to-speech and speech-to-text. In addition, voice recognition can be discrete (i.e., each word must be pronounced distinctly and separately from the next) or continuous (i.e., words flow naturally as in normal speech without breaks between them).

The Technology

Voice recognition technology is based on processing spoken words in increasingly complex phases.

* The first phase is to take sounds and convert them from the analog form of airwaves airwaves
Noun, pl

Informal radio waves used in radio and television broadcasting
 into an electrical signal. This is not different from what Alexander Graham Bell Graham Bell could refer to:
  • Alexander Graham Bell (1847–1922), recognized inventor of the telephone, however is disputed to be the second inventor of the telephone, after Antonio Meucci or maybe Philipp Reis
 did more than 100 years ago. (See "The History of Voice Recognition Technology.")

* Next, an analog-to-digital converter changes the electrical signal into a string of numbers that reflect a range of analog values.

* A set of compression techniques reduces the amount of information and highlights specific features of the captured sound that enable speech recognition.

* The compressed digital representation of the captured sound is then compared to a reference set of sounds that have been previously stored in compressed digital format as part of the system. In the simplest version, a comparison is made between the recorded speech and, in the case of English, the 40 or so phonemes (or distinct speech sounds) that are the building blocks of speech. (In fact, voice recognition systems may actually compare the digitized and compressed speech to as many as 1,024 possible classifications.)

* The systems are now ready to recognize words. Some systems approach this task through statistical models or neural nets neural nets - artificial neural network . Others match the recorded speech to words whose patterns of phonemes or classifications have been compiled as a reference standard for the system.

* At this point in the process, a measure of artificial intelligence is required. Natural language systems analyze the words as they are decoded and parse them lot grammar and meaning. It is in this phase that similar sounding words, known as homonyms, are properly decoded. A simple example would be discerning the difference between "there" and "their." Through analysis of word position and sentence structure, the voice recognition system's language component determines the final selection of words and ultimately what they mean. It was at this point that the health provider's telephone answering system failed. It could not "understand" what the caller requested.

How the interpreted information is used depends upon the nature of the system. In a command-oriented system, the decoded speech can then prompt performance of an action or completion of a task. The most familiar example is the telephone question-answer program that lets users reorder medicine or cancel newspaper delivery. In a content-oriented system, the interpreted speech is then displayed as text in a word-processing document or e-mail. Most commonly, these are dictation systems used to prepare letters, reports, and other documents.

It is important to emphasize another basic distinction among voice recognition systems. Some are voice-independent, others voice-dependent. Voice-independent systems can be used by anyone. They are usually of a much more limited vocabulary but handle regional dialects and accents well. Telephone response systems are obviously of this kind. Voice-dependent systems, on the other hand, require training to be effective. Depending upon the complexity of the recognition required, training can take as little as five to 10 minutes, while training the system for a highly technical vocabulary may take days or possibly weeks. Many commercial voice recognition products have medical, legal, or engineering technical dictionaries, which can substantially reduce the need for extended training.

Command-Oriented Systems

Command-oriented systems are becoming much more common. In addition to the telephone answering systems, commercial applications include the ability to log inventory changes in warehouses or to control a computer under adverse conditions where a keyboard or mouse would be ineffective.

A New Hampshire New Hampshire, one of the New England states of the NE United States. It is bordered by Massachusetts (S), Vermont, with the Connecticut R. forming the boundary (W), the Canadian province of Quebec (NW), and Maine and a short strip of the Atlantic Ocean (E).  police department has begun to deploy a voice recognition system in its patrol cars that allows an officer to accomplish a number of tasks with a single spoken word. For example, if the word, "pursuit," is spoken, a number of events occur. First, the flashing warning lights are illuminated, and the siren begins to sound. Next, a query is sent to a global positioning satellite system to determine the exact location of the patrol car. Finally, a message is sent notifying the dispatcher Software that determines what pending tasks should be done next and assigns the available resources to accomplish it. It may execute other programs or generate a list for human operators to follow. See scheduler.  of the pursuit in progress as well as the location and identification of the patrol car involved.

In the military arena, an integral part of the Boeing-developed joint Strike Fighter A strike fighter is a fighter aircraft which is also capable of attacking surface targets, including ships. It differs from an attack aircraft in that the aircraft remains a capable fighter.  aircraft is a voice recognition system that allows the pilot to control the aircraft through speech commands. Because the pilot does not have to focus on instruments and can concentrate on the flying environment, greater flying efficiency is achieved and essential information can be gathered, integrated, and displayed on a heads-up display A display technology that superimposes images onto the inside of the windshield to enable drivers to view the information while keeping their eyes on the road. Heads-up displays (HUDs) are also used in goggles and helmets (see head mounted display).  without manual intervention.

Security Applications

As voice recognition systems become more complex, their accuracy increases, providing opportunities for security related applications. Such systems not only recognize words but also recognize words spoken by a specific individual. Each person has identifying speech characteristics that result in specific digital patterns. By comparing a spoken phrase or sentence with the same phrase or sentence that has been pre-recorded and encrypted en·crypt  
tr.v. en·crypt·ed, en·crypt·ing, en·crypts
1. To put into code or cipher.

2. Computer Science
, a voice recognition system can positively identify an individual for admission to a highly secure area or for access to specific data.

Airports such as London's Heathrow are using voice recognition in combination with other biometric systems to ensure that those who work around air craft, baggage, and maintenance are actually the individuals who are sup posed to be in those areas. Another innovative use of the technology in air ports is an application that transforms spoken security procedures into American Sign Language American Sign Language
n.
The primary sign language used by deaf and hearing-impaired people in the United States and Canada.


American Sign Language (ASL),
n.
 via a three-dimensional, computer-generated representation of a person signing.

Visa is investigating the implementation of voice recognition technology to authenticate (1) To verify (guarantee) the identity of a person or company. To ensure that the individual or organization is really who it says it is. See authentication and digital certificate.

(2) To verify (guarantee) that data has not been altered.
 telephone transactions. In one approach, the system would record a voice sample for each Visa customer pronouncing pro·nounc·ing  
adj.
Relating to, designed for, or showing pronunciation: a pronouncing dictionary. 
 the numbers zero through nine. Subsequently, a purchaser's identity would be verified by asking him or her to pronounce pro·nounce  
v. pro·nounced, pro·nounc·ing, pro·nounc·es

v.tr.
1.
a. To use the organs of speech to make heard (a word or speech sound); utter.

b.
 a randomly selected sequence of numbers. The voice recognition system would match the caller to the prerecorded pre·re·cord  
tr.v. pre·re·cord·ed, pre·re·cord·ing, pre·re·cords
To record (a television program, for example) at an earlier time for later presentation or use.

Adj. 1.
 digital representation of that person's voice. Asking the caller to repeat a random sequence of numbers makes it harder for frauds to deceive TO DECEIVE. To induce another either by words or actions, to take that for true which is not so. Wolff, Inst. Nat. Sec. 356.  the system by using recorded voices.

Content Creation

Voice recognition systems that permit users to create substantial amounts of text without a keyboard or mouse will have the biggest impact on records management.

Perhaps the most extensive use of this type of voice recognition system has occurred in the medical field. In the past, it could take several days or weeks for a physician's or nurse's dictated notes to be transcribed and added to a patient's record. As a result, medical records could be incomplete and subsequent visits could result in improper treatment.

Most major medical centers, health care providers, and many individual physicians now dictate directly to their computers, or to computer connected handheld recorders or telephone systems where their dictation is interpreted into text. All commercial voice recognition systems have developed specialized dictionaries A specialized dictionary is a dictionary that covers a relatively restricted set of phenomena. The typical type of specialized dictionary is that which in English is often referred to as a technical dictionary and in German as a Fachwörterbuch.  for the various medical specialties Medical Specialties
See also anatomy; disease and illness; drugs; health; remedies; surgery.

adenography

the science of the description of glands. — adenographic, adj.
: pathology, hematology, radiology radiology, branch of medicine specializing in the use of X rays, gamma rays, radioactive isotopes, and other forms of radiation in the diagnosis and treatment of disease. , oncology oncology /on·col·o·gy/ (ong-kol´ah-je) the sum of knowledge regarding tumors; the study of tumors.

on·col·o·gy
n.
, etc. As digital medical records systems become more pervasive, this approach permits posting a digital file of transcribed text so that the healthcare professional can validate it online and add it to the medical record the same day or the following day.

In the legal field, voice recognition technology is invading the world of the court reporter. Rather than typing what is heard ill the courtroom, the court reporter repeats every word stated by attorneys, witnesses, judges, and other parties during a proceeding while also identifying the speakers. The reporter also describes activities as they take place, and, in some cases, marks exhibits. In addition, the court reporter can now produce real-time text and transfer it immediately following a proceeding.

Records Management Implications

The increasing use of voice recognition technology has several implications for records management. As always, the focus is on the content created by the voice recognition system. Records managers should consider several things in determining how to handle information generated through voice recognition systems.

Does the system create and preserve information? This question is the most important. Some systems merely use voice recognition technology as a tool to accomplish other tasks. In this case, speech, whether as a digital voice file or as interpreted text, is discarded once the task is completed. If the system does not maintain information, the records manager has no worries.

On the other hand, if the system creates and preserves information, then it is crucial to determine exactly what has been created and how it is preserved. Several options are possible. For example, the system may

* preserve the spoken words in a standard audio file (e.g., WAV or MP3)

* preserve the spoken words in a proprietary format

* preserve only the interpreted text file

* preserve both the audio file and the file interpreted from voice to text

Each of these cases may require a slightly different records management approach. In particular, if proprietary formats are involved, then appropriate steps must be taken to preserve the application software as well as the documentation so that retrieval will be possible in the future. If standard formats such as WAV or ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers.  are involved, only the digital files themselves need to be maintained as a part of the records management system because standard viewers and players are available.

What do we need to retain? Because the system creates an audio file or an interpreted text file and preserves one or both as part of the system operation, it does not follow that either or both should be retained for substantial periods of time. As in all records retention decisions, it is the content of the digital file and its value to ongoing business or organizational operations that must determine its retention.

Thus, because of regulatory requirements Regulatory requirements are part of the process of drug discovery and drug development. Regulatory requirements describe what is necessary for a new drug to be approved for marketing in any particular country. , one organization may need to retain both the audio file and the interpreted text file to validate the accuracy of the interpretation. The same may be true of answers recorded in a command-oriented system where the user speaks the words "I agree" and thereby creates a contractual relationship. In other instances, the organization may have the option to choose whether to retain one or both of the outputs from a voice recognition system. In command-oriented systems, no records may be created or those that are may not be retained.

Do voice recognition systems create a new type of record? No. Some voice recognition systems create records; others do not. It is the records manager's responsibility to assist users in identifying records that are created by the systems and then linking them to the business value that determines their retention. Like e-mail, voice recognition technology is a tool involved in the records management process. It is the content of the voice recognition system's output that is the focus of the records management process, not the delivery mechanism.

Where do we go from here? Some new applications of voice recognition technology currently being developed include intelligent voice-controlled agents (network-based robots) that will place phone calls, track down people you want to reach, and let you know whether these people want to talk to you; or help you find merchandise, remind about appointments and birthdays, and control equipment and appliances from any location. Other new applications include devices for the home and office that will be network-accessible and voice-controllable, such as copiers, refrigerators, and entertainment systems.

In the future, records managers can be certain that they will increasingly address the output of voice recognition systems as an integral part of their electronic records retention decisions.

RELATED ARTICLE: The history of voice recognition technology.

Although the largest strides in the development of voice recognition technology have occurred in the past two decades, this technology really began with Alexander Graham Bell's inventions in the 1870s. By discovering how to convert air pressure waves (sound) into electrical impulses, he began the process of uncovering the scientific and mathematical basis of understanding speech.

In the 1950s, Bell Laboratories developed the first effective speech recognizer for numbers. In the 1970s, the ARPA ARPA - Defense Advanced Research Projects Agency  Speech Understanding Research project developed the technology further--in particular by recognizing that the objective of automatic speech recognition is the understanding of speech not merely the recognition of words.

By the 1980s, two distinct types of commercial products were available. The first offered speaker-independent recognition of small vocabularies. It was most useful for telephone transaction processing Updating the appropriate database records as soon as a transaction (order, payment, etc.) is entered into the computer. It may also imply that confirmations are sent at the same time.

Transaction processing systems are the backbone of an organization because they update constantly.
. The second, offered by Kurzweil Applied Intelligence, Dragon Systems Dragon Systems, Inc., was the company that created DragonDictate and Dragon NaturallySpeaking. It was founded in 1982 by Drs. James and Janet Baker and bought by Lernout & Hauspie in 2000. , and IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) , focused on the development of large-vocabulary voice recognition systems so that text documents could be created by voice dictation.

Over the past two decades, voice recognition technology has developed to the point of real-time, continuous speech systems that augment command, security, and content creation tasks with exceptionally high accuracy.

Author's Note: Research on voice recognition technology for this article was conducted completely on the Internet with Copernic Agent Professional and Google. The article was dictated directly into Microsoft Word A full-featured word processing program for Windows and the Macintosh from Microsoft. Included in the Microsoft application suite, it is a sophisticated program with rudimentary desktop publishing capabilities that has become the most widely used word processing application on the market.  trough Trough

The stage of the economy's business cycle that marks the end of a period of declining business activity and the transition to expansion.
 Dragon Naturally Speaking.

Alan A. Andolsen, CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. , CMC (Common Messaging Calls) A programming interface specified by the XAPIA as the standard messaging API for X.400 and other messaging systems. CMC is intended to provide a common API for applications that want to become mail enabled.

1.
, is President of Naremco Services Inc. in New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
, a records and information management consulting Noun 1. management consulting - a service industry that provides advice to those in charge of running a business
service industry - an industry that provides services rather than tangible objects
 firm founded by Emmett Leahy in 1948. He also is Vice President of the Institute of Certified Records Managers

Introduction

In today's evolving knowledge economies, the convergence of IM domains indicates the need for a greater integration of management disciplines that build the capacity of business to achieve desired outcomes.
. He may be contacted at AlAndolsen@NAREMCO.COM.
COPYRIGHT 2004 Association of Records Managers & Administrators (ARMA)
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Tech Trends
Author:Andolsen, Alan A.
Publication:Information Management Journal
Date:Jan 1, 2004
Words:2534
Previous Article:The digital tsunami: a perspective on data storage: to meet demands, organizations will need to increase today's storage offerings 10 times. But how...
Next Article:Mapping information flows: a practical guide: information mapping based on an organization's goals and objectives can help shift the information...



Related Articles
Personal business records in an electronic environment.(Perspectives)
Technology: Tools for Managing Information.
A New Australian Regulation for Electronic Tax Records.
Electronic Records Retention: Fourteen Basic Principles.
Speech technologies for the 21st century. (Call Center/CRM Management Scope).
Tying it all together: a CIO perspective; technology is making it imperative that information technology and records and information management come...
Managing electronic records in modern business.(Electronic Records Retention: New Strategies far Data Life Cycle Management)(Book Review)
New technologies, new RIM challenges.(In focus: a message from the editor)
RIM and IT professionals disagree about who is responsible for ERM.(records and information management, Information Technology, electronic records...
Lifting the burden: recent case studies show that minimizing record management decision-making by end users results in higher quality...

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles