Can you understand me now? Soon records and information managers will be required to address the output of voice recognition systems as an integral part of their electronic records retention decisions.
* defines voice recognition technology
* explains how the technology is used
* discusses how voice recognition will impact records management
"I would like to speak to a human being."
"I am sorry. I do not understand what you have asked. Please repeat your request."
"I would like to speak to a human being."
"I am sorry. I do not understand what you have asked ..."
So went a recent conversation with the voice recognition computer at a health provider's 800 number. The caller needed to explain a rather complex situation to reorder a prescription. He was fully aware that the voice recognition system he was addressing could not answer his question, one that required several sentences to explain. For some reason, the system could not interpret the phrase "speak to a human being" well enough to transfer the caller to a live operator. It was not until he spoke the magic phrase "customer service representative" that the system responded, "I will connect you with a customer service representative."
This interchange highlights the advantages and drawbacks of voice recognition technology. On the positive side, individuals can communicate in their own words and no longer have to worry about pushing a variety of buttons and listening to a decision tree. On the other hand, if the voice recognition software is unable to recognize and properly interpret the words spoken, users can become increasingly frustrated and concerned that they cannot accomplish their task.
What Is Voice Recognition Technology?
Voice recognition technology describes the ability of a computer to understand human speech. More completely, according to MSN.com, it is "a system of computer input and control in which the computer can recognize spoken words and transform them into digitized commands or text. With such a system, a computer can be activated and controlled by voice commands or take dictation [that] is input to a word processor or desktop publishing system."
The broadest type of voice recognition technology is automatic speech recognition (ASR). Within ASK the most common divisions are text-to-speech and speech-to-text. In addition, voice recognition can be discrete (i.e., each word must be pronounced distinctly and separately from the next) or continuous (i.e., words flow naturally as in normal speech without breaks between them).
Voice recognition technology is based on processing spoken words in increasingly complex phases.
* The first phase is to take sounds and convert them from the analog form of airwaves into an electrical signal. This is not different from what Alexander Graham Bell did more than 100 years ago. (See "The History of Voice Recognition Technology.")
* Next, an analog-to-digital converter changes the electrical signal into a string of numbers that reflect a range of analog values.
* A set of compression techniques reduces the amount of information and highlights specific features of the captured sound that enable speech recognition.
* The compressed digital representation of the captured sound is then compared to a reference set of sounds that have been previously stored in compressed digital format as part of the system. In the simplest version, a comparison is made between the recorded speech and, in the case of English, the 40 or so phonemes (or distinct speech sounds) that are the building blocks of speech. (In fact, voice recognition systems may actually compare the digitized and compressed speech to as many as 1,024 possible classifications.)
* The systems are now ready to recognize words. Some systems approach this task through statistical models or neural nets. Others match the recorded speech to words whose patterns of phonemes or classifications have been compiled as a reference standard for the system.
* At this point in the process, a measure of artificial intelligence is required. Natural language systems analyze the words as they are decoded and parse them lot grammar and meaning. It is in this phase that similar sounding words, known as homonyms, are properly decoded. A simple example would be discerning the difference between "there" and "their." Through analysis of word position and sentence structure, the voice recognition system's language component determines the final selection of words and ultimately what they mean. It was at this point that the health provider's telephone answering system failed. It could not "understand" what the caller requested.
How the interpreted information is used depends upon the nature of the system. In a command-oriented system, the decoded speech can then prompt performance of an action or completion of a task. The most familiar example is the telephone question-answer program that lets users reorder medicine or cancel newspaper delivery. In a content-oriented system, the interpreted speech is then displayed as text in a word-processing document or e-mail. Most commonly, these are dictation systems used to prepare letters, reports, and other documents.
It is important to emphasize another basic distinction among voice recognition systems. Some are voice-independent, others voice-dependent. Voice-independent systems can be used by anyone. They are usually of a much more limited vocabulary but handle regional dialects and accents well. Telephone response systems are obviously of this kind. Voice-dependent systems, on the other hand, require training to be effective. Depending upon the complexity of the recognition required, training can take as little as five to 10 minutes, while training the system for a highly technical vocabulary may take days or possibly weeks. Many commercial voice recognition products have medical, legal, or engineering technical dictionaries, which can substantially reduce the need for extended training.
Command-oriented systems are becoming much more common. In addition to the telephone answering systems, commercial applications include the ability to log inventory changes in warehouses or to control a computer under adverse conditions where a keyboard or mouse would be ineffective.
A New Hampshire police department has begun to deploy a voice recognition system in its patrol cars that allows an officer to accomplish a number of tasks with a single spoken word. For example, if the word, "pursuit," is spoken, a number of events occur. First, the flashing warning lights are illuminated, and the siren begins to sound. Next, a query is sent to a global positioning satellite system to determine the exact location of the patrol car. Finally, a message is sent notifying the dispatcher of the pursuit in progress as well as the location and identification of the patrol car involved.
In the military arena, an integral part of the Boeing-developed joint Strike Fighter aircraft is a voice recognition system that allows the pilot to control the aircraft through speech commands. Because the pilot does not have to focus on instruments and can concentrate on the flying environment, greater flying efficiency is achieved and essential information can be gathered, integrated, and displayed on a heads-up display without manual intervention.
As voice recognition systems become more complex, their accuracy increases, providing opportunities for security related applications. Such systems not only recognize words but also recognize words spoken by a specific individual. Each person has identifying speech characteristics that result in specific digital patterns. By comparing a spoken phrase or sentence with the same phrase or sentence that has been pre-recorded and encrypted, a voice recognition system can positively identify an individual for admission to a highly secure area or for access to specific data.
Airports such as London's Heathrow are using voice recognition in combination with other biometric systems to ensure that those who work around air craft, baggage, and maintenance are actually the individuals who are sup posed to be in those areas. Another innovative use of the technology in air ports is an application that transforms spoken security procedures into American Sign Language via a three-dimensional, computer-generated representation of a person signing.
Visa is investigating the implementation of voice recognition technology to authenticate telephone transactions. In one approach, the system would record a voice sample for each Visa customer pronouncing the numbers zero through nine. Subsequently, a purchaser's identity would be verified by asking him or her to pronounce a randomly selected sequence of numbers. The voice recognition system would match the caller to the prerecorded digital representation of that person's voice. Asking the caller to repeat a random sequence of numbers makes it harder for frauds to deceive the system by using recorded voices.
Voice recognition systems that permit users to create substantial amounts of text without a keyboard or mouse will have the biggest impact on records management.
Perhaps the most extensive use of this type of voice recognition system has occurred in the medical field. In the past, it could take several days or weeks for a physician's or nurse's dictated notes to be transcribed and added to a patient's record. As a result, medical records could be incomplete and subsequent visits could result in improper treatment.
Most major medical centers, health care providers, and many individual physicians now dictate directly to their computers, or to computer connected handheld recorders or telephone systems where their dictation is interpreted into text. All commercial voice recognition systems have developed specialized dictionaries for the various medical specialties: pathology, hematology, radiology, oncology, etc. As digital medical records systems become more pervasive, this approach permits posting a digital file of transcribed text so that the healthcare professional can validate it online and add it to the medical record the same day or the following day.
In the legal field, voice recognition technology is invading the world of the court reporter. Rather than typing what is heard ill the courtroom, the court reporter repeats every word stated by attorneys, witnesses, judges, and other parties during a proceeding while also identifying the speakers. The reporter also describes activities as they take place, and, in some cases, marks exhibits. In addition, the court reporter can now produce real-time text and transfer it immediately following a proceeding.
Records Management Implications
The increasing use of voice recognition technology has several implications for records management. As always, the focus is on the content created by the voice recognition system. Records managers should consider several things in determining how to handle information generated through voice recognition systems.
Does the system create and preserve information? This question is the most important. Some systems merely use voice recognition technology as a tool to accomplish other tasks. In this case, speech, whether as a digital voice file or as interpreted text, is discarded once the task is completed. If the system does not maintain information, the records manager has no worries.
On the other hand, if the system creates and preserves information, then it is crucial to determine exactly what has been created and how it is preserved. Several options are possible. For example, the system may
* preserve the spoken words in a standard audio file (e.g., WAV or MP3)
* preserve the spoken words in a proprietary format
* preserve only the interpreted text file
* preserve both the audio file and the file interpreted from voice to text
Each of these cases may require a slightly different records management approach. In particular, if proprietary formats are involved, then appropriate steps must be taken to preserve the application software as well as the documentation so that retrieval will be possible in the future. If standard formats such as WAV or ASCII are involved, only the digital files themselves need to be maintained as a part of the records management system because standard viewers and players are available.
What do we need to retain? Because the system creates an audio file or an interpreted text file and preserves one or both as part of the system operation, it does not follow that either or both should be retained for substantial periods of time. As in all records retention decisions, it is the content of the digital file and its value to ongoing business or organizational operations that must determine its retention.
Thus, because of regulatory requirements, one organization may need to retain both the audio file and the interpreted text file to validate the accuracy of the interpretation. The same may be true of answers recorded in a command-oriented system where the user speaks the words "I agree" and thereby creates a contractual relationship. In other instances, the organization may have the option to choose whether to retain one or both of the outputs from a voice recognition system. In command-oriented systems, no records may be created or those that are may not be retained.
Do voice recognition systems create a new type of record? No. Some voice recognition systems create records; others do not. It is the records manager's responsibility to assist users in identifying records that are created by the systems and then linking them to the business value that determines their retention. Like e-mail, voice recognition technology is a tool involved in the records management process. It is the content of the voice recognition system's output that is the focus of the records management process, not the delivery mechanism.
Where do we go from here? Some new applications of voice recognition technology currently being developed include intelligent voice-controlled agents (network-based robots) that will place phone calls, track down people you want to reach, and let you know whether these people want to talk to you; or help you find merchandise, remind about appointments and birthdays, and control equipment and appliances from any location. Other new applications include devices for the home and office that will be network-accessible and voice-controllable, such as copiers, refrigerators, and entertainment systems.
In the future, records managers can be certain that they will increasingly address the output of voice recognition systems as an integral part of their electronic records retention decisions.
RELATED ARTICLE: The history of voice recognition technology.
Although the largest strides in the development of voice recognition technology have occurred in the past two decades, this technology really began with Alexander Graham Bell's inventions in the 1870s. By discovering how to convert air pressure waves (sound) into electrical impulses, he began the process of uncovering the scientific and mathematical basis of understanding speech.
In the 1950s, Bell Laboratories developed the first effective speech recognizer for numbers. In the 1970s, the ARPA Speech Understanding Research project developed the technology further--in particular by recognizing that the objective of automatic speech recognition is the understanding of speech not merely the recognition of words.
By the 1980s, two distinct types of commercial products were available. The first offered speaker-independent recognition of small vocabularies. It was most useful for telephone transaction processing. The second, offered by Kurzweil Applied Intelligence, Dragon Systems, and IBM, focused on the development of large-vocabulary voice recognition systems so that text documents could be created by voice dictation.
Over the past two decades, voice recognition technology has developed to the point of real-time, continuous speech systems that augment command, security, and content creation tasks with exceptionally high accuracy.
Author's Note: Research on voice recognition technology for this article was conducted completely on the Internet with Copernic Agent Professional and Google. The article was dictated directly into Microsoft Word trough Dragon Naturally Speaking.
Alan A. Andolsen, CRM, CMC, is President of Naremco Services Inc. in New York, a records and information management consulting firm founded by Emmett Leahy in 1948. He also is Vice President of the Institute of Certified Records Managers. He may be contacted at AlAndolsen@NAREMCO.COM.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Tech Trends|
|Author:||Andolsen, Alan A.|
|Publication:||Information Management Journal|
|Date:||Jan 1, 2004|
|Previous Article:||The digital tsunami: a perspective on data storage: to meet demands, organizations will need to increase today's storage offerings 10 times. But how...|
|Next Article:||Mapping information flows: a practical guide: information mapping based on an organization's goals and objectives can help shift the information...|