Printer Friendly
The Free Library
4,552,977 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Free-text data entry by speech recognition software and its impact on clinical routine.


Abstract

We conducted a study to evaluate speech recognition Same as voice recognition. software in an otorhinolaryngology otorhinolaryngology /oto·rhi·no·lar·yn·gol·o·gy/ (-ri?no-lar?ing-gol´ah-je) the branch of medicine dealing with the ear, nose, and throat.

o·to·rhi·no·lar·yn·gol·o·gy (
 unit and to assess its impact on productivity prior to general implementation. Current speech recognition software (IBM ViaVoice, version 10) was implemented on a personal computer with a 2-GHz central processing unit, 256MB of RAM, and a 30-GB hard disk drive, with and without add-on professional vocabulary for otorhinolaryngology. This vocabulary was added by the automated analysis of an additional 12,257 documents from our department. We compared the word recognition error rates The measurement of the effectiveness of a communications channel. It is the ratio of the number of erroneous units of data to the total number of units of data transmitted. for three different text types and determined their impact on the amount of surgeon's time that was invested in the production of an error-free document. Although error rates without any professional vocabulary database were rather high (operation reports: 38.72%; consultation notes: 27.77%), the patient information was edited with a satisfactory result (10.65%). Best results were obtained with the specialty-related vocabulary database added by the analysis of our own documents (operation reports: 5.45%; consultation notes: 5.21%). An increase in productivity compared with that of conventional transcription was found at an error rate of less than 16%.

Introduction

We conducted a study to evaluate the performance of current speech recognition software, in conjunction with three different databases of vocabulary, and to assess its impact on productivity prior to general implementation in an otorhinolaryngology unit. Our underlying hypothesis was that in an environment where all medical documents are entered into a medical records database, the use of computer-based speech recognition technology results in greater overall productivity than manual input by surgeons via a computer keyboard.

By now, the implementation of Diagnosis-Related Groups has marked the preliminary conclusion of a transfer process toward information technology for extended quality management in healthcare administration. In contrast, computer-based documentation of medical records is yet subject to further development. A considerable share of information is still communicated by handwritten notes and orally; both of these methods pose major obstacles to the conversion of this information into electronic documents. Nevertheless, day-to-day recording of patients' clinical courses is essential if electronic clinical documentation is implemented to replace handwritten notes. This process has to be reliable, fast, and convenient for the average clinician. Speech recognition software offers a potential solution.

The first reports of this software's practical use were published in the early 1990s. (1,2) In these reports, the software was used in a professional environment in which (1) single workplaces, mostly in laboratories or nonclinical departments, were predominant; (2) a limited professional vocabulary was used; and (3) the writing of the actual medical report was the limiting factor for productivity.

The expansion of this software into radiology departments (3-5) and pathology laboratories prompted the need for extended vocabulary databases. However, further propagation of speech recognition software to clinical departments--surgical and medical--has been limited. This may be attributable to the fact that various workplaces used for different tasks (operating theater, clinical ward, outpatient department, etc.) have to be covered. Healthcare professionals often move between different parts of a unit, and these different areas are not necessarily situated in one building. Therefore, there should be access to a computer platform that either performs speech recognition itself or transfers audio data to a central server on which speech recognition software is implemented and from which the written document is distributed either to the surgeon or to a professional typist for revision.

Only a small number of studies have been conducted on speech recognition software in a clinical environment with a complex spatial structure. (6) The implementation of speech recognition software on computer networks is a challenge because these networks generally perform more slowly than stand-alone personal computers as a result of network capacity limitations. Although a network setup would fulfill the need for covering clinical departments as described above, infrastructure costs are considerably higher than those associated with stand-alone solutions.

The objective of this study was to evaluate the usability of current speech recognition software for the average surgeon in otorhinolaryngology. Intended users are junior and senior clinicians who have a general background of personal computer use at work and in everyday life but who lack expert typing skills. The purpose of speech recognition software is to transcribe different types of information that appear frequently during clinical work--such as operation reports, consultation notes on a patient for other specialties, discharge letters, and written information for patients on a specific topic--into an electronic document that can be integrated into a medical records database. The desired effect is that dictating documents into the speech recognition software plus subsequent revision by the dictating surgeon is faster and more efficient than the surgeon's typing and revising documents manually him/herself or dictating the documents to a professional typist and revising the document at a later stage.

Materials and methods

The computer hardware included a single personal computer with a 2-GHz central processing unit (Pentium 4; Intel; Santa Clara, Calif.), main board VIA KT 266A a (VIA Technologies; Taipei, Taiwan), 256 MB of RAM, a 30-GB hard disk drive, and an AC97 sound controller on board. For the human interface device, we used an Andrea NC 61 headset (Andrea Electronics; Melville, N.Y.). The operating system was Microsoft Windows 2000 (Microsoft; Redmond, Wash.), and for text processing we used Microsoft Office XP. Speech recognition software consisted of the IBM ViaVoice engine, German version 10 (IBM; Armonk, N.Y.), supplemented by a commercially available vocabulary database for general medicine and otorhinolaryngology in German (Mende Mende (mäNd), city (1990 pop. 12,667), capital of Lozère dept., S France, on the Lot River. Mende is a tourist resort. It was originally a small Gallo-Roman city that became an episcopal see in the 5th cent. Bishops ruled the town until 1306, when they were forced to cede a portion of it to Philip IV. Speech Solutions; Bammental, Germany). Initial setup and speaker training were provided by a local supplier (AS-Anwender Systeme; Aachen, Germany).

In a subsequent step, one user analyzed 12,257 medical documents (operation reports, consultation notes, discharge letters, and interdisciplinary consultations) from our own department for additional vocabulary that was not provided in the add-on vocabulary database. As a result, 8,947 new expressions were added to the database, which is referred to as an "individual" vocabulary in the remainder of this article. Under these conditions, we dictated 15 operation reports with an average word count of 171 (range: 52 to 432) and 53 consultation notes with an average word count of 271 (range: 66 to 385) to the speech recognition system. The average dictation time was 190 sec for operation reports (range: 75 to 510) and 290 sec for consultation notes (range: 90 to 450).

In addition, all documents had to be revised. The revision process in speech recognition systems, however, entails not only the correction of typing errors, but also speaking the corrected words into the headset microphone and confirming their correct spelling on the computer screen. Thus, the revision process is more complex than conventional text editing. Consequently, the error rate in speech recognition directly influences the time required for the entire editing process and thus has a strong impact on productivity.

To estimate the influence of a document's content on the recognition rate, we dictated three different types of documents (operation reports, consultation notes, and a patient information leaflet) with equal word counts (1 page; size: A4; font: 12-pt Arial; margins: 2 cm) under three speech recognition software configurations (without professional vocabulary, with commercially available professional vocabulary, and with commercially available plus individual professional vocabulary) to the speech recognition system. For purposes of control, an operation report was typed manually by a qualified surgeon in otorhinolaryngology who did not possess professional typing skills.

Results

The standardized text samples were entered with fluent, conversational speech into the speech recognition system in an average of 424 sec (range: 355 to 510; control: 1,590, corresponding to 161 keystrokes/min) (table). The time required for text revision varied, primarily depending on the three software configurations.

The small amount of time required to enter text into the speech recognition system confirmed the idea that current speech software can recognize and record fluent, conversational, open language without major problems. This is made possible by a "continuous speech recognition" program, which segments words into phonemes--that is, their smallest distinguishable phonetic units. The speech recognition software compares the sequence of phonemes to its own vocabulary database and displays the best match on the computer screen. For this reason, a professional vocabulary database must be as comprehensive as possible. The results of the "operation reports" and "consultation notes" parts of this study are unacceptable in the absence of any background medical vocabulary. However, the "patient information" text contains only expressions that laypersons are able to understand, and therefore its speech recognition result was satisfactory with only the conversational vocabulary.

With the introduction of standardized and individual medical vocabulary databases, the error rates drop markedly. It must be noted that the results with the consultation notes text were slightly better with the individual vocabulary because the system had been trained beforehand with more samples of this type of text than with samples of operation reports (53 vs. 15). As for the control, we assumed that these text samples would have to be entered into the medical records database by a surgeon rather than a professional typist, primarily because most of the text samples described here serve as a direct replacement for handwritten notes (e.g., consultation on an inpatient referred from a different medical specialty); otherwise, the transfer of the document to a professional typist, return for revision, etc. would be too time-consuming. However, as electronically editable text is due to replace handwritten notes, we did not regard the time required for speech recognition or conventional typing versus handwriting. If the average A4 text takes 26.5 min for manual typing, as assumed above, a productivity gain is achieved as soon as the error rate of the speech recognition system drops below 16% (figure). According to our findings, this was the case under all conditions when medical vocabulary was added to the database.

[FIGURE OMITTED]

Discussion

When comparing our results with data in the current literature, we must take into account the pace at which hardware and software continue to improve. The first basic speech recognition models had a very limited vocabulary database, and an increase in productivity was found only under certain circumstances. (1-4)

Rosenthal et al looked at the total amount of time it took for a medical report to leave their radiology department. (4) They found that with a speech recognition system, the time required to finalize reports prepared by residents, whose work had to be revised by a fully trained radiologist, declined from an average of 62 hours to 24 hours. By contrast, radiology consultants, whose work did not have to be reviewed, were able to transmit reports out of the department in only 13 minutes.

As speech recognition technology advanced, Vorbeck et al recorded an average error rate of 5.5% for all radiology reports, which led to a 14 to 24% reduction in overall editing time (mean: 19%). (5) Happe et al linked their speech recognition system to a medical records database for discharge summaries and found a speech recognition rate of 98%. (7) Al-Aynati and Chorneyko compared the accuracy of speech recognition and conventional transcription in 200 pathology laboratory reports and calculated that the mean accuracy of computer software was 93.6% (range: 87.4 to 96%), compared with 99.6% for human transcription (range: 99.4 to 99.8%). (8) However, the time required to edit speech recognition text was 1.4- to 3.5 times greater than the time required to edit human transcription by a professional typist. Al-Aynati and Chorneyko concluded that speech recognition systems are inferior to conventional text editing The ability to change text by adding, deleting and rearranging letters, words, sentences and paragraphs. when professional typists are available, but they offer an alternative in places where these services do not exist.

In contrast, Langer found that the use of speech recognition technology led to an increase of approximately 2.3 reports per day over conventionally edited reports in a radiology department. (9) Groschel et al found that use of both a mobile and a stationary personal computer resulted in recognition rates in the range of 80 to 89% in the emergency setting. (10) Yet there are few reports on the implementation of speech recognition software in clinical departments. Mohr et al performed a randomized, single-blind trial in which resident physicians and psychiatrists dictated their notes via a telephone network to a central server; from there, the audio data files were randomly allocated to either speech recognition software or to a professional typist.(6) The mean speech recognition rate by the software was 84.5% (range: 55 to 95%), which resulted in no increase in productivity. However, the drawback of that single-blind trial was that the medical professionals had no opportunity to revise their documents by means of the speech recognition software. According to our study, training the software to each speaker's voice model is essential to reducing error rates.

Secondary productivity gains are less often the subject of current publications because they are more difficult to assess. However, it must be taken into account that, in practice, a rapid transfer of reports can help reduce hospitalization time for patients with multiple disorders, whose discharge depends on many interdisciplinary consultations. The general availability of information-technology--based data reduces the workload in quality management programs. Furthermore, the omission of illegible handwritten reports precludes the need for oral confirmation and cross-checking while providing a solid base for medicolegal medicolegal /med·i·co·le·gal/ (med?i-ko-le´g'l) pertaining to medical jurisprudence.

med·i·co·le·gal (md
requirements.

In conclusion, computer-based speech recognition programs can increase productivity, particularly in an environment where individual medical reports with a specialized vocabulary are issued frequently and professional typists are unavailable. In addition to suitable hardware, a vocabulary database adapted to the specific task is essential. Furthermore, clinical staff must train the speech recognition system to their individual voices over a period of 20 to 30 sessions before satisfactory results can be obtained. As other medical documentation tasks increase the workload of medical staff, this new technology cannot be taken for granted and should be discussed with the professionals concerned before implementation.

References

(1.) O'Hara SP, Bryant TN, Oji EC, Rowe DJ. Speech recognition and the clinical microbiology laboratory. Med Lab Sci 1992;49:20-6.

(2.) Mrosek B, Grunupp A, Keppel E, et al. [Computer-assisted speech recognition and display of x-ray findings]. Rofo 1993; 159:481-3.

(3.) O'Hara SP, Athersuch R. Speech recognition and direct data entry in clinical microbiology. Br J Biomed Sci 1996;53:209-13.

(4.) Rosenthal DI, Chew FS, Dupuy DE, et al. Computer-based speech recognition as a replacement for medical transcription. AJR AJR - Academy for Jewish Religion
AJR - Accelerated Junctional Rhythm
AJR - American Journal of Roentgenology
AJR - American Journalism Review
AJR - Association of Jewish Refugees (UK organization)
 Am J Roentgenol 1998;170:23-5.

(5.) Vorbeck F, Ba-Ssalamah A, Kettenbach J, Huebsch E Report generation using digital speech recognition in radiology. Eur Radiol 2000; 10:1976-82.

(6.) Mohr DN, Turner DW, Pond GR, et al. Speech recognition as a transcription aid: Arandomized comparison with standard transcription. J Am Med Inform Assoc 2003;10:85-93.

(7.) Happe A, Pouliquen B, Burgun A, et al. Automatic concept extraction from spoken medical reports. Int J Med Inform 2003;70:255-63.

(8.) Al-Aynati MM, Chorneyko KA. Comparison of voice-automated transcription and human transcription in generating pathology reports. Arch Pathol Lab Med 2003;127:721-5.

(9.) Langer SG. Impact of speech recognition on radiologist productivity. J Digit Imaging 2002;15:203-9.

(10.) Groschel J, Philipp F, Skonetzki S, et al. Automated speech recognition for time recording in out-of-hospital emergency medicine--An experimental approach. Resuscitation 2004;60:205-12.

Justus Ilgner, MD; Philip Duwel, MD; Martin Westhofen, MD

From the Department of Otorhinolaryngology and Plastic Head and Neck Surgery, RWTH RWTH - Rheinisch Westfälische Technische Hochschule (Aachen, Germany) Aachen University, Aachen, Germany.

Reprint requests: Justus Ilgner, MD, Department of Otorhinolaryngology and Plastic Head and Neck Surgery, RWTH Aachen University, Pauwelsstrasse 30, 52057 Aachen, Germany. Phone: 49-241-80-88946; fax: 49-241-80-82523; e-mail: jilgner@ukaachen.de
Table. Time required for document dictation and revision and speech
recognition error rate in relation to the type of document and the
type dictation and of vocabulary revision and speech recognition
error rate in relation to database With standardized plus

                                                With
                               Without      standardized
                               medical        medical
                              vocabulary     vocabulary

Operation reports
Dictation time (sec)             450            375
Revision time (sec)             3,360           840
Total (sec)                     3,810          1,215
Error rate (%) ([dagger])       38.72           8.27

Consultation notes
Dictation time (sec)             510            470
Revision time (sec)             2,680          1,090
Total (sec)                     3,190          1,560
Error rate (%)                  27.77           9.54

Patient Information
Dictation time (sec)             355            375
Revision time (sec)             1,160          1,560
Total (sec)                     1,515          1,935
Error rate (a/o)                10.65          13.69

                                 With
                             standardized
                                 plus
                              individual
                               medical
                              vocabulary     Control *

Operation reports
Dictation time (sec)             440           1,590
Revision time (sec)              690            405
                                1,130          1,995
Error rate (%) ([dagger])        5.45

Consultation notes
Dictation time (sec)             390
Revision time (sec)              595
Total (sec)                      985
Error rate (%)                   5.21

Patient Information
Dictation time (sec)             450
Revision time (sec)             1,100
Total (sec)                     1,550
Error rate (a/o)                 9.89

* Manually typed.

([dagger]) The error rate be the speech recognition
system was calculated as the ratio of incorrectly
recognized words to the total number of words,
multiplied by 100.
COPYRIGHT 2006 Vendome Group LLC
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2006, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Comment:Free-text data entry by speech recognition software and its impact on clinical routine.
Author:Westhofen, Martin
Publication:Ear, Nose and Throat Journal
Geographic Code:1USA
Date:Aug 1, 2006
Words:2808
Previous Article:Office-based arytenoid palpation for diagnosis of disorders of bilateral vocal fold immobility.(Disease/Disorder overview)
Next Article:Thyroidectomy for substernal goiter via a mediastinoscopic approach.
Topics:



Related Articles
Have you talked to your PC? Voice-recognition software for business. (Software Review)(Evaluation)
Does ICR Keep Paper Forms Viable?
Data Handling.
The Speed of SPEECH.(speech recognition technology)(Brief Article)
Dialogic And Speech Works Team For Voice-Driven Functionality.(Company Business and Marketing)(Brief Article)
Avaya announces self-service/speech solutions, collaborates with IBM.(New Products)(Brief Article)
You talkin' to me? Voice recognition software quickly making a name for itself.(VOICE RECOGNITION)
Dictaphone Announces New Contract Wins for mdEssential; Combination of Speech Recognition and Natural Language Processing Is Physician-Friendly EMR...
Transcend Receives A Most Innovative Solution Award in Speech Recognition Technology Deployment.
Emergency information management system.(Products & Services)

Terms of use | Copyright © 2008 Farlex, Inc. | Feedback | For webmasters | Submit articles