Printer Friendly

Monaural performance intensity functions of average esophageal speech.


Laryngectomy, which involves the surgical removal of the larynx, causes the respiratory tract to be separated from the vocal tract. Consequently, breathing now occurs via the tracheostoma, an opening that is created by attaching the trachea to the skin of the neck. After laryngectomy, one of the most important objectives is voice restoration. Currently, there are three methods to achieve voice rehabilitation in these patients: esophageal voice, tracheoesophageal voice and artificial larynx. (1) The artificial larynx is recommended only when the patient is unable to achieve esophageal voice.

Alaryngeal voice production is comparable to laryngeal voice production, because both rely on the combination of a driving force and vibrating tissue. In esophageal voicing, air is injected from the oral cavity into the esophagus, thus insufflating the esophagus beneath the neo-glottis which is the new voice source. This injected air is then released, and causes the neo-glottis to vibrate1. The neo-glottis is situated at the entrance to the esophagus and it is formed by the same structures as the upper esophageal sphincter. Thus, the source of vibration is composed of mucosa and musculature that is normally present in this area, such as the cricopharyngeal muscle and the constrictor pharyngeus muscles. (2)

Compared to the larynx, the alaryngeal voice source can at best be described as a grossly controlled structure. However, control of the alaryngeal voice source may not be as consistent as control of the larynx. It therefore seems reasonable that the auditory quality of alaryngeal speech will be negatively affected.

Both laryngeal and esophageal speakers have a vocal tract to shape speech sounds and a voice source that may be controlled to a greater or lesser degree. However, the alaryngeal speech intelligibility is known to be poor. The voice sound that is produced, although it may contain the same features as the laryngeal voice, also has different acoustic characteristics, which influence speech quality and intelligibility, among other properties.

Previous research on word recognition scores for esophageal and normal speakers reveals that the intelligibility of esophageal speech is significantly inferior to that of normal speech. (1,3,4) In comparison to normal laryngeal speech, the average intensity in esophageal speech tends to be lower. (5) In a perceptual study by Williams and Watson (1987), tracheoesophageal and esophageal speech was rated more poorly than normal speech, on parameters such as quality and noise, intelligibility, pitch and speaking rate6. Also, laryngeal speech is generally noisier than normal speech, and features such as fundamental frequency (F0) and intensity are less stable.

At the present time there is no comparative study between esophageal and normal laryngeal speech in the Greek-speaking population. Thus, the purpose of this paper is to compare speech intelligibility in native-Greek speakers of the esophageal versus normal voice. The specific aim of the present study was to compare monaural performance-intensity functions of average esophageal and normal speech in quiet, and at various signal-to-noise ratios (SNR) (0, +3, +6, +9, +12 dBHL).



The individuals who participated in this investigation included twenty native Modern Greek adult listeners (10 males and 10 females. Mean Age=26,30. SD=2,74). All subjects had pure tone thresholds of [less than or equal to] 15 dBHL at all octave frequencies ranging from 250 Hz to 8000 Hz with no known history of auditory dysfunction or neurological disorder. All subjects were unfamiliar with alaryngeal speech.

Selection of speech materials

Duration of words or speech sounds is dependent on an air supply, in that sounds can only be lengthened if there is a sufficient amount of air to allow sound production of a (prolonged) word or speech sound. In contrast to normal laryngeal speakers who can have an air supply of approximately 3 liters, the air supply available to esophageal speakers is limited to small volumes of approximately 80 milliliters. (7,8) Esophageal speakers further differ from laryngeal speakers with regard to timing. Not only is the maximum phonation time shorter, but also the number of syllables produced per phrase is far less in esophageal speakers (5) . Therefore, words, instead of longer speech stimuli, were selected.





Stimulus materials

Normal Speech

The speech stimuli administered for both conditions in this experiment were four Phonemically-Balanced 50-bisyllabicword lists with male voice. (9)

Esophageal Speech

For the esophageal speech stimuli, all 200 words were recorded in an Industrial Acoustic Company booth meeting ANSI S3.1 standards, by one male 40 years old average-esophageal speaker enrolled in intensive individual speech therapy for three months. His speech was judged to represent average esophageal speech proficiency (on a three-point scale of proficiency: good, average, poor) by two speech pathologists experienced in esophageal speech instruction.


The type of noise employed in the present study was speech noise.

Recording of Esophageal Speech-Stimuli

Each word was produced several times and two judges (speech pathologists) rated the repetitions of each word for perceived quality of production, and the best production of each word was selected. The words were digitized at a sampling frequency of 44.100K Hz and 16-bit resolution. Each word was brought at an equivalent overall loudness level by editing.


Testing was conducted in a sound isolated booth. The signal was routed from a PC to a GSI 61 audiometer. The stimuli were routed from the audiometer to the subject via supra-aural TDH-49 headphones. The order of the presentation of words within each list was randomized for each subject and for each intensity level. An open-set response strategy was employed.


Each list was presented monaurally (right ear) starting at 0 dBHL and ascending in 5 dBHL steps until 100% recognition was achieved.


Each list was presented monaurally (right ear) at 55dBHL at various SNRs (0, +3, +6, +9, +12 dBHL).


Following monaural presentation of the lists in quiet and at various signal-to-noise ratios, the percent of correct values were used to construct performance intensity (PI) functions. The PI functions for mean scores and standard deviations for the men and women in the quiet condition are revealed in figures 1 and 2 respectively. In addition, the PI functions for mean scores and standard deviations for the men and women under various signal-to-noise ratios are revealed in figures 3 and 4 respectively.

Results indicated that listeners performed significantly better in quiet than in noise. However, word recognition scores for the esophageal speech stimuli were significantly poorer for both conditions.

Statistical analysis

Word recognition scores were assessed in quiet and under 5 different SNRs (SNR= 0, +3, +6, +9, +12 dBHL). These repeated measures were analyzed by mixed models with random effects and by the use of pseudo variables. The results of the statistical analysis for the quiet condition and under different SNRs are presented at tables 1 and 2 respectively.

The variables "Esophageal Speech" and "Gender" are pseudo variables that take only the values 0 and 1 as follows: Esophageal Speech = 0 for normal speech and = 1 for esophageal speech. Gender = 0 for males and = 1 for females.

All the p-values in Tables 1 and 2, except the one for gender in Table

2, are smaller than 0.05 and thus the corresponding variables are statistically significant. About the gender, when there is a quiet condition, women have a better perception than men, while with noise as we mentioned there is no significant difference.


The results of this study revealed that the monaural intelligibility of Greek esophageal voice is generally poorer than the intelligibility of normal speech for both conditions. Observation of the intelligibility functions for esophageal and normal speech in figures 1 and 2 reveals significant differences. However, 100% monaural recognition for the esophageal speech in the quiet condition was obtained at higher intensities. Although this is true to the present study, several authors have commented about the general restriction in the intensity variation and overall reduction in average speech level associated with esophageal speech production. (5,10-12)

In addition, significant differences were emerged between esophageal and normal speech in the presence of noise. Although at 12 dBHL SNR normal speech was about 85% intelligible for both groups, esophageal speech was only about 34%. Therefore, the esophageal speaker has a less desirable SNR than the normal speaker, because of reduced volume and possible extraneous noises associated with breathing from the stoma and air intake. (1)

Earlier studies demonstrated that subjects with normal hearing and hearing loss discriminate a speech signal better binaurally than monaurally in the presence of noise. (13,14,15) Normal listeners showed an average improvement of 20% in word intelligibility under binaural conditions. (13) Additionally, hearing-impaired listeners with hearing aids demonstrated a 10% increase under roughly similar test conditions. (14) In this light, binaural esophageal-word recognition scores are expected to be higher for both conditions and groups employed in this study.

The results of this study do not imply that monaural word recognition scores may be the most appropriate standard for evaluating the intelligibility of esophageal speech, since the review of the literature on speech perception shows other variables contribute to the intelligibility of speech such as contextual cues and visual cues. (16-18) However, it can clinically be adopted to determine whether an esophageal speaker has achieved sufficient speech intelligibility to function in a normal speaking world.

Conflict of interest: None declared.


(1.) Boone DR, McFarlance SC, Von Berg SL, Zraick RI. Voice and Voice Therapy. Eighth Edition. Boston, MA: Allyn & Bacon Inc, 2009.

(2.) Weissenbruch R. Voice restoration after total laryngectomy. [Dissertation]. Netherlands: University of Groningen; 1996.

(3.) Tikofsky RS. A comparison of the intelligibility of esophageal and normal speakers. Folia Phoniat 1965;17: 19-32.

(4.) Black JW, Haagen CH. Multiple-choice intelligibility tests, Forms A. and B. Journal Speech Hearing Dis 1963; 28: 77-86.

(5.) Robbins JA, Fisher HB, Blom ED, Singer M. A comparative acoustic study of normal, esophageal and tracheoesophageal speech production. Journal of Speech and Hearing Disorders 1984;49:202-210.

(6.) Williams SE, Watson JB. Speaking proficiency variations according to method of alaryngeal voicing. Laryngoscope 1987;97:737-9.

(7.) Van Den Berg J, Moolenaar-Bijl AJ. Crico-pharyngeal sphincter, pitch, intensity and fluency in oesophageal speech. Pract Otorhinolaryngol (Basel). 1959 Jul; 21(4): 298-315.

(8.) Casper C, Colton R. Vocal rehabilitation. A physiological perspective for diagnosis and treatment. In: Butler JP, Editor. Understanding voice problems. Baltimore,Maryland:Williams and Wilkins;1996. p 270-316.

(9.) Trimmis N, Papadeas E, Papadas T, Naxakis S, Papathanasopoulos P, Goumas P. Speech Audiometry: The Development of Modern Greek Word Lists for Suprathreshold Word Recognition Testing. The Mediterranean Journal of Otology 2006;2(3):117-126.

(10.) Drummond S. The effects of environmental noise on pseudovoice after Iaryngectomy. J. Laryng. 1965; 79:193-202.

(11.) HymanM. An experimental study of artificial larynx and esophageal speech. I Speech Hear. Dis. 1955;20:291-299.

(12.) Diedrich WM. The mechanism of esophageal speech. Sound Production in Man. Annals of the New York Academy of Sciences 1968; 155: 303-317.

(13.) Chappell RG, Kavanagh JF, Zerlin S. Monaural versus binaural discrimination for normal listeners. Journal of Speech and Hearing Research 1963; 6: 263-26.

(14.) Jerger J, Carhart R, Dirks D. Binaural heating aids and speech intelligibility. Journal Speech Hearing Res. 1961; 4: 137-148.

(15.) Gelfand SA, Hochberg I. Binaural and Monaural Spech Discrimination under Reverbaration. Audiology 1976;15:72-84.

(16.) Miller GA, Heise GA, Lichten W. The intelligibility of speech as a function of the context of the test materials.In:Ventry IM, Chaiken JB, Dixon RF, Editors. Hearing Measurement: A Book of Readings. New York: Appleton-Century Crofts; 1971.

(17.) Jeffers J, Barley M. Speechreading. Springfield, IL: Charles C Thomas; 1971.

(18.) Hoops HR, Noll, JD. The effects of listener sophistication on judgments of esophageal speech. J. Commun. Dis.1971;3:250-260.

Corresponding author: Nikolaos Trimmis, PhD CCC A/SLP

Assistant Professor of Speech Pathology & Audiology

Technological Educational Institute of Patras

School of Health & Welfare Professions

Dept. of Speech and Language Therapy

M. Alexandrou 1--Koukouli

26334 Patras,Greece

Tel # +30-2610-325275 (Office)

Tel # +30-2610-322812 (Clinic)

E-Mail :

N. Trimmis [1], S. Papadopoulos [2]

[1] Technological Educational Institute of Patras, Greece

[2] Democritus University of Thrace, Komotini, Greece
Table 1. Quiet condition: Regression Results for Mixed
Models with Random Effects.

             Dependent Variable: y = % perception

Variable                Estimated        Standard    p-value
                          Coefficients     Errors

Voice Volume d (B)           5.31          0.17      <0.0001
(Voice Volume)2            -0.0577        0.0026     <0.0001
(Voice Volume) *            -2.83          0.16      <0.0001
  (Esophageal Speech)
(Voice Volume)2 *           0.0491        0.0031     <0.0001
  (Esophageal Speech)
Gender                       4.41          1.54       0.0045
Intercept                   -14.82         2.16       0.0001

Table 2. With noise and speech signal at 55 dBHL: Regression Results
for Mixed Models with Random Effects.

                Dependent Variable: y = % perception

Variable                Estimated        Standard    p-value
                          Coefficients     Errors

Noise Volume (dB)           -23.3          3.31      <0.0001
(Noise Volume)2             0.171          0.034     <0.0001
(Noise Volume) *            -5.15          0.25      <0.0001
  (Esophageal Speech)
(Noise Volume)2 *           0.091         0.0049     <0.0001
  (Esophageal Speech)
Gender                       0.72          1.02        0.48
Intercept                    772            81        0.0001
COPYRIGHT 2009 Renaissance Medical Publishing
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2009 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Trimmis, N.; Papadopoulos, S.
Publication:Archives: The International Journal of Medicine
Article Type:Report
Geographic Code:4EUGR
Date:Oct 1, 2009
Previous Article:Activities of daily living scale--the tool for clinical state monitoring of spinocerebellar ataxia and Friedreich ataxia patients.
Next Article:Contribution to mortality and resource usage of nosocomial and community infections in an intensive care setting.

Related Articles
Andrea Electronics Corporation Markets First PC Headset With ANR Earphone & ANC Microphone Technology at SpeechTEK
Conversational Computing Corporation and Andrea Electronics Package Products Together for Sale Through Retail Distribution Channels
Curon Announces the Publication of Data Supporting Durability and Effectiveness of Stretta(R) System; - Positive One Year Follow-Up Data of U.S....
Botox therapy for achalasia earns mixed reviews. (Older Patients Benefit Most).
Sennheiser debuts office headsets.
NEC launches industry's first melody chip with MP3 and AAC playback support.
Esophageal cancer research developments.
Sound Alert series, classroom audio components: Califone.
Restoring hearing symmetry with two cochlear implants or one cochlear implant and a contralateral hearing aid.

Terms of use | Copyright © 2017 Farlex, Inc. | Feedback | For webmasters