Printer Friendly

Audio and visual cues in a two-talker divided attention speech-monitoring task.

INTRODUCTION

In recent years, improvements in data transmission technology have dramatically reduced the cost of telecommunications bandwidth. To this point, however, relatively little effort has been made to exploit this low-cost bandwidth in improved speech communications systems. In part, this apparent oversight reflects the fact that standard telephone-grade audio speech (with a bandwidth of roughly 3500 Hz) already produces near 100% intelligibility for typical telephone conversations involving a single talker in a quiet listening environment. There is, however, ample opportunity for higher-bandwidth speech communication systems to improve performance in complex listening tasks that involve more than one simultaneous talker. High-bandwidth multichannel speech communication systems could have a wide variety of applications, ranging from simple three-way conference calling to sophisticated command and control tasks that require listeners to monitor and respond to time-critical information that could be present in any one of a number of simultaneously presented competing speech messages.

A question of practical interest, therefore, is how additional bandwidth could best be allocated to improve the effectiveness of multichannel speech communications systems. The most obvious approaches to this problem involve the restoration of the audio and visual cues that listeners rely on to segregate speech signals in real-world multitalker listening environments, such as crowded restaurants and cocktail parties. For example, listeners in the real world rely on interaural differences between the audio signals reaching their left and right ears to help them segregate the voices of spatially separated talkers (see Bronkhorst, 2000, for a recent review of this phenomenon). When these binaural cues are restored to a speech communication signal by adding a second independent audio channel to the system, multitalker listening performance improves dramatically (Abouchacra, Tran, Besing, & Koehnke, 1997; Crispien & Ehrenberg, 1995; Ericson & McKinley, 1997; Nelson, Bolia, Ericson, & McKinley, 1999).

Additional bandwidth could also be used to restore the visual speech cues that are normally available in face-to-face conversations. These cues make it possible to extract some information from visual-only speech stimuli (a process commonly known as speechreading; Summerfield, 1987), and they contribute substantially to audiovisual (AV) speech perception when the audio signal is distorted by the presence of noise (Sumby & Pollack, 1954) or interfering speech (Rudmann, McCarley, & Kramer, 2003).

From earlier experiments, it is clear that multitalker listening performance can be improved both by the addition of binaural spatial audio cues and by the addition of visual speech information. However, relatively little is known about how audio and visual information might interact in high-bandwidth multichannel AV speech displays. Important research issues related to this topic include the following:

Divided attention versus selective attention in AV speech perception. An essential underlying assumption in the design of a multitalker speech display is that neither the system nor the operator will have reliable a priori knowledge about which talker will provide the most important information at any given time. (Otherwise, either the system or the operator would simply turn off the uninformative talkers.) Consequently, it is important to know how well listeners are able to divide their attention across the different talkers in an AV speech stimulus in order to extract important information that might originate from any one of the competing speech signals. However, virtually all experiments that have examined AV speech perception with more than one simultaneous audio speech signal (Driver, 1996; Driver & Spence, 1994; Reisberg, 1978; Rudmann et al., 2003; Spence, Ranson, & Driver, 2000) have examined performance in a selective attention paradigm in which the participants were provided with a priori information about which talker to attend to and which talker to ignore prior to the presentation of each stimulus. This makes it difficult to determine how visual cues might influence performance in situations in which listeners must rely on the content of the competing speech messages to determine where the most important information resides.

AV speech perception with multiple visible talkers. Although a number of studies have examined AV speech perception with multiple simultaneous talkers, most have been limited to cases in which only a single talker was visible at any given time (Driver, 1996; Driver & Spence, 1994; Reisberg, 1978; Rudmann et al., 2003; Spence et al., 2000). Thus it is not clear how well listeners might be able to divide their visual attention across two visible faces in a multitalker AV speech stimulus.

Semantic AV incongruencies. When visual speech stimuli are presented in conjunction with mismatched audio speech stimuli, cross-modal interactions can substantially distort the overall perception of the multimodal stimulus. One classic example of this is the McGurk effect, which causes listeners who see one word spoken and hear another word spoken to report the perception of a third word that was not presented in the stimulus (e.g., they report hearing an "ada" sound when they see a talker saying "aga" and hear a talker saying "aba"; McGurk & McDonald, 1976). Although it is unlikely that AV speech display would intentionally present mismatched audio and visual signals, it is possible that AV interactions such as the McGurk effect could cause visual cues produced by one talker to interfere with the perception of the audio speech signals produced by one of the other competing talkers in the AV stimulus.

Spatial AV incongruencies. Another cross-modal interaction that has been reported in AV speech perception is the so-called ventriloquist effect, in which listeners hear speech sounds at the location of a visual representation of the talker's face even when that face is spatially displaced from the origin of the audio signal (Bertelson & Radeau, 1976). The ventriloquist effect can influence AV speech perception in listening configurations in which the audio and visual portions of the target speech stimulus are spatially separated.

Spence and his colleagues have shown that AV speech perception is better when the audio and visual signals are colocated than when they are spatially separated (Driver & Spence, 1994) and that interfering speech signals are more difficult to ignore when they are located at the point of visual fixation (Spence et al., 2000). Similarly, Driver (1996) has shown that performance in an AV speech perception task that combines a visual representation of the target talker's face with an audio mixture containing both the target and masking talkers' voices improves when the visual representation of the face is spatially displaced from the audio mixture, presumably because ventriloquism causes the apparent location of the target voice to shift toward the location of the face and away from the location of the masking voice. These results are important in the design of AV multitalker speech displays because they suggest that the relative spatial locations of the audio and visual portions of a multimodal speech stimulus can substantially influence the overall perception of that stimulus.

In this paper, we present the results of an experiment that attempted to address some of these research issues by examining performance in a two-talker divided attention AV listening task with all possible combinations of three levels of audio fidelity (no audio, one-channel audio, and two-channel audio) and three levels of video fidelity (no video, one visible talker, and two visible talkers). When applicable, performance was also examined in configurations in which the locations of the audio and visual portions of the AV stimuli were presented at the same locations (spatially congruent) and in which the audio and visual portions were presented at different locations (spatially incongruent). In all cases, performance was analyzed under the assumption that the target information was equally likely to originate from either of the two talkers in the stimulus. The results are discussed in terms of their implications for the design of multitalker AV speech displays.

METHOD

Participants

Eight volunteers (5 men and 3 women) were paid to participate in the experiment. All had normal hearing (<15 dB HL from 500 Hz to 8 kHz), and their ages ranged from 21 to 55 years. All of the participants had taken part in previous experiments that utilized speech materials similar to those used in this experiment.

Audio and Visual Speech Materials

The experiment was conducted with an audiovisual speech corpus based on the Coordinate Response Measure (CRM), a call-sign-based color and number identification task developed by the Air Force Research Laboratory for use in multitalker communications research. The CRM, which has been used in a number of studies of multitalker speech perception with audio-only stimuli (Brungart, Simpson, Ericson, & Scott, 2001), is ideally suited to testing the ability of listeners to divide their attention across two simultaneous stimuli because it requires them to identify the target phrase by listening for a preassigned call sign contained within that phrase.

The CRM speech materials used in this experiment consisted of phrases of the form "Ready (call sign) go to (color) (number) now" spoken with all possible combinations of two call signs (Ringo and Baron), four colors (blue, green, red, white), and eight numbers (1-8). Thus a typical example of a CRM sentence would be the phrase "Ready Baron go to blue five now."

The CRM phrases were recorded in the corner of a large anechoic chamber with a digital video camera (Sony Digital Handycam) located roughly 1.5 m in front of the talker. The talkers (1 man and 1 woman) stood in front of a black, acoustically transparent background covering the wedges on the wall of the anechoic chamber and were instructed to repeat the CRM phrases at a monotone level while keeping their heads as still as possible. Breaks were inserted between the CRM phrases to avoid any effects of coarticulation between consecutive recordings. The resulting videotapes were downloaded onto a PC, where they were partitioned into different individual Audio Video Interleaved (AVI) files for each of the 128 recorded phrases. Then a commercially available video editor (VirtualDub) was used to crop the frames of the AVI files around the locations of the talkers' heads, convert them from color to gray scale, and compress them with the Indeo 5.1 codec. Finally, the AVIs were time aligned by padding enough silent repetitions of the first video frame in each file to set the onset of the word "ready" to the same temporal position within each AVI file. The audio signals associated with each AVI file were also normalized to have the same overall root mean square level. Figure 1 shows example frames from the CRM AVIs for the male and female talkers used in this experiment.

[FIGURE 1 OMITTED]

Apparatus

The experiment was conducted in a 5.5- x 5-m sound-treated conference room with the configuration shown in Figure 2. The participant was seated at the center of the long edge of a 3.3- x 1.2-m conference table located along the midline of the room. Directly across the table from the participant (roughly 1.37 m from the location of the head), a full-sized white cotton sheet was hung from the ceiling perpendicular to his or her line of sight. This sheet, which was held taut by a weighted wooden rod along its bottom edge, served as a projection screen for the visual stimuli used in the experiment. The visual stimuli were projected onto the sheet at eye level by a Boxlight CD-40m projector located on a shelf above and behind the participant's chair. The projector was connected to the SVGA output of a PC-based control computer, which used ActiveX calls to the Microsoft Windows Media Player to present the left and right visual stimuli (AVI files) at the left and right sides of a single MATLAB figure window. The system was run in an 800 x 600 video output mode, and the video files were each 150 x 150 pixels, with a separation of 118 pixels between the two faces. This resulted in an image size of 0.40 x 0.60 m for each of the two AVI frames and a separation of 0.72 m between the midpoints of the two faces on the screen. Note that this orientation results in face locations at +15[degrees] azimuth and -15[degrees] azimuth relative to the location of the participant (identical to the configuration used by Driver, 1996).

[FIGURE 2 OMITTED]

The loudspeakers used to generate the audio portions of the stimuli (Bose JewelCube) were mounted on stands behind the acoustically transparent screen. These speakers were adjusted to place them directly behind the mouths of the two faces projected on the front of the screen. The speakers were driven by a stereo amplifier connected to the sound card of the control PC.

Audiovisual Display Configurations

The experiment was designed to examine all possible combinations of three audio conditions and three visual conditions that could reasonably occur in a two-talker AV speech display. The audio conditions were a no-audio condition, in which neither talker's voice was presented, a one-channel audio condition, in which both of the talkers' voices were mixed together and presented from the same loudspeaker location, and a two-channel audio condition, in which each talker's voice was presented from a different loudspeaker location. The video conditions were a no-video condition, in which neither talker's face was visible, a one-channel video condition, in which only one of the talkers' faces was visible, and a two-channel video condition, in which both talkers' faces were visible. Figure 3 shows the eight AV conditions that result from different combinations of these audio and visual conditions. The third column of the figure shows the basic template of each AV condition for two arbitrary talkers, labeled A and B; Talker A was always located to the left of the participant. These templates illustrate the AV conditions pictorially by a small box with two circled positions at the top, representing the talkers shown visually at the left and right sides of the screen, and two uncircled positions at the bottom representing the talkers presented auditorily from the left and right loudspeaker locations (as indicated by the legend in the upper-right of the figure).

[FIGURE 3 OMITTED]

In a normally configured ("spatially congruent") AV display, in which the audio signals for each talker originate from the same locations as the visual representations of those talkers, each of these basic AV templates corresponds to either two or four different spatial configurations, depending on whether the target talker is designated by A or g in the template and whether Talker A in the template is located on the participant's left or right side. The fourth column of Figure 3 illustrates the 20 possible congruent spatial configurations that could originate from the eight basic AV templates. In configurations with at least one audio channel and at least one video channel, it is also possible to construct "spatially incongruent" variations for each template, in which the target and/or masker voice originates from a location different from that of the corresponding face. The rightmost column of Figure 3 shows the 12 possible incongruent spatial configurations corresponding to the templates with one or more audio channel and one or more video channel. Thus the last two columns of Figure 3 show a total of 32 different spatial configurations corresponding to the eight basic AV conditions shown in the figure, with spatial configurations that were mirror images of each other labeled as "a" and "b" versions of each of 16 unique configuration conditions (labeled Conf 1-16a and Conf 1-16b). All 52 of these possible spatial configurations were tested in the experiment.

Procedure

The experiment was conducted with the participant seated in front of the video screen in the darkened conference room. In each trial, the participant was presented with a stimulus containing the audio and/or video portions of two CRM phrases spoken simultaneously by the same talker: a target phrase, selected randomly from the 32 phrases containing the call sign "Baron," and a masking phrase, selected randomly from the 21 phrases containing the call sign "Ringo" and a color and number different from that of the target phrase. These two phrases were presented auditorily at the left or right loudspeaker locations and/or visually at the left or right talker locations according to one of the 32 spatial configurations shown in Figure 3. The participants were instructed to attend the target phrase containing the call sign "Baron" and to use the mouse to select the color and number contained in the target phrase from an array of colored digits that was displayed on the projection screen after the end of each stimulus.

The data were collected in 60-trial blocks. Each block was divided into 12 sequences of five consecutive trials with the same target talker location in one of the 32 different spatial configurations shown in Figure 3. Thus, within each block of trials, the locations of the target and masker voices and faces would remain fixed for five consecutive trials, randomly change to a new spatial configuration, remain fixed for five more consecutive trials, randomly change again, and so on until five trials had been collected in each of 12 different randomly selected spatial configurations. Over the course of the experiment, the data collection was balanced to collect an equal number of trials with each of the two target talkers (male or female) with each of the 32 spatial configurations shown in Figure 3 (i.e., an equal number of trials in the "a" and "b" versions of each of the 16 configuration conditions in the figure). Each participant participated in 34 60-trial blocks, plus one additional 40-trial block that was required to equalize the number of trials collected in each of the 32 spatial configurations. Thus each participant participated in a total of 13 five-trial sequences in each of the 32 possible AV configuration conditions of the experiment, for a total of 2080 trials per participant.

RESULTS

AV Display Condition

Figure 4 shows overall color and number identification performance in each of the eight spatially congruent AV display conditions tested in the experiment. These data were generated by averaging performance across all of the normal spatial configurations associated with each basic AV template in Figure 3. For example, performance in the AV condition with one audio channel and one video channel (speckled bar in the middle of Figure 4) represents mean performance across Spatial Configurations 4a, 4b, 5a, and 5b. This mean value represents the average overall level of performance that would occur if that AV display condition were used in a two-talker listening scenario in which the relevant information was equally likely to originate from either of those two talkers. In the AV conditions in which only one of the two talkers was visible (speckled bars), the black triangles show performance in the spatial configurations in which only the target talker was visible and the white triangle shows performance in the spatial configurations in which only the masking talker was visible.

[FIGURE 4 OMITTED]

The leftmost group of bars shows performance in the visual-only AV conditions, in which no audio signal was present in the stimulus. The speckled bar represents performance in the condition with only one visible talker. The black triangle illustrates performance in the spatial configurations in which the only visible talker happened to be the target talker. The participants' responses were 67% correct in this condition, indicating that they were quite good at determining both the colors and numbers contained in the CRM phrases from just the visual cues available in the stimuli.

The relatively high speechreading scores obtained in this experiment reflect the highly redundant nature of the CRM speech stimuli: With only four color and eight number alternatives, the participants were able to make reasonably accurate guesses about the contents of the CRM phrases, even without the benefit of any audio information. Of course, because only one talker was present in this AV condition, no information could be obtained from the display when the visible talker happened to be the masking talker rather than the target talker. Thus in Figure 4 the white triangle in this condition is placed at the chance level of performance (shown by the horizontal dashed line). The level of the speckled bar represents the average of the target-visible and masker-visible spatial configurations; in other words, it represents the mean performance level that would occur in a speech display that showed the face of only a single talker, who was 50% likely to be the target talker at any given time. Note that this mean performance level was substantially worse than the mean performance level that occurred in the no-audio condition in which both talkers were visible at the same time (shown by the adjacent shaded bar in the figure). This suggests that the participants in the AV condition with two visible faces were able to successfully divide their attention across the two faces and use the embedded call signs in the CRM phrases to help them selectively focus their attention on the face of the target talker.

The right two groups of bars in Figure 4 show performance in the AV display conditions that contained at least one channel of audio. The arcsine-transformed data from each participant in these six conditions were subjected to a two-factor, within-subjects repeated measures analysis of variance (ANOVA) on the factors of audio channels (one or two) and video channels (zero, one, or two). This ANOVA showed that the main effects of audio channel, F(1, 7) = 96.19, and video channel, F(2, 14) = 25.35, were both significant at the p < .0001 level but that the interaction between the two was not significant. From these data, it is apparent that spatially separating the two audio channels had the largest effect on performance, improving overall identifications by roughly 32 percentage points. Also, from the mean performance values illustrated by the bars, it is apparent that overall performance was roughly 4 percentage points better than in the audio-only condition when both talkers in the stimulus were visible but that it was actually 4 percentage points worse than in the audio-only condition when only a single visible talker was present in the stimulus. Although these differences were small in absolute terms, a post hoc test (Tukey's honestly significant difference) confirmed that both were statistically significant at the p < .001 level.

The reason for the drop in mean performance in the AV conditions with only a single video channel is illustrated by the white and black triangles in Figure 4. When a visible representation of just the target talker was added to the audio-only stimulus (black triangles), performance increased slightly in the condition with one audio channel and stayed roughly constant in the condition with two audio channels. However, when the visible representation of just the masking talker was added to the stimulus (white triangles), performance decreased in both audio conditions. This performance penalty might have been the result of a McGurk-like fusion between the visible masking face and the audible target voice, or it might simply have been the result of a strong bias on the part of the participants to incorrectly assume that the visible talker was always the target talker. However, it is worth noting that the performance penalty of showing just the masking talker was much smaller in the condition with two audio channels, in which the bias to assume that the visible talker was the target should have been just as strong. Thus it seems that spatially separating the two talkers' voices produced a substantial increase in overall performance and had the additional benefit of making the participants much more resistant to confusion from the presence of a visible representation of only the masking talker.

Spatial Incongruency

Figure 5 compares performance in the congruent and incongruent spatial configurations of the AV display conditions that included one or two audio channels and one or two visual channels. The arcsine-transformed performance scores for each participant were also subjected to a three-factor within-subjects ANOVA on the factors of audio channels (one or two), video channels (one or two), and spatial congruency (yes or no). This analysis revealed that all three of these main effects were statistically significant at the p < .05 level but that none of their interactions were significant. Thus it seems that performance was slightly but significantly worse in the spatially incongruent configurations than in the spatially congruent configurations.

[FIGURE 5 OMITTED]

In the AV conditions with one audio channel, performance was virtually identical in the spatially congruent and incongruent configurations of the experiment. Thus there does not appear to be any evidence of the improvement in performance reported by Driver (1996) when the visual representation of the target talker was spatially displaced from the location of the audio signal containing both the target and masking speech. In a recent experiment examining bimodal speech perception, Rudmann et al. (2003) also failed to find any evidence of the improvement in performance that Driver (1996) attributed to the ventriloquism effect. (Specifically, Driver, 1996, hypothesized that the spatially displaced visual image changed the apparent location of the corresponding audio signal and thus introduced an apparent spatial separation between the target and masking speech signals.) In light of our results and those of Rudmann et al. (2003), it does not appear that the ventriloquism-based spatial separation effect reported by Driver (1996) is particularly robust.

In the AV conditions with two audio channels, there does appear to be a slight decrease in performance in the spatially incongruent configurations. This is consistent with previous results that have shown that AV speech perception is better when the audio and visual signals are colocated than when they are spatially separated (Driver & Spence, 1994). However, this also seems to be a relatively small effect.

SUMMARY AND CONCLUSIONS

In terms of display design, the important results of this experiment can be summarized by the following three major points:

1. Spatial separation of the audio signals should be the first priority of any system designed to efficiently present more than one simultaneous speech signal. In this experiment, spatially separating the audio signals from the two talkers improved overall performance by 32 percentage points, whereas adding visual representations of both talkers improved performance only 4 percentage points. Spatially separating the audio signals also substantially reduced the performance penalty that occurred when only the masking talker was visible in the stimulus.

2. Caution should be used when adding only a single visible face to a multichannel speech display when there is no guarantee that the visible talker will be the most important talker in the combined AV stimulus. In this experiment, the performance gains that occurred when the visible talker happened to be the target talker were more than offset by the performance penalties that occurred when the visible talker was the masking talker.

3. Whenever possible, the audio and visual portions of a multitalker speech signal should be spatially congruent. In this experiment, there was no evidence that the ventriloquism effect improved performance when the visible representation of the target talker was spatially separated from a single-channel audio signal containing both the target and masking voices, and there was a significant decrease in performance when the visible target talker was presented with a spatially incongruent two-channel audio signal.

The results of this experiment provide a preliminary assessment of the roles that audio and visual cues play in speech-monitoring tasks, but a great deal of additional research is needed to obtain a clear picture of the full impact that visual cues have on multitalker speech perception. One possible area for future research is a more detailed analysis of the influence that different speech materials have on audiovisual interaction in monitoring tasks. The limited vocabulary of the CRM corpus used in this experiment (two call signs, four colors, and eight numbers) introduced substantial redundancy into the audio and visual signals associated with the different conditions of this experiment. This redundancy may have caused performance to be dominated by the most salient modality. If a larger vocabulary of phonetically balanced words were used to generate the stimuli, some of this redundancy would be eliminated and the participants would probably exhibit more signs of audiovisual integration than they did with the CRM phrases.

Another important area for future research is an extension of this experiment to multitalker listening configurations with more than two competing talkers. In the monitoring task used in this experiment, participants who were able to extract the masker call sign from one of the two competing speech signals were able to use elimination to determine that the other talker had to be the target talker. If a third talker were added to the stimulus, an elimination-based monitoring strategy would be less effective and overall performance in the task would be more dependent on the ability to divide attention across more than one simultaneous speech signal. This would provide a more thorough test of the role that audiovisual cues play in multitalker speech perception and would allow an extension of the results of this experiment to complex communications situations in which listeners are more likely to require the assistance of sophisticated speech display systems to successfully complete their assigned tasks.

ACKNOWLEDGMENTS

The authors would like to thank Bob Bolia and Jessica Young for their help with the collection of the AV corpus. Portions of this research were funded by the Air Force Office of Scientific Research.

REFERENCES

Abouchacra, K., Tran, T., Besing, J., & Koehnke, J. (1997, February). Performance on a selective attention task as a function of stimulus presentation mode. Presented at the Midwinter Meeting of the Association for Research in Otolaryngology, St. Petersburg Beach, Florida.

Bertelson, R, & Radeau, M. (1976). Ventriloquism, sensory interaction, and response bias: Remarks on the paper by Choe, Welch, Gilford, and Juola. Perception and Psychophysics, 29, 578-585.

Bronkhorst, A. (2000). The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acustica, 86, 117-128.

Brungart, D., Simpson, B., Ericson, M., & Scott, K. (2001). Informational and energetic masking effects in the perception of multiple simultaneous talkers. Journal of the Acoustical Society of America, 110, 2527-2558.

Crispien, K., & Ehrenberg, T. (1995). Evaluation of the "cocktail party effect" for multiple speech stimuli within a spatial audio display. Journal of the Audio Engineering Society, 43, 952-940.

Driver, J. (1996). Enhancement of selective attention by illusory mislocation of speech sounds due to lip-reading. Nature, 384, 66-68.

Driver, J., & Spence, C. (1994). Spatial synergies between auditory and visual attention. In C. Umilta & M. Moscovitch (Eds.), Attention and performance XV (pp. 511-351). Cambridge, MA: MIT Press.

Ericson, M., & McKinley, R. (1997). The intelligibility of multiple talkers spatially separated in noise. In R. H. Gilkey & T. R. Anderson (Eds.), Binaural and spatial hearing in real and virtual environments (pp. 701-724). Hillsdale, NJ: Erlbaum.

McGurk, H., & McDonald, I. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.

Nelson, W. T., Bolia, R. S., Ericson, M. A., & McKinley, R. L. (1999). Spatial audio displays for speech communication: A comparison of free-field and virtual sources. In Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting (pp. 1202-1205). Santa Monica, CA: Human Factors and Ergonomics Society.

Reisberg, D. (1978). Looking where you listen: Visual cues and auditory attention. Acta Psychologica, 42, 551-541.

Rudmann, D., McCarley, J., & Kramer, A. (2003). Bimodal displays improve speech comprehension in environments with multiple speakers. Human Factors, 45, 529-536.

Spence, C., Ranson, J., & Driver, J. (2000). Cross-modal selective attention: On the difficulty of ignoring sounds at the locus of visual attention. Perception and Psychophysics, 62, 410-424.

Sumby, W., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212-215.

Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audiovisual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology, of lipreading (pp. 5-51). New York: Erlbaum.

Douglas S. Brungart is a senior computer engineer in the Air Force Research Laboratory at Wright-Patterson Air Force Base, Ohio. He received a Ph.D. in electrical engineering from the Massachusetts Institute of Technology in 1998.

Alexander J. Kordik is a human factors psychologist with Sytronics, Inc., in Dayton, Ohio. He received a B.S. in psychology from Wright State University in 2002.

Brian D. Simpson is an engineering research psychologist in the Air Force Research Laboratory at Wright-Patterson Air Force Base, Ohio. He received a M.S. in human factors psychology from Wright State University in 2002.

Date received: March 7, 2003

Date accepted: August 26, 2004

Address correspondence to Douglas S. Brungart, AFRL/HECB, 2610 Seventh St., WPAFB, OH 45453-7901; douglas. brungart@wpafb.af.mil.
COPYRIGHT 2005 Human Factors and Ergonomics Society
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2005 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Brungart, Douglas S.; Kordik, Alexander J.; Simpson, Brian D.
Publication:Human Factors
Geographic Code:1USA
Date:Sep 22, 2005
Words:5370
Previous Article:A normative database of thumb circumduction in vivo: center of rotation and range of motion.
Next Article:Sharing control between humans and automation using haptic interface: primary and secondary task performance benefits.
Topics:

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters