Plain Speaking in Mil/Aero Audio Systems: In aerospace and defense audio systems, speech intelligibility is more critical than speech quality.
Key Methods for Measuring Speech
One technique developed to measure speech intelligibility is Speech Transmission Index (STI). Normal fluctuations in speech signals, from the acoustic separation of sentences, words, and phonemes, carry the most relevant information related to speech intelligibility. Modulation depth and modulation rate form a modulation spectrum. For clear speech, modulation rates range from 0.5 Hz to 16 Hz, with maximum modulation at around 3.0 Hz.
Another measurement technique is Modified Rhyme Tests (MRTs), where trained listeners, in a controlled laboratory environment, are presented with sets of rhyming single-syllable words in a random order. Conclusions are drawn, based on analysis of the responses to questions about the material.
A speech algorithm, Articulation-Band Correlation Modified Rhyme Test (ABC-MRT), uses a form of Automatic Speech Recognition (ASR) to conduct an automated MRT that does not involve human subjects, thereby reducing test time, complexity, and expense. Published in 2013, the algorithm was initially limited to narrowband communication systems (250 Hz to 3.85 kHz bandwidth). In 2016, bandwidth was extended to cover all bandwidths up to full-band (20 Hz to 20 kHz), and an attention model was added. The enhanced method is ABC-MRT16.
Generally, an STI signal is formed from seven octave-band noise signals, with overall spectral shape matching the long-term average spectrum of speech. Each noise carrier is modulated with one or more of 14 low frequency sine waves at modulation rates from 0.63 Hz to 12.5 Hz, spaced at 1/3-octave intervals. The STI signal passes through the transmission channel with the output signal captured at a listener position. From this output signal, the modulation transfer ratio (the reduction in modulation depth) is determined and used to calculate a single-number STI value.
Full STI requires 98 test signals to be applied sequentially; each measurement takes about 10 seconds for a total measurement time of approximately 15 minutes. The subsequent STI for Public Address (STIPA) standard uses simplified, faster, and more practical methods, making it popular.
In STIPA, each of the seven octave bands contains a pre-defined set of two modulation rates. These 14 combinations are generated simultaneously in one signal and processed in parallel to reduce measurement time to between 10 and 20 seconds. To date, however, the method has only been standardized for the male speech spectrum.
Item Description The overall rms level for each octave band Leq (dB) (referred to as the equivalent sound level for acoustic inputs) Indicates the spectral shape of the received Level Ratio (dB) signal relative to the source signal, normalized to the 1 kHz octave band. f-mod (*) (Hz) The modulation frequency. Raw MTR (*) The calculated raw modulation transfer ratio. Correction MTR (*) The MTR after correction for auditory masking and reception threshold. TI (*) The transmission index. Octave MTI The average of TI1 & TI2 (*) These items have two values for each octave band
Although the STI approach might seem similar to audio measurements like Perceptual Evaluation of Speech Quality (PESQ) and Perceptual Objective Listening Quality Analysis (POLQA), both PESQ and POLQA were developed to measure speech quality over telephone transmission networks in ideal listening conditions with very little background noise. STI, however, is a metric for speech intelligibility more applicable to high-noise environments where intelligibility is primary and quality is a secondary concern.
The STI uses the modulation transfer function (MTF), which is the ratio of modulation depth of the received signal to the modulation depth of the transmitted signal as a function of modulation rate. Distortions in the transmission channel, such as noise, reverberation, echoes, and digital codecs, reduce the modulation depth and distort speech intelligibility.
For ABC-MRT measurement, a specialized ASR algorithm emulates the MRT methodology by recognizing keywords transmitted through a communication system. It also uses frequency bands called Articulation Index (AI) bands. There are 17 AI bands for narrowband speech and 21 for full-band speech.
The algorithm uses time-frequency (T-F) representations of keywords in the MRT. The T-F pattern of an impaired speech signal is correlated with corresponding patterns of the six unimpaired options from a list of six rhyming words in multiple AI bands (Figure 2).
The top image in Figure 2 is a spectrogram of the sentence "Please select the word went," as spoken by Female 1 in the Institute for Telecommunication Sciences (ITS) ABC-MRT speech database. The frequency is from 0 to 10 kHz and the time span is approximately 2 seconds. The significant levels are represented by blue (low) and red (high). The portion of the spectrogram containing the keyword "went" is highlighted with a red rectangle. The bottom images show the spectrograms of the six keywords from the list, plotted on the same time and frequency scales.
For each trial, a sentence containing the carrier phrase and keyword is generated and passed through the system, or Device Under Test (DUT). The signal is acquired from the output of the DUT and transformed to a T-F pattern using the same technique applied to the 1,200 isolated keyword recordings in the database.
The keyword is located within the T-F pattern by cross-correlating it in two of the AI bands with the T-F pattern of the original keyword in the template file. The portion of the DUT's T-F pattern containing the keyword is extracted and matched with one of the six rhyming words in the AI bands of interest. Eliminating negative results leaves 17 narrowband and 21 full-band AI band correlations for each candidate keyword. Keywords in the 16 AI bands with the highest correlation values are selected and compared to the known correct keyword. The average number of correct keyword selections over the 16 AI bands provides the success rate. The average success rate across all the trials is corrected to account for the effect of guessing in MRT tests, and provides the ABC-MRT Intelligibility Score.
An ITS evaluation of ABC-MRT and ABC-MRT16 found extremely high correlation with subjective MRT and low estimation error. ABC-MRT16 performed significantly better than ABC-MRT for the 139 narrowband conditions, indicating the importance of the attention model.
STI is used in high-noise environments where intelligibility is paramount. Examples include Public Address (PA) systems, aircraft Voice Announcements (VA) and emergency communication systems, in-vehicle communication systems, auditorium systems, and assistive hearing systems.
It should not be used for measuring systems with transmission channels that contain vocoders (i.e. codecs which operate only on speech elements), but may be used for digital codecs that operate on the entire signal. In systems with aggressive noise-suppression algorithms, the STI signal is likely to be suppressed by the algorithm.
The ABC-MRT algorithm works with voice codecs and noise suppression systems, and can be used for any of the standard speech bandwidths.
Having ABC-MRT measurement capability in an audio analyzer complements its traditional audio test functionality. Depending upon the analyzer, it can also provide access to multiple audio interfaces (analog, acoustic, AES3-SP/DIF, digital serial, Bluetooth, PDM, HDMI, and ASIO), a built-in test sequencer and reporting engine, test limits, multiple channels, and a wide variety of specialized audio measurements.
By Joe Begin, director of applications and technical support, Audio Precision
The U.S. National Fire Protection Association (NFPA) 1981 standard covers self-contained breathing apparatus (SCBA) for emergency services. It requires a minimum STI of 0.55 for non-electronic systems, and 0.60 for supplementary systems. The test methods for both non-electronic and supplementary systems are similar. Both require a hemi-anechoic chamber and a Head and Torso Simulator (HATS) with mouth simulator (Figure 3).
NFPA test methods require measuring STI at a test microphone that is 1.5 m in front of the artificial mouth, while simultaneously generating pink noise from a separate speaker located below the test microphone. (Pink noise power per hertz decreases as the frequency increases; white noise has an equal power per hertz through all frequencies.)
NFPA 1981 requires that the mouth simulator is equalized to a specific 1/3-octave spectral shape at the mouth reference point of the mouth simulator. For convenience when adjusting the EQ by trial and error, the output EQ curve in the audio analyzer software can be specified at standard 1/3-octave center frequencies.
The standard suggests using a STIPA signal generator with an equalizer to drive the mouth simulator and a separate pink noise generator with an equalizer to drive the pink noise speaker. Both signals can be generated simultaneously by an APx audio analyzer, using its ability to generate stereo waveforms with different levels on each channel.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||TECH FOCUS: AUDIO SYSTEMS|
|Publication:||ECN-Electronic Component News|
|Date:||Nov 1, 2018|
|Previous Article:||Hardware-in-the-Loop Testing Meets Wireless System Challenges: Applying channel emulators into performance verification.|
|Next Article:||Company Profile: BAE Systems' Center of Excellence for GaN and GaAs Technology: We dive into the details of BAE Systems' AMP Center and its MMIC...|