Speech Processing and Waveform Envelope

(see AUDIO FILES for information on the *.wav files)

This demonstration is based on the work of Shannon et al (see references below). The speech utterance of the numbers "1 2 3" is first filtered by five gamma-tone filters (each with a Q of 10) centered at 200 Hz, 400 Hz, 800 Hz, 1600 Hz, and 3200 Hz. The envelope of the filtered speech waveform at the output of each filter is extracted by rectifying the filtered waveform and low-pass filtering at 100 Hz. A broadband noise is then filtered with the same five gamma-tone filters that were used to filter the speech waveform. Each of these filtered noises is then multiplied by the envelope extracted from the respective filtered speech waveform. That is, the 200-Hz extracted speech envelope is multiplied times the 200-Hz filtered noise, the 400-Hz extracted envelope times the 400-Hz filtered noise, and so on. The Figure describes this signal processing and generation procedure for two filter bands; a low-frequency band (low CF) shown at top and a high-frequency band (high CF) shown on the bottom. The two envelope-modulated noise bands could be added together or presented in isolation. By clicking on 1-2-3 you will hear the starting speech waveform.

Clicking on 200-Hz Band, will produce the waveform of the 200-Hz band of noise modulated with the envelope of the speech waveform filtered at 200 Hz.

Clicking on 200+400-Hz Bands, will produce the waveform of the 200-Hz band of noise modulated with the envelope of the speech waveform filtered at 200 Hz plus the 400-Hz band multiplied by its respective envelope.

Clicking on 200+400+800-Hz Bands, will produce the waveform of the 200-Hz band of noise modulated with the envelope of the speech waveform filtered at 200 Hz plus the 400-Hz band and the 800-Hz band each multiplied by their respective envelopes.

Clicking on 200+400+800+1600-Hz Band, will produce the waveform of the 200-Hz band of noise modulated with the envelope of the speech waveform filtered at 200 Hz plus the 400-Hz band, the 800-Hz band, and the 1600-Hz band each multiplied by their respective envelopes.

Clicking on 200+400+800+1600+3200-Hz Bands, will produce the waveform of the 200-Hz band of noise modulated with the envelope of the speech waveform filtered at 200 Hz plus the 400-Hz band, the 800-Hz band, the 1600-Hz band, and the 3200-Hz band each multiplied by their respective envelopes.

You will notice that the intelligibility of the speech-like sound increases as the number of bands that are added together increases. The five-band condition (200+400+800+1600+3200-Hz) produces a highly intelligible version of "1-2-3."

Clicking on Altered Band will produce the waveform generated when the bands used to filter the speech are different from those used to filter the noise. In particular, the center frequency of the bands used to filter the noise were lowered one-octave below those used to filter the speech waveform. That is, the envelope extracted from the 200-Hz filtered portion of speech was multiplied times a 100-Hz center frequency filtered band of noise, the 400-Hz extracted envelope was multiplied times a 200-Hz center frequency filtered band of noise, and so on until the 3200-Hz extracted envelope was multiplied times a 1600-Hz center frequency filtered band of noise. Then all five bands of noise were added together as in the case of the 200+400+800+1600+3200-Hz Band condition. The Altered Band example is like frequency shifting downward. This might be something one would do to provide a possible hearing aid for a person with good low-frequency hearing and poor high-frequency hearing. However as you hear, the intelligibility is very poor in the Altered Band condition. The research that has been done to date using this technique suggests that almost anything that is done to disassociate the envelope extracted from a particular band of noise with the band of noise that is used for the multiplication will significantly lower intelligibility.


Suggested References:

Shannon, R.V., Zeng, F., Kamath, V., Wygonski, J., Ekelid, M., Speech Recognition with Primarily Temporal Cues, Science 270, 303-304, 1995

Grant, K.W., Braida, L.D., Renn, R.J., Single Band Amplitude Envelope Cues as an Aid to Speechreading, Quarterly Journal of Experimental Psychology 43(A), 621-645, 1991