There are a number of context effects in speech perception in which the identification of one phoneme is changed due to the identity of neighboring phonemes. It has been proposed that these identity shifts compensate for the spectral deformations caused by coarticulation. For example, listen to the following files.
To hear a synthesized version of /al/ followed by a consonant-vowel (CV) syllable,
click here:
(Was the second syllable a /da/ or a /ga/?)
To hear a synthesized version of /al/ followed by a consonant-vowel (CV) syllable,
click here:
(Was the second syllable a /da/ or a /ga/?)
In both of these previous cases the second syllable was identical, yet most people hear
the CV as /ga/ when preceded by /al/ and as /da/ when preceded by /ar/. The syllable
sounds like this in isolation:
This perceptual effect may counteract the constraints of coarticulation, in that, a
syllable is acoustically more /da/-like when produced following /al/ and more /ga/-like
when produced following /ar/. Thus, this context effect is likely important for veridical
speech communication.
Whereas, these effects have traditionally been offered as evidence for the existence of
specialized modules for speech perception, recent data suggest that the effects are the
result of more general auditory processes. Similar shifts in syllable "labeling"
have been demonstrated in pre-lingual infants (Fowler et al., 1990) and in birds
(Japanese quail, Coturnix japonica) trained to label syllables by pecking a key
(Lotto et al., 1997).
![]() |
Figure 1.Japanese quail (Coturnix
japonica) in an operant chamber. Quail trained to peck to /da/ or /ga/ syllables
showed a shift in peck rates dependent on the preceding syllable. More "/ga/
responses" were obtained when CVs were preceded by /al/ and more "/da/
responses" were obtained when CVs were preceded by /ar/. This shift in response is
similar to what is witnessed in humans responses to these syllables. |
Experiment 1: Non-Speech Analogue
One of the first questions raised by this context effect is whether or not it is specific
to speech stimuli?
In order to provide an answer to this question, we presented listeners with CVs (/da/-/ga/) preceded by non-speech stimuli that contained some of the purported important spectral properties of the speech contexts (/al/-/ar/) but that didnt sound like speech. The non-speech sounds were a sum of two sine-wave tones matched in frequency with the offset of the second and third formant (F2 and F3) in the speech stimuli (for /al/ this is: 956 and 2700 Hz; for /ar/: 1517 and 1600 Hz). These stimuli preceded the speech CVs with a 50 msec interstimulus gap. Listeners identified the CVs.
To listen to the tones modeled on /al/ followed by a CV, click here:
To listen to the tones modeled on /ar/ followed by a CV click here:
The CV is identical in both of these examples.
Results
Figure 2 relates the mean percentage of /ga/ responses for CVs following the speech /al/
and /ar/ contexts and for the CVs following the non-speech /al/ and /ar/ sine-pair
analogues. One obtains an identification shift for the non-speech contexts that is
statistically indistinguishable from the shift obtained for the speech contexts. This
suggests that the spectral properties of the context determine the context effect and not
the identity of the context. Taken together with the data from avian subjects, these
results point to a general auditory mechanism underlying this important speech context
effect.
![]() |
Figure 2. Mean Percent of /ga/ responses to CVs
preceded by synthesized speech (/al/ or /ar/) or preceded by non-speech analogues (sine
waves placed at frequencies of F2 and F3 offset of /al/ or /ar/). The shift in CV
identifications is not statistically different. |
Proposed General Explanations
The perceptual context effect may be described in terms of spectral contrast. That
is, following a syllable with high-frequency F3 offset (/al/), an ambiguous
syllable is labeled as if the syllable had a low-frequency F3 onset (/ga/).
Following a syllable with a low-frequency F3 offset (/ar/), an ambiguous syllable
is labeled as if the syllable had a high-frequency onset (/da/). Two mechanisms
that have been proposed as underlying this spectral contrast are: adaptation of auditory
nerve fibers and auditory enhancement (Holt & Kluender, 2000; see also Delgutte,
1996).
These possible explanations both suggest that the context effect occurs at a rather peripheral level in the auditory system (it appears that auditory enhancement is due in part to interactions in the cochlear nucleus). In order to evaluate purported mechanisms for this important context effect, two experiments were run which were designed to 1) describe the time course of the context effect (roughly); and 2) to determine if the effects are strictly monaural.
Experiment 2: Temporal Contiguity
A 10-step series of consonant-vowel (CV) syllables was synthesized varying in F3-onset
frequency (1800-2700 Hz). These syllables varied perceptually from a good /da/ to a good
/ga/. These CVs were preceded by synthesized versions of /al/ (F3 offset=2700 Hz) or /ar/
(F3 offset=1600 Hz). The duration of the silent gap between these syllables was varied
from 25 to 400 msec. Participants were asked to identify the second syllable as /da/ or
/ga/ by pressing a button on a response box. Pseudo-spectrograms of the stimuli are
displayed here.
Results
Figure 3 displays identification boundaries (from probit analysis) for each context (/al/
vs. /ar/) at each silent gap duration (25 400 msec). There is a monotonic decrease
in the size of the context effect as gap duration increases (higher boundary = more /ga/
responses). The boundary shift is significant (p < .05) for each duration up to,
and including, 275 msec.
The fact that the context effect is maintained for gaps up to 275 msec long has
implications for determining the mechanism underlying the effect. This duration appears to
be too long for adaptation at the level of ANFs to play an appreciable role. Viemeister
& Bacon (1981) found no auditory enhancement for their masking study beyond about 100
msec of silent gap. (However, the time course of auditory enhancement does vary with
particulars of the stimuli and tasks.) The context effect studied here is still quite
strong with a 100-msec gap between syllables. These data suggest that the mechanisms
responsible for this effect may not be peripheral.
![]() |
Figure 3. Identification boundary
(Probit) values for CVs preceded by /al/ or /ar/ with varying durations of silent gap
(with s.e. bars). T-tests are significant for all comparisons up to and including 275 msec
gap. |
Experiment 3: Dichotic Presentation
ANF adaptation and auditory enhancement are monaural effects. For example, Summerfield
& Assmann (1989) failed to find effects of a precursor stimulus when it was presented
to the contralateral ear. One way to examine the plausibility of auditory enhancement as a
mechanism for coarticulation compensation is to present the context for a syllable to the
contralateral ear as the target syllable.
Here, we use a different speech context than used in Experiment 1.
The target CV varies in the frequency of the onset of the second formant (F2).
Perceptually, it varies from /ba/ to /da/. This is preceded by examples of the vowel /i/
or /u/. We have shown previously that the context of /i/ (high F2) and /u/ (low F2)
results in a shift in identification from /ba/ (low F2 onset) to /da/ (high F2 onset),
respectively. In experiment 3, one group is presented the context and target CV binaurally
and a second group receives the target CV monaurally with the preceding context being
presented to the contralateral ear. The ear receiving the context varied randomly between
trials.
![]() |
Figure 4.Diagram of presentation
conditions for Experiment 3. |
Results
The data demonstrate that the context effect maintains with dichotic presentation. Figure
5 displays the identification functions for each presentation condition (binaural vs.
dichotic) and each context (/i/ vs. /u/). The size of the context effect does not change
when the context arrives at a different ear. As with the temporal range evidenced in
Experiment 2, these data suggest that the mechanism underlying this context effect is not
peripheral. The maintenance of the effect in the dichotic condition also makes it less
plausible that auditory enhancement is responsible for the effect, as auditory enhancement
is usually a monaural effect.
![]() |
Figure 5. Identification functions for
CVs preceded by /i/ or /u/ presented in the same ear or contralaterally. The size of the
shift (context effect) does not differ between dichotic and binaural presentation. |
Conclusions
There is a class of context effects that have been referred to as "perceptual
compensation for coarticulation". They may be important for maintaining invariant
phonemic perception despite varying acoustic input. The three experiments described here
lead to the following conclusions concerning the mechanisms responsible for these context
effects:
1) Effects of context can occur with, at least, a 275-msec gap between syllables. This suggests that the effects are not (completely) due to adaptation in the auditory nerve.
2) Shifts in identification occur even when the context is presented to the contralateral ear. This is evidence against auditory enhancement (a monaural effect) as a plausible mechanism for the context effects.
3) A similar identification shift can be induced by non-speech analogues with some spectral similarity to the speech contexts. These data suggest that the context effect is general in nature and does not require that the context is perceived as speech. The results are also coherent with a general spectral contrast account of the effects.
Bibliography
Delgutte, B. (1996). Auditory neural processing of speech. In W. J. Hardcastle & J. Laver (Eds.), The Handbook of Phonetic Sciences, pp. 507-538. Oxford: Blackwell.
Fowler, C.A., Best, C.T., & McRoberts, G.W. (1990). Young infants' perception of liquid coarticulatory influences on following stop consonants. Perception & Psychophysics, 48, 559-570.
Holt, L.L., & Kluender, K.R. (in press). General auditory processes contribute to perceptual accommodation of coarticulation. Phonetica.
Lotto, A.J., Kluender, K.R., & Holt, L.L. (1997). Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica). Journal of the Acoustical Society of America, 102, 1134-1140.
Summerfield, Q., & Assmann, P.F. (1989). Auditory enhancement and the perception of concurrent vowels. Perception & Psychophysics, 45, 529-536.
Viemeister, N.F.., & Bacon, S.P. (1982). Forward masking by
enhanced components in harmonic complexes. Journal of the Acoustical Society of America,
71, 1502-1507.