0 ratings 0% found this document useful (0 votes) 50 views 9 pages Abramson Lisker 1985
This document discusses the relative power of fundamental frequency (Fo) shifts versus voice timing in distinguishing consonant voicing in speech perception. The authors conducted experiments using synthesized speech to analyze the effects of varying Fo and voice onset time (VOT) on listeners' judgments of consonant sounds. The findings suggest that while Fo shifts have a modest effect on voicing judgments, VOT remains the primary cue, with certain VOT values being categorical and unaffected by Fo.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save abramson_lisker_1985 For Later In: V. Fromkin (ed.)
3 Phonetic Linguistics
Essays in Honor of Peter Ladefoged
Relative Power of Cues: Fg Shift $07
Versus Voice Timing*
Arthur S. Abramson
Leigh Lisker
1, BACKGROUND
The acoustic features that provide information on the identify of phonetic
segments are commonly called “‘cues to speech perception.” These cues
do not typically have one-to-one relationships with phonetic distinctions.
Indeed, research usually shows more than one cue to be pertinent to a
distinction, although all such cues may not be equally important, Thus,
if two cues, x and y, are relevant for a distinction, it may turn out that
for any value x, a variation of y will effect a significant shift in listeners’
Phonetic judgments but that there will be some values of y for which
varying x will have negligible effect on phonetic judgments. We say,
then, that y is the more powerful cue.
A good deal of evidence now exists to show that the timing of the
valvular action of the larynx relative to supraglottal articulation is widely
used in languages to distinguish homorganic consonants. The detailed
Properties of the distinctions thus produced depend on glottal shape and
concomitant laryngeal impedance or stoppage of airflow, as well as on
the phonatory state of the vocal folds. Such acoustic consequences as
the presence or absence of audible glottal pulsing during consonant closures
or constrictions, the turbulence called aspiration between consonant release
and onset or resumption of pulsing, and damping of energy in the region
of the first formant have all been subsumed (Lisker & Abramson 1964,
* This work was supported by Grant HD-01994 from the National Institute of Child
Health and Human Development to Haskins Laboratories. An oral version of this chapter
was presented at the Tenth International Congress of Phonetic Sciences, Utrecht, 1-6
August, 1983,
25
PHONETIC LINGUISTICS Copyright © 1985 by Academie Press, Inc.
Al rights of reproduction in any form reserved.
ISBN 0-12-268990926 ARTHUR S. ABRAMSON AND LEIGH LISKER
1971) under a general mechanism of voice timing, In utterance-initial
Position, the phonetic environment in which consonantal distinctions
based on differences in the relative timing of laryngeal and supraglottal
action have been most often studied, this phonetic dimension has commonly
been referred to as voice onset time (VOT).
Although the acoustic features just mentioned, and perhaps some others,
may be said to vary under the control of the single mechanism of voice
timing, it is of course possible, by means of speech synthesis, to vary
them one at a time to learn which of them are perceptually more important.
We must not forget, however, that such experimentation involves pitting
against one another acoustic features that are not independently controlled
by the human speaker,
A relevant feature not yet mentioned is the fundamental frequency
(Fo) of the voice. If we assume a certain Fo contour as shaped by the
intonation or tone of the moment, there is a good correlation between
the voicing state of an initial consonant and the Fo height and movement
at the beginning of that contour (House & Fairbanks 1953; but see also
O'Shaughnessy 1979 for complications). After a voiced stop, Fo is likely
to be lower and shift upward, while after a voiceless stop it will be higher
and shift downward (Lehiste & Peterson 1961). Although the phenomenon
has not been fully explained, it is at least apparent that it is a function
of physiological and aerodynamic factors associated with the voicing
difference.
‘The data derived from the acoustic analysis of natural speech can be
matched by experiments with synthetic speech that demonstrate that Fo
shifts can influence listeners’ judgments of consonant voicing (Fujimura
1971; Haggard, Ambler, & Callow 1970; Haggard, Summerfield, & Roberts
1981). Of further interest in this connection is the claim that phonemic
tones have developed in certain language families through increased
awareness of these voicing-induced Fo shifts and their consequent pro-
motion to distinctive pitch features under independent control in production
(Hombert, Ohala, & Ewan 1979; Maspero 1911).
Our motivation for the present study was to put Fo into proper per-
spective as one of a set of potential cues to consonant voicing coordinated
by laryngeal timing. After all, our own earlier synthesis (Abramson &
Lisker 1965; Lisker & Abramson 1970) yielded quite satisfactory voicing
distinctions without Fo as a variable. In addition, Haggard et al. (1970)
may have exaggerated its importance in the perception of natural speech
by their use of a frequency range of 163 Hz, one very much greater
than, for example, the range of less than 40 Hz found for English stop
productions by Hombert (1975). We set out to test the hypothesis that
the separate perceptual effect of Fo is small and dependent upon voiceCHAPTER 3. RELATIVE POWER OF CUES: Fo SHIFT VS. VOICE TIMING 27
timing, while the dependence of the voice timing effect on Fo is virtually
nil. We used native speakers of English as test subjects.
2, PROCEDURE
Making use of the Haskins Laboratories formant synthesizer, we pre-
Pared a pattern appropriate to an initial labial stop followed by a vowel
[a]. Variants of this pattern were then synthesized with VOT values of
5, 20, 35, and 50 msec after the simulated stop release.
These values were chosen because of earlier work (Figure 3.1) that
determined English voicing judgments for a VOT continuum ranging from
150 msec before release to 150 msec after release. This range of VOT
values was sampled at 10 msec intervals, except for the span from 10
msec before release to 50 msec after release, which was sampled at 5
msec intervals. Those stimuli for which voice onset followed release,
that is, to the right of 0 msec on the abscissa, had noise-excited upper
formants during the interval between the burst at VOT = 0 and the
onset of voice. In the labial data at the top of the figure, the perceptual
crossover point between /b/ and /p/ falls just after 20 msec of voicing
lag. Thus, we expected that the extreme values of our more limited range
would be heard as unambiguous /b/ and /p/, given an unchanging Fo,
while the category boundary, lying somewhere between, might be shifted
one way or the other as the Fo was varied. In addition to a set of VOT
variants having an Fo fixed at 114 Hz, we imposed onset frequencies of
98, 108, 120, and 130 Hz, values commensurate with ranges reported for
natural speech (Hombert 1975; House & Fairbanks 1953; Lea 1973; Lehiste
& Peterson 1961). That is, the Fo at voicing onset for each variant began
at one of those frequencies and shifted upward or downward to a level
of 114 Hz, where it stayed for the rest of the syllable. These Fo shifts
were of three durations, 50, 100, and 150 msec. These fitted with our
own cursory observations and bracketed the value of 100 msec found
by Hombert (1975). We recorded the resulting $2 stimuli—two tokens
of each—in three randomizations and played the tapes to 11 native
speakers of English for labeling as /b/ or /p/. The subjects, three women
and eight men, represented a wide variety of regional dialects, 10 in the
United States and one in Britain,
3. RESULTS
The overall results are shown in Figure 3.2. The three panels are for
the durations of Fo shift. The abscissa of each panel shows. the fourPER CENT IDENTIFICATION (128s)
28 AR’
ENGLISH
100) Orne on gece tnd ng indent
tof LABIAL
Net50
60.
40.
20:
150
1009 ooo emtmengneieiie engine
sof APICAL
Nae.
60:
40.
1004 omen erent.
60:
40.
20.
‘THUR S. ABRAMSON AND LEIGH LISKER
.
“180° -120" 90-60-30 ° wo” 60 (9020
VOICE ON.
SET TIME IN MSEC
Figure 3.1 English voicing judgments for stops varying in VOT. Below each pair of
curves is a histogram (from Lisker & Abi
ramson 1964) of frequency distributions of VOT
in speech. Reproduced from Lisker and Abramson (1970).
b arene
150
150Fo shit raion $0 mage
g
: 3
é
°
100
%
Fy onsets nH
c 0 —
i
130
- 14 a9
°
VOT in mae
Figure 3.2 Effects of Fo shifts on identification of VOT variants as English labial stops.30 ARTHUR S. ABRAMSON AND LEIGH LISKER
VOT values, while the ordinate gives the Percentage identified as /p/ for
each VOT. The coded line standing for the variants with a flat Fo of 114
Hz is, of course, a plot of the same data in all three panels, The 50%
Perceptual crossover point for the flat Fo falls at about 25 msec of VOT.
This is consistent with the results for the more finely graded series of
stimuli in Figure 3.1. Indeed, for all conditions in Figure 3.2, it is VOT
that is the main causative factor, regardless of Fo, with perceptual cross-
overs in the region of the VOT of 20 msec. With hindsight we can say
that additional stimuli with VOTs of 15 and 25 msec would have given
more precision. At the same time, we do note effects of the fundamental
frequency shifts: In each panel there is much spread of data points for
20 msec and virtually none for 35 and 50 msec,
In Figure 3.3 we focus on the results for the stimuli with a VOT of
20 msec, the one that shows the major effect of Fo shifts. For each of
the four Fo onsets we see the percentage of /p/ responses. The coded
lines stand for the three durations of Fo shift. A rather general upward
trend in /p/ responses is evident as Fo onset rises. A two-way analysis
of variance yielded a significant main effect for Fo onset (F[3,30] =
36.45, p < 0.001) and a strong interaction between shift duration and
Fo onset for each duration (F[6,60] = 6.00, p < 0.01).
VOT=+20 msec
100
Fo-shitt durations in msec
— 50
--~ 100
—— 150
78
50
25
Percent /p/ ide
98 108 120 130
Fo in Hz
Figure 3.3 Effects of Fo shifts on VOT of 20 msec.CHAPTER 3. RELATIVE POWER OF CUES: Fo SHIFT VS. VOICE TIMING 31
Figure 3.4 focuses on the Fo onset of 130 Hz, the one that had the
highest number of /p/ identifications. The /p/ responses for this Fo onset
at all four VOT values are shown. Coded lines stand for the three shift
durations; the flat Fo plot, marked “‘no shift,”” is repeated from Figure
3.2. It is once again obvious that the major effect is at the VOT of 20
msec, with the deviation from ‘‘no shift’ increasing with greater shift
duration.
The spread of points at the VOT of 5 msec in Figure 3.4, although
much smaller than that at 20 msec, made us look for significant effects
in individual cells of the confusion matrix underlying all our plots. That
is, wherever we found apparent effects of fundamental frequency at VOT
values other than 20, the locus of the main effect, we did a one-tailed
t-test for significant deviations from 100%. All such suspicious clusters
of responses were at VOT values of 5 msec and 35 msec; for the former,
we expected 100% /b/ identifications and for the latter, 100% /p/ iden-
tifications. We found three such significant deviations, all of them at the
VOT of 5 msec: (1) 120 Hz onset and 50 msec duration (¢[10] = 2.70,
p < 0.01), (2) 130 Hz onset and 100 msec duration (s(10] = —2.51,
Pp < 0.025), (3) 130 Hz onset and 150 msec duration (t{10] = 2.799,
Fy onset 130 Hz
100 [re
15
Q
y
;
g J
= 50 ?
5 fy d
3 Y — Focabitt durations in msec
‘00
[ ws 150
oo No shit
2
°
$ 20 36 EF
VOT in meee
Figure 3.4. Effects of VOT and shift durations on onset of 130 Hz.32 ARTHUR S. ABRAMSON AND LEIGH LISKER
P < 0.01). No such significant deviations were found at the VOT values
of 35 msec and 50 msec.
4. CONCLUSION
We conclude that there is a modest effect of fundamental frequency
shifts on judgments of consonant voicing even within more natural ranges
determining the plausibility of arguments on the rise af distinctive tones
(Abramson 1975; Abramson & Erickson 1978),
category or the other by Fo shifts in a forced-choice test, Finally, there
are values of VOT that are firmly categorical; they cannot be affected
by Fo. There are, however, no values of fundamental frequency that
cannot be affected by voice onset time.
NOTES
1. The normal ranges of Fo variation linked to consonant voicing, not only in citation
forms but especially in running speech (Lea 1973; O'Shaughnessy 1979), have still not
been well described. We have begun a study of this matter wit different sentence intonations
88 variable (Abramson and Lisker 1984) and hope to present a full Teport soon,
REFERENCES
Abramson, A. S., & Lisker, L. (1984). Stop voicing, intonation, and the Fo contour,
Journal of the Acoustical Society of America, 75, S40 (Abstract).
Abramson, A. S. (1975). Pitch in the perception of voicing states in Thai: Diachronic
implications. Haskins Laboratories Status Report on ‘Speech Research, SR-41, 165—
174,
Abramson, A. S., & Erickson, D. M. (1978). Diachronic tone splits and voicing shifts in
Thai: Some perceptual data. Haskins Laboratories Status Report on Speech Research,
SR-53Q), 85-96,CHAPTER 3. RELATIVE POWER OF CUES: Fo SHIFT VS. VOICE TIMING 33
Abramson, A. S.. & Lisker, L. (1965). Voice onset time in stop consonants: Acoustic
analysis and synthesis. Proceedings of the Sth International Congress of Acoustics,
Liege.
Fujimura, O. (1971). Remarks on stop consonants: Synthesis experiments and acoustic
cues. In L. L. Hammerich, R. Jakobson. & E. Zwirner (Eds.), Form and substanc
Phonetic and linguistic papers presented to Eli Fischer-Jorgensen. Copenhagen: Aka-
demisk Forlag.
Haggard, M. P., Ambler, S., & Callow, M. (1970). Pitch as a voicing cue. Journal of the
Acoustical Society of America, 47, 613-617.
Haggard, M., Summerfield, Q., & Roberts, M. (1981). Psychoacoustical and cultural de-
terminants of phoneme boundaries: Evidence from trading Fo cues in the voiced—
voiceless distinction. Journal of Phonetics, 9, 49-62.
Hombert, J. M. (1975). Towards a theory of tonogenesis: An empirical, physiologically
and perceptwally-based account of the development of tonal contrasts in language.
Unpublished doctoral dissertation, University of California, Berkeley.
Hombert, J. M., Ohala, J., & Ewan, W. (1979). Phonetic explanation for the development
of tones. Language, 55, 37-58.
House, A. S., & Fairbanks, G. (1953). The influence of consonant environment upon the
secondary acoustical characteristics of vowels. Journal of the Acoustical Society of
America, 25, 105-113.
Lea, W. (1973). Segmental and suprasegmental influences on fundamental frequency contours.
In L. Hyman (Ed.), Consonant types and tone. Southern California Papers in Linguistics
(Los Angeles), /.
Lehiste, 1., & Peterson, G. E. (1961). Some basic considerations in the analysis of intonation.
Journal of the Acoustical Society of America, 33, 419-423,
Lisker, L., & Abramson, A. (1964). A cross-language study of voicing in initial stop:
Acoustical measurements. Word. 20, 384~422.
Lisker, L., & Abramson, A. S. (1970). The voicing dimension: Some experiments in
comparative phonetics. Proceedings of the 6th International Congress of Phonetic
Sciences. Prague: Academia.
Lisker. L., & Abramson, A. S. (1971). Distinctive features and laryngeal control. Language,
47, 167-785.
Maspero, H. (1911). Contribution a l'étude du systéme phonétique des langues thai,
de l'Ecole Francaise d’Extréme-Orient, 19, 152-168.
‘O'Shaughnessy, D. (1979). Linguistic features in fundamental frequency patterns. Journal
of Phonetics, 7, 119-145.
zulletin