0% found this document useful (0 votes)
50 views9 pages

Abramson Lisker 1985

This document discusses the relative power of fundamental frequency (Fo) shifts versus voice timing in distinguishing consonant voicing in speech perception. The authors conducted experiments using synthesized speech to analyze the effects of varying Fo and voice onset time (VOT) on listeners' judgments of consonant sounds. The findings suggest that while Fo shifts have a modest effect on voicing judgments, VOT remains the primary cue, with certain VOT values being categorical and unaffected by Fo.

Uploaded by

fhr1406
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
50 views9 pages

Abramson Lisker 1985

This document discusses the relative power of fundamental frequency (Fo) shifts versus voice timing in distinguishing consonant voicing in speech perception. The authors conducted experiments using synthesized speech to analyze the effects of varying Fo and voice onset time (VOT) on listeners' judgments of consonant sounds. The findings suggest that while Fo shifts have a modest effect on voicing judgments, VOT remains the primary cue, with certain VOT values being categorical and unaffected by Fo.

Uploaded by

fhr1406
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 9
In: V. Fromkin (ed.) 3 Phonetic Linguistics Essays in Honor of Peter Ladefoged Relative Power of Cues: Fg Shift $07 Versus Voice Timing* Arthur S. Abramson Leigh Lisker 1, BACKGROUND The acoustic features that provide information on the identify of phonetic segments are commonly called “‘cues to speech perception.” These cues do not typically have one-to-one relationships with phonetic distinctions. Indeed, research usually shows more than one cue to be pertinent to a distinction, although all such cues may not be equally important, Thus, if two cues, x and y, are relevant for a distinction, it may turn out that for any value x, a variation of y will effect a significant shift in listeners’ Phonetic judgments but that there will be some values of y for which varying x will have negligible effect on phonetic judgments. We say, then, that y is the more powerful cue. A good deal of evidence now exists to show that the timing of the valvular action of the larynx relative to supraglottal articulation is widely used in languages to distinguish homorganic consonants. The detailed Properties of the distinctions thus produced depend on glottal shape and concomitant laryngeal impedance or stoppage of airflow, as well as on the phonatory state of the vocal folds. Such acoustic consequences as the presence or absence of audible glottal pulsing during consonant closures or constrictions, the turbulence called aspiration between consonant release and onset or resumption of pulsing, and damping of energy in the region of the first formant have all been subsumed (Lisker & Abramson 1964, * This work was supported by Grant HD-01994 from the National Institute of Child Health and Human Development to Haskins Laboratories. An oral version of this chapter was presented at the Tenth International Congress of Phonetic Sciences, Utrecht, 1-6 August, 1983, 25 PHONETIC LINGUISTICS Copyright © 1985 by Academie Press, Inc. Al rights of reproduction in any form reserved. ISBN 0-12-2689909 26 ARTHUR S. ABRAMSON AND LEIGH LISKER 1971) under a general mechanism of voice timing, In utterance-initial Position, the phonetic environment in which consonantal distinctions based on differences in the relative timing of laryngeal and supraglottal action have been most often studied, this phonetic dimension has commonly been referred to as voice onset time (VOT). Although the acoustic features just mentioned, and perhaps some others, may be said to vary under the control of the single mechanism of voice timing, it is of course possible, by means of speech synthesis, to vary them one at a time to learn which of them are perceptually more important. We must not forget, however, that such experimentation involves pitting against one another acoustic features that are not independently controlled by the human speaker, A relevant feature not yet mentioned is the fundamental frequency (Fo) of the voice. If we assume a certain Fo contour as shaped by the intonation or tone of the moment, there is a good correlation between the voicing state of an initial consonant and the Fo height and movement at the beginning of that contour (House & Fairbanks 1953; but see also O'Shaughnessy 1979 for complications). After a voiced stop, Fo is likely to be lower and shift upward, while after a voiceless stop it will be higher and shift downward (Lehiste & Peterson 1961). Although the phenomenon has not been fully explained, it is at least apparent that it is a function of physiological and aerodynamic factors associated with the voicing difference. ‘The data derived from the acoustic analysis of natural speech can be matched by experiments with synthetic speech that demonstrate that Fo shifts can influence listeners’ judgments of consonant voicing (Fujimura 1971; Haggard, Ambler, & Callow 1970; Haggard, Summerfield, & Roberts 1981). Of further interest in this connection is the claim that phonemic tones have developed in certain language families through increased awareness of these voicing-induced Fo shifts and their consequent pro- motion to distinctive pitch features under independent control in production (Hombert, Ohala, & Ewan 1979; Maspero 1911). Our motivation for the present study was to put Fo into proper per- spective as one of a set of potential cues to consonant voicing coordinated by laryngeal timing. After all, our own earlier synthesis (Abramson & Lisker 1965; Lisker & Abramson 1970) yielded quite satisfactory voicing distinctions without Fo as a variable. In addition, Haggard et al. (1970) may have exaggerated its importance in the perception of natural speech by their use of a frequency range of 163 Hz, one very much greater than, for example, the range of less than 40 Hz found for English stop productions by Hombert (1975). We set out to test the hypothesis that the separate perceptual effect of Fo is small and dependent upon voice CHAPTER 3. RELATIVE POWER OF CUES: Fo SHIFT VS. VOICE TIMING 27 timing, while the dependence of the voice timing effect on Fo is virtually nil. We used native speakers of English as test subjects. 2, PROCEDURE Making use of the Haskins Laboratories formant synthesizer, we pre- Pared a pattern appropriate to an initial labial stop followed by a vowel [a]. Variants of this pattern were then synthesized with VOT values of 5, 20, 35, and 50 msec after the simulated stop release. These values were chosen because of earlier work (Figure 3.1) that determined English voicing judgments for a VOT continuum ranging from 150 msec before release to 150 msec after release. This range of VOT values was sampled at 10 msec intervals, except for the span from 10 msec before release to 50 msec after release, which was sampled at 5 msec intervals. Those stimuli for which voice onset followed release, that is, to the right of 0 msec on the abscissa, had noise-excited upper formants during the interval between the burst at VOT = 0 and the onset of voice. In the labial data at the top of the figure, the perceptual crossover point between /b/ and /p/ falls just after 20 msec of voicing lag. Thus, we expected that the extreme values of our more limited range would be heard as unambiguous /b/ and /p/, given an unchanging Fo, while the category boundary, lying somewhere between, might be shifted one way or the other as the Fo was varied. In addition to a set of VOT variants having an Fo fixed at 114 Hz, we imposed onset frequencies of 98, 108, 120, and 130 Hz, values commensurate with ranges reported for natural speech (Hombert 1975; House & Fairbanks 1953; Lea 1973; Lehiste & Peterson 1961). That is, the Fo at voicing onset for each variant began at one of those frequencies and shifted upward or downward to a level of 114 Hz, where it stayed for the rest of the syllable. These Fo shifts were of three durations, 50, 100, and 150 msec. These fitted with our own cursory observations and bracketed the value of 100 msec found by Hombert (1975). We recorded the resulting $2 stimuli—two tokens of each—in three randomizations and played the tapes to 11 native speakers of English for labeling as /b/ or /p/. The subjects, three women and eight men, represented a wide variety of regional dialects, 10 in the United States and one in Britain, 3. RESULTS The overall results are shown in Figure 3.2. The three panels are for the durations of Fo shift. The abscissa of each panel shows. the four PER CENT IDENTIFICATION (128s) 28 AR’ ENGLISH 100) Orne on gece tnd ng indent tof LABIAL Net50 60. 40. 20: 150 1009 ooo emtmengneieiie engine sof APICAL Nae. 60: 40. 1004 omen erent. 60: 40. 20. ‘THUR S. ABRAMSON AND LEIGH LISKER . “180° -120" 90-60-30 ° wo” 60 (9020 VOICE ON. SET TIME IN MSEC Figure 3.1 English voicing judgments for stops varying in VOT. Below each pair of curves is a histogram (from Lisker & Abi ramson 1964) of frequency distributions of VOT in speech. Reproduced from Lisker and Abramson (1970). b arene 150 150 Fo shit raion $0 mage g : 3 é ° 100 % Fy onsets nH c 0 — i 130 - 14 a9 ° VOT in mae Figure 3.2 Effects of Fo shifts on identification of VOT variants as English labial stops. 30 ARTHUR S. ABRAMSON AND LEIGH LISKER VOT values, while the ordinate gives the Percentage identified as /p/ for each VOT. The coded line standing for the variants with a flat Fo of 114 Hz is, of course, a plot of the same data in all three panels, The 50% Perceptual crossover point for the flat Fo falls at about 25 msec of VOT. This is consistent with the results for the more finely graded series of stimuli in Figure 3.1. Indeed, for all conditions in Figure 3.2, it is VOT that is the main causative factor, regardless of Fo, with perceptual cross- overs in the region of the VOT of 20 msec. With hindsight we can say that additional stimuli with VOTs of 15 and 25 msec would have given more precision. At the same time, we do note effects of the fundamental frequency shifts: In each panel there is much spread of data points for 20 msec and virtually none for 35 and 50 msec, In Figure 3.3 we focus on the results for the stimuli with a VOT of 20 msec, the one that shows the major effect of Fo shifts. For each of the four Fo onsets we see the percentage of /p/ responses. The coded lines stand for the three durations of Fo shift. A rather general upward trend in /p/ responses is evident as Fo onset rises. A two-way analysis of variance yielded a significant main effect for Fo onset (F[3,30] = 36.45, p < 0.001) and a strong interaction between shift duration and Fo onset for each duration (F[6,60] = 6.00, p < 0.01). VOT=+20 msec 100 Fo-shitt durations in msec — 50 --~ 100 —— 150 78 50 25 Percent /p/ ide 98 108 120 130 Fo in Hz Figure 3.3 Effects of Fo shifts on VOT of 20 msec. CHAPTER 3. RELATIVE POWER OF CUES: Fo SHIFT VS. VOICE TIMING 31 Figure 3.4 focuses on the Fo onset of 130 Hz, the one that had the highest number of /p/ identifications. The /p/ responses for this Fo onset at all four VOT values are shown. Coded lines stand for the three shift durations; the flat Fo plot, marked “‘no shift,”” is repeated from Figure 3.2. It is once again obvious that the major effect is at the VOT of 20 msec, with the deviation from ‘‘no shift’ increasing with greater shift duration. The spread of points at the VOT of 5 msec in Figure 3.4, although much smaller than that at 20 msec, made us look for significant effects in individual cells of the confusion matrix underlying all our plots. That is, wherever we found apparent effects of fundamental frequency at VOT values other than 20, the locus of the main effect, we did a one-tailed t-test for significant deviations from 100%. All such suspicious clusters of responses were at VOT values of 5 msec and 35 msec; for the former, we expected 100% /b/ identifications and for the latter, 100% /p/ iden- tifications. We found three such significant deviations, all of them at the VOT of 5 msec: (1) 120 Hz onset and 50 msec duration (¢[10] = 2.70, p < 0.01), (2) 130 Hz onset and 100 msec duration (s(10] = —2.51, Pp < 0.025), (3) 130 Hz onset and 150 msec duration (t{10] = 2.799, Fy onset 130 Hz 100 [re 15 Q y ; g J = 50 ? 5 fy d 3 Y — Focabitt durations in msec ‘00 [ ws 150 oo No shit 2 ° $ 20 36 EF VOT in meee Figure 3.4. Effects of VOT and shift durations on onset of 130 Hz. 32 ARTHUR S. ABRAMSON AND LEIGH LISKER P < 0.01). No such significant deviations were found at the VOT values of 35 msec and 50 msec. 4. CONCLUSION We conclude that there is a modest effect of fundamental frequency shifts on judgments of consonant voicing even within more natural ranges determining the plausibility of arguments on the rise af distinctive tones (Abramson 1975; Abramson & Erickson 1978), category or the other by Fo shifts in a forced-choice test, Finally, there are values of VOT that are firmly categorical; they cannot be affected by Fo. There are, however, no values of fundamental frequency that cannot be affected by voice onset time. NOTES 1. The normal ranges of Fo variation linked to consonant voicing, not only in citation forms but especially in running speech (Lea 1973; O'Shaughnessy 1979), have still not been well described. We have begun a study of this matter wit different sentence intonations 88 variable (Abramson and Lisker 1984) and hope to present a full Teport soon, REFERENCES Abramson, A. S., & Lisker, L. (1984). Stop voicing, intonation, and the Fo contour, Journal of the Acoustical Society of America, 75, S40 (Abstract). Abramson, A. S. (1975). Pitch in the perception of voicing states in Thai: Diachronic implications. Haskins Laboratories Status Report on ‘Speech Research, SR-41, 165— 174, Abramson, A. S., & Erickson, D. M. (1978). Diachronic tone splits and voicing shifts in Thai: Some perceptual data. Haskins Laboratories Status Report on Speech Research, SR-53Q), 85-96, CHAPTER 3. RELATIVE POWER OF CUES: Fo SHIFT VS. VOICE TIMING 33 Abramson, A. S.. & Lisker, L. (1965). Voice onset time in stop consonants: Acoustic analysis and synthesis. Proceedings of the Sth International Congress of Acoustics, Liege. Fujimura, O. (1971). Remarks on stop consonants: Synthesis experiments and acoustic cues. In L. L. Hammerich, R. Jakobson. & E. Zwirner (Eds.), Form and substanc Phonetic and linguistic papers presented to Eli Fischer-Jorgensen. Copenhagen: Aka- demisk Forlag. Haggard, M. P., Ambler, S., & Callow, M. (1970). Pitch as a voicing cue. Journal of the Acoustical Society of America, 47, 613-617. Haggard, M., Summerfield, Q., & Roberts, M. (1981). Psychoacoustical and cultural de- terminants of phoneme boundaries: Evidence from trading Fo cues in the voiced— voiceless distinction. Journal of Phonetics, 9, 49-62. Hombert, J. M. (1975). Towards a theory of tonogenesis: An empirical, physiologically and perceptwally-based account of the development of tonal contrasts in language. Unpublished doctoral dissertation, University of California, Berkeley. Hombert, J. M., Ohala, J., & Ewan, W. (1979). Phonetic explanation for the development of tones. Language, 55, 37-58. House, A. S., & Fairbanks, G. (1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America, 25, 105-113. Lea, W. (1973). Segmental and suprasegmental influences on fundamental frequency contours. In L. Hyman (Ed.), Consonant types and tone. Southern California Papers in Linguistics (Los Angeles), /. Lehiste, 1., & Peterson, G. E. (1961). Some basic considerations in the analysis of intonation. Journal of the Acoustical Society of America, 33, 419-423, Lisker, L., & Abramson, A. (1964). A cross-language study of voicing in initial stop: Acoustical measurements. Word. 20, 384~422. Lisker, L., & Abramson, A. S. (1970). The voicing dimension: Some experiments in comparative phonetics. Proceedings of the 6th International Congress of Phonetic Sciences. Prague: Academia. Lisker. L., & Abramson, A. S. (1971). Distinctive features and laryngeal control. Language, 47, 167-785. Maspero, H. (1911). Contribution a l'étude du systéme phonétique des langues thai, de l'Ecole Francaise d’Extréme-Orient, 19, 152-168. ‘O'Shaughnessy, D. (1979). Linguistic features in fundamental frequency patterns. Journal of Phonetics, 7, 119-145. zulletin

You might also like