0% found this document useful (0 votes)
50 views170 pages

An Acoustic Study of The Singer's Formant The Comparison Between Western Classical and Traditional Chinese Opera Singing Teachniques

Uploaded by

vusalahasanova54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views170 pages

An Acoustic Study of The Singer's Formant The Comparison Between Western Classical and Traditional Chinese Opera Singing Teachniques

Uploaded by

vusalahasanova54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 170

AN ACOUSTIC STUDY OF THE SINGER’S FORMANT: THE COMPARISON BETWEEN

WESTERN CLASSICAL AND TRADITIONAL CHINESE OPERA SINGING


TEACHNIQUES

Wen-Hui Su

Submitted to the faculty of the University Graduate School in partial fulfillment of the
requirements for the degree Doctor of Philosophy in the Department of Speech and Hearing
Sciences, Indiana University.

April 2009
UMI Number: 3354922

INFORMATION TO USERS

The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.

______________________________________________________________

UMI Microform 3354922


Copyright 2009 by ProQuest LLC
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
_______________________________________________________________

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106-1346
Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.

_____________________________
Karen Forrest, Ph.D.
(Chairperson)

_____________________________
Moya L. Andrews, Ed.D.
Doctoral Committee

_____________________________
Diane Kewley-Port, Ph.D.

_____________________________
Gary Kidd, Ph.D.

_____________________________
Paul Kiesgen, M.M.

September 30, 2003

ii
Acknowledgements

I would like to thank all people who have helped and inspired me during my doctoral

study. My dissertation would not have been successful without the support and assistance of

these people.

I especially want to thank my dissertation advisor, Dr. Karen Forrest, for her guidance

during my research and study at Indiana University. With her enthusiasm, her inspiration, and

her great efforts to explain things clearly and simply, research life became smooth and rewarding

for me. In addition, she was always accessible and willing to help her students with their

research. She provided encouragement, advice, good teaching, good company, and lots of good

ideas to students. Dr. Forrest not only is a great advisor, she is also a great friend who deeply

cares about my well being.

I also would like to thank my doctoral mentor, Dr. Moya Andrews, who brought me into

this field and completely changed my life. She supported and helped me in every possible aspect

through my doctoral study. She provided advices both academically and personally and loves me

as her own daughter. Without her strength, I would not have completed my degree. I am truly

grateful for all the love, help and support that she has given me.

A special thanks to my dissertation committee members, Dr. Moya Andrews, Dr. Diane

Kewley-Port and Dr. Gary Kidd, who provided me great information, knowledge and advice for

my research. David Montgomery spent countless hours providing the technical support that is

critical for my data collections.

I am grateful to have many friends who supported me one hundred percent through this

degree. Dr. Hiroya Yamaguchi and Mrs. HirokoYamaguchi became close friends with me. They

iii
treated and care for me like their own family member. Chang Liu is an unbelievable friend and

colleague who helped and supported me through uncountable obstacles during my doctoral study

and continue to be my close friend through life. Stan Stockton and Michael Johnson always

showed their support and made me feel warm and welcomed. Lin Lee, Philip Feng, Susan Chiu,

Fred Chen, Kwan-Jun Tyan and Monica Chung are lifetime friends who have and will always be

there for me and share my ups and downs of life.

Lastly, and most importantly, I wish to thank my parents, Ching-Shen Su and Kuei-Chen

Lee. They raised me, supported me, taught me, loved me and gave me valuable advice. To them,

I dedicate this dissertation.

iv
Abstract

Wen-Hui Su

An Acoustic Study of The Singer’s formant: The comparison Between Western Classical and

Traditional Chinese Opera Singing Techniques

The singer’s formant (Fs) is a prominent spectrum envelope peak near 3000 Hz that
appears in voices sung by trained Western classical singers. It is a raising cluster of formant 3,
4, and 5 and is especially important since this energy allows singers voices to be heard over the
loud orchestra in the big concert or opera (Fant, 1970, Sundberg, 1970).
Over the years, numerous researches have investigated the Fs by using many different
methodologies. Taken together, an overview of the studies gives a good picture of the influences
that affect the production of a Fs, that is, the Fs could not be explained merely by the influence
of one factor. This study was to investigate the Fs by comparing two completely different
training techniques: Trained Chinese opera singing techniques vs. Western classically trained
singing techniques. Different methodologies were compared to examine whether they impacted
the Fs. The effects of different factors on the Fs such as fundamental frequency, intensity and
vowel quality were also investigated in this study.
Our findings showed that not only the Western classically trained singing; the traditional
Chinese opera singing also exhibited the Fs. The perceptual judgments and qualitative analysis
of the LTAS seemed to be reliable tools to investigate the presence or absence of the Fs. Other
acoustic measures such as quantitative analyses - the energy differences between the high and
low frequency regions of the LTAS, the L3-L1 of LTAS analysis, and the L3-L1 of short-term
spectrum analysis might not be sufficient tools to determine the Fs. Factors such as singing
technique, fundamental frequency, intensity, and vowel quality either interacted or trade-off to
signal the Fs; they interacted differently to impact each individual singer.

v
Table of Content

Chapter I: Introduction 1

Chapter II: Literature Review 4

Mechanism of Fs production 6

Factors that impact the Fs 16

Procedure used to investigate the Fs 26

Influence of singing technique on the Fs 32

Chapter III: Research questions 43

1.) Perceptual judgments 43

2.) Impact of analysis procedure and singing material on the Fs 44

3.) Influence of independent and combined factors on Fs 45

Chapter IV: Methods 46

Experiment 1: Perceptual judgments 50

Experiment 2: Categorical measurement 54

Experiment 3: Quantitative measurement of LTAS 55

Chapter V: Results and discussions 62

Experiment 1: Perceptual rating 62

Discussion 76

Experiment 2: Categorical analysis 80

Discussion 99

Experiment 3: Quantitative analysis of Fs 103


vi
Discussion 126

Chapter VI: General Discussion and Conclusions 132

Conclusion 144

References 154

vii
Chapter I: Introduction

The singer’s formant (Fs) is defined as a prominent, spectrum- envelope peak near 2800

Hz that appears in the singing of certain voice types of Western classically trained singers. The

Fs is a raised cluster of formants 3, 4, and 5 at an optimal frequency that allows singers’ voices

to be heard over the highest sound level of an orchestra in a big concert hall or opera theater

(Bartholomew, 1934; Sundberg, 1974; Sundberg, 1977). The Fs is the perceptual equivalent of

the “vocal ring” (Bartholomew, 1934; Vennard, 1967). The Fs occurs when an optimal frequency

in the voice is enhanced by the properties of the vocal tract. These properties include lengthening

the singer’s vocal tract by protruding the lips, lowering the larynx and expanding the pharynx

(Sundberg, 1970; 1974; 1977).

Over the years, researchers investigating the Fs have relied on a variety of

methodologies. In early studies, most investigators focused on the definition of the Fs and

measured the Fs based on single vowels produced by classically trained singers, especially males

(Bartholomew, 1934; Fant, 1970; Sundberg, 1970; Sundberg, 1973; Cleveland & Sundberg,

1985; Schutte & Miller, 1985; Sundberg, 1995). After this initial period of investigation,

researchers turned their attention to the magnitude of the Fs and what factors affected the Fs

(Sundberg, 1970; Bloothooft & Plomp, 1984, 1985, 1986; Schutte & Miller, 1985; Seidner,

Sechutte, Wendler & Rauhut, 1985; Cleveland & Sunberg, 1985; Wang, 1985; Rossing,

Sundberg & Ternstrım, 1986; Ternstrım & Sundberg, 1989; Sengupta, 1990; Ross, 1992;

Barrichelo, Heuer, Dean & Sataloff, 2000; Sundberg, 2001; Weiss, Brown & Morris, 2001;

Cleveland, Sundberg, & Stone, 2001). Singing and speaking phrases produced by both untrained

1
and trained singers were used as samples. Throughout this body of work, researchers identified

many factors as possibly affecting the Fs: The most noteworthy among these variables are vocal

training, voice type, fundamental frequency (F0), intensity, and vowel configuration. Several

other studies have focused on clarifying the measurement of the level of the Fs and its center

frequency region and bandwidth (Boothooft & Plomp, 1984; 1985, 1986; Schutte & Miller,

1985; Seidner, Schutte, Wendler & Rauhut, 1985; Sengupta, 1990; Sundberg, 2001) while others

have focused on quantitatively calculating the singing power ratio (Omori, Kacker, Carroll,

Riley, & Blaugrund, 1996; Lundy, Roy, Casiano, Xue, & Evans, 2000).

There are many unanswered questions regarding the Fs including the following. What is

the operational definition of a Fs? To some investigators, the Fs is related to a specific vocal tract

configuration that generates precise resonances (Sundberg, 2001), whereas other researchers

have focused on the listeners’ perception to define the Fs (Wang, 1985;Omori et al. 1996). In the

present document, the Fs relates to the resonant features in the voices if highly trained singers.

Ideally, there would be some equivalence between the perceptual, acoustic and physiological

definitions, but this relationship remains largely unexplored. Therefore, one must ask whether

there are any quantitative criteria to determine the Fs. Over the years, research studies mainly

have focused on Western classically trained singers to evaluate the Fs. Do trained singers of

other musical styles also exhibit the Fs?

Previous researchers used different methodologies to determine the Fs and often got

different results. Therefore, the relation between methodologies and the Fs needs to be

investigated. Finally, because previous studies used very few subjects, one must question how

well those results may generalize to the relevant population. The goals of the current research are

2
to investigate different methodologies and factors that may affect the Fs. The main question to be

addressed in this study is: Does the Fs exist in traditional Chinese opera as well as in Western

classically trained singing? Previously identified factors including vowel quality, fundamental

frequency and intensity of Chinese and Western classically trained singing were measured and

compared to help this question.

Secondary questions that were closely related to the main purpose of this study were also

investigated: (1) How is the Fs perceived by trained listeners? (2) Do analysis procedures and

singing materials impact the measurement of the Fs? (3) What is the impact of the independent

and combined acoustic parameters on the Fs and how do these acoustic parameters differ for

Chinese and Western singers?

3
Chapter II: Literature Review

Identification of the characteristics of a good quality voice for singing is an important

issue that has been investigated by many researchers. Bartholomew (1934) defined a good voice

quality for the male voice as the combination of smooth and even control of pitch, intensity and

timbre. A person who can produce a good voice has the ability to produce a higher intensity than

a person who has a poor voice. Bartholomew speculated that, in order to increase the intensity of

a good quality voice, a wide opening of the throat, more vigorous action of the folds, a greater

space between the tongue and the lower pharynx, and a tensing of the pharyngeal walls are

required. Bartholomew further suggested that a good voice quality for males has a F0 of 500 Hz

or lower. This good quality of singing voice is called the “ring,” and this “ring” is characterized

by the presence of a strong overtone averaging around 2800-2900 Hz for men.

In the past 30 years, Sundberg has conducted many studies based on the Western,

classically-trained singing voice and introduced the term “singer’s formant” (Fs) to characterize

a good quality of the singing voice. As Sundberg notes, the mechanism of singing shares the

same elements as are used in speech production, although singers must manipulate the vocal tract

differently from what is done in speech (Bartholomew, 1934). Therefore, in this chapter, a basic

introduction to the acoustic theory of speech production (Fant, 1960) will be presented.

Characteristics of the singing voice will then be compared to features of the acoustic speech

signal.

According to the acoustic theory of speech production, as proposed by Fant (1960), the

speech sound is generated by the voice source, filtered by the vocal tract, and radiated from the

mouth. The voice source produces a complex tone that is composed of the F0 and its harmonics.
4
According to Fant (1960), the amplitude of these harmonics decreases at 12 dB per octave for

normal speech produced by men. The next stage proposed in the acoustic theory of speech

production is the filtering of the complex tone produced by the vocal source. The vocal tract acts

as a resonator that selectively filters energy from the harmonic source. Whether amplitude of the

harmonic frequencies is attenuated or increased is determined by the momentary shape and size

of the vocal tract. As a result of these shape changes, the spectrum of each speech sound shows a

specific pattern that is defined by spectral peaks and valleys. These peaks are the formants. The

first, second, and third peaks are labeled as F1, F2, and F3 respectively. These formants are most

dominant in the spectrum, although theoretically there are an infinite number of formants (Fant,

1960). In addition, the first three formants provide cues for English vowel categorization and

perception (Peterson and Barney, 1952).

Although the voice-source spectrum shows a similar envelope slope (-12 dB/octave)

across vowels, the formants produced by the vocal tract differentiate vowel quality, both

acoustically and perceptually (Kent & Read, 1992). The formant frequencies are determined by

the length and shape of the vocal tract. The overall length of the vocal tract is measured by the

distance from the glottis to the lip opening and is determined by the morphology of an

individual. The length can also be modified either by raising or lowering the larynx, or by

protruding or retracting the lips. A longer vocal tract yields lower formant frequencies, other

things being equal (Fant, 1960).

After filtering of the source sound by the vocal tract, speech is radiated from the mouth

into air. This final stage proposed by the acoustic theory of speech production (Fant, 1960) is the

radiation characteristic, refers to a filtering effect that the causes the output spectrum to increase

5
by 6 dB/oct. Thus, the radiated speech sound is a product of the voice source energy, the

resonator (vocal tract) and the radiation characteristics from the mouth.

Fant’s initial delineation of the acoustic theory of speech production (1960) indicated

independence of all three components of the model. More recently, theorists and investigators

have suggested a non-linear interaction between the source and filter (Titze, 2008) such that

certain vocal configurations improve the efficiency of the transfer function (defined as the ratio

of oral radiated pressure to glottal flow). In many ways, the Fs provides great support for this

hypothesis as the following review will explain.

Mechanism of Fs production

Speaking and singing involve changes in the shape of the vocal tract in addition to

variations in its length. The shape of the vocal tract varies depending on the position and size of

the constriction along the length of the tube (Sundberg, 1977; 1987). Vocal tract shape is

determined by the movements of the articulators including lip and jaw openings, tongue position,

, and velar and laryngeal height. Articulatory movements are very complicated and a movement

in any of the articulators generally affects the frequencies of the formants (Sundberg, 1987). The

precise relationship between these articulatory movements and the acoustic characteristics of the

resultant signal, first described by Fant (1960) has been an area of great interest in speech and

singing.

Western, classically-trained male singers manipulate their articulators differently during

singing than during speaking (Bartholomew, 1934). Singers are taught to “cover” their voices

during which they enlarge the cross-sectional area of the pharynx, almost as if they were

yawning and singing at the same time. Some voice teachers describe this vocal configuration as
6
the sensation of holding an egg inside the mouth while singing. This results in a darker voice

quality than is produced in speech (Vennard, 1967; Hines, 1990). Sundberg (1970) was

interested in determining how these vocal tract alterations affect formant frequencies during

singing.

Sundberg (1970) studied articulatory movements by investigating x-ray pictures of the

vocal tract and related these movements to the formant frequencies that were produced. Pictures

of the entire vocal tract including the lips and glottis, as well as the frontal part of the cervical

vertebrae were taken during sustained sung and spoken vowels. These x-ray pictures were

examined and related to the vowels’ intensity and formant frequencies. In general, singing was

characterized by greater intensity than speech and this difference could be related to jaw opening

and larynx position. Jaw opening was shown to have its greatest effect on F1.

X-ray pictures of the larynx that were taken during the sung /a/ showed a lowered larynx, and

prominent expansion of the laryngeal ventricle and of the piriform sinuses in singing but not

during conversational speech. Sundberg suggested that the lowered larynx resulted in a lowered

frequency of F2 of sung front vowels. The frequency of F3 of back vowels was increased during

singing because of a decrease of the size of the cavity behind the incisors and an increase in the

size of the posterior part of the oral cavity. The frequency of F4 was found to decrease in sung

back vowels because of the lower larynx position during singing than during speech. These

changes resulted in a reduced frequency distance between F3 and F4. Finally, Sundberg

suggested that the lowered F3 and F4 frequencies found in the sung front vowels is also an effect

of the lowered larynx as well as increased lip protrusion. Sundberg concluded that the

manipulations of jaw opening and larynx lowering clustered to yield the Fs.

7
In a second study, Sundberg (1973) noted that the Fs is characterized by a faster growth

in intensity than is found in the overall spectrum intensity This was an early investigation into

the non-linear relationship between the voice and vocal tract wherein Sundberg hypothesized

that Fs could be attributed to the voice spectrum, the vocal tract transfer function, or both. An

investigation of professional singers was conducted to examine the impact of the source and

filter on the level of the Fs.

Sustained sung and spoken vowels /a/, /i/, and /u/ were produced by trained singers at

four, equally-spaced pitches and at four different intensities: piano, mezzo piano, mezzo forte

and forte. Sundberg (1973) investigated the relation between the overall sound pressure level

(SPL) and the SPL of the Fs as a function of vocal intensity and pitch. Results from the spectrum

analysis of the sung vowels showed that when vocal effort increased, the amplitude of the higher

partials increased as compared with the intensity of the lower partials. Moreover, the level of the

higher partials usually increased more quickly than that of the lower partials when the vocal

effort increased.

Sundberg (1973) hypothesized that spectral balance (the amplitude difference between

the Fs region, denoted as L3, and the first formant frequency region, denoted as L1) is

determined by two factors: formant frequencies and the source spectrum characteristics. In an

effort to determine the contribution of the voice source, he compared the spectra of sung vowels

with those generated by a synthesizer. Differences between sung and synthesized vowels were

calculated to determine if the singers’ source spectra followed a 12 dB/octave decline that

typically is found in male speakers (Fant, 1960). Sundberg (1973) investigated the effect of vocal

intensity on the source spectrum for both the sung and spoken vowels by two singers. One singer

8
had a “dark voice” and the other singer produced a “light voice.” Sundberg suggested that the

variation of vocal effort did not change the source spectrum slope by a constant amount. The

lower partials (i.e., those below 1000 Hz) tended to increase more slowly in amplitude than the

higher partials (those above 1000 Hz) while the vocal intensity increased during singing.

Sundberg also indicated that the level of the Fs increased more quickly than the level of F1 when

vocal intensity increased. Similar effects were found for changes in F0; that is, increased F0

yielded less change in F1 intensity than was noted for the level of Fs.

Sundberg (1973) found that the average source spectra of all sung and spoken vowels

were quite similar for singers. That is, the amplitude of the lower partials relative to the

amplitude of the higher partials was weaker in the loud spoken vowels than in the normal spoken

vowels. Because vocal effort in singing did not differ from that of speech as reflected in the

source spectrum in this study, Sundberg concluded that the singers used a similar type of source

in singing as in their regular speech, and this type of source may be generated by a special

articulation rather than having the vocal folds vibrate differently. Sundberg also compared his

data with other studies in which untrained singers were investigated (e.g., Lindqvist, 1970; Fant,

1970) and found that for untrained singers’ speech, vocal efficiency was limited to a small range

of intensities and pitches. This contrasts with the voice spectra of trained singers wherein

increases in vocal intensity and pitch raise the level of the higher formants, i.e., Fs. Sundberg

concluded that the voice source is different between trained and untrained voices.

In this study (Sundberg, 1973), a physical model of the vocal tract was also simulated to

compare the acoustic signal obtained from the model with the acoustic signal generated by the

synthesizer. Results from the comparison showed the transfer functions were similar for both the

9
model and synthesized signals. Furthermore, in Sundberg’s (1970) previous study, examination

of the frontal x-ray pictures of the larynx during the spoken and sung vowel /a/ produced by

trained singers showed a lowered larynx, expansion of the sinus Morgagni (laryngeal ventricle)

and the piriform sinuses for the sung vowel. Sundberg (1973) hypothesized that the expanded

sinus Morgagni and the piriform sinuses somehow impact the Fs; therefore, he simulated the

vocal tract to include the expanded sinus Morgagni and the piriform sinusus. Comparison of the

transfer functions obtained from the vocal tract model indicated that the transfer functions

comprising a singer’s formant were equivalent to those obtained from the synthesizer. This result

led Sundberg (1974) to hypothesize that the Fs is an extra formant around the frequencies F3, F4,

and F5 that is produced when the larynx is lowered and the pharynx above the opening part of

the larynx is expanded. This effect would substantiate the non-linear interaction of the voice and

vocal tract.

Sundberg (1974) investigated his hypothesis with a simulated male vocal tract. The shape

of the vocal tract was modeled on the tomograms generated by Fant (1960) from which the

dimensions of the lower position of the larynx tube were estimated. Sundberg defined the larynx

tube as a small tube above the vocal folds that is vertically inserted into the pharynx tube. The

larynx tube was modeled as a twin resonator with a wider tube at the inferior end (sinus

Morgagni) and a narrower, longer tube above the sinuses. The dimensions of the larynx tube

were 6 cm in length with a cross-section area of 1 cm. This tube was inserted into the pharyngeal

tube which was formed by a cylindrical brass tube that was closed off at one end. A high level,

modulated DC voltage was used as a sound source for this simulated vocal tract.

10
The first of Sundberg’s (1974) experiments was designed to investigate the effects of the

lower position of the larynx on the pharynx tube. Results showed that when the larynx is

lowered, the pharynx above the opening part of the larynx is widened. Furthermore, Sundberg

(1974) confirmed previous studies that indicated that when the pharynx is widened during larynx

lowering for singing, the cross-sectional area of the pharynx tube is six times larger than the

opening area of the larynx tube. Thus, the larynx tube acts as a separate resonator from the

pharynx and, therefore, generates the Fs around 3000 Hz.

In addition, Sundberg (1974) noted that when the F0 increased, the area of the larynx

tube opening was normally expanded. This expansion of the larynx tube opening might affect the

ratio between the larynx tube opening and the cross -sectional area of the pharyngeal tube that

generated the Fs. In the next experiment, Sundberg simulated (1974) the size of the larynx tube

opening at different F0 and the output from the vocal tract model was measured. The findings

showed that when F0 increased, the larynx tube opening increased resulting in a raised resonance

frequency of the larynx tube. As indicated before, in order to generate the Fs, the cross-sectional

area of the pharyngeal tube had to be six times larger than the opening area of the larynx tube so

that the larynx tube could be acoustically independent of the pharyngeal tube. When F0

increased, this did not occur and the larynx tube could not become a separate resonator.

Sundberg (1974) hypothesized that in order to generate the Fs at the higher F0, the larynx

tube should be maintained as a separate resonator. Sundberg further suggested that the sinus

Morgagni might have a great impact on counteracting the change in the larynx opening relative

to the pharynx at higher F0. This led to the next experiment in which Sundberg simulated the

sinus Morgagni and then measured the output from the vocal tract model. The small tube that

11
acted as a larynx was inserted into a larger “pharyngeal” tube and the volume of the larynx tube

was varied when the simulated sinus Morgagni was expanded. Again, comparisons were made

between the model’s output and the formant frequencies that were derived from Fant’s equations

of vocal tract resonance (1960). The agreement between the calculated and measured formants

indicated that changes in the size of the larynx tube opening could be overcome by expansion of

the sinus Morgagni, and that sinus Morgagni expansion could be effected by laryngeal lowering.

Sundberg concluded that during singing, the sinus Morgagni is expanded to compensate for the

increased larynx tube opening caused by the increased F0. This expansion allows the larynx tube

to act as a separate resonator with a resonance frequency at 3 kHz. Moreover, the expansion of

the sinus Morgagni is usually achieved by laryngeal lowering.

In addition to the contribution of the sinus Morgagni to the generation of Fs, Sundberg

(1974) reported that the tomograms showed that both the cross sectional area and the length of

the piriform sinuses were increased when the larynx is lowered. In the final experiment, the

piriform sinuses were simulated based on the findings from the tomograms (Fant, 1960). The

piriform sinuses were simulated by one or two cylindrical tubes that could vary in length and

diameter. These tubes were inserted parallel to the larynx tube, into the closed end of a large

“pharyngeal” tube. The results from the model’s output agreed with the formant frequencies that

were derived from the equations of vocal tract resonance (Fant, 1960). Sundberg concluded that

the lowered larynx causes an expansion of the piriform sinuses and an expansion of the pharynx

tube. Sundberg further suggested that the piriform sinuses could also be interpreted as an

increased pharynx length. This increase in pharyngeal length caused the frequency of F5 to drop

considerably but the resonance frequency of the larynx tube remained at around 3 kHz.

12
Although Sundberg (1974) suggested that the Fs only occurs when the larynx is lowered,

several investigations of other singing styles counter this claim (Wang, 1985; Sengupta, 1990).

For example, Wang (1985) studied Chinese opera singers and found the Fs with an elevated

larynx, however, this result was not replicated by Sundberg (2003) in his investigation of one

Chinese opera singer. Sengupta’s (1990) studies of Northern Indian classical singing also

disagreed with Sundberg’s suggestions about larynx position and the Fs because Indian singers

were found to generate the Fs without lowering the larynx.

In an effort to resolve the conflict between Sundberg’s findings and those of other

researchers about larynx height and the Fs, Detweiler (1994) investigated the laryngeal system

by magnetic resonance imaging (MRI), stroboscopic videolaryngoscopy, and acoustic analysis.

Three, trained male singers (one tenor and two baritones) were investigated during the

production of modal and pulse (involving vocal fry) phonations. The main focus of Detweiler’s

(1994) study was to determine if the Fs really was generated only when the cross-sectional area

of the laryngeal outlet was six times smaller than the pharynx tube. Another focus of Detweiler’s

study was to investigate the effect of the laryngeal ventricle (sinus Morgagni) on Fs. Endoscopic

videolaryngoscopy was used to examine the cross-sectional areas of the laryngeal outlet and

laryngopharynx during phonation and images of the larynx were captured with MRI. In addition,

acoustic analysis was used to determine the presence of the Fs.

The results from both MRI and the laryngoscopic examinations showed that the cross-

sectional area ratio between the outlet of the larynx and the pharynx ranged from 2.9:1 to 3.7:1,

thereby contradicting Sundberg’s (1974) model. Supporting evidence from both MRI and

laryngoscopic examinations showed a clear laryngeal ventricular space during modal phonation,

13
but not for the pulse phonation; nevertheless, the acoustic study showed that both singing

conditions demonstrated the Fs. Detweiler (1994), therefore, concluded that the sinus Morgagni

was not the clear cause of the Fs. Moreover, results of the vertical laryngeal position obtained

from the MRI showed that the sinus Morgagni behaved differently than what Sundberg

suggested. Detweiler (1994) concluded that Sundberg’s model was inadequate to account for the

Fs for the three subjects that she studied.

Detweiler (1994) supported her hypothesis strongly by using the three different analyses

but there were some questions that still needed to be addressed. Although the results of

Detweiler’s acoustic analysis showed the Fs when the singers sang in both the supine and upright

positions (for the laryngoscopic evaluation during MRI), it is doubtful that the singers could

really sing with their “best voice” while in a supine position. It is also necessary to question

whether the supine position affected the larynx position during singing, which may account for

the differences in results between Detweiler and Sundberg (1974). Detweiler did not specify how

the Fs was identified in the acoustic analysis, therefore, results from this study are hard to

interpret. Finally, it was suggested that the results from the MRI and the laryngoscopic

examinations yielded consistent information. However, the MRI was taken while the singers

were phonating a different vowel (/a/) from the vowel /i/ which was used during the

laryngoscopic examination. The comparison of two different physical examination procedures

(i.e, MRI and stroboscopy) with different body positions and different vowels, leads one to

question how the study’s results might have been affected by these variations. Also, direct

stroboscopy may have made it difficult for the singers to sing with their best voices, so it is

difficult to apply Detweiler’s results to normal singing condition.

14
Titze and Story (1997) also conducted a study to evaluate Sundberg’s (1974) model of

the physiological and acoustic changes associated with the Fs. They used a computer model to

investigate how the vocal tract can be adjusted to produce the best conditions for vocal fold

oscillation. The model was based on magnetic resonance images (MRI) measured from a 30

year-old-male. The input impedance (defined as the ratio of supraglottal pressure to glottal flow)

and transfer function of the vocal tract were computed when the vocal tract shape varied.

Titze and Story (1997) showed that the epilarynx tube (i.e., the narrowed portion of the

laryngeal ventricle above the glottis that is equivalent to Sundberg’s definition of the larynx

tube) influenced the resonant frequencies of the output signal. With a narrowed epilarynx relative

to a uniform vocal tract, the frequencies of F1, F2, and F3 were pulled upwards and the

frequencies of F4 and F5 were pulled downward toward the region of 2500-3500 Hz. When the

resonant frequencies associated with an independent epilarynx were calculated, Titze and Story’s

findings showed that the first 5 formant frequencies were affected and moved toward the

frequency region of 2756 Hz. In a second simulation, Titze and Story calculated the acoustic

consequences of pharyngeal expansion. Their findings confirmed Sundberg’s (1974) results that

the epilarynx tube influences the generation of the Fs. They concluded that when the pharynx is

widened and the ratio of the cross sectional areas of the pharynx to the epilarynx is 6:1, the

narrowed epilarynx tube becomes a separate resonator that causes a cluster of F3, F4 and F5, the

Fs. These findings led Titze to suggest modifications to Fant’s (1960) acoustic theory of speech

production to include nonlinearities.

From the studies discussed above, it can be concluded that the Fs can be defined as a

prominent, spectrum-envelope peak around 3 kHz that is composed of a raised cluster of F3, F4

15
and F5. When a Western classically trained male singer lowers his larynx and expands his

pharynx during singing, the cross sectional area of the pharynx tube is six times larger than the

epilarynx tube; therefore, the epilarynx tube becomes a separate resonator and generates the Fs.

In addition, laryngeal depression increases the width of the sinus Morgagni and causes the

piriform sinuses to be expanded which maintains the resonance frequency of the laryngeal tube

at around 3 kHz. Because many acoustic variations are used during singing, one must examine

how these factors influence the Fs.

Factors that impact the Fs

Sundberg’s early studies established the acoustic and physiological bases for the

generation of the Fs. Subsequent studies, reviewed in this section, focused on how different

factors such as vocal training, voice classification, pitch, loudness, and vowel configurations

impact Fs. F0 are common during vocal performance, so Schutte and Miller (1985) investigated

the effect of F0 on the center frequency of the Fs (Schutte & Miller, 1985). They asked one tenor

to sing the vowel /ɔ/ in chromatic steps over his total vocal range, starting with F0 below the

normal tenor singing range and continuing to a F0 above this range. The center frequency of the

Fs from each chromatic note was analyzed by a short-term spectrum, and the results showed that

the Fs appears in the region of 2,200 Hz for the lowest F0, whereas the highest F0 yielded the

highest Fs at 3,100 Hz. Schutte and Miller concluded that within the frequency range

investigated, the frequency of the Fs increased as the F0 increased; however, the spectral balance

(i.e., L3-L1) remained constant throughout the whole F0 range. Within this tenor’s most

commonly used F0 range (131Hz –524Hz), the spectral balance is about –7 dB. These findings

suggest that Fs frequency varies with F0, yet its intensity remains constant across the tenor’s
16
singing range. Therefore, Schutte and Miller defined the Fs by a 7 dB difference between L3 and

L1.

Schutte and Miller’s (1985) study investigated only one singer but it was not specified

how this particular singer was selected or determined to have the Fs before acoustic measures

were made. There were no details that indicated who judged this singer to have the Fs, or if there

was any other acoustic analysis to indicate that the Fs was exhibited in this singer. Therefore,

their findings may not provide a general definition of the Fs.

Seidner, Schutte, Wendler and Rauhut (1985) further studied the effect on the Fs of F0

and they also investigated the effects vowel quality and voice types on the Fs. Five trained

singers (3 males and 2 females) with different voice types (tenor, bass, baritone, soprano, and

alto) were included in their investigation. Singers were asked to produce three different vowels,

/a/, /i/, and /u/ with a loud voice. Each vowel was sung at four notes, C, E, G, and A, over a range

of three octaves. Measures included the level and center frequency of Fs as a function of vowel

quality, voice type and F0. Seidner et al. found that the frequency of the Fs varied depending on

vowel quality and F0. The results showed that the Fs shifted to higher frequencies when F0 was

high and Fs was lower for low F0; the results also showed that there was no relation between the

frequency of the Fs and the voice type.

Seidner et al. (1985) also found that the spectral balance of the Fs, as measured by the

intensity of the Fs relative to that of F1, was affected by the voice types and was higher in the

male singers than in the female singers. For male singers, lower voices (bass and baritone)

showed similar spectral balance of -10 dB with vowel quality while the relative spectral balance

varied with both F0 and vowels for the tenor. The results for the tenor showed that the spectral

17
balance increased when the F0 increased for all vowels (/a/, /i/, /u/); within the range of A4 (440

Hz) to C5 (524 Hz), the greatest relative intensity of the Fs (+20 dB) was generated for the vowel

/a/ and the relative intensity of the Fs decreased beyond this frequency range. For female

singers, the relative intensity of the Fs for the alto was lower than that seen in the male singers,

and the soprano showed the lowest level of the Fs among all singing types. Moreover, the

relative intensity level also varied with both F0 and vowels for female singers. This

investigation, then, defined 3 primary factors that affected the Fs- voice type, F0, and vowel

quality.

Cleveland and Sundberg (1985) also studied the effects of F0 and intensity on the Fs in

different voice classifications as well as the influence of subglottic pressure on these parameters.

Three trained male singers (bass, baritone and tenor) used three loudness levels (forte, mezzo

forte and piano) and three pitch levels (high, medium and low) as they sang the chromatic scale

on the vowel /a/. Fundamental frequency for these singers ranged from E3 (165 Hz) to E4 (330

Hz) during this singing task. The vowel /a/ was preceded by a consonant /p/ so that each singer’s

subglottic pressure could be measured from the oral pressure during the /p/-occlusion.

Cleveland and Sundberg (1985) first investigated the subglottic pressure when singers

produced three different pitch and loudness levels. Although no information was provided to

quantify the singers’ loudness levels and F0, Cleveland and Sundberg showed that changes in

subglottic pressure had a main effect on vocal loudness. When the vocal effort was high (at

forte), the subglottic pressure was high and when the vocal effort was low (at piano), the

subglottic pressure was low. Their findings also showed a relation between F0 and subglottic

18
pressure; when the F0 increased, the subglottic pressure increased at all different levels of vocal

effort.

Cleveland and Sundberg (1985) further suggested that even though the subglottic

pressure was the main effect in controlling vocal SPL, the loudness (i.e. the perception of vocal

level) was affected by other factors such as the relative distance of the partials, the formant

amplitude and frequency. They found that the bass singer used the lowest subglottic pressure yet

produced the highest sound pressure level (SPL) (measured 50 cm from the mouth) whereas the

tenor used the highest subglottic pressure, however, produced the lowest SPL. Cleveland and

Sundberg suggested that singers use different subglottic pressure and/or articulatory movement

in order to achieve certain loudness levels. Further, the same pitch range that was produced by

different singers of different voice types also could cause different SPLs from these singers.

They hypothesized that instead of requiring singers of different voice types to produce the same

pitch, singers might generate more similar SPLs if they sang in their own comfortable pitch

ranges. That is, singers with different vocal ranges need to adjust their phonations differently to

accomplish different fundamental frequency ranges, therefore, subglottal pressure and SPL might

be affected.

Also in this investigation, Cleveland and Sundberg (1985) investigated the relations

between the level of the Fs and the overall SPL values. They suggested that the level of the Fs

increased with both F0 and vocal effort (loudness level- high, medium and low). Their results

showed that the baritone generated the lowest level of the Fs and the tenor generated the highest

level of the Fs. It was also found that the SPL and the amplitude of the Fs are highly correlated in

all subjects; however, the specific influence of increased F0 on the Fs was not described.

19
Cleveland and Sundberg further found that the level of the Fs increased more rapidly than the

amplitude of the partials in the lower frequency region; therefore, they concluded that the Fs was

impacted more by the vocal tract shape than by the pitch and loudness.

Overall the studies reviewed suggest that the Fs is influenced by voice training as well as

voice type (classification); bass (range 82- 262 Hz), baritone (range 98-330 Hz), tenor (range

123-392 Hz) and sometimes alto (range 220- 698 Hz) voices evidence the Fs. Sundberg (1977)

hypothesized that it is difficult for female singers to produce the Fs because it is nearly

impossible to lower the larynx when singing in a high frequency range. Very few studies directly

tested whether the Fs is produced by the female singers, especially sopranos. One study that did

include both males and females (Seidener et al., 1985) suggested that the level of the Fs was

lower for the female singers than the male singers. Seidner et al. did not specify how the Fs was

defined nor whether female singers exhibited the Fs at all.

In another study designed to investigate the effect of voice classification on the Fs,

Sundberg (2001) compared the differences between 5 different voice types including male and

female singers. This study used commercial recordings from 20 classically trained singers that

included equal numbers of singers demonstrating soprano, alto, tenor, baritone, and bass voices.

Each sample was approximately 30 seconds long and was analyzed with long-term-average-

spectra. Both center frequency and level of the Fs were measured for each sample. Findings

showed that both the frequency and the level of the high frequency peak (Fs) varied within and

between voice classifications. The alto singers showed the highest center frequency of the Fs (3

kHz, approximately). Within male singers, tenors showed the highest center frequency (2.84

kHz) whereas the bass singers showed the lowest frequency (2.42 kHz) for the Fs. Moreover,

20
results showed that the highest level of the Fs was found in the baritones. In comparison to the

baritones, the basses and tenors produced a Fs that was 3 dB lower while the Fs for altos was 9

dB lower than in the baritones. Most sopranos obtained two peaks rather than one single peak.

Sundberg (2001) related these two peaks to the F3 and F4 and thus suggested that sopranos did

not exhibit the Fs because there was no clustering of these formants. Sundberg explained that

sopranos do not produce the Fs because of their high fundamental frequencies. Higher

fundamental frequencies affect the frequency distance between partials, which reduces the

energy available for vocal tract excitation; therefore, no Fs is found in sopranos.

These results were confirmed in a study by Weiss, Brown and Morris (2001). Weiss et al.

did a spectrographic analysis of 10 sopranos singing 5 vowels /a/, /i/, /u/, /e/ and /o/ at 3 different

pitch levels: low (261 Hz), mid (622 Hz) and high (932 Hz). Their findings showed that the

spectral peaks for sopranos ranged from 2.6 kHz to 4.6 kHz which was beyond the definition of

the Fs (around 3 kHz) suggested by Sundberg (1970; 1995); therefore, Weiss et al. concluded

that the sopranos did not have the Fs.

Results from Weiss et al. (2001) showed that when the soprano sang low-pitch and mid-

pitch vowels, there was a high frequency reinforcement found around 2.5 kHz, the region of the

Fs. However, the bandwidth of this peak was 2-2.5 times broader than that of the Fs found in

men (Schutte & Miller, 1985). Weiss et al. also showed that for the high-pitched vowel, there

was no clear energy peak found in the region of 3 kHz, but a higher, extended, strong energy was

found between 5-8 kHz. Weiss et al. concluded that the spectral energy generated by sopranos is

simply related to the high-frequency harmonics of the fundamental; thus high female voices

simply do not need a Fs.

21
Taken together, the results from the investigations reviewed above suggest that the Fs is

affected by the vocal intensity, F0, voice classifications, and the articulatory movements (Fant,

1970; Sundberg 1973; Sundberg, 1977; Schutte & Miller, 1985; Cleveland & Sundberg, 1985;

Seidner et al., 1985; Sundberg, 2001; Weiss et al. 2001); however, in these studies, researchers

rarely provided specific details on how the Fs was defined. Bloothooft and Plomp(1984, 1985,

1986) recognized this short-coming and conducted a series of studies to determine the specific

criteria for the Fs and the factors that impact the its generation. This series of studies investigated

the relation between the level of the Fs and the overall sound level of sung vowels with different

voice types. Bloothooft and Plomp investigated the interactions between the Fs and F0, intensity,

mode of singing (i.e. light, dark, pressed voice, soft, etc), vowel configuration, and voice

classification (i.e. tenor, bass, baritone, etc).

In this series of studies, Bloothooft and Plomp (1984, 1985, 1986) used 1/3-octave filters,

with center frequencies from 122-4000 Hz, to approximate the filtering of the auditory system.

The spectra of the vowels were measured every 10 ms and normalized for overall sound-pressure

level to eliminate spectral variation due to level differences. Nine Dutch vowels (/a/, /ɑ/, /i/, /u/,

/ ɑ/, /œ/, /y/, /ε/ and /e/) were sung including males and females with seven voice types ranging

from bass to soprano. Different pitch ranges (F0= 98, 131, 220, 392, 659 and 880 Hz across

singers) sung in nine different singing modes such as neutral, light, dark, soft, etc., were

investigated.

In their first experiment, Bloothooft and Plomp (1984) focused on factors that influence

spectral variance during singing. They determined that vowel spectra variances depend on

factors such as vowel quality, voice type, modes of singing (light, dark, neutral, soft, etc.) and

22
fundamental frequency. They then examined the interactions between these factors in male and

female singers. Bloothooft and Plomp showed that spectral variance is associated with

interactions of all the above factors. Among these factors, vowel quality had the greatest effect

on the spectral variance when the F0 was 98 Hz for males and 220 Hz for females. The impact of

vowel quality on spectral variance decreased when the F0 increased for both males and females.

The results showed that the relation between spectral variance and vowel quality was constant

for F0 up to 392 Hz and decreased when the F0 increased beyond 392 Hz. In other words, vowel

distinction was reduced at higher F0.

In the following study, Bloothooft and Plomp (1985) used the same methodologies and

subjects used in the first study and investigated the interactions between F0 and overall SPL.

Measurements were made from spectra that were averaged across singers and across the sung

vowels. Their findings showed that when F0 increased from 98-392 Hz, the average SPL of the

sung vowels increased by 16 dB for males. For females, when F0 increased from 220-880 Hz,

there was an average increase of 22 dB for the sung vowels. When singers of both sexes sang

with F0= 392 Hz, males exhibited 8 dB higher overall SPL than female singers.

Bloothooft and Plomp (1985) indicated that the highest sound levels were found in the

1/3 -octave bands with a mean center frequencies of 2.5 kHz for male singers and 3.16 kHz for

female singers. They, therefore, defined the frequency band between 2.5 kHz and 3.16 kHz as

the frequency band of the Fs. Bloothooft and Plomp then measured the sound level in the

frequency bands of the Fs from the average spectra and comparisons were made between the

overall SPL of the vowels and the sound level of the Fs. Findings showed that for the modal

register of the male singers, overall SPL and the sound level of the Fs increased proportionally

23
when F0 increased. For the falsetto register, defined as a high pitch produced by males with use

of only part of the vocal folds, the level of the Fs frequency band was less than in the modal

register. For female singers, increasing F0 increased the difference between the level of the Fs

and overall SPL of the vowel; that is, the overall SPL increased while the level of the Fs

decreased at higher F0.

The findings also showed the shapes of the average spectra and the level of Fs were

similar for male and female singers when F0=220 Hz. This similarity between males and females

was found for F0 up to 392 Hz, even if the males used a falsetto register. When F0 was greater

than 392 Hz, female singers showed a decrease in the amplitude of the Fs while the overall

spectral SPL increased. These findings led Bloothooft and Plomp (1985) to agree with

Bartholomew’s (1934) suggestion that the Fs is present in female singing, but it is diminished

when F0 increased above a certain frequency.

In the last study of this series, Bloothooft and Plomp (1986) investigated the sound level

of the Fs and how the five different factors-vowel quality, vocal intensity, F0, mode of singing

and voice classification-interacted to impact the level of the Fs. Bloothooft and Plomp then

defined the Fs based on the outcome of this investigation. The methodologies from the previous

two studies were used to determine the variation in the sound level of the Fs compared to the

overall SPL of each singer as a function of vowel, fundamental frequency, classification, and

mode of singing. Again, the Fs was defined as a peak between 2.5 and 3.16 kHz. The sound level

in this frequency band was measured and normalized relative to overall SPL. The results showed

that the level of the Fs was influenced by the following interactions:

24
F0 and gender: At a F0 of 392 Hz or less, the level of the Fs was equivalent for male and female

singers; however, the level of the Fs decreased when F0 increased above this frequency.

Vocal intensity: It was found that sound level of the Fs increased when the vocal intensity

increased.

Vowel quality and F0: The results showed that the magnitude of the Fs depended on the vowel

quality. When F0 was 220 Hz, the level of the Fs was low in the sung vowels /u/ and /ɔ/ for both

females and males but the level of the Fs was high in the sung vowel /i/ for both male and

females.

Overall SPL, vowel quality and vocal register: The results showed that the level of the Fs

increased more rapidly with increased overall SPL for the vowels /ɔ/, /y/, and /u/. This effect was

seen for males singing in the modal register and for all females. However, when the same vowels

were sung by males in the falsetto register, the level of the Fs increased less rapidly than for

singing in the modal register with increased overall SPL. Vowel quality did not influence the Fs

when females used a high F0 range.

Modes of singing and F0: In the male modal register, the level of the Fs was constant relative to

overall SPL over the 3 modes of singing, light, neutral, and loud, as F0 increased. For the female

singers, the level of the Fs remained constant over these three modes of singing when F0 was

220 Hz and 392 Hz.

Bloothooft and Plomp (1986) suggested that only the results from the level of the Fs

remained stable with different modes of singing (neutral and loud) for both male and female

singers whereas the level of the Fs varied with variations in vowel quality, F0, vocal intensity,

and vocal register. Therefore, the minimum sound level of the Fs (-20 dB relative to F1)

25
measured from the two modes was defined as the threshold of the Fs. Bloothooft and Plomp

concluded that when the relative level of the high-frequency spectral peak around 2-4 kHz

exceeds a threshold of about –20 dB relative to F1, this peak is defined as the Fs.

Procedure used to investigate the Fs

As noted earlier, Schutte and Miller (1985) used a short-term spectral analysis in an effort

to define the Fs. One singer, a tenor, produced the vowel /ɔ/ in chromatic steps over his entire

vocal range. The stimuli were passed through a spectral analyzer and level differences between

two regions in the spectrum were calculated. The first region was defined through acoustic

theory about vowel identity and was based on the frequency of the lower formants, up to about

1,800 Hz (L1). Schutte and Miller defined a second region, L3, where the Fs was located as the

frequency region around 2.2 kHz-3.5 kHz. In these two regions, peaks (L1 and L3) were defined

by either the highest partial or the average of the two highest partials in each region. The level of

Fs was calculated as the difference between L3-L1. Schutte and Miller calculated the level

differences between the two regions for each note from the most commonly used F0 range of this

tenor (131 Hz –524 Hz). Their findings showed that the level differences between L3 and L1

remained constant throughout the whole F0 range. Schutte and Miller then identified –7 dB

relative to F1 as the averaged level of the Fs for tenor voices. This finding contradicts the

findings of Bloothooft and Plomp (1985) in which the amplitude of the Fs was stable over the

modal register but when the F0 range was beyond 392 Hz, the amplitude of the Fs decreased.

Schutte and Miller (1985) determined the bandwidth of the Fs as the frequencies within -

15 dB from the peak of the center frequency. They suggested that with F0 = 131-392 Hz, the

bandwidth of the Fs was constant, however, the basis for this claim is not clear. Schutte and
26
Miller provided a clear explanation of how to determine the level of the Fs; however, they only

used one tenor who sang one vowel. As noted earlier, Seidner, Schutte, Wendler and Rauhut

(1985) used Schutte and Miller’s methodology and investigated the Fs in five different voice

types of trained singers (one of each tenor, base, baritone, soprano, and alto). Their findings

showed that the level of the Fs was about –10 dB relative to the overall SPL for both bass and

baritone singers. Seidner et al. also compared their results from the short-term spectrum of the

sustained vowels to the long-term-average spectrum (LTAS) of a singing phrase; however, they

do not provide details for the measurement and results from the LTAS.

Sengupta (1990) investigated the presence of the Fs by adopting the methods from

Schutte and Miller’s (1985) study. Four males and four females who were trained, Northern–

Indian, classical singers participated in this study. Short-term spectra of the single vowels /a/, /i/

and /o/ were measured from productions that spanned the singers’ full vocal ranges (range from

2-2.5 octave). The spectral balance, the center frequency and the bandwidth of the Fs were

analyzed. The average results from these eight singers were comparable to Schutte and Miller’s

(1985) findings. Sengupta showed that the center frequency and the bandwidth of the Fs

increased when the F0 increased. Sengupta further found that the spectral balance across all

singers for the vowel /a/ was rather stable when the F0 range was between 230 Hz – 400 Hz,

with an average Fs level of –4 dB (relative to F1), and decreased when the F0 increased. In this

paper, Sengupta did not specify whether all singers were found to have the Fs; however, their

results were presented as the average of all the singers. Therefore, it is assumed that all these

singers matched the criteria for the Fs.

27
So far, most of the studies reviewed investigated the presence or absence of the Fs but

without uniform agreement on how the Fs is defined. Omori, Kacker, Carroll, Riley, and

Blaugrund (1996) sought to determine the Fs by using different measurements from the short-

term spectrum. Each sustained vowel /a/ was analyzed by Fast Fourier Transform using a

Hamming window. The investigators measured the Fs quantitatively by calculating the “singing

power ratio” (SPR) of the sustained /a/. The SPR was determined by dividing the singing power

peak (SPP), the highest harmonic peak between 2-4 kHz, by the highest peak between 0-2 kHz.

The spectra of the sustained sung and spoken /a/ vowels from 37 singers (21 professional singers

and 16 non-professional singers) and 20 non-singers were measured and compared. The age

range of the 37 subjects was from 19-60 years with the duration of the vocal training ranging

from 1-42 years.

Results from Omori et al. (1996) showed that the SPR of the vowel /a/ sung by the

singers was significantly greater than the SPR produced by male and female non-singers. There

was no significant difference in the SPR between professional and non-professional singers. In a

comparison of the sung vowel /a/ and the spoken vowel /a/, results from statistical analysis

showed significantly greater SPR in the sung vowel than that of the spoken vowel for trained

singer. Omori et al. also found that there were no significant differences in SPR between the

male and female singers for either the sung or spoken vowels. Finally, they investigated the

effectiveness of vocal training in relation to the SPR of the sung vowel. Statistical results showed

that there was a significant difference in SPR related to the duration of training: SPR produced

by singers who had longer durations of training (> 4 years) was significantly greater than the

SPR produced by the singers who had shorter durations of training (< 4 years). Based on these

28
results, Omori et al. (1996) concluded that SPR was a reliable tool to analyze the acoustic

characteristic of the singing voice.

Although Omori et al. mentioned that vocal training of the subjects ranged from 1-42

years, it was not specified whether the years of training related to whether the singers were

professional or non- professional. If the non-professional singers had fewer years of vocal

training than the professional singers, as one might expect, the training effect would contradict

the finding of no significant difference between the professional and non- professional singers.

Lundy, Roy, Casiano, Xue, and Evans (2000) attempted to replicate the findings from Omori et

al. (1996) to evaluate the utility of SPR in investigating the Fs. Lundy et al. recruited 55 singing

students (14 males and 41 females) between the ages of 18 and 37 years. Their results were

opposite to those of the Omori et al.’s study in that Lundy et al. found no significant difference in

SPR between the sung and spoken vowel. Lundy et al. also found no significant difference

between the SPR of the sung vowel related to the duration of training. However, there is one

common finding from both studies in that there was no significant difference between the male

and female singers for both sung and spoken vowels.

Lundy et al. (2000) suggested that their results did not replicate the findings of Omori et

al. (1996) because of differences in the populations. The subjects in Lundy et al.’s study were

students whereas Omori et al. studied professional singers with a broader variation in length of

study. Lundy et al. further questioned whether SPR can represent the acoustic characteristics of

the singing voice quality and suggested that SPR needed to be investigated in future research.

In a more recent study, Sundberg (2001) used two different acoustic analyses to clarify

the definition and determination of the Fs: The short-term spectrum analysis and the LTAS

29
analysis. According to Fant’s (1960) acoustic theory of speech, formant frequencies affect

formant levels in normal speech. When two formants are close in frequency, the levels of the

formants increase. Sundberg (2001) adopted Fant’s equation for predicting formant levels and

predicted the different levels of F3 (denoted as L3). He applied a voice source that decreased by

12 dB/octave and varied the values of F1 and F2 to measure the impact on the Fs. The values

predicted by Fant’s equations were referred toas “expected values” and values measured from

participants of this experiment were indicated as “observed values.” Findings confirmed Fant’s

suggestions that different frequency spacing of F1 and F2 affects the L3. The next step was to

investigate various sung and spoken vowels produced by Western classically trained singers and

untrained singers.

Sundberg (2001) asked three male speakers (1 trained singer and 2 untrained singers) to

read a standard text (not specified by the author) with their normal conversational loudness.

Seven professional singers (4 tenors, 1 baritone and 2 basses) were asked to sing a vowel

sequence (/u/, /o/, /a/, /æ/, /e/, /i/, /ı/) at their intermediate pitch and loudness. Finally, three

professional sopranos were asked to sing a solo part of a choir piece with the pitch range from

D4-G5. The vowels (/u/, /o/, /a/, /æ/, /e/, /i/, /ı/) sung on sustained long notes in this choir piece

were selected for the spectral analysis.

Spectra of these samples were calculated at the middle part of the vowel. L3 was

determined by measuring the strongest partial in the frequency region between 2-4 kHz while

level of F1 (denoted as L1) was measured as the strongest partial near F1. The difference

between L3-L1 was then calculated. The difference between the expected and the observed

30
values was then calculated. Sundberg (2001) suggested that if the observed level was

significantly higher than the predicted level, the vowel could be defined as having a Fs.

Findings from this experiment showed that the average level of the Fs across vowels for

the male singers was 10.8 dB, with the vowels /u/ and /o/ giving the highest levels of the Fs. The

differences between observed and expected values of L3-L1 were close to 0 dB or negative for

speakers, with an average of and –3.1 dB across speakers and vowels. The results for female

singers showed that the values of L3-L1 varied greatly between and within vowels with a mean

value of –4 dB. Sundberg (2001) noted L3-L1 might be difficult to measure in the female singers

because of the high F0 and the resultant great frequency distance between adjacent partials;

therefore, L3-L1 varied greatly depending on how close a partial was to F1 and F3. Sundberg

(2001) suggested that the LTAS gives clear spectrum envelope peaks at certain formant

frequencies during singing because it yields the time average of sound level in adjacent

frequency bands. LTAS is stable for speech and singing samples, and most importantly, it is less

dependent on F0 and intensity than other analysis techniques; therefore, LTAS may be most

appropriate in analyses of voices with high F0. Sundberg used the LTAS to measure the Fs from

commercial recordings of 20 classical trained singers representing 5 different male and female

voice types (soprano, alto, tenor, baritone, and bass voices). Each sample was approximately 30

seconds long. Both the center frequency and level of the Fs were measured for each sample. As

noted earlier, findings showed that both the frequency and the level of the Fs varied within and

between voice classifications.

Each of the studies discussed above made different contributions to the definition of the

Fs. Schutte and Miller (1985), Sengupta (1990) and Sundberg (2001) clarified the definition of

31
the Fs by using varied methodologies of acoustic analysis, such as LTAS, short-term spectrum,

and the difference between L3-L1. Bloothooft and Plomp (1984, 1985, 1986) investigated the

effects of various factors quantitatively by using the 1/3-oct filter spectra. Omori et al. (1996)

and Lundy et al. (2000) determined the Fs quantitatively by calculating the singing power ratio

(SPR). The results of these studies provide an overall picture of the interactions between the

important factors of the Fs: voice classification, fundamental frequency, intensity, and vowel

configuration. What has yet to be fully investigated, however, is vocal technique. The following

questions are paramount: What is the impact of Western classical vocal training on the Fs? Do

other types of singers exhibit the Fs? Are there other methods of vocal training that also yield

Fs?

Influence of singing technique on the Fs

Overall, Sundberg’s studies suggest that Western classical voice training is important to

the development of the Fs. Although studies by Wang (1985) and Sengupta (1990) suggest that

singers trained in other musical styles produce the Fs, Sundberg’s limited investigation (2003)

doesn’t support those findings. When comparing Western classically trained singers to untrained

singers, many studies (Rossing, Sundberg & Ternstrım,1986; Ternstrım & Sundberg,1989;

Ross, 1992; Sundberg, 2001; Cleveland, Sundberg & Stone, 2001) show that only trained singers

exhibit the Fs. Further, these studies also suggest that the Fs only occurs with classical Western

training. However, Rossing, Sundberg and Ternstrım (1986) investigated whether other types of

vocal training (i.e. non-classical) could impact the generation of Fs.

Rossing et al. (1986) compared the timbral difference between solo and choral singing.

They first investigated eight trained male singers (3 professional and 5 amateur singers with
32
various amounts of vocal training) singing in both choral and solo modes. Singers were then

asked to sing one solo musical phrase that was written by the researchers in order to incorporate

most of the same vowels and pitches with the choral and solo passages. Each sample was

analyzed by LTAS. Their findings from one professional singer showed that the energy increased

around 2-4 kHz for both the solo passages (both loud singing and soft singing) and the choir

passages (loud singing) suggesting the presence of the Fs. The results also showed that all

professional singers had more energy around 2-4 kHz than amateur singers. Furthermore, the

choir passages, especially the soft singing, exhibited more energy in the lower, fundamental

frequency region (100 Hz- 315 Hz) which Rossing et al. attribute to glottal source characteristics.

Following Sundberg’s model, Rossing et al. suggested that prominent energy in the Fs region

was mainly affected by different articulatory factors that resulted from training. Therefore, even

trained singers appear to use different singing techniques for solo versus choral singing.

Ternstrım and Sundberg (1989) investigated the presence of the Fs in eight untrained

choir singers (bass singers). Singers were asked to speak a song phrase, four times, with their

normal conversational pitch and loudness. They were also asked to sing the song phrase four

times. Samples were analyzed by LTAS and the level of the Fs was measured from these spectra.

The level of the Fs was determined by measuring the difference between formant peaks at the

lower frequency region (around 500 Hz) and the peaks in the Fs region (around 3000 Hz).

Sundberg’s (1986) definition of the Fs as energy around 3 kHz which averages 7.2 dB greater in

singing than in spoken phrases was used to determine the Fs. Findings from the LTAS showed a

small increase in amplitude of the Fs region in the sung phrase when compared to the spoken

phrase (mean=1.4 dB). Ternstrım and Sundberg then suggested that untrained choir singers did

33
not exhibited the Fs since the increase in high frequency energy for singing did not approximate

7.2 dB. Ternstrım and Sundberg also compared their results with the previous study by Rossing

et al. (1986) in which the professional singers generated the Fs not only in the solo passages but

also in the choir passages (loud singing). In comparison, Ternstrım and Sundberg concluded that

the untrained singers in their study were unable to generate the Fs in the choir mode. This

contrasts with trained singers who are able to produce the Fs in both solo and choir modes.

In a more recently study, Cleveland, Sundberg and Stone (2001) investigated the

presence or absence of the Fs in male country singers and compared the spectra of these singers

to one classical singer. Five male country singers were asked to sing the National Anthem as

well as one country song, chosen by each singer. Subjects were then asked to speak the text of

the National Anthem and of the song that they chose to sing. The classically trained singer was

also asked to sing and speak the National Anthem and one piece from the oratorical collections.

Samples were recorded and then normalized and analyzed by the LTAS. Results showed that the

classically trained singer obtained a clear Fs by increased energy in the Fs region near 2.8 kHz.

The LTAS for the country singers were similar for the spoken and sung samples and did not

show the Fs. However, Cleveland et al. (2001) noted a slightly increased energy peak between 3-

4 kHz that was observed in all spoken samples that suggest that presence of the speaker’s

formant. The “speaker’s formant” is typically found in “good voices” of singers, actors, radio

announcers, etc. (Leino, 1994; Nawka, 1997). The results of this study led Cleveland et al. to

conclude that only classically–trained singers produce a Fs.

There are a few studies in which researchers investigated the Fs by using singing styles

from different cultures. Wang (1985) investigated ten male singers with three different singing

34
styles: Western opera singing style, Chinese singing style and early music singing style. Details

of the singers and how many singers represented each singing style were not provided. Singers

were asked to sing three vowels, /a/, /i/ and /u/, with their full voice. All samples were analyzed

acoustically with a short-term spectrum. Wang also investigated the physiological changes

during singing by measuring the vertical distance of the larynx position during singing. Wang

found that the Chinese opera and the early music singers exhibited higher positions of the larynx

than the Western singers, yet the Fs still was exhibited in these two non-Western singing styles.

The position of the larynx was the highest for the vowel /i/ with a similar height for the vowel

/a/, and the lowest larynx for the vowel /u/ for all singing styles; however, reports from Wang’s

study did not provide the formant frequency and amplitudes of for each vowel nor of the clear

relations between the laryngeal height and acoustic measurements of vowel qualities. Also, it is

not clear how they defined the high energy peak (amplitude) for the Fs.

Sengupta (1990) also compared different cultures and singing styles by investigating four

male and four female trained, North-Indian classical singers. These singers were asked to sing

/a/, /i/, and /o/ with their full vocal range. Samples were recorded on a stereo cassette recorder

with a microphone 17 cm from the subject’s mouth. Spectrograms were taken over the frequency

range up to 8 kHz using both narrow band and wide band filters. PWR spectra were taken at the

steady position of each note.

Sengupta (1990) first identify the Fs by comparing the spectrum of the sung vowel to the

spoken vowel from the trained male singers. Results showed the presence of the energy around

2-4 kHz for the sung vowel /o/ and absence of the energy around the same region for the spoken

vowel /o/ indicating the Fs was found in the sung vowel /o/. Results for the female singers,

35
however, were not detailed. In this study, Sengupta also investigated the center frequency of Fs

by measuring the amplitude of the highest partial or by averaging the amplitude of the two

highest partials in the region of 2-4 kHz. Results showed that when the F0 increased, the center

frequency of the vowel /a/ sung by four male and four female trained singers increased.

Furthermore, Sengupta measured resonance balance of the sung vowel /a/ by calculating the

amplitude level differences between the spectral region of the Fs (between 2-4 kHz) and region

of vowel formants (frequency under 1.8 kHz). The results showed steady values of –4 dB

between the range of 230 Hz to 400 Hz and gradually decreases as the F0 increased for the vowel

/a/. Results also showed that the level of the Fs raised when the pitch raised.

In another study, Ross (1992) investigated the presence of the Fs in Estonian folk singing,

performed by two female singers. Both singers performed one Estonian folk song with a F0

range of 200-300 Hz. The LTAS from the first 50 seconds of the song was computed and a

determination of the presence of Fs was made. The findings showed that there was no increased

energy or a clustered of formants around the Fs region: the level of the peak around the Fs region

(3 kHz) was about 30-40 dB less than that of the first formant. Ross, therefore concluded, these

two female Estonian folk singers did not exhibit the Fs; however, it is not clear whether the Fs

was not exhibited in these singers because of the singing styles or because female singers do not

typically exhibit the Fs.

To date, research on the Fs has been primarily based on experiments conducted with

Western classically trained singers although there are a few studies that also identify Fs in other

singing styles (Wang, 1985; Sengupta, 1990). Perhaps because of the population generally

studied, one of the prominent suggestions has been that the techniques employed in Western

36
classical singing constitute a primary factor for the Fs. A broader perspective on the Fs,

including both the entire range of factors influencing the Fs and the hierarchy of these factors,

can be gained by gathering data from professional singers who have been trained to use

specialized singing techniques other than those used in Western classical music. Traditional

Chinese opera is one such form of professional singing that requires extensive training but

utilizes techniques that are distinct from those used in Western classical singing. The following

section reviews the variables that are distinctive in traditional Chinese opera.

Comparison of traditional Chinese opera to Western classical singing

In general, information on Chinese opera is rather limited. Although there are many

different regional operas that evidence some differences, the general musical style can be

discussed collectively under the umbrella of “Chinese opera” (Grout & Williams, 2002). Hsu

(1992) indicated that traditional Chinese opera in the early era was often performed in the open

air with a small Chinese instrumental ensemble. It was then moved to the theatre or teahouse

where the audience noise and social activities under the stage were unavoidable aspects of these

performances. Although the size of the Chinese ensemble is not as large as the Western

orchestra, the loudness is no less than the Western orchestra, and the timbre that these

instruments produce is extremely piercing to the ears. Nevertheless, traditional Chinese opera

singers can still be heard clearly above the loud, piercing instrumental ensemble and the

audience noise, in the same way that trained Western classical singers can be heard over the loud

orchestra in large concert halls and opera houses. This suggests that the traditional Chinese opera

singers may also have the Fs to overcome the loud orchestra.

37
Traditional Chinese opera singing is quite different from that of the Western classical

singing in terms of training and techniques. In an interview with five traditional Chinese opera

singers and teachers (personal communication, 2001) it was noted that traditional Chinese opera

singers begin their training at the age of 5 to 10 years. By comparison, it is recommended that

Western singers not start vocal training until after puberty (Sataloff, 1996). According to the

Chinese opera teachers, traditional Chinese opera performance historically has placed emphasis

on the singers’ appearance, especially the aesthetic appearance of the singers’ faces and their

facial expressions. For centuries, singers have been taught to retract the lips when singing

because opening the mouth and protruding the lips are considered unattractive. Today, traditional

Chinese opera singers also believe that the best way to project the voice is to focus on bright

vowel sounds; as opposed to the Western classical training techniques, lip retraction is the most

important technique for producing such bright vowels in Chinese opera (personal interviews

from five famous traditional Chinese opera singers and teachers, 2001).

In Western classical singing, darker voice quality is appreciated more than the brighter

voice quality (Hines, 1990). In order to produce a more aesthetically pleasing tone quality,

Western classical singers are trained to open the jaw, protrude and round their lips. These

techniques serve to project the voice. In addition to the different timbres valued by each music

tradition, the constrictions in the oral cavity due to the position of the tongue, the lips and the

mouth are different in Western classical singing than in traditional Chinese opera singing in part

because of the language differences.

The vowels /a/ and /i/ are the favorite vocalizations for traditional Chinese opera singing.

Traditional Chinese opera singers believe that the vowel /a/ helps project the voice. This belief,

38
based on practical experience, provides a striking correlation with the findings of empirical

studies conducted with Western classical singers (Sundberg, 1974), i.e. Fs is most prominent in

/a/. Traditional Chinese opera singers also believe that the /i/ sound aids in projecting the voice.

When singers retract the lips for the /i/ sound, the vowel is pushed up into the nasal cavity and

this nasal resonance can carry the voice (Hsu, 1992), leading Western listeners to perceive a

nasal quality to the singing (Grout & Williams, 2003).

Another difference between Western classical singing and Chinese opera relates to voice

classification. Hsu (1992) and Grout and Williams (2003) suggest that voice classification in

traditional Chinese opera is based on “singing style” or character type. This proposal was

confirmed in personal interviews with professional traditional Chinese opera singers. Singers are

not categorized by vocal ranges, such as soprano, alto, bass, baritone, and tenor, as they are in

Western classical singing. Rather, singers of traditional Chinese opera are classified by the kind

of characters that they typically portray. The timbre and pitch of the voice depends on age, sex,

and social status of the dramatic roles. The major three characters for male singers in Chinese

opera are Lao-sheng, Hsiao-Sheng and Wu-sheng. The Lao-sheng character-type is a middle-

aged or old man, an official of the imperial court, a general, or some other distinguished person.

This character-type sings with a full baritone voice. The Hsiao-sheng, or “scholar-lover,”

character-type has a high-pitched voice similar to the tenor and sings in the falsetto region. Wu-

sheng, who plays warrior roles and wears costumes which symbolize armor, has a wide vocal

range. This character-type is more involved in acting than in singing. Another character, Jing, is

known as “painted face male.” His facial colors symbolize the type of character; for example, red

represents good and white represents treacherous. Jing often plays the part of a high-ranking

39
army general, a warrior or official depending on his paint. “His robust baritone voice and unique

painted face together with his swaggering self-assertive manner all combine to make him the

most forceful personality in most scenes in which he appears” (Chinese traditional opera, 2003,

para. 5).

Although these character-types of traditional Chinese opera could be located within the

range-based classification system used in Western classical singing, their ranges are not exactly

the same. Traditional Chinese opera singers are trained to have much wider ranges than Western

classical singers. For example, the Lao-Sheng and Jing character-types, who most often sing in

the baritone range, also sing up into the tenor range and down into the bass range during opera

performance.

Given these differences discussed above, one may wonder whether the traditional

Chinese opera singer also has a high frequency peak in the spectrum, i.e. Fs. This question was

addressed in a pilot study by investigating the Fs in traditional Chinese opera, particular in Lao-

Sheng and Jing character types.

As noted, traditional Chinese opera singing has different techniques of training than

Western classical singing techniques, yet the voices can still be heard over a loud instrumental

ensemble. Furthermore, Wang (1985) confirmed the Fs in traditional Chinese opera singing. Su

(2000; 2002) hypothesized that the Western vocal training style in which the singers are taught to

round the lips, lower the larynx and lengthen the vocal tract might not be the only contribution to

the Fs. Instead, the intensity, F0, and the vowel configurations may have more effect on the Fs.

Two males (1 Lao-sheng and Jing: range overlapped with baritone and tenor) and one

female (in soprano range) traditional Chinese opera singers from the National Taiwan Traditional

40
Chinese Opera Department served as subjects in this study. Each singer had at least 15 years of

vocal training. Each singer was asked to choose a familiar musical phrase that was at least 40

seconds in duration. The pitch rangesof their singing phrases were not controlled. The singers

were asked to read the text of the musical phrase that they chose, three times, with their normal

conversational voices. They were then asked to sing the phrase with their full voice as if they

were singing in a large concert hall. Using the same musical score and range, each subject was

then asked to sing the phrase three more times with the single vowels /a/, /i/, and /u/ replacing the

text. Samples were recorded on a DAT in a quiet room in the National Taiwan College of

Performing Arts. The output of the DAT signals were analyzed using CSpeech (Milenkovic,

1987), a computer based speech analysis program. All samples were analyzed by long-term-

average spectral analysis.

The results from this pilot study showed that the female singer (range =soprano) did not

exhibit the Fs for any of the samples. One male singer, Lao-Sheng (range overlaps baritone and

tenor), exhibited the Fs for all of the sung samples. Only two samples sung by Jing (range

overlaps baritone and tenor) exhibited the Fs; however, the LTAS did show energy extending to

the high frequency region for this singer, but this did not quite meet the criteria developed for Fs.

This result agreed with Weiss et al.’s (2001) suggestion that Fs is not necessary for the high-

pitch voice because the maximal projection of the high-pitch voice exhibits strong energy in the

high frequency area. When the phrase sung by Jing which did not exhibit the Fs was edited to

eliminate the intervals with F0 of 695 Hz or above, a Fs was exhibited. It was hypothesized that

the Fs was not seen in the LTAS because of the high F0 for some segments of the singing. As

noted by previous researchers, the Fs does not occur in voices with high F0.

41
One possible explanation for the lack of Fs in some of the traditional Chinese opera

singers in this pilot study may relate to the vowel content of the phrases. As noted earlier, vowel

quality is a factor that influences Fs; therefore, the number of /a/, /i/, and /u/ vowels from the

musical phrase that traditional Chinese opera singers sang were counted. It was found that there

were few of these vowels in the traditional Chinese opera lyrics. In order to investigate the vowel

dependent-nature of the Fs in the singing phrase, the pilot study included a comparison of each

singer’s performance of the phrase using the original text to his performance of the same phrase

using only single vowels, /a/, /i/, and /u/. Results showed that Lao-Sheng exhibited the Fs in all

three vowels samples. Although Jing exhibited no sign of the Fs when the phrase was sung with

the original text, when the singer was asked to sing the same musical phrase with a single vowel,

the result showed the existence of the Fs in the vowels /a/ and /u/ but not the vowel /i/.Both male

singers showed the highest level of the Fs for vowel /a/ suggesting that the Fs may be exhibited

in the preferred vowels. Thus, absence of the Fs in the performance of the musical phrase with

the original text might be explained by the dominance of non-preferred vowels, rather than an

effect of vocal singing style or training.

42
Chapter III: Research questions

As illustrated through the review of existing literature, different research questions and

methodologies have been used to investigate the Fs. Previous studies identified a range of factors

influencing the Fs, including vocal training technique, F0, and vowel quality, and suggested that

the Fs cannot be explained merely by one factor. The main purpose of this study was to

determine whether the Fs exists in traditional Chinese opera as well as in Western classical

singing. Secondary questions that were closely related to the main purpose of this study were

also investigated. The three following questions were addressed: (1) How is the Fs perceived by

trained listeners? (2) What factors impact the Fs? (3) What is the impact of the independent and

combined factors on the Fs and do these differ for Chinese and Western singers?

1.) Perceptual judgments

In previous research, Fs has been identified by acoustic and physiological analysis;

however, few studies have compared these results to the perception of the Fs. Therefore,

perceptual judgments were obtained in the present study from a group of highly trained singers.

In past years, researchers investigated the Fs to determine how it enhances the singing voice so

that listeners are able to perceive the “ring” over a very loud orchestra (Bartholomew, 1934).

Vocal “ring” relates to people’s perception of the timbre of professional singing whereas the Fs

is considered to be the physical correlate of vocal “ring.” Unfortunately, not many studies have

been done to evaluate the vocal ring and its relation to the acoustic measures based on perceptual

judgments of the Fs. Those studies that have investigated the relation between vocal ring and Fs

43
(Wang, 1985; Omori et al, 1996) were not detailed enough to yield unambiguous interpretation

of their results. In the current study, we asked whether trained listeners could identify a “ring” in

different types of singing such as traditional Chinese opera and Western classical singing and

whether listeners’ perceptions corresponded to the physical identification of the Fs. A second

question addressed in this perceptual study was whether judgments of vocal ring are reliable

within and across listeners.

2.) Impact of analysis procedure and singing material on the Fs

In the past, quantitative definitions of Fs were based on a single vowel, but identification

of the Fs from connected passages of singing was mostly based on categorical evaluation alone

(i.e. present or absent). Researchers have used a variety of methodologies in these studies (e.g.

short-term spectra, LTAS, 1/3-octave bands) and reported inconsistent findings. There are no

studies that have investigated different methodologies in one experimental design to determine

whether analysis procedure is a factor in identifying the Fs. In this research, we asked whether

method of measurement affects determination of Fs?

Many studies have investigated the Fs by using LTAS because it is stable for speech and

singing samples, and most importantly, it is less dependent on F0 and intensity than other

analysis procedures (Sundberg 2001). LTAS provides information about spectrum envelope

peaks during singing because it yields the time average of sound level for adjacent frequency

bands (Sundberg, 2001). Mendoza, Munoz and Naranjo (1996) measured voice stability by using

the LTAS and suggested that the LTAS is an appropriate measurement to detect the stability of

speech signals of 30 seconds or greater. Moreover, Byrne et al (1994) used the LTAS to compare

44
the spectra of different languages and their results showed that the LTAS was similar for all

languages which suggested that LTAS is applicable for the present study.

In many previous studies, the precise amount of energy increase in the LTAS to define

the Fs was not provided. The present researcher found that this description of Fs was hard to

apply because it was not clear how a peak should be defined in terms of amplitudes and

bandwidth. Therefore, in this study, categorical (i.e. present or absent of the Fs) and quantitative

measures of LTAS (center frequency, relative intensity L3-L1 difference) and short-term spectra

(L3-L1) were made in order to decide on the presence or absence of Fs. Comparisons of these

measures were made for sung versus spoken tokens.

3.) Influence of independent and combined factors on Fs

The final questions that we asked were: What are the effects of the independent and

combined factors that affect the Fs and do these factors differ for Western and Chinese singers?

Voice classification was controlled and other factors such as singing technique (Chinese and

Western), vowel quality (/i/, /a/,/u/), fundamental frequency and intensity of singing were

measured and related to Fs.

45
Chapter IV: Methods

Three experiments were conducted to investigate the Fs in Chinese and Western classical

opera. Perceptual judgments were obtained in the first experiment by asking a group of highly-

trained singers to judge the presence or absence of the vocal ring. The second experiment

examined the Fs by measuring LTAS, categorically. The third experiment evaluated the Fs by

measuring LTAS and the short-term spectrum quantitatively. The purpose and details of these

research questions will be discussed further in the following sections.

This section will describe the subjects who participated in this study (Chinese and

Western) and detail the recording procedures and tasks that were used to collect the data for the

three experiments. The details of each experiment will then be provided

Singers

Ten males (5 Lao-sheng and 5 Jing with vocal ranges somewhat overlapping tenors and

baritone of Western classical singing) trained in traditional Chinese opera singing participated in

this study. All singers had a minimum of 15 years of individual voice lessons. These singers

were all professional singers and were selected from National Taiwan Traditional Chinese Opera

Department in Taiwan. Singers’ ages ranged from 27 to 40 years.

Ten Western classically trained singers (5 tenors, 5 baritones), with a minimum of 5 years

of vocal training were selected from the Indiana University School of Music. All singers were

referred by their voice professors only if they met the professors’ criteria of “professional

singers.” Singers’ ages ranged from 27 to 40 years. Although the two groups of singers were in

the similar age range, they had quite different durations of vocal training. Recall that it is typical
46
for Traditional Chinese opera singers to start training between the ages of 5 to 10 years old. On

the other hand, Western classical singers normally start their training after puberty. Therefore,

singing expertise, as judged by singing professionals, was a more realistic criterion for matching

subjects than duration of training if singers’ were to be of equivalent age. Age was considered to

be an important control variable because of the known effects of aging on the voice. (Hollien &

Shipp, 1972). All subjects were selected under the conditions of self-report of normal hearing

and had no medical history of voice pathology. Before the recording started, each subject was

asked to fill out a questionnaire, which was prepared by the researcher (see Appendix A). The

purpose of the questionnaire was to collect information about the characteristics of the subjects

such as the singers’ age, the singers’ vocal category, years of training, etc. to assure that all

subjects met the study’s inclusionary criteria.

Recording Procedures

Subjects were asked to choose one of the musical phrases that they were most familiar

with and bring the lyrics with them on the day of the recording. The recordings were done in a

quiet room in either National Taiwan College of Performing Arts or in the Indiana University

Department of Speech and Hearing Sciences. Each subject was asked to stand in a comfortable

position with a condenser microphone (ATM71) positioned 30 cm in front of his lips. The

microphone signals were transduced and recorded on a DAT recorder (SONY TCD-D8). A

sound level meter was placed at the same distance as the microphone position and the sound

level of each singer’s voice was measured and noted. A musical keyboard was provided for the

singers to identify comfortable keys in which they wanted to sing; this instrument also helped to

determine the singing range for each singer’s sample.

47
Data Collection

Subjects were asked to prolong the vowels /a/, /i/, and /u/, with the most natural and

comfortable habitual pitch and loudness for 5 seconds each. These samples constituted the

sustained “spoken” vowel samples. The purpose of this and the following task was to compare

the single vowels of the non-singing samples to the singing samples. Subjects were asked to sing

the vowels /a/, /i/, and /u/. Subjects were instructed to sing these three vowels in their most

comfortable pitch for 5 seconds each. They were asked to sing as loud as possible as if they were

singing in a big concert hall. These vowels provided the sustained “sung” vowels that were

compared with the sustained “spoken” vowels from the previous task. Subjects were asked to

sing certain musical notes such as C4, D4, G4, and E4 given by a keyboard cue. Those music

notes were in the range produced by baritone and tenor voices. The rationale for giving the same

notes to all the singers was to be able to have a standard for comparison across the different

singers. However, when the singers were asked to produce these given notes, some of the singers

were not able to complete the task due to their different vocal ranges. Even though the notes

were within the standard ranges for tenors and baritones, some of the singers had to sing an

octave higher or lower than the target pitch for certain notes. Some of the singers were not

comfortable with certain notes and they either refused to sing them or sang very uncomfortably.

Due to the lack of the consistency of these singing notes, this part of the data was not analyzed.

Subjects were also asked to glide up and down the musical scale of their comfortable ranges with

the vowel /a/. The lowest and the highest notes of this glide were sustained for at least 1-2

seconds.

48
Prior to the recording session, each singer chose his most familiar musical phrase, with a

minimum length of 40 seconds. This was done to give singers the opportunity to present their

best performances in the following tasks. The musical passage for each Western classically

trained singer is listed in Appendix B to indicate the language and type of repertoire that each

singer used in his performance. All Chinese singers sang in Mandarin. The singers were first

asked to read the phrase that they chose, three times, using their normal conversational pitch and

loudness. The purpose of this task was to compare running speech to the singing phrases. One

way to define the Fs is as an increasing energy at F3 in singing compared to speaking (Sundberg,

1974). The rationale for reading the same text from the singing phrase was to be able to control

and compare the vowels in both singing and speaking samples. Moreover, repeating the text

three times provided a sample of adequate length for a stable acoustic analysis with the LTAS

(Mendoza, Munoz & Naranjo, 1996; Sundberg, 2001).

The singers then sang the same musical phrase with their most comfortable pitch range.

Singers were asked to pretend that they were singing in a large concert hall so that they could

perform with their full voices in which the Fs may normally occur. The number of the vowels /a/,

/i/, /u/ from each musical phrase were also counted to evaluate whether or not the vowel content

affected the magnitude of the Fs. The same musical score and range were then sung with each

single vowel /a/, /i/, and /u/. There were two rationales for the selection of these three vowels.

First, results of many studies showed that vowels /a/, /i/, and /u/ have the most acoustically

distinct Fs (Sundberg, 1970, 2001; Bloothooft & Plomp, 1984, 1985; Su, 2000). Second, the

vowels /a/ and /i/ are the most common vowels used for traditional Chinese opera (Hsu, 1992).

By comparing the musical phrase performed on a sustained vowel with the same musical phrase

49
performed with the original text, we were able to investigate the effect of phonetic context on the

Fs.

Experiment 1: Perceptual judgments

Subjects

“Vocal ring” was judged perceptually by 12 doctoral students in voice performance, each

with a minimum of 5 years performance experience, from the Indiana University School of

Music. One male singing professor, the chair of the voice department, who has more than 20

years of teaching experience, also participated in this perceptual study. The average age of the 12

doctoral students was 28 years with a range of 27 to 36 years. The age of the singing teacher was

55 years. All listeners had attended recitals and concerts for at least 5 years and were familiar

with different voice qualities. All the listeners passed a hearing screening with thresholds of 20

HL at frequency of 500, 1000, 1500, 2000 and 4000 Hz.

Procedures

Stimulus tapes

All 40 samples of the sung passage were digitized (see section on acoustic analysis) and

transferred from the computer to a DAT tape. There were two blocks of stimuli: 1. Original

musical phrase and 2. Musical phrase sung with the single vowel /a/. Because of the time

consuming, vowels /i/ and /u/ were not included in the perceptual experiment. There were two

consecutive repetitions of each sample within each block, with a 3-second interval between the

repetitions. Each block had 20 samples (10 Chinese singers and 10 Western singers) and lasted

approximately 15 minutes. In addition to the stimulus tapes, two sets of samples were recorded

for a practice tape and each set contained 3 samples. Each set of practice stimuli was 60 seconds
50
in duration. Samples were presented with a 5-second interval before the next stimulus. Both sets

of practice tapes were about 6.5 minutes.

Listening procedures

Practice section

Listeners received a practice session to familiarize them with the “vocal ring.” The

practice session was held one day prior to experimental testing in a quiet classroom in the

Indiana University School of Music. The stimulus tape was presented through 2 loudspeakers at

a distance of 6 feet from the listeners. The samples were presented at 80 dB SPL, as measured 6

feet from the loudspeakers. There were two sets of samples presented in this practice session and

each set contained three samples. The first set contained Western classically trained singing,

traditional Chinese opera singing and popular music singing, all from commercial recordings.

These samples were used to provide the concept of vocal ring. Both Chinese opera and Western

classical samples contained a ring whereas the popular music did not possess a ring as judged by

the experimenter, both perceptually and acoustically.

The second set of stimuli acquainted the listeners with the type of stimuli and procedures

that would be used in the experiment. The three samples in the second set contained Western

classically trained singing, Chinese opera and popular music all sung without orchestral

accompaniments. The samples of Western opera were obtained from a singing professor at

Indiana University, School of Music. The traditional Chinese opera sample came from samples

collected for an earlier pilot study (Su, 2000). The popular music singing sample was recorded

by a student from the School of Music who was not a professional singer.

51
Before the practice session started, each listener was given a rating sheet with a 3-point

scale (1= strong vocal ring, 2= not sure, 3= no vocal ring) on the sheet. After listening to each

sample, listeners were asked to rate the vocal ring on the rating sheets. The answers recorded by

the listeners provided information for the researcher to determine whether the listeners’ concept

of the vocal ring was consistent within the group and with the experimenters’ judgments.

Experimental session

The experimental session was conducted in a quiet classroom in the School of Music.

The stimulus tape was presented through 2 loudspeakers at a distance of 6 feet from the listeners.

The samples were presented at 80 dB SPL, as measured 6 feet from the loudspeakers. Listeners

were given the response sheets which were divided into a rating section and a “comments”

section (see Appendix C). All ratings were made on a 3-point scale, with ‘1’ indicating “strong

vocal ring” perceived, ‘2’ indicating “not sure,” and ‘3’ indicating “no vocal ring.” Listeners

were instructed as follows:

During this experiment, you will hear samples of Chinese opera or Western classical

singing. Each sample will last about 30 seconds and will be played twice. You are required to

judge if there is a vocal ring in each sample by using a scale from 1 to 3. Rate the sample as 1 if

you hear a strong vocal ring. Rate the sample as 2 if you are not sure, and rate as 3 if you hear no

vocal ring. You will be getting two answer sheets. The first one is the score sheet which has the

rating scale for you to respond. The second sheet is for you to write your comments. Record your

rating on the score sheet. Please wait until you have heard 2 repetitions of the sample to mark

your score sheet. You are allowed to write down your comments anytime during stimulus

presentation. Write down your comments on the second sheet of paper regarding how you feel

52
about the vocal quality of different samples that you hear. You can also comment on which part

of the music you heard the ring; for example, at the high pitch, low pitch, or certain vowels, etc.

These comments are optional. We will not play the next sample until you are all finish your

ratings and comments, and are ready to proceed. There will be 2 blocks of samples. Block 1 will

have the phrases with original texts and Block 2 will present the singers as they sing an isolated

vowel. A 5-minute break was provided between the first (phrase) and second (vowel) blocks.

There are a total of 40 samples for you to rate. I will indicate the sample number before each

presentation so you can respond in the correct order.’

Reliability of listeners’ perceptual judgments

Five listeners who participated in the perceptual rating procedure were recruited again

four months after the test and were asked to rate the same samples (the regular singing phrase

and phrase sung with the vowel /a/) that they rated during the first experimental session. The

same procedures as were used in the first test session were used in this second rating session.

Data Analysis

The results for the perceptual test were based on 70% agreement across listeners. That is,

a sample was considered to have a vocal ring if 70% of the listeners rated the sample with a “1”-

strong vocal ring. The same criterion of 70% agreement across listeners was used for ratings of

“not sure” and “no vocal ring.” Intra-judge reliability was based on the number of samples that

each of the listeners rated the same in both listening sessions. Inter-judge reliability was used as

an index of the strength of the vocal ring. It was reasoned that a singing sample that received the

same rating across listeners in the two experimental sessions had more salient cues to the vocal

ring than a sample that had a wider range of ratings. Therefore, the numbers of listeners who

53
gave the same rating across sessions was taken as an index of the strength or weakness of

singers’ production of a vocal ring.

Acoustic analysis

One of the purposes of this study was to investigate the different acoustic cues that may

impact the Fs. This was done categorically (Experimental 2) and quantitatively (Experimental 3).

The singers’ samples were output from the DAT tape and analyzed with CSpeech and TF32

(Milenkovic, 1987; 1997; 2003), a speech-processing tool for PC computers. All acoustic

signals were low-pass filtered at 9 kHz and digitized with a 22 kHz sampling rate. Formant

frequencies for the vowels /a/, /i/, and /u/ were measured with FFT and LPC. The LPC analysis

included 26 coefficients. The FFT was used to determine the fundamental frequency and LPC

provided a good estimate of formant frequencies for speech and smoothed the spectrum for sung

vowels.

Experiment 2: Categorical measurement

Long-term average spectrum (LTAS)

A LTAS was calculated for each digitized sample by averaging the FFTs (512 points)

from successive 20 ms segments. The 20 ms Hamming window was advanced in 10 ms steps.

This analysis was conducted for all phrase–length passages (i.e. sung passage, sung vowel

passage, speaking passage).

As mentioned before, previous researchers investigated and defined the Fs by finding an

extra spectrum envelope peak (a cluster of formant 3, 4 and 5) that appears between 2300 Hz and

3500 Hz. The procedure from previous studies, whereby the researcher first identified increasing

amplitude between 2300 Hz and 3500 Hz was applied. The bandwidth of this peak energy was
54
then measured at the 3 dB down points from the high and low frequency sides of the peak. If the

cluster of energy in the frequency region of Fs had a bandwidth that was less than or equal to

1000 Hz, that cluster was defined as a peak and its low and high frequency –3dB boundaries

were recorded. Therefore, the Fs in this experiment was defined by a cluster of energy between

2300 and 3500 Hz, with a bandwidth less than or equal to 1000 Hz.

Reliability of peak determination was made from randomly picked samples by two other

experienced investigators from the Indiana University Department of Speech and Hearing

Sciences. The presence or absence of a measured Fs in all the regular singing phrases and the

speaking phrases from both Chinese and Western singing groups was determined. The same

musical phrases sung with the vowels /a/, /i/, and /u/ were then investigated to determine if Fs is

impacted by vowel quality. Samples from the regular singing phrase and the phrase sung with the

vowel /a/ that matched the criteria of the Fs were then compared with the perceptual ratings. This

was done to investigate whether there is a relationship between the categorical acoustic cues to

the Fs and the vocal ring.

In addition, the presence or absence of the Fs from the gliding musical scales with the

vowel /a/ were also measured by the LTAS. The same procedures for determining the Fs were

used in this analysis. Results were compared with a study from Sundberg (2001) in which a

similar task was performed by a Chinese opera singer.

Experiment 3: Quantitative measurement of LTAS

Relative energy of the LTAS

55
As discussed before, many researchers defined the Fs by investigating single vowels;

however, there is no operational definition of the Fs from a sung passage. Previous researchers

who used LTAS analysis to investigate the Fs simply decided if there was an energy peak

between 2300 Hz and 3500 Hz with increasing amplitude from the region of 2000 Hz.

Unfortunately, no study really specifies how much increase in energy is needed to yield the Fs.

In the current study the researcher attempted to quantify Fs using the differences in energy

(measured in dB) between high (2000 –4000 Hz) and low (0- 2000 Hz) frequency regions. This

high-low frequency energy difference was compared for sung and spoken samples of the same

passage.

The spectra from all the sung and spoken phrases were filtered using Elliptic IIR filters

designed in MatLab Sptool. Each sung and spoken phrase was first low- pass filtered (fс=2000

Hz) with a ripple of 3 dB in the pass-band. The stop-band edge frequency was set at 2500 Hz

with 50dB roll-off. The original full spectra of the sung and spoken phrase were then filtered by

the band- pass filter which was set at 2000-4000 Hz with a pass-band ripple of 3 dB. The stop-

band edge frequencies were set at 1500 and 4500 Hz, each with a 50dB roll-off. All samples

filtered by both low-pass and band-pass filters were saved and then transferred to the waveform

files from the MatLab workspace and analyzed with Cool Edit (Symtrilliam, 1999). The energy

values (RMS in dB) in the low frequency region (0-2kHz) and high frequency region (2-4 kHz)

were calculated by Cool Edit. Finally the relative energy level differences between the high

frequency band (2000-4000 Hz) and the low frequency band (0-2000 Hz) were calculated by

subtraction and compared for sung and spoken.

56
The relative energy level difference between the two frequency bands for the sung phrase

was first calculated; a small absolute difference between the two regions was expected if there

was a Fs. The relative energy level difference between the two regions for the spoken phrase was

then calculated; a greater negative difference between the two regions was expected for the

spoken, compared to the sung samples. Finally, the absolute difference in relative energy in the 2

frequency bands was compared between sung and spoken phrases.

A two- way ANOVA was performed to investigate the main factors of material (spoken

and sung phrases) and style (Western and Chinese groups), as well as their interaction effect.

Tukey HSD analysis was also performed to test the interaction effect between the all the

materials in order to examine which material significantly impacted on the spectral energy. In

order to investigate whether different materials impact the Fs, one-way ANOVAs were used to

evaluate the relative energy differences between the spoken phrase, the regular sung phrase, and

the musical phrase sung by /a/, /i/ and /u/) within each group-Western and Chinese. Moreover,

two sample t-tests were performed to compare the relative energy difference between the

Western and Chinese groups for the vowels /a/, /i/ and /u/.

Finally, Spearman’s Rho was used to investigate the correlation between the quantitative

analysis of the LTAS and perceptual rating for the regular singing phrase and the phrase sung

with the vowel /a/. Composite scores of the perceptual ratings for each singer were determined

by multiplying the number of the responses to receive each rating (1= strong vocal ring, 2= not

sure, or 3= no vocal ring) by the ratings’ scale value and determining the sum. For example,

singer C4 received “1” from four listeners (1x4= 4), “2” from eight listeners (2x8 =16) and “3”

from one listener (3x1 =3). Therefore, the composite rating for C4 was 23.

57
Acoustic measures of the F0

The F0 of the entire sung phrase was calculated by using Cspeech version 4.0

(Milenkovic, 1987; 1997). The algorithm uses an autocorrelation procedure to track F0 changes.

The simultaneous use of the pitch trace, the waveform and the spectrogram provided facilitated

the determination of the pitch of specific vowels. The mean, minimum, maximum, and the

standard deviation of the F0 of all these phrases were computed by positioning the cursors to the

endpoints (beginning and the end) of the waveform. In addition, the highest and the lowest F0

were measured from each sung phrase by manually positioning the cursor to these pitch levels

and recording the output of the F0 values that appeared on the screen.

Comparisons were made between F0 measures and listeners’ perceptual judgments

(Experiment 1). The purpose of this comparison was to investigate any F0 differences between

the singers who exhibited the strong vocal ring and singers who did not exhibit the strong vocal

ring. Both mean F0 and F0 range were investigated to determine their impact on the Fs for both

traditional Chinese opera singing and Western classically trained singing. Because of the limited

sample size, statistical analysis was not undertaken. Instead, the frequency difference of F0 was

operationally defined as a minimum of 1 semitone. This definition was based on the study of

differential pitch sensitivity of the ear (Shower & Biddulph, 1931). Shower and Biddulph found

that the minimum change in F0 that is detectable by the human ear is on the order of 1.0% or less

in the whole musical range. Because the smallest pitch interval between notes in Western music

is a semitone, with the frequency difference around 5.9% of the frequency range, our definition

of a perceptible frequency difference was based on a minimum of one semitone difference.

58
Intensity measurement

The intensity range (the highest and lowest intensities) of each sung phrase was measured

by using the sound level meter during the recording of each phrase. The sound level meter was

placed at the same distance as the microphone position in front of the singer. The researcher

watched the sound level meter when singers were singing their passages, and the highest and

lowest intensities were noted during the singing. The results were compared to the listeners’

perceptions to determine intensity range differences between singers who were rated as having a

strong vocal ring and singers who did not exhibit the strong vocal ring. The differential threshold

for the intensity was operationally defined by minimum of 1 dB. This definition was based on

the study of Reisz (1928), which showed that with the intensity between 70-110 dB, intensity

discrimination was below 1 dB.

Level of the Fs (L3-L1)

Several studies (Schutte & Miller, 1985; Seidner et al., 1985; Rossing et al., 1986;

Sengupta, 1996; Sundberg, 2001) determined the Fs by measuring the difference between the

level of the formant peak around 3000 Hz (L3) and the level of the first formant (L1) from either

the short-term spectrum or the LTAS. In this study, the researcher measured the L3-L1 from the

LTAS of the regular singing phrase and the phrase sung with the vowel /a/ for all singers. Again,

comparisons were made between this measure and listeners’ judgments of the vocal ring. The

purpose of this comparison was to investigate the level difference between singers who were

rated as having the strong vocal ring and singers who did not exhibit the strong vocal ring.

Formant peaks from the LTAS were determined by using Rossing et al.’s (1986) criteria.

Rossing et al. defined the level of the Fs as the formant frequency level at the 3 kHz frequency

59
region. They also defined the level of the first formant as the frequency level around 500Hz. The

first formant peak (L1) around 500 Hz was identified manually from the LTAS in the current

study. The frequency and amplitude of the first formant and the formant (L3) in the region of 2-4

kHz were obtained from LPC with 26 coefficients. When there were two or three peaks adjacent

to each other in the higher frequency region near the third formant (around 2.3-3.5 kHz), as is

common for the Fs, the average of these peaks was calculated. The differences between L3 and

L1 were then calculated.

Fs also was investigated by short- term spectral analysis. The level of the Fs (L3-L1) was

calculated for each individual vowel from sustained sung and spoken vowels and the vowels

selected from the musical phrase. Both FFT and LPC analyses were used to measure the formant

frequencies for all sustained vowels /a/, /i/ and /u/ and all vowels (/a/, /i/, and /u/) within the

musical phrase. A 50 ms segment was extracted from a steady-state portion of each vowel. The

FFT gave a good estimate of the F0 and harmonics, and LPC provided a good estimate of

formant frequencies for speech and smoothed the spectrum for sung vowels. A 26 coefficients

LPC spectrum and a broad-band spectrogram were used to assist identification of the formant

frequencies. The LPC spectrum was overlaid on the plot of the FFT when the broad-band

spectrogram was displayed. By moving the cursor on the spectra, both frequency and intensity

values appeared on the screen using CSpeech.

The prolonged spoken vowels /a/, /i/, and /u/, with the most natural and comfortable pitch

and loudness were first measured and the mean L3-L1 was calculated for each vowel. The L3-L1

from the prolonged sung the vowels and vowels /a/, /i/ and /u/ edited from the sung phrase were

then measured. The mean L3-L1 was calculated and compared with L3-L1 from the spoken

60
sustained vowels. The first formant peak (L1) was identified manually by determining the

frequency and amplitude of the highest harmonic or average of the highest two harmonics near

the first formant. The third formant peak (L3) also was identified by the highest harmonic or

average of the highest two or three harmonics around 2.3 - 3.5 kHz. Formant values from the

sustained spoken vowels from each singer were used as a standard to help determine the F1 of

the vowels edited from the sung phrase. Comparisons were made between the mean L3-L1values

of the spoken and sung vowels within and between each singing group. Additionally, the mean

L3-L1 values were compared to the perceptual judgments to determine if this parameter was a

good reflection of the vocal ring.

The formants of the vowels /a/, /i/, and /u/ from the spoken phrases were not measured

with short-term spectral analysis because many vowels were not long enough to exhibit a steady-

state portion. Also, the speed of speech and the quick changes of articulatory movements due to

the different contexts would not provide a reasonable comparison to the lengthened vowels in the

sung passages.

61
Chapter V: Results and discussions

Results for all experiments will be discussed relative to the results of the perceptual

judgments (Exp. 1). Comparisons were made between the acoustic measures (i.e., categorical

measurements of LTAS, quantitative measurements of LTAS, L3-L1 of short term spectra) and

listeners’ judgments concerning the presence of the “vocal ring.” The acoustic results are

compared to listeners’ judgments using a 70% criterion wherein a singing sample was considered

to have a vocal ring if 70% of listeners rated “yes” they heard a “strong vocal ring.” The

category for these samples will be termed “vocal ring” for the remaining discussion. The second

perceptual category was formed by 70% of listeners giving a rating of “not sure” to any sample.

This rating indicated that sometimes listeners heard the “strong vocal ring” and sometimes they

did not. The rating “no vocal ring” will not be related to the acoustic data because this category

was rarely used, and no sample was judged by 70% of the listeners as having “no vocal ring.”

Experiment 1: Perceptual rating

This experiment assessed listeners’ perception of the vocal ring for both traditional

Chinese opera and Western classically trained singers. The Chinese and the Western samples of

the regular singing phrases and the phrases sung with the vowel /a/ were rated. No spoken

passage was rated in the perceptual rating session. All ratings were made on a 3-point scale with

‘1’ indicating “strong vocal ring” perceived, ‘2’ indicating “not sure” (i.e., sometimes yes,

sometimes no), and ‘3’ indicating “no vocal ring”. A practice session was provided prior to the

experimental session that tested listeners’ ability to hear the vocal ring. Results of this practice

62
session showed consistent answers across all listeners, indicating that they agreed on the concept

of the “vocal ring”.

The perceptual judgment test was given one day after the practice session. The results

showed that 70% of the listeners heard a strong vocal ring in samples produced by 4 out of 10

Chinese traditional opera singers (C8, C11, C14 and C15) for the regular singing phrase (Table

1). However, listeners were “not sure” if a ring was present throughout the musical phrase of

four singers (C3, C4, C5 and C6). Around half of the listeners (46%) heard a strong vocal ring

for singers C9 and C16; but, half of the listeners (54%) were not sure. Essentially the “no vocal

ring” response was not used except by one listener for just two singers, C8 and C11.

General comments from listeners for the regular phrase sung by Chinese opera singers

indicated that the vocal ring was perceived, but not always throughout the entire phrase. Most of

the listeners commented that they heard the vocal ring in the high F0 range but could not hear it

in the lower F0 range. Moreover, listeners indicated the possibility that not perceiving the vocal

ring could be a result of their unfamiliarity with the language and bias in terms of singing

techniques. Listeners did report that the ring was perceived in certain vowels, such as /i/, /u/, /e/

and /a/ in the Chinese samples, particularly in the vowels /a/ and /i/. Listeners also noted that

they heard a stronger vocal ring in sustained notes and vowels than in running notes with

complex context.

For the same musical phrase sung with the vowel /a/, results showed that two singers (C8

and C15) were perceived to produce a strong vocal ring (Table 2). As noted above, these singers

also were perceived to have the vocal ring when they sang the regular phrase. Listeners were

“not sure” about the vocal ring in four singers (C3, C6, C14, and C16). Ratings for the remaining

63
singers (C4, C5, C9 and C11) did not provide any clear judgments about the strength of the vocal

ring. For these singers, less than 70% of listeners rated samples as falling into any of the 3 vocal

ring categories (i.e., strong vocal ring, not sure, or no vocal ring). All singers, except singer C15,

were perceived by at least one listener as having no ring for the phrase sung with the vowel /a/.

Listeners’ comments generally indicated that the vocal ring was mostly heard in the higher F0

and sustained notes when the sample phrase sung with the vowel /a/ was presented. Most

listeners commented that on the samples that they rated “unsure,” they sometimes heard the

vocal ring, yet not throughout the entire phrase. These comments are consistent with those made

for the regular singing phrase.

64
Table 1: Results of perceptual rating for the regular sing phrase: percentage of listeners’ rating

“strong vocal ring”, “no vocal ring” and “not sure” for traditional Chinese opera singers. Percent

ratings are based on the judgments of 13 listeners.

C3 C4 C5 C6 C8 C9 C11 C14 C15 C16

Strong vocal 8% 23% 23% 15% 69% 46% 77% 77% 77% 46%
ring

Not sure 92% 77% 77% 85% 23% 54% 15% 23% 23% 54%

No vocal ring 0% 0% 0% 0% 8% 0% 8% 0% 0% 0%

65
Five out of ten Western classically trained singers (W1, W3, W5, W7 and W9) were

perceived to have a vocal ring for the regular singing phrase (Table 3) whereas judgments of “not

sure” were obtained for two singers (W2 and W6). Approximately half of the listeners heard the

vocal ring and half of the listeners were “not sure” for singers W4, W8, and W10. There was

only 1 singer (W6) for which any listener indicated no vocal ring. For the phrase sung with the

vowel /a/ (Table 4), results from the perceptual rating showed seven singers (W1, W2, W3, W5,

W7, W9 and W10) were perceived to have a strong vocal ring. Five of these singers (W1, W3,

W5, W7 and W9) also were perceived to produce the vocal ring during the sung passage.

Listeners were unsure of the ringing quality for the remaining three singers (W4, W6 and W8).

Listeners commented that the vocal ring of these three singers was sometimes perceived and

sometimes not. Results from the perceptual rating for the phrase sung with the vowel /a/ showed

that only one listener rated one sample (sung by W2) as having no vocal ring.

Second perceptual ratings

Five listeners who participated in the perceptual rating procedure were recruited again

four months after the test and were asked to rate the same samples (the regular singing phrase

and phrase sung with the vowel /a/) sung by both traditional Chinese opera singers and Western

classically trained singers. The same procedures as were used in the first test session were used

in this second rating session and the reliability of the listeners’ judgments was calculated.

Additionally, the percentage of listeners that provided the same rating during both listening

sessions was determined and used as an index of the robustness of the singer’s vocal ring.

66
Table 2: Results of perceptual rating for phrase sung with the vowel /a/: Percentage of listeners’

rating “strong vocal ring”, “no vocal ring” and “not sure” for traditional Chinese opera singers.

Percent ratings are based on the judgments of 3 listeners.

C3 C4 C5 C6 C8 C9 C11 C14 C15 C16

Strong vocal 15% 31% 0% 23% 69% 38% 54% 23% 85% 8%
ring

Not sure 77% 62% 62% 69% 23% 54% 38% 69% 15% 69%

No vocal ring 8% 8% 38% 8% 8% 8% 8% 8% 0% 23%

67
Table 3: Results of perceptual rating for the regular sing phrase sung by Western classically

trained singers: percentage of listeners’ rating “strong vocal ring”, “no vocal ring” and “not sure”

based on the judgments of 13 listeners.

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10

Strong vocal ring 92% 23% 92% 62% 100% 23% 100% 46% 92% 62%

Not sure 8% 77% 8% 38% 0% 69% 0% 54% 8% 38%

No vocal ring 0% 0% 0% 0% 0% 8% 0% 0% 0% 0%

68
Table 4: Results of perceptual rating for the singing phrase sung by Western classically trained

signers with the vowel /a/: percentage of listeners’ rating “strong vocal ring”, “no vocal ring” and

“not sure” for Western classically trained singers. Percent ratings are based on the judgments of

13 listeners.

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10

Strong vocal ring 77% 69% 77% 31% 77% 31% 100% 31% 100% 69%

Not sure 23% 23% 23% 69% 23% 69% 0% 69% 0% 31%

No vocal ring 0% 8% 0% 0% 0% 0% 0% 0% 0% 0%

69
Results from the regular singing phrase showed that the reliability of listeners’ judgments

regarding the vocal ring ranged from 30% to 90%, with an average of 50 % (Table 5) for

traditional Chinese opera singers. There was a mean reliability of 46% for the phrase sung with

the vowel /a/, with a range of 20% to 80% across the judges. In comparison to the traditional

Chinese opera singers, the reliability of judgments for Western singers ranged from 50% -100%,

with a mean of 72% (Table 5). The mean reliability of the Western singers for the phrase sung

with the vowel /a/ ranged from 60% to 90%, with a mean of 72%.

On average, 60% of the listeners provided the same rating in both listening sessions with

a range of 40% to 80% for the regular phrase sung by the Chinese singers (Table 6).

Interestingly, the reliability of listeners’ perception was higher for the singers that were judged as

having a strong vocal ring (C8, C11, C14, and C15), with a mean reliability of 70% compared to

the singers that received a rating of “not sure” (C3, C4, C5 and C6); the reliability within

listeners for this group of Chinese singers was 50%, on average, when they sang the regular

phrase. Similar results were found when the phrase was sung with the vowel /a/; there was a

higher reliability across judges (mean of 70% ranging from 60% to 80%) for the singers that

were judged as having a strong vocal ring (C8 and C15) compared to those that were judged as

“not sure” (C3, C6, C14, C16).

Mean reliability for the “not sure” singers was 40% with a range from 20 to 60%

reliability (Table 6). Reliability of ratings was lower for a phrase sung with the vowel /a/ than the

regular singing phrase, with a mean reliability of 50% (ranging from 20% to 80%) for the

Chinese singers (Table 6). Judgments for singers C14 and C15 had the highest reliability across

70
listeners, with an average of 80% reliability for the regular singing phrase. For a singing phrase

sung with vowel /a/, reliability for singer C8 was the highest (80%).

71
Table 5. Percentage of samples that received identical ratings across two listening sessions for

each listener. Top panel is for the Chinese singers and bottom panel is for the Western singers.

Chinese EF BH JM TW D3 Mean

Regular singing 60 30 90 60 50 50

Phrase sung with /a/ 80 30 50 50 20 46

Western EF BH JM TW D3 Mean

Reg. singing phrase 100 50 80 70 60 72

Phrase sung with /a/ 90 70 70 70 60 72

72
Table 6 Percentage of listeners that perceived identical ratings across two listening sessions for

samples that were perceived to produce a “strong vocal ring,” based on 70% agreement across

listeners and 70% as “not sure” for the Chinese singers.

Chinese singers Strong vocal ring Not sure

Reg. singing phrase C8 C11 C14 C15 C3 C4 C5 C6

Judgment % 60 60 80 80 40 60 60 40

Chinese singers Strong vocal ring Not sure

Phrase sung with /a/ C8 C15 C3 C6 C14 C16

Judgment % 80 60 40 40 60 20

73
The ratings of the Western group showed results similar to the Chinese group for the

sung phrase. The reliability of listeners’ judgments was also higher for singers who were rated as

having a strong vocal ring compared to those singers who were rated as not sure. Results showed

an average of 84% reliability, with a range from 60% to 100% when listeners rated singers who

had a strong vocal ring (Table 7). By comparison, singers who received a rating of “not sure”

had an average reliability of 50%, with a range from 40% to 60%. Unlike the reliability for the

Chinese group, listener reliability for the Western singers did not vary with the perceived

strength of vocal ring when singers sang a phrase with the vowel /a/. Reliability across listeners

was 71% for singers with a strong vocal ring and a mean reliability of 73% when listeners were

“not sure” about the ring.

Listener reliability in both vocal ring conditions ranged from 40% to 100% reliability for

Western singers singing the phrase with the vowel /a/ (Table 7). Listener reliability was similar

for judgments of the Western singer’s original singing phrase and for the judgments of the phrase

sung with the vowel /a/. An average of 74% reliability across the 5 listeners (with a range of 40%

to 100%) was found for the regular singing phrase, and an average listener reliability of 72%

with a range of 40%-100% was noted for the phrase sung with the vowel /a/. These values were

higher than the reliability of ratings of listener judgments for the Chinese group. Singers W7 and

W9 exhibited the highest reliability for the regular singing phrase and listener ratings were most

reliable for the singers W3, W8 and W9 for the phrase sung with the vowel /a/. Judges were

100% reliable across listening sessions for these three singers (Table 7).

74
Table 7: Percentage of listeners that perceived identical ratings across two listening sessions for

samples that were perceived to produce a “strong vocal ring,” based on 70% agreement across

listeners and 70% as “not sure” for the Western singers.

Western group Strong vocal ring Not sure

Reg. singing phrase W1 W3 W5 W7 W9 W2 W6

Judgment % 80 60 80 100 100 40 60

Western group Strong vocal ring Not sure

Phrase sung with W1 W2 W3 W5 W7 W9 W10 W4 W6 W8


/a/

Judgment % 60 40 100 60 60 100 80 40 80 100

75
Discussion

Previous research suggests that vocal tract configuration affects the Fs in the Western

classically trained singing technique (Bartholomew, 1934; Suindberg, 1970; Sundberg, 2001).

The limited numbers of studies of other singing techniques do not provide a clear indication of

the Fs in non-Western classical singing. It was not clear whether other singing techniques also

produce the Fs. This experiment provides general information of the listeners’ perceptions of the

existence of the Fs in two different singing techniques (traditional Chinese opera singing and

Western classical singing). Results disagreed with previous studies (Bartholomew, 1934;

Suindberg, 1970; Sundberg, 2001) and showed that listeners were able to hear the Fs in both the

Western classically trained singing style and the traditional Chinese opera singing style. Our

study is consistent with the results of Wang (1985) who revealed the Fs in 3 different types of

singing styles, Western classical singing, early music singing and the traditional Chinese opera

singing, by using both perceptual judgments and acoustic measurements.

In the present study, more Western than Chinese singers were perceived as having a

strong vocal ring for both the regular singing phrase and the phrase sung with the vowel /a/.

Although there were many samples rated as “not sure” in the Chinese group, listeners reported

that the vocal ring was still perceived in most of the samples, only not throughout the whole

phrase. Listeners indicated that the uncertainty of these judgments might have been affected by

their unfamiliarity with the Chinese language and the different singing style. Although the lack

of familiarity with the Chinese language seemed to have affected the listeners’ perceptions,

listeners’ comments indicated that the vocal ring was still identified in certain common vowels

such as /a/, /i/, /u/ and /e/ within the musical phrase; this was especially noted in the vowels /a/

76
and /i/. This is consistent with previous studies (Sundberg, 1970; Seidner, 1985; Bloothooft &

Plomp, 1984, 1985, 1986) in which different vowel qualities, especially the vowels /a/, /i/ and

/u/, were found to influence the Fs.

Listeners also indicated that the Fs could be more easily heard in sustained notes with

single vowels than in running notes with complex contexts for both Chinese and Western

singers. This result is similar to what is seen in speech production wherein speech quality also is

impacted by signal duration. For example, Hillenbrand et al.’s (1995) investigation of speech

intelligibility revealed that vowel duration can be an important cue for identifying some vowels.

Ferguson and Kewley-Port (2002, 2008) and Picheny et al. (1986) studied the acoustic difference

between clear and conversation speech. They found that one of the reasons that clear speech had

superior intelligibility was because it has longer steady state durations than are found in

conversational speech. Based on these speech production and intelligibility results, we suggest

that when singing a complex texts, singers had to change their vocal tract configurations quickly

to incorporate all the relevant articulatory gestures. This rapid change might decrease a singer’s

ability to achieve the right vocal tract configuration for the Fs. Therefore, less vocal ring was

heard. However, when vowels were sustained, singers had a longer duration to achieve the right

vocal tract configuration for the Fs. Therefore, Fs was heard more in the steady vowels than that

of the complex texts.

The comparison of the first and second perceptual ratings showed that the listeners’

judgments were more reliable for the Western group than for the Chinese group for both the

regular singing phrase and the phrase sung with the vowel /a/. This further confirms that the

listeners’ perceptions were likely affected by the familiarity with the languages and techniques.

77
Recall that listeners were classically-trained, professional Western singers and were familiar with

that music style and the texts; therefore, they may have been less distracted by the singing style

of Western compared to Chinese opera. As Lundin (1967) suggests, musical preferences may be

culturally conditioned: For example, listeners in the present study indicated that harshness was

heard in Chinese’s singing and it influenced their perceptions of the singing sample. Harshness

may be due to the different techniques wherein Chinese singers are taught to sing with a bright

voice whereas the Western singers are taught to sing with a dark voice which is the singing

method that balances the high and low formants. Because all listeners in this experiment were

trained Western singers and were used to the dark timbre, the bright voice might sound harsh to

these listeners’ ears.

Moreover, Chinese music is based on a pentatonic scale which might be dissonant to the

Western listener. The Western music, however, is based on the diatonic scale and is consonant to

the Western listeners’ ears. It has been hypothesized that predictable musical sequences are

preferred and considered to have greater tonality (Roederer, 1972). Therefore, the familiarity

with Western music may have led the listeners to hear the vocal ring in Western classical opera,

whereas the unfamiliarity with Chinese music may have led to “musical tension” (Roederer,

1972, p. 148). Previous research on the Fs led the current investigator to assume that if a singing

voice contained a vocal ring, all listeners who are familiar with this concept would be able to

perceive the ring regardless of the different languages or techniques. This assumption was

negated by listeners’ comments that indicated that their perceptions were still affected by the

different techniques, language, and music style. These results suggest the need for evaluation of

78
the vocal ring in musicians from other musical traditions, including those trained in Chinese

opera.

It was interesting that listeners made more reliable judgments for singers who were heard

to have a strong vocal ring than for singers who were judged as “not sure” for both groups. This

suggests that when the vocal ring is strong, listeners can make consistent judgments about its

presence. In addition, listeners’ judgments and reliability may also reflect the singers’ comfort

with the task and the training methods. Classically trained Western singers are taught to

substitute the original texts with a single vowel during their practice in order to become familiar

with the music and singing technique before they include the texts. Therefore, the Western

singers were comfortable when they substituted a vowel for the text of a musical phrase. This

may have influenced the vocal ring when the singing phrase was sung with the vowel /a/; in this

task, listeners were more able to hear the ring in the Western singers. In contrast, in traditional

Chinese opera training, singers are not trained to substitute the whole singing text with one

particular vowel. When they were asked to sing only one vowel throughout the whole musical

phrase instead of the regular texts, they felt uncomfortable and were not able to project their full

voice even though they were allowed to practice as many times as they wished before the

recording. This comfort level may explain why there were more samples perceived as having a

strong vocal ring in the regular singing phrase than the phrase sung with the vowel /a/ by the

Chinese opera singers.

The few previous studies that investigated the percept of Fs by multiple listeners only

presented the results from the listeners’ average ratings (Wang, 1985; Omori et al. 1996). None

of the previous studies of Fs investigated individual differences. In the current investigation, we

79
studied each listener’s perception and reliability and found that there were several factors that

may influence the listeners’ perceptions. These factors include listeners’ skills, singers’ abilities,

language differences, the familiarly of the musical style and technique. Future study of these

factors is needed to gain a better understanding of listeners’ ability to perceive Fs.

Experiment 2: Categorical analysis

Long-Term-Average spectrum (LTAS)

In general, the Fs has been identified by the presence of a peak around the region of

2300-3500 Hz in the Western classical singing. Therefore, the categorical analysis of the Fs was

based on the presence of a peak around the region of 2300 Hz to 3500 Hz in the LTAS for the

Western classical trained singers in this experiment. For the traditional Chinese opera singers,

the Fs was determined with the presence of a peak around the region of 2300 Hz to 3700 Hz

because Chinese singers had overall higher F0 range than the Western singers. In addition, a

peak was defined by a bandwidth that was less than or equal to 1000Hz. The reliability of

defining a peak was done by the consensus between the current researcher and two other

experienced researchers in acoustic. All samples were judged and the reliability analysis showed

80% agreement across all judges. When there was a disagreement between investigators, the

presence or absence of the peak was based on the decision of the majority of judges.

Results of the categorical analysis showed that the spectra for four out of ten of the

Chinese singers (C8, C11, C14 and C15) matched the criteria of the Fs for the regular singing

phrases (Table 8). Judges identified the Fs in five Chinese singers (C5, C6, C8, C14, and C15)

for the musical phrase sung with vowel /a/ (Table 9). The results of the categorical analysis

showed that seven out of ten Western singers (W1, W2, W3, W4, W5, W7, and W9) matched the
80
criteria of the Fs for the regular singing phrase (Table 10). The Fs was identified for eight

Western singers (all except W6 &W8) for the musical phrase sung with the vowel /a/ (Table11).

The categorical acoustic measurements were compared to the perceptual ratings to

determine the correspondence of acoustic measures with vocal ring. The results from the

perceptual ratings for the traditional Chinese opera singers matched (exhibited the Fs both

acoustically and perceptually) exactly with the categorical results for regular singing phrases.

There was a correspondence between the Fs and the perceptual ratings for C9 and C15 when the

/a/ vowel was sung. Results from the perceptual rating did not match with the categorical results

for C5, C6 and C14, however, for the sung /a/ phrases. Although these three samples fulfilled the

acoustic criteria of the Fs, listeners did not perceive a vocal ring. For example, the LTAS from

C5 for singing /a/ exhibited a peak at 3165 Hz with a bandwidth of 797 Hz, but listeners did not

perceive a strong vocal ring (Figure 1). Similar results were found for C14 in the phrase sung

with vowel /a/ wherein the categorical analyses yielded a Fs (peak at 3424 Hz, bandwidth at 796

Hz), but listeners were not able to perceive this sample as having a “strong vocal ring” (Figure

2). Several samples, for example singers C9 and C4 (Figure 3 & 4), exhibited a cluster of

increasing energy around the Fs frequency region; however, because the bandwidth was over

1000 Hz, the energy cluster did not meet the operational definition of a Fs. Listeners also were

not able to identify a strong vocal ring in these samples. These two analyses, perceptual ratings

and categorical analysis, are consistent in suggesting that a peak in the region of 3 kHz that has a

bandwidth in excess of 1000 Hz should not be considered as a Fs.

For the Western classically trained singers, the categorical measurements were also

compared to the perceptual rating to determine the correspondence of the acoustic measures with

81
vocal ring. Application of the criteria to define Fs, categorically, yielded results that showed 7

out of 10 Western singers (W1, W2, W3, W4, W5, W7, and W9) had the Fs for the regular

singing phrase. There was no correspondence between the categorical analysis and the perceptual

rating for W2 and W4 when the regular singing phrase was sung (Table 10). Although the

categorical criteria of the Fs were met, listeners could not perceive a “strong vocal ring” in

singers W2 and W4.

For the phrase sung with the vowel /a/, the LTAS showed 8 singers (all except W6 and

W8) exhibited the Fs (Table 11). When the results were compared to the perceptual ratings, there

was a correspondence of categorical measurements and the perceptual rating for 7 of these

singers (W1, W2, W3, W5, W7, W9 and W10) but no correspondence of these measures for

singer W4.

82
Table 8: Results of the categorical analysis for the regular singing phrase sung by traditional

Chinese opera singers: Shaded boxes indicated that the spectrum of 4 singers (C8, C11, C14 and

C15) matched the acoustic criteria of the Fs of the categorical analysis. The perceptual rating

showed that 70% of listeners heard a “strong vocal ring” in these four singers. Samples that were

not shaded provided information, but no peaks exhibited.

C3 C4 C5 C6 C8 C9 C11 C14 C15 C16

Center frequency No No No No 3639 3531 3489 3251 3435 No


peak peak peak peak peak

Bandwidth (Hz) N/A N/A N/A N/A 753 1508 517 991 431 N/A

Perceptual with 70%


of strong vocal ring No No No No Yes No Yes Yes Yes No

83
Table 9: The results of the categorical analysis for the phrase with the vowel /a/ sung by

traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 5 singers (C5, C6,

C8, C14, and C15) matched the criteria of the Fs of the categorical analysis. Results of the

perceptual rating showed that 70% of listeners heard a “strong vocal ring” in singers C8 and

C15, but could not hear a “strong vocal ring” in singers C5, C6 and C14. Samples that were not

shaded provided information, but no peaks exhibited.

C3 C4 C5 C6 C8 C9 C11 C14 C15 C16

Center frequency No 3165 3165 2871 3445 3505 3359 3424 3596 No
peak peak

Bandwidth (Hz) N/A 1335 797 754 689 1027 1400 796 151 N/A

Perceptual with 70%


of strong vocal ring No No No No Yes No No No Yes No

84
Table10: Results of the categorical analysis for the regular singing phrase sung by Western

classically trained singers: Shaded boxes indicated that the spectrum of 7 singers (W1, W2, W3,

W4, W5, W7, and W9) matched the criteria of the Fs of the categorical analysis. Results of the

perceptual rating showed that 70% of listeners heard a “strong vocal ring” in singers W1, W3,

W5, W7, W9. Samples that were not shaded provided information, but no peaks exhibited.

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10

Center frequency 2907 2842 3317 3058 2412 3300 2778 2700 2498 No
peak

Bandwidth (Hz) 668 818 366 754 431 1080 236 1100 344 N/A

Perceptual with 70%


of strong vocal ring Yes No Yes No Yes No Yes No Yes No

85
Table 11: Results of categorical analysis for the phrase with the vowel /a/ sung by Western

classically trained singers: Shaded boxes indicated that the spectrum of 8 singers (all except W6

& W8) matched the criteria of the Fs. Results of the perceptual rating showed that 70% of

listeners heard a “strong vocal ring” in these singers except singers W4, W6 and W8. Samples

that were not shaded provided information, but no peak exhibited.

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10

Center frequency 2929 2713 3338 3036 2369 3445 2778 2885 2627 2821

Bandwidth (Hz) 344 193 259 732 280 1258 237 1020 345 883

Perceptual with 70%


of strong vocal ring Yes Yes Yes No Yes No Yes No Yes Yes

86
Figure 1: The LTAS of the phrase sung with vowel /a/ by a traditional Chinese opera singer, C5:

A clear peak or a cluster of peaks around a specific frequency region with a bandwidth less than

1000 Hz. However, more than 70% of listeners were “not sure” if they perceived the strong vocal

ring in this particular singer.

-10

-20

-30

-40
Amplitude (dB)

-50

-60

-70

-80

-90

-100

0 1 2 3 4 5 6 7 8 9 10

Frequency (kHz)

87
Figure 2: The LTAS of a phrase sung with vowel /a/ by traditional Chinese opera singer, C14: A

clear peak or a cluster of peaks around a specific frequency region with a bandwidth less than

1000 Hz. However, more than 70% of listeners were “not sure” if they perceived the strong vocal

ring in this particular singer.

-10

-20

-30

-40
Amplitude (dB)

-50

-60

-70

-80

-90

-100

0 1 2 3 4 5 6 7 8 9 10

Frequency (kHz)

88
Figure 3: The LTAS of the regular singing phrase sung by traditional Chinese opera singer, C9:

An increased cluster of energy around a specific frequency region with a bandwidth in excess of

1000 Hz; therefore, not consider as a peak.

-10

-20

-30

-40

-50
Amplitude (dB)

-60

-70

-80

-90

-100

0 1 2 3 4 5 6 7 8 9 10

Frequency (kHz)

89
Figure 4: The LTAS of the phrase sung with vowel /a/ by traditional Chinese opera singer, C4:

An increased cluster of energy around a specific frequency region with a bandwidth in excess of

1000 Hz; therefore, not consider as a peak.

-10

-20

-30

-40

-50
Amplitude (dB)

-60

-70

-80

-90

-100

0 1 2 3 4 5 6 7 8 9 10

Frequency (kHz)

90
A comparison of the categorical Fs analysis for the Chinese and the Western groups

revealed that the mean bandwidth of Fs was greater in Chinese singers (673 Hz) than in the

Western group (409 Hz) for the regular singing phrase. Moreover, results showed that the mean

center frequency of the Fs in the Chinese group was higher (3442 Hz) than the mean center

frequency in the Western group (2782 Hz). Similar results were found for the singing of the

phrase with the vowel /a/. The Chinese group exhibited greater bandwidth (420 Hz) and higher

center frequency (3520 Hz) of the Fs than the Western group (fc =2796 Hz, BW =363 Hz).

Categorical analyses of Fs for the phrase sung with the vowels /i/ and /u/ and the spoken

phrase were then conducted for both Chinese and Western groups. Recall that there was no

perceptual judgment on the phrase sung with the vowels /i/, /u/ or the spoken phrase for either

the Chinese or Western group. Results from the categorical analysis showed that the LTAS for 6

singers (C3, C4, C8, C9, C11 and C15) exhibited the Fs for the phrase sung with vowel /i/ (Table

12) and 8 singers (all but C3 and C8) had a Fs for the phrase sung with vowel /u/ (Table 13).

None of the speaking samples from the Chinese group showed the Fs, although the spoken

phrase from singers C3, C4, C5, C6, C9, C14, and C15 showed an increasing energy in the

higher frequency region; however, this energy did not meet the operational definition of Fs. For

the Western group, results of the categorical analysis of the LTAS showed that 5 singers (W3,

W4, W7, W9, and W10) matched the criteria of the Fs for the phrase sung with the vowel /i/

(Table 14) and 8 singers (except W1 and W8) exhibited the Fs for the phrase sung with the

vowel /u/ (Table 15).

91
Table 12: Results of the categorical analysis for the phrase with the vowel /i/ sung by the

traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 5 singers (C3, C4,

C8, C11 and C15) matched the criteria (1000Hz bandwidth) of the Fs. Samples that were not

shaded provided information, but no peaks exhibited.

Phrase sung C3 C4 C5 C6 C8 C9 C11 C14 C15 C16


with vowel /i/

Center 2670 2509 No No 2412 No 2670 3300 2799 No


frequency peak peak peak peak

Bandwidth (Hz) 700 431 N/A N/A 409 N/A 301 2032 193 N/A

92
Table 13: Results of the categorical analysis for the phrase with the vowel /u/ sung by the

traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 8 singers (all but

C3 and C8) matched the criteria (1000Hz bandwidth) of the Fs. Samples that were not shaded

provided information, but no peaks exhibited.

Phrase sung with C3 C4 C5 C6 C8 C9 C11 C14 C15 C16


vowel /u/

Center frequency No 2993 3284 3187 No 2950 3521 3618 3553 3069
(Hz) peak peak

Bandwidth (Hz) N/A 129 517 431 N/A 86 559 302 302 150

93
None of the spoken samples had a high frequency peak that matched the criteria of the Fs

in the Western group. Similar to the Chinese group, there was a strong energy distribution of

partials extending to the high frequency region for almost all speaking samples from the Western

singers. The energy in the Western singers’ speech exhibited a concentration between 2-4 kHz

whereas the Chinese singers had more diffuse energy in the higher frequencies (Fig. 5). This

energy concentration produced by the Western singers may be the “speaking formant” (Oliveira-

Barrichelo et al. 2001).

The formant bandwidths and center frequency from the phrases sung with the vowels /i/

and /u/ were compared across the Chinese and Western groups. Results showed that the Fs

bandwidth was smaller among the Chinese singers (Mean =310 Hz) than among the

Westernsingers (Mean =415 Hz) for the phrase sung with the vowel /u/ and there was no

difference between the Chinese (407 Hz) and Western singers (405 Hz) for the phrase sung with

the vowel /i/. Samples from the phrase sung with the vowel /i/ showed lower mean center

frequency (2612 Hz) for the Chinese group than for the Western group (2920 Hz), whereas

samples from the phrase sung with the vowel /u/ showed higher mean center frequency (3272

Hz) for the Chinese group than the Western group (2789 Hz).

Recall that one of the tasks required the singers to glide up and down the musical scale

with the vowel /a/. Data from this task were also analyzed by the LTAS, and the presence or

absence of the Fs was also determined categorically, as discussed before. For the Chinese singers

who were found to have the Fs (C8, C11, C14 and C15) both perceptually and categorically for

the regular singing phrase, the results for gliding the musical scales showed the Fs in all of them

except singer C11. The center frequency and bandwidths for singers C8, C14 and C15 were

94
comparable for the gliding musical scales and the regular singing phrase. Similar results were

obtained for the Western singers in which all of the singers who exhibited the Fs in the regular

singing phrase also showed the Fs when they glided through the musical scales.

95
Table 14: Results of the categorical analysis for the phrase with the vowel /i/ sung by Western

classically trained singers: Shaded boxes indicated that the spectrum of 5 singers (W3, W4, W7,

W9, and W10) matched the criteria of the Fs. Samples that were not shaded provided

information, but no peaks exhibited.

Phrase sung with W1 W2 W3 W4 W5 W6 W7 W8 W9 W10


vowel /i/

Center frequency No 2500 3295 3122 2390 No 2778 3381 2412 2993
peak peak

Bandwidth (Hz) N/A 1300 323 258 1500 N/A 345 1028 388 710

96
Table 15: Results of the categorical analysis for the phrase with the vowel /u/ sung by the

traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 8 singers (all but

W1 and W8) matched the criteria of the Fs. Samples that were not shaded provided information,

but no peaks exhibited.

Phrase sung with W1 W2 W3 W4 W5 W6 W7 W8 W9 W10


vowel /u/

Center frequency No 2692 3058 2756 2218 3402 2713 3122 2541 2929
peak

Bandwidth (Hz) N/A 194 689 582 258 194 194 1210 345 862

97
Figure 5: The LTAS of the speaking phrase for Chinese singer (C5) showed increasing energy

around the higher frequency region (Top panel), whereas the Western singer (W3) showed

“speaker’s formant” around the higher frequency region (bottom panel).


-10
-20
-30
-40
Amplitude (dB)

-50
-60
-70
-80
-90
-100

-10
-20
Speaker’s formant -30
-40
Amplitude (dB)

-50
-60
-70
-80
-90
-100

0 1 2 3 4 5 6 7 8 9

Frequency (kHz)

98
Discussion

Results from the categorical measurements of the LTAS matched the results from the

perceptual judgments and showed that the Fs was produced by both the traditional Chinese opera

singers and the Western classically trained singer in this study. However, there were some

exceptions. Some of the singers from both groups (C5, C6, C14, and W4, W2 and W4) exhibited

a peak around 3000 Hz with bandwidths of less than 1000 Hz, yet listeners did not perceive a

vocal ring (Table 9, 10 & 11). The bandwidths of the peaks for these singers were between 700

Hz and 800 Hz. Thus, the categorical results may have been more consistent with the perceptual

data if the definition of a peak corresponding to Fs was 700 Hz rather than the 1 kHz cutoff used

in the present study.

There were some samples (W6 & W8) in the present study that were noted to have more

than one peak in the high frequency region which Seidner et al. (1985), Rossing et al. (1987) and

Sundberg (2001) also found in their studies. These researchers noted that 2 peaks, rather than 1,

appeared in the high frequency region in some of their baritone, tenor, alto, and soprano singers.

Sundberg related these two peaks to the F3 and F4 rather than the Fs because there was no

cluster of formants. Results from the present perceptual judgments and categorical analyses

agree with previous studies in that a single high frequency peak is needed to define the Fs.

Many studies (Sundberg 1970; Bloothooft and Plomp, 1986; Sundberg, 2001; Cleveland et al.,

2001; Oliveira-Barrichelo et al. 2001) defined the Fs by comparing the energy level difference

between the spoken and sung phrases or vowels. Findings from these studies suggest that unlike

singing samples, no cluster of formant peaks appeared in the Fs region for spoken samples. In the

current study, none of the spoken samples from the Chinese or Western group met the

99
operational definition of the Fs. However, many of the spoken samples showed a strong energy

distribution of partials extending to the high frequency region. This high amplitude energy in

higher frequencies is not consistent with expectations, in that the energy is predicted to decrease

in the higher frequency region for normal speech (Fant, 1960). However, this high energy may

indicate a speaker’s formant that previous research suggests is found in some trained singers

(Oliveira-Barrichelo et al. 2001).

Some of the results from this experiment are inconsistent with previous studies (Seidner

et al., 1985; Schutte & Miller, 1985; Segupta, 1990). Previous studies showed that the bandwidth

for male singers singing in the Western classical style (base, baritone and tenor) ranged from

1000 Hz–2000 Hz whereas the current study showed that when the bandwidth of Fs exceeded

700 Hz, listeners did not perceive the ring for our singers. The inconsistency in these studies may

be due to differences in the definition of bandwidth. Previous studies defined the bandwidth of

Fs by the frequencies with intensities that were –15 dB from the peak amplitude, whereas the

current study used a –3dB criterion. Also, previous studies measured the Fs by using short-term

spectra with single vowels whereas the current study used the LTAS with the entire musical

phrase. A further inconsistency between the current data and results from previous studies was

found in the bandwidths for tenors and baritones. Seidner et al.’s (1985) study found that the

tenor had a broader bandwidth of Fs than baritone and bass singers; however, only one singer

was investigated in each vocal category. Our data suggest that there is great variability across

singers in the bandwidth of the Fs (Tables 10 & 11). In the current study, two different vocal

categories in the Western classical singing, tenor and baritone (5 for each), were investigated.

Results showed that voice classification did not have a consistent impact on Fs. For example, the

100
bandwidths for tenor singers who matched the criteria of the Fs categorically ranged from 236

Hz-668 Hz and 431-818 Hz for baritone in the regular singing phrase. As for the phrase sung

with the vowel /a/, the bandwidths for tenors who matched the categorical criteria of the Fs

ranged from 237-345 Hz whereas baritone showed bandwidths between 193-883 Hz. This

suggests that the range of the bandwidth varies among the Western singers regardless of their

vocal classification. Moreover, the median bandwidths of Fs for /i/ and /u/ for were 345 and 270

Hz, respectively for the tenors, whereas the median bandwidths of these two vowels for baritones

were 484 and 420 Hz, respectively. This seems to contradict previous results from Seidner et al.

(1985) in that he found greater Fs bandwidths for tenors than for baritones. The investigation of

only one singer from each voice classification may have led Seidner et al. to an erroneous

relationship between voice-type and Fs bandwidth. Clearly future studies of Fs should include

multiple singers.

Because of the time consuming nature of the perceptual task, only the regular singing

phrase and phrase sung with the vowel /a/ were used to investigate listeners’ perception. The

vowel /a/ was chosen instead of other two vowels, /i/ and /u/, because it was the vowel that was

most commonly investigated in the previous literature. The vowels /i/ and /u/ were categorically

analyzed in this experiment and the results showed that more singers exhibited the Fs for the

phrase sung with the vowel /u/ than for /a/ and /i/ for both Chinese and the Western groups.

Because the vowel /u/ was not investigated perceptually, it is not known if listeners’ judgments

would have been consistent with the categorical measures such that more listeners would hear a

ring when the vowel /u/ was sung.

101
Previous researchers used different materials and methods to investigate the Fs. For

example, some researchers used sustained sung and spoken vowels whereas others used singing

phrases with complex texts. Some researchers used short-term spectral analysis and others

applied the LTAS analysis to the singing material. However, none of the previous studies

investigated the Fs by investigating a variety of materials or methods. Although it was not clearly

stated, the assumption from previous studies seemed to be that the different materials or methods

would not affect the Fs. In other words, if Fs was exhibited by a singer, it always exists no matter

what materials and methods are used to evaluate it. In our study, we varied the material to

include the regular singing phrase and phrases sung with the vowels /a/, /i/ and /u/. Our

investigation of Fs indicated differences in the presence of Fs depending on the material that was

used. Therefore, it appears that the Fs must be investigated in a variety of contexts to better

understand when it is present.

During the recordings, the Chinese singers expressed difficulty with the task of gliding

the musical scale as they were instructed. This may be due to the fact that this singing scale is

based on Western classical training whereas the Chinese singers were not trained to perform this

type of task. All of the Chinese singers (except sing C11) did not follow the instructions but sang

in the Chinese style in which they used scale steps rather than a glide (D-#C-E-D#-F-E-G-#F…

etc.). Even though singer C11 expressed difficulty with gliding the musical scale, he still tried to

follow the instructions. It is interesting to note that the Fs was exhibited in the Chinese singers

who did not follow the instructions but maintained their own singing style. On the other hand,

singer C11 who followed the instructions showed no Fs. This suggests that task familiarity and

training culture, as well as issues about the singing material, may impact the Fs. Sundberg (2002)

102
investigated one Chinese opera singer by asking him to sing a musical scale with the vowel /a/.

He revealed that no Fs was found in this subject. This leads us question whether Sundberg’s

subject was instructed to sing a Western musical scale that was unfamiliar to him. If this was the

task, then the absence of the Fs for this singer might simply be caused by unfamiliarity with the

musical style. Unfortunately, the methodology was not fully provided in Sundberg’s report.

Experiment 3: Quantitative analysis of Fs

Acoustic analysis of fundamental frequency (F0)

The highest and lowest fundamental frequencies (F0) from each singing phrase were

measured and investigated in relation to the perceptual judgments. A frequency significant

difference between F0s was operationally defined by a minimum of 1 semitone (Shower and

Biddulph, 1931).

Analysis of the highest and lowest F0 from the regular singing phrase for the traditional

Chinese opera singers showed that singing samples which were judged as having a strong vocal

ring exhibited a higher F0 than the samples that were judged as not sure. This was true for

comparisons of either the highest or the lowest F0 produced during the singing phrase. The F0

across the samples that were perceived to have a vocal ring showed the highest F0 to be 495.18

Hz (41.6 SD) on average, and the lowest F0 to have a mean of 289.85 Hz (55.5 SD) (Fig.6a).

The F0 for the samples that were judged as “not sure” showed that, on average, the highest F0

was 421.83 Hz (23 SD) and the lowest F0 was 234.03 Hz (54.5 SD). There was one exception to

this pattern: C16 had a high F0 for both his highest and lowest pitches (highest =510.3 Hz and

lowest =252.5 Hz). However, results from the listeners’ ratings were ambiguous; 46% of

103
listeners heard the vocal ring and 54% of listeners indicated that they were “not sure” about the

presence of a vocal ring.

The mean F0 from each regular singing phrase was also measured and investigated in

relation to the perceptual judgments. Results from the regular singing phrase for the Chinese

group also showed that singing samples which were judged as having a strong vocal ring

exhibited a higher mean F0 than the samples that were judged as not sure. Across the samples,

those that were perceived to have a strong vocal ring showed a mean F0 of 360 Hz (51.8 SD),

whereas the mean was 306 Hz (43SD) in samples that were rated as not sure. Similar results

were found for the highest F0 when the musical phrase was sung with the vowel /a/ by the

Chinese singers. Samples which had a strong vocal ring exhibited a mean high F0 of 524 Hz

(39.2 SD) compared to a mean high F0 of 463 Hz (68.2 SD) for samples that did not clearly have

a ring throughout the phrase (Fig. 6b). By contrast, the average lowest F0 during the /a/ singing

was slightly lower, by 1 semitone, when listeners were sure about the ring (270 Hz, SD= 86.2)

compared to when they were unsure about the vocal ring’s presence (286 Hz, SD = 44). Results

of the mean F0 measured from each singing phrase sung with the vowel /a/ showed that samples

which were judged as having a strong vocal ring exhibited the same mean (less than one

semitone difference) F0 (346 Hz, SD=51.2) as the samples that were judged as “not sure” (354

Hz, SD=41).

104
Figure 6a: Scatterplot for the highest and the lowest F0 measured from regular singing phrase

from the Chinese singers. Filled diamond indicates the highest F0 and opened square indicates

the lowest F0. 1 on the X axis indicates the samples that were perceived as having a “strong

vocal ring” and 2 indicates the samples that were rated as “not sure."

Chinese Highest & Lowest F0 (Sing-regular)

600 Highest F0
Lowest F0
500
466X2
440X2
400

300 311X2
F0

200

100

0
0 1 2
Perception

105
Figure 6b: Scatterplot for the highest and the lowest F0 measured from phrase sung with the

vowel /a/ for the Chinese singers. Filled diamond indicates the highest F0 and opened square

indicates the lowest F0. 1 on the X axis indicates the samples that were perceived as having a

“strong vocal ring” and 2 indicates the samples that were rated as “not sure."

Chinese Highest & Lowest F0 (Sing- /a/)

600 Highest F0
Lowest F0
500

400

300 311X
F0

200

100

0
0 1 2
Perception

106
For the Western classically trained singers, the highest and lowest F0 from each singing

phrase also were measured and investigated in relation to the perceptual judgments. In contrast to

the results from the Chinese group, results from the regular singing phrase showed that samples

that were judged as having a strong vocal ring exhibited a lower F0 range than the samples that

were rated as not sure. Samples that were rated as having a strong vocal ring showed a F0 that

ranged between 166 Hz and 387 Hz (Fig 7a). The F0 range for tokens that were judged as “not

sure” was between 175 and 443 Hz. The mean F0 measured from each regular singing phrase

also was investigated relative to the perceptual judgments for the Western singers. Across the

samples, the mean was within one semitone for singers with (267 Hz, SD= 59) and without (277

Hz, SD= 74) a vocal ring.

For the phrase sung with the vowel /a/ by the Western singers, the average high and low

F0 (highest = 370 Hz and lowest = 161.8 Hz) were equivalent for tokens perceived to have a

strong vocal ring as for tokens with ratings of “not sure” (highest = 368 Hz and lowest = 153

Hz). Samples that were perceived as having a strong vocal ring showed the same mean F0, less

than one semitone difference, as samples that were rated as “not sure” for the phrase sung with

the vowel /a/. The average across the samples that were perceived to have a strong vocal ring

showed a mean F0 of 254 Hz (58.2 SD); mean F0 was 247 Hz (55.7SD) in samples that were

rated as “not sure” (Fig. 7b). The Western classically trained singers had highest and lowest F0

ranges that were at least 100 Hz lower than the Chinese singers. This is true for singers that were

perceived as having the strong vocal ring for both the regular singing phrase and the phrase sung

with the vowel /a/. Results for the mean F0 measured from both singing phrases showed an

107
average of about a 80 Hz (3 semitones) higher mean F0 in the Chinese group than the Western

group for all singers, regardless of listeners’ perception of vocal ring.

The distribution of the F0 range (lowest and highest) across singers (Figures 6a & b)

shows great variability for both categories (strong vocal ring and not sure) in the Western singers

in contrast to the Chinese group that showed a clear distinction of the F0 range produced for

strong vocal ring and not sure. For the regular singing phrase, most singers from the Chinese

group who were perceived to have a vocal ring produced higher F0 than singers who did not

have the strong ring. This is true for both the lowest and highest F0 (Fig.6a). In general, similar

results were found in the phrase sung with the vowel /a/ (Fig 6b). Although the results from the

regular singing phrase for the Western groups showed that the mean value for the lowest and the

highest F0 was lower in samples that were perceived as having a strong vocal ring than in

samples that were rated as not sure, the distribution of F0 overlapped for these two perceptual

categories (Fig. 7a & 7b). The results from the analysis of F0, discussed above, indicate that the

F0 range (highest and lowest levels) was associated the Fs for regular singing phrase and phrase

sung with the vowel /a/ for the Chinese group; however, this relationship was not seen for the

Western group. The distribution of F0 from all Chinese singers showed that the Fs was

evidenced when there was a higher F0 range. For the Western group, the overall distribution

suggests that F0 and Fs were related for only some singers.

108
Figure 7a: Scatterplot for the highest and the lowest F0 measured from regular singing phrase

for the Western signers. Filled diamond indicates the highest F0 and opened square indicates the

lowest F0. 1 on the X- axis indicates the samples that were perceived as having a “strong vocal

ring” and 2 indicates the samples that were rated as “not sure."

Western Highest & Lowest F0 (Sing-regular)

600 Highest F0
Lowest F0

500

400
392X2

300
329X2
F0

200

100

0
0 1 2
Perception

109
Figure 7b: Scatterplot for the highest and the lowest F0 measured from phrase sung with the

vowel /a/ for the Western singers. Filled diamond indicates the highest F0 and opened square

indicates the lowest F0. 1 on the X axis indicates the samples that were perceived as having a

“strong vocal ring” and 2 indicates the samples that were rated as “not sure."

Western Highest & Lowest F0 (Sing- /a/)

600 Highest F0
Lowest F0
500

400 392X3
329X2
300
F0

200
185X2
164X2
100

0
0 1 2
Perception

110
Intensity measured from sound level meter

The differential threshold for SPL was operationally defined by a minimum of 1 dB

(Reisz, 1928). Both the highest and lowest SPLs measured from each singing phrase were

investigated in relation to the perceptual ratings. The average power of the highest and lowest

levels were calculated and then converted into decibels. Samples from the regular singing

phrases which were perceived as having a strong vocal ring showed higher mean intensity for

both highest and lowest levels (highest = 99.89 dB SPL and lowest = 96.87 dB SPL) than

samples that received a rating of “not sure” (highest = 96.05 dB SPL and lowest= 93.38 dB SPL)

for the traditional Chinese opera singers (Table 16). Samples from phrases sung with the vowel

/a/ which were perceived as having a strong vocal ring showed a lower mean intensity range

(97.16 dB–99.16 dB SPL) than the samples that were rated as “not sure” (104.12 dB-110.04 dB

SPL) (Table 16). This result could have been affected by the sample sung by C16, who showed

the highest intensity ranges (110 dB-116 dB); however, listeners could not identify a strong

vocal ring. Listeners reported that this singer’s voice was very loud, but mostly it sounded like

“shouting” instead of the vocal ring. When this sample was removed, the average intensity range

for the “not sure” group (highest level of 93.42 dB SPL and 90.29 dB SPL) was lower than for

the vocal ring group (highest level of 99.16 dB SPL and lowest level of 97.16 dB SPL).

111
Table 16: Results of the highest and lowest intensity measured across singers from both the

regular singing phrase and the phrase sung with the vowel /a/ in the Chinese group. All results

based on singers who were perceived to produce a “strong vocal ring” and “not sure” with 70 or

greater agreement across listeners. The top panel is for the regular singing phrase and the bottom

panel is for the phrase sung with the vowel /a/.

Regular singing phrase

Strong vocal ring Not sure

C8 C11 C14 C15 C3 C4 C5 C6

Highest SPL(dB) 102 102 98 90 90 98 94 98

Lowest SPL(dB) 100 98 94 88 88 96 92 94

Phrase sung with the vowel /a/

Strong vocal ring Not sure

C8 C15 C3 C6 C14 C16

Highest SPL(dB) 102 88 92 94 94 116

Lowest SPL(dB) 100 86 88 90 92 110

112
Table 17: Results of the highest and lowest intensity measured across singers from both the

regular singing phrase and the phrase sung with the vowel /a/ in the Western group. All results

based on singers who were perceived to produce a “strong vocal ring” and “not sure” with 70 or

greater agreement across listeners. Top panel is for the regular singing phrase and the bottom

panel is for the phrase sung with the vowel /a/.

Regular singing phrase

Strong vocal ring Not sure

W1 W2 W3 W5 W7 W9 W10 W4 W6 W8

Highest SPL(dB) 100 100 96 96 100 101 92 88 98 88

Lowest SPL(dB) 94 98 90 94 94 99 86 86 96 86

Phrase sung with the vowel /a/

Strong vocal ring Not sure


W1 W3 W5 W7 W9 W2 W6
Highest SPL(dB) 96 96 98 100 101 98 100
Lowest SPL(dB) 94 90 94 96 98 96 98

113
For the Western singers, samples from the regular singing phrases which had a strong

vocal ring showed almost the same mean SPL range (highest = 98.72 dB SPL and lowest = 95.2

dB SPL) as samples which received ratings of “not sure” (highest = 94.01dB SPL and lowest =

92.01 dB SPL) (Table 17). The phrases sung with the vowel /a/ with a strong vocal ring showed

a higher mean intensity for both the highest and lowest levels (highest = 98.67 dB SPL – lowest

= 95.13 dB SPL) than the samples that were rated as “not sure” (highest = 99.11 to lowest =

97.12 dB SPL) (Table 17).

Comparison across groups indicated that Chinese and Western singers used similar

intensity ranges when the vocal ring was heard in the regular singing passages. There were some

intensity differences between groups when listeners were “not sure” if a vocal ring was heard

during the passage; in general, the Western opera singers used lower intensities (both highest and

lowest) than the Chinese singers. These results were not maintained for the passage sung with the

vowel /a/. In these samples, the Western singers used higher intensities (both highest and lowest)

than Chinese singers, whether or not the vocal ring was perceived.

Relative energy difference of LTAS

The quantitative analysis was first carried out by calculating the difference in energy

(measured in dB) between high (2000-4000 Hz) and low (0-2000 Hz) frequency regions. Each

sung and spoken phrase was first filtered by a low-pass filter (fc= 2000 Hz) and then the original

waveform was band pass filtered at 2000-4000 Hz. The energy values (RMS in dB) in both

frequency regions were calculated and the difference between the two intensity values was then

calculated. Statistical analyses were performed to investigate differences between the traditional

Chinese opera singers and the Western classically trained singers as well as differences between

114
the sung phrases and spoken phrases within and between the two groups. Furthermore, the

correlation between the perceptual ratings and the quantitative analysis was investigated. Finally,

results of the quantitative analysis were compared with the results from the categorical analysis

(Experiment. 2) in order to evaluate if the two measures provided the same information about the

presence or absence of the Fs.

Results from quantitative measurements (calculation of relative energy between high and

low frequency bands) were first analyzed statistically by a two- way ANOVA (SPSS 11.5) to

investigate two main factors: material (2 levels- relative intensity difference between spoken and

sung phrases) and singing style (2 levels- relative intensity difference between Western and

Chinese groups), and their interaction effect. Data from all subjects were included in the statistic

analyses. The results showed that there was a significant difference in relative energy between

the Chinese and Western groups (F (1,36) = 13.572, p<0.05) and there was a significant difference

between the sung and spoken phrases (F (1,36) = 5.609, p<0.05). Results also showed that there

was no interaction between these two main factors.

The impact of the materials on the relative intensity was investigated within each group,

Chinese and Western. A one-way ANOVA was performed with five levels to compare the

spoken phrase, regular sung phrase, and musical phrase sung with /a/, /i/ and /u/ produced by the

Chinese singers. Results showed that there was a significant difference between materials (F (4, 45)

= 3.368, p<0.05). A Tukey HSD was calculated to determine the pair-wise differences within the

Chinese singers. The results showed that there was a significant difference in relative energy

between singing phrases sung with the vowels /a/ and /i/ (p<0.05). There was a significant

difference between the regular singing phrase and the phrase sung with the vowel /i/ (p<0.05);

115
however, no other significant effects of material were found (/a/ vs /u/; /i/ vs /u; regular singing

vs /a/ or /u/). Interestingly, results from the Tukey HSD showed that there was no significant

difference in relative energy between the spoken phrase and any of the sung phrases (p>0.05 for

all pair-wise comparisons) produced by the Chinese singers.

Another one-way ANOVA was performed to investigate whether different materials

impacted on the relative energy within the Western group. As with the Chinese singers, the five

levels compared were the spoken phrase, the regular sung phrase, and the musical phrase sung by

/a/, /i/ and /u/. Results showed that there was a significant difference between materials (F (4, 45) =

18.582, p<0.05). Results from the Tukey HSD showed that there was a significant difference in

relative energy between singing phrases sung with the vowels /a/ and /i/ (p<0.05) and a

significant difference between the sung vowels /i/ and /u/ (p< 0.05). Also, there was a significant

difference between the regular singing phrase and the phrase sung with the vowel /i/ (p<0.05).

There was no significant difference in relative energy between singing phrases sung with /a/ or

/u/ (p=0.275) or between the regular singing phrase and the phrase sung with the vowels /a/

(p=0.931) or /u/ (p=0.228). Similar to the results from the Chinese group, results from the

Western group also indicated that different vowels impact on spectra within the Western group.

Results also were similar to the Chinese group in that there also was no significant difference

between the regular singing phrase and the spoken phrase (p=0.240) produced by the Western

singers. Unlike the Chinese group, results for the Western group showed that there were

significant differences between the spoken phrase and the phrases sung with the vowels /a/, /i/,

and /u/ (p<0.05 for all pair-wise comparisons).

116
Independent sample t-tests were performed to compare the relative energy difference

between the Western and Chinese singers for the phrase sung with the vowels /a/, /i/ and /u/.

Results showed that there was a significant difference in relative energy between Western and

Chinese singing for the /a/ phrase (F (1, 18) = 1.548, p< 0.01) and for the /u/ phrase (F (1, 18) =

4.170, p< 0.01). There was no significant difference in relative energy between the Western and

Chinese singers for the /i/ phrase (F (1, 18) = 0.496, p=0.19).

Correlations between the perceptual ratings (i.e. the cumulative ratings across listeners)

and the intensity measurements for the regular singing phrase and the musical phrase sung with

vowel /a/ were investigated for each singing group separately. Results showed that there was no

significant correlation between the perceptual ratings and the quantitative measurements for

either the regular singing phrase (r =0.03, p=0.934 for Chinese; r =0.321, p=0.365 for Western)

or for the phrase sung with the vowel /a/ (r = 0.127, p=0.726 for Chinese; (r=0.127, p=0.726 for

Western) for either the Chinese or Western trained singers.

Results from the quantitative analysis of the relative intensity differences between the

high and low frequency regions also were compared to the results from the categorical analyses

of the LTAS in order to determine if the two measures yielded similar information about the

presence or absence of the Fs. Results showed that there was no relationship between categorical

analysis and quantitative analysis. For example, results from both the perceptual and categorical

analyses showed that singers C8 and C11 exhibited the Fs and singers C3, C4, C5 and C6 had no

Fs in the regular singing phrase. Therefore, we expected that the differences in relative energy

between the high and low frequency regions for C8 and C11 would be smaller than for C3, C4

C5 and C6. However, the results (Table 18) showed greater differences in singers C8 (-13.2dB)

117
and C11 (–12.8dB) than in C3 (-7.1dB), C4 (-6.2), C5 (-5dB) and C6 (-7.4dB). Similar results

also were found when the phrase was sung as the vowel /a/ (Table18).

The results for the Western classically trained singers also showed no relationship

between values from the quantitative analysis (relative intensity) of the LTAS and categorical

analysis. For example, the quantitative results showed a –3.1 dB relative energy difference

between high and low frequency regions in singer W6 who exhibited no Fs either categorically

or perceptually. By comparison, singer W9, also had about a -3 dB difference between high and

low frequency regions, but he did have the Fs categorically or perceptually for the regular

singing phrase (Table 19). Similar results were found for the phrase sung with the vowel /a/ in

which W1 and W6 showed the same energy difference (0.2 dB) between the two frequency

bands, however, W1 had the Fs categorically and perceptually but W6 did not have the Fs in

either analysis (Table 19).

L3-L1 of the LTAS analysis

Previous investigators (Schutte and Miller, 1985; Bloothooft and Plomp, 1986; Sengupta,

1990; Sundberg 2001) defined the Fs by measuring the difference between L3-L1 of the short-

term spectrum. The purpose of this study was to investigate the level difference between singers

who were rated as having the strong vocal ring and singers who were note rated as having the

strong vocal ring. Another purpose of this experiment was to compare the L3-L1 of the LTAS

with the findings from the previous studies to determine the relation between these two cues. The

criteria for determining the Fs and the F1 of the LTAS were based on Rossing et al.’s (1986)

suggestions which the level of the Fs corresponded to formant the frequency level in the 2-4 kHz

frequency region and the level of the first formant was identified by a frequency around 500 Hz.

118
Table 18: Results from the quantitative measurements of the LTAS-by calculating the relative

intensity differences between high and low frequency regions: with categorical measurements

based on 70% of the listeners who perceived a “strong vocal ring” and “70% of listeners were

“not sure” for the traditional Chinese opera singers

The regular singing phrase


C3 C4 C5 C6 C8 C11 C14 C15

Relative energy in dB -7.1 -6.2 -5 -7.4 -13.2 -12.8 -2.8 -5.9


(difference b/t high-low)
Perceptual rating (70%of No No Yes Yes Yes Yes
Yes/not sure) No No
Categorical measurement No No No No Yes Yes Yes Yes

Singing phrase with vowel /a/


C3 C6 C8 C14 C15 C16

Relative energy in dB -10.1 -11.6 -11.4 -2.5 -6.1 -13.3


(difference b/t high-low)
Perceptual rating (70% of No No Yes No Yes No
Yes/not sure)
Categorical measurement No No Yes No Yes No

Spoken phrase
C3 C4 C5 C6 C8 C9 C11 C14 C15 C16

Relative energy in dB -0.8 -3.5 -2.2 -0.6 -14.8 -3 -17.4 3.3 2.2 3.1
(difference b/t high-low)
Categorical No No No No No No No No No No
measurement

119
Table 19: Results from quantitative measurements-by calculating the relative intensity

differences between high and low frequency regions: with categorical measurements based on

70% of listeners perceived as having a “strong vocal ring” and “70% of listeners perceived as

“not sure” for the Western classically trained singers.

The regular singing phrase


W1 W2 W3 W5 W6 W7 W9

Relative energy in dB -0.8 0.7 -1.4 1.3 -3.1 -0.5 -3.4


(difference b/t high-low)
Perceptual rating (70% of Yes No Yes Yes No Yes Yes
Yes/not sure)
Categorical measurement Yes No Yes Yes No Yes Yes

Singing phrase with vowel /a/


W1 W2 W3 W4 W5 W6 W7 W8 W9 W10

Relative energy in dB 0.2 0.9 0.3 -3.2 1.1 0.2 -1.1 -7.2 -4.1 -0.1
(difference b/t high-low)
Perceptual rating Yes Yes Yes No Yes No Yes No Yes Yes
(70% of Yes/not sure)
Categorical Yes Yes Yes No Yes No Yes No Yes Yes
measurement

Spoken phrase
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10

Relative energy in dB 5.5 4.4 -0.8 -0.7 0.4 -4.2 3.3 0.2 -1 -1
(difference b/t high-low)
Categorical measurement No No No No No No No No No No

120
However, it was difficult to define F1 in this study because the singing samples included high F0

ranges that were quite close to the first formant. Therefore, the researcher was not able to

differentiate the F1 from F0. Results from L3-L1 of LTAS analysis are not reported because

either the first formant peak could not be differentiated from the F0 or its harmonics (Fig.8).

L3-L1 of short-term spectra analysis

The short-term spectral analysis was used to measure the difference between the peak

level around 3000 Hz and the level of the first formant (L3-L1). Negative values indicated that

the L3 was lower in amplitude than L1 and positive value indicated that the L3 was higher in

amplitude than L1. Stimuli included sustained vowels that were sung and spoken, as well as

these vowels edited from the regular singing phrase. Samples that met the perceptual criteria of

strong vocal ring and “not sure” only were included in this analysis. For the Chinese singers, the

sustained sung vowels /a/, /i/ and /u/ which had the Fs categorically showed relatively smaller

difference between L3-L1 than in the spoken vowels (Table 20). Also, results from samples that

exhibited the Fs showed that there was greater energy exhibited in the L3 area for the vowels /a/,

/i/ and /u/ edited from the regular singing phrase than in the sustained sung vowels (Table 20).

For samples from the Chinese group that had no Fs categorically, the sustained sung vowels /a/

and /u/ also had smaller negative energy difference between L3-L1 than the sustained spoken

vowels. There was no difference in L3-L1 between sung and spoken samples when the Chinese

singers produced a sustained /i/ that did not have the Fs, categorically. The sustained sung

vowels /a/ and /u/ showed a smaller negative energy difference between L3-L1

121
Figure 8. Top panel shows unexcited first formant of the LTAS from singer W5 for the phrase

sung with vowel /a/, and bottom panel showed inseparable harmonics and first formant of the

LTAS from singer W7 for the phrase sung with the vowel /a/.
-10
-20
-30
-40
-50
Amplitude (dB)

-60
-70
-80
-90
-100

-10
-20
-30
-40
-50
Amplitude (dB)

-60
-70
-80
-90
-100

0 1 2 3 4 5 6 7 8 9

Frequency (Hz)

122
than the same vowels edited from the regular singing phrase, whereas the sustained vowel /i/ had

less positive energy difference between L3-L1 than in the vowel /i/ selected from the regular

singing phrase (Table 20).

For the Western samples that had the Fs categorically, the mean values of L3-L1 from the

short-term spectral analysis of sustained spoken vowels /a/, /i/, and /u/ showed less energy in the

L3 region than the same sustained sung vowels (Table 21). Moreover, the results showed that the

sustained sung vowels /a/, /i/ and /u/ had higher energy in L3 region than that of the same three

vowels edited from the regular singing phrase (Table 21). For samples from the Western group

that had no Fs categorically, the mean values of L3-L1 of the vowels /a/, /i/ and /u/ for sustained

spoken vowels showed less energy in the L3 region than the same sustained sung vowels. Results

also showed that sustained sung vowels /a/, /i/, and /u/ had higher energy around L3 region than

the same vowels edited from the regular singing phrase (Table 21).

In summary, L3- L1 from the short-term spectra showed a much lower energy level in all

the speaking samples than the singing samples from both Chinese and Western groups. There

was only one exception, when listeners were “not sure” about the vocal ring; the sustained

spoken vowel /i/ had almost the same L3-L1 as the sustained sung vowel /i/ when produced by

the Chinese singers. An overall comparison between the Chinese and the Western singers

indicated less of a negative L3-L1 difference for the Western than for the Chinese samples.

There were two exceptions from singers that were rated as not sure; the sustained spoken vowel

/i/, and the vowel /a/ selected from the regular singing phrase showed higher energy level in L3

region in the Chinese singers than in the Western singers (Table 20 & 21).

123
Table 20: Top Panel: The mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from

Chinese group (C8, C11, C14, and C15) that were judged as having the Fs.

/a/ /i/ /u/

Sustained spoken vowel (dB) -15.4 -12 -34.9

Sustained sung vowel (dB) -6.2 -0.3 -12.1

Mean values of vowels selected from


regular singing phrase (dB) -2.4 5.8 -7.6

Bottom: The mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from the Chinese

group (C3, C4, C5, and C6) that were judged as “not sure” of the Fs.

/a/ /i/ /u/

Sustained spoken vowel (dB) -11.1 1.3 -25

Sustained sung vowel (dB) -2 1.2 -9.2

Mean values of vowels selected from


regular singing phrase (dB) -2.3 2.4 -14.4

124
Table 21: Top Panel: Mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from the

Western group (W1, W3, W5, W7, and W9) that were judged as having the Fs.

/a/ /i/ /u/

Sustained spoken vowel (dB) -10.3 -3.1 -12.2

Sustained sung vowel (dB) 1.5 12 4.7

Mean values of vowels selected from


regular singing phrase (dB) 0.3 10.6 -4.1

Bottom Panel: Mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from the

Western group (W2, W4, W6, W8, and W10) that were judged as “not sure” of the Fs.

/a/ /i/ /u/

Sustained spoken vowel (dB) -9 0.1 -12.2

Sustained singing vowel (dB) -1.2 5.8 4.7

Mean values of vowels selected from


regular singing phrase (dB) -4.5 3.7 -1.2

125
Discussion

One of the purposes of this study was to define the Fs quantitatively. Listeners noted on

the comment sheets that they identified the presence and absence of the vocal ring by focusing

on several factors such as the F0, intensity and vowel quality. Results from the analysis of F0

showed that the Chinese singers had a higher F0 range (highest and lowest), whether they had

the Fs or not, than the Western singers for both the regular singing phrase and the phrase sung

with the vowel /a/ (Figure 6 & 7). Western singers who were rated as having a strong vocal ring

had values of F0 up to 392 Hz, whereas samples that were judged as “not sure” had F0 above this

point. This is consistent with the findings of Bloothooft and Plomp (1985) that showed that the

level of the Fs for male singers increased when the F0 increased up to 392Hz but the Fs

decreased when F0 increased beyond this point. However, Bloothooft and Plomp only used

Western opera singers in their investigation. In contrast to the Western singers, Chinese singers

in the current study who were heard to have the strong vocal ring had a F0 that exceeded 392 Hz.

This suggests that there may be different acoustic cues that can signal Fs in the two different

singing styles.

Several studies (Sundberg, 1973; Cleveland and Sundberg, 1985) have shown that one of

the important factors that impacts the Fs is vocal intensity. The results from the current study

agree with these studies and found that the SPL affected the percept of vocal ring for both

traditional Chinese opera singers and Western classically trained signers. Results showed higher

SPL ranges in all of the samples that were rated as having the Fs, perceptually and categorically,

than samples that were rated as “not sure” for both groups (Table 16 & 17). Moreover, it was

found that Chinese singers that had a Fs showed a higher SPL range in the regular singing phrase

126
than in the phrase sung with the vowel /a/, whereas Western singers who had the Fs had the same

SPL in the regular singing phrase and the phrase sung with the vowel /a/. The difference in SPL

between the regular singing phrase and phrase sung with the vowel /a/ may relate to singers’

comfort with these different tasks. The purpose of requiring the singers (Chinese and Western) to

perform the same tasks was to be able to compare the different singing techniques by controlling

the material across all singers. However, we found that the Chinese singers were not trained to

substitute the whole singing text with one particular vowel; therefore, they were not able to

project their full voice with such a task. We conclude that the familiarity of the singing material

and training style also impact the SPL of the singing voice.

Another way to investigate the Fs quantitatively was to determine the relative energy

difference between the high and low energy difference for sung and spoken phrases. The results

from the statistical analysis showed that there was a significant difference in relative energy

between the Chinese and Western singers which suggests that the different singing techniques

impact the spectral energy distribution. Also, there was a significant main effect of material such

that a relative energy difference was found when speaking and singing samples. This indicates

that the different materials used affected the spectral energy. Sundberg (1970) compared sung

and spoken vowels and found that the spectrum levels around 3 kHz were different between

spoken samples and sung vowels because of different articulatory configurations in singing and

speaking; however, results from our post hoc analysis showed that there was no significant

difference between the sung and spoken samples within each group. The results suggested that

singers in either Chinese or Western groups used similar vocal tract configurations between their

spoken and the sung samples.

127
We might also suspect that the samples included in the statistics analysis affected the

results: That is, all data were included in the statistic analysis for both Chinese and Western

groups regardless of the different ratings (strong vocal ring, not sure, no vocal ring). This was

done because the vocal ring was sometimes heard in the samples that were rated as “not sure.”

Therefore, the lack of significant difference in relative intensity between spoken and sung

phrases may relate to pooling across these subjects with strong and weak vocal rings.

Furthermore, the results of the statistical analysis showed a significant difference between the

Western and the Chinese groups for the regular singing phrase. Data from Table 18 and 19 show

that the singers in both groups had greater intensity in the low frequency regions than in the high

frequency regions (as indicated by negative values); however, the data show that Chinese

subjects had greater differences in relative energy (high-low) than the Western singers for both

the regular singing phrase and the phrase sung with the vowel /a/. Although the relative energy

was different between the Chinese and Western singers, singers in both groups still demonstrated

the Fs. There are a number of possible explanations for this. First, Chinese singers produced a

higher F0 range than the Western singers. One reason may relate to language differences; that is,

there may be a difference in relative energy distributions in Chinese and Western languages.

Another more likely possibility is that the Chinese singers have a higher laryngeal

position during singing than during speaking (Wang, 1985). This laryngeal elevation may

shorten the vocal tract and yield higher frequency energy. Laryngeal elevation, as may be seen in

Chinese singers, is not used in Western opera singers (Sundberg, 1974). In fact, Sundberg

showed that Western opera singers lower the larynx to produce the Fs. This lowered larynx

position seen in Western singers may be incompatible with the higher F0 used by Chinese

128
singers. Also, language requirements may constrain the vocal tract configuration such that

laryngeal depression cannot be accomplished during singing in Chinese. Although singers in

different language cultures may use different articulatory manipulations, our perceptual and

categorical results indicate that Fs is produced by both Western and Chinese opera singers.

Notably results are contrary to those from Bartholomew (1934) and Sundberg (2001) who

maintain that the Fs is only exhibited in the Western classically trained singing style.

Another purpose of this study was to quantitatively investigate whether different vowels

affected the singing when the musical phrase was controlled. The statistical analysis from this

experiment showed that the different vowel qualities impacted the spectra within and between

groups. Specifically the vowel /i/ showed significant spectral differences from other vowels. This

is consistent with the results from the categorical measurements discussed before in which /i/

showed a different center frequency than the other two vowels (/a/ & /u/). Moreover, the

statistical results also showed that there were significant differences between the groups for the

vowels /a/ and /u/, however, no significant difference between the two groups for the vowel /i/.

This suggests that the vocal tract configurations during the singing vowels /a/ and /u/ sung by the

Chinese were different from the Western singers, whereas the vocal tract configuration is similar

between the two groups for the vowel /i/.

The quantitative measures were compared to the categorical measurement, as well as to

the perceptual judgments, to assess the validity of these different measurement techniques in

determining the Fs. The relative energy level differences between the high and low frequency

regions were inconsistent across the singing samples and the quantitative measures of Fs were

not consistent with results from the categorical measures or perceptual ratings. The quantitative

129
measure for calculating the relative energy difference between the high and low frequency

regions might not be a good measure to define the Fs.

In this experiment, differences between the level of the third formant and the first

formant (L3-L1) of the short-term spectra were also measured in order to compare them with

previous studies. Previous studies showed various values for the L3- L1 (Bloothooft & Plomp,

1986; Schutte & Miller, 1985; Sengupta, 1990). For example, Bloothooft and Plomp found an

overall average of –20 dB difference between L3 and L1 for all vowels; Schutte and Miller

suggested that the Fs was noted when L3- L1 was about –7 dB for a tenor who sang the vowel /

ɔ/, and Sengupta suggested –4 dB for the male singers with the vowel /a/. Our results showed

that L3-L1 differed depending on the sung material. Both the Chinese and Western singers who

exhibited the Fs had differences in L3-L1 for the sustained sung vowels /a/, /i/ and /u/ and the

same vowels edited from the regular singing phrase (Table 20 & 21). We suspect that the level of

the L3-L1 varies due to multiple factors such as singing techniques, different vowels, different

singing tasks, etc. Thus, it is difficult to set up an operational definition of Fs based on

measurements of L3-L1 of the short-term spectrum. More investigation of L3-L1 is needed

before such a definition can be adopted.

Singers from both Chinese and Western groups that did not evidence the Fs (categorically

and perceptually) in the sung vowels still showed small differences between L3-L1 of the short-

term spectrum which means, there was still an increased energy in the high frequency region for

these sung vowels. In addition, when these samples were compared to the sustained spoken

vowels, the results also showed small energy differences between the L3- L1 in these spoken

vowels, which means that there was also an increase in the L3 region for the spoken vowels.

130
Oliveira-Barrichelo et al.’s (2001) results suggest that although these singers do not have the Fs,

they do have certain level of training to adjust their vocal tract and generate the energy in the

higher frequency region. As such, some of the highly trained singers in the present investigation

may produce the “speaker’s formant.”

131
Chapter VI: General Discussion and Conclusions

The main purpose of this study was to investigate whether different singing techniques,

traditional Chinese opera and Western classical singing, exhibit the Fs. This study supported our

hypothesis that the Fs not only is found in the Western classical singing technique but also marks

other singing styles such as traditional Chinese opera singing. Previous researchers’

(Bartholomew, 1934; Sundberg, 1987; Sundberg, 2001) claims that the Fs occurs because of

vocal tract configurations that are unique to Western classically trained singing were not

supported by the present investigation. However, previous studies mainly were based on the

Western trained and untrained subjects; therefore, it is difficult to make inferences about non-

Western singing techniques from this previous research.

A ubiquitous definition of Fs is still elusive. In the previous literature, the Fs was defined

based on either the acoustical approach or the physiological approach. The acoustical approach

indicates that the Fs is recognized by a raised cluster of formants 3, 4, and 5 in the acoustic

spectra; the precise amt of increase in this energy remains ill-defined (Sundberg, 1974). The

physiological approach indicates a lengthening of the singer’s vocal tract by lowering the larynx

and expanding the pharynx. In the current study, we sought to define Fs relative to its percept. As

such quantitative and categorical analyses of the acoustic spectra were compared to listeners’

judgments of a vocal ring. Our data showed that the Fs was found not only in the Western

classically trained singers but also the traditional Chinese opera singers. Specifically listeners

heard a vocal ring in samples that evidenced high frequency peaks in their spectra. However, the

high frequency energy found in the Chinese samples seemed to be somewhat different from that
132
found in the Western singers; that is, the center frequency of the peak was higher in the Chinese

singers than in of the Western singers. Results from the acoustic analysis also showed that

bandwidths of the Fs in the Chinese singers were broader than those of the Western singers.

These differences suggest that the traditional Chinese opera singers may manipulate the vocal

tract differently than Western opera singers to generate the Fs. Further investigation of vocal

tract control is needed to understand how Chinese opera singers produce the Fs. It is clear,

however, that these physiological adjustments yield a Fs that is distinct in the acoustic spectra.

As noted earlier, the Fs serves to amplify singers’ voices above the level of the orchestral

accompaniment. Sundberg (1970, 1978) suggested that the Western symphony orchestra has its

highest level of sound in the vicinity of 450 Hz and the amplitude declines abruptly above that

frequency. Therefore, one might expect that the frequency of the Fs would vary depending on the

spectral characteristics of the orchestral accompaniment. Instruments that accompany the

Chinese opera singer typically have a higher frequency range than instruments of the Western

orchestra (Guy, 2003). That is, the spectrum of the Chinese orchestra has high energy that

extends through the high frequency region and gradually decreases beyond 4000 Hz. Therefore,

we suspect that because the Chinese orchestra includes a high energy extending throughout a

broader spectral range (both high and low regions), Chinese singers need to generate the Fs at a

higher frequency than is seen in the Western singers: the production of the Fs at a higher

frequency may be accomplished by using a higher F0. In summary, we hypothesize that singers

generate the Fs differently to overcome the various orchestras depending on their cultural and

musical styles, although certain common elements may be seen.

133
A primary shortcoming of previous studies of the Fs is that they did not relate their

acoustic measurements to specific judgments about the listeners’ perceptions. The assumption

based on Bartholomew (1934) was that the vocal ring was the perceptual attribute of the Fs. If

the Fs causes the percept of the vocal ring, then perceptual judgments are needed about the

presence of the vocal ring. Based on Bartholomew’s assumption, results from the perceptual

rating of this study showed that listeners heard the vocal ring not only in the Western classically

trained singers but also the traditional Chinese opera singers. However, results also showed that

the perceptual judgments of the Fs were not consistent across all listeners. Although listeners

were able to reliably differentiate the singers who had the vocal ring and singers who did not

have the vocal ring in both traditional Chinese singers and the Western classically trained

singers, they were less confident of these judgments for the Chinese singers. We found that there

were many factors that may influence the listeners’ perceptions of the Fs such as language and

technique variations; listeners’ training; singer’s abilities; and bias in instructions, which will be

discussed below.

In this study, we found that although listeners were able to perceive the vocal ring in the

Chinese singers, they expressed that the uncertainty of these judgments might have been affected

by their unfamiliarity with the Chinese language and the different singing technique. Recall that

listeners were professionally trained classical Western singers and were familiar with the

languages and the singing technique; however, listeners were not familiar with the traditional

Chinese singing technique and language which distracted the listeners’ judgments of the Fs.

Although the lack of familiarity with the Chinese language seemed to have distracted the

134
listeners’ perception, listeners expressed that the vocal ring was still identified in certain

common vowels such as /a/, /i/, /u/, and /e/ within the musical phrase.

As noted before, 13 listeners (highly-trained musicians) were used in this experiment, yet

the reliability for the perceptual judgments varied across these listeners. Although the listeners in

this study were highly skilled musicians and were familiar with identifying the presence or

absence of a vocal ring, they may not be experienced with using rating scales. Studies on speech

perception and discrimination commonly use perceptual training and multiple testing sessions to

obtain more stable results. In the present study, there was only one short practice session and one

test session for most listeners. In future studies, investigations of the perceptions of the Fs should

provide greater training. For example, listeners could receive specific training of the Fs until they

are familiar with its definition. In order to achieve that, multiple training sections could be

provided for the listeners until their responses match the qualitative analysis of the Fs to the same

criteria level. Only then will listeners be exposed to the experimental material and asked to make

judgments about the vocal ring. In addition, listeners should repeat the perceptual tests multiple

times to determine the reliability of the perceptual judgments. Also, it may be useful to use

listeners with knowledge of both singing styles to be judged.

One of the factors that affected the outcome of this experiment is singers’ ability in terms of the

singing tasks and methods given in this study. As we mentioned in the previous chapter (p.77),

classically trained Western singers were comfortable with substituting a particular vowel for the

text of a whole musical phrase because it was one of the basic routines of their vocal training.

However, the traditional Chinese opera singers were not comfortable with such singing method

and were unable to project their full voice; therefore, there were more samples perceived as

135
having a strong vocal ring in the regular singing phrase than the phrase sung with the single

vowel. In future studies, it is important to find the common singing tasks for both groups so that

singers can perform with their full strength.

Another factor that may have influenced perception relates to the instructions given to the

listeners. That is, in our instructions, listeners were not specifically told that these samples might

have or might not have the vocal ring. Because the practice session emphasized the presence of

the vocal ring and the instructions asked listeners to rate the ring, listeners may have had a bias

toward expecting all samples to have the Fs. Therefore, it is possible that listeners rated some

samples as “not sure,” (as if they sometimes heard the ring and sometimes not) rather than

indicating that “no vocal ring” was heard. In future research, instructions should either state that

some samples may not have the vocal ring or some tokens that do not have the rings should be

included in the test protocol.

Another possibility is that the Fs and the vocal ring are not the same thing. The questions

asked in this paper were based on the assumption that the vocal ring and the Fs were the same

thing and the entire analysis was guided by this precept. However, it’s possible that people

weren’t hearing the Fs but something else that singers manipulated to generate a ringing tone.

For example, evidence in the previous literature suggested that the sopranos do not have the Fs;

however, they still project a loud and ringing sound that was not due to the high-energy peak at

the Fs region (Carlsson and Sundberg, 1992; Sundberg, 2001). Unlike Western sopranos, results

from the spectral analysis of the traditional Chinese singers in the current study showed a

relationship between how listeners perceived the ring and the presence of the high frequency

peak at the Fs region. However, there is still a possibility that what the listeners in this study

136
heard in those Chinese singers may not have been the Fs but something else to help project the

voice. Therefore, in future study it is critical to refine what the listeners are asked to identify. It is

important to clarify whether listeners identify the ring that may be caused by other mechanisms

or identify the ring that is the Fs. In order to find a better clarification between the Fs and the

ringing tone, it is also important to conduct a perceptual rating of vocal ring on sopranos and to

match their acoustic signals in order to see what acoustic signals may lead to the percept.

What factors impacted the Fs (Factors investigated were analysis procedures and singing

materials)?

Another issue that may have yielded different results across previous studies is the

different methodologies used to investigate the Fs. Although the method of investigation was not

expected to impact the Fs, this assumption had not been tested previously. Some of the studies

investigated the Fs by using the short-term spectra (Schutte & Miller, 1985; Sengupta, 1990;

Sundberg, 2001; Seidner et al., 1985; Weiss et al. 2001), some of the studies used LTAS

(Rossing, 1986; Ternstrom & Sundberg, 1989; Ross, 1992; Cleveland et al. 2001; Sundberg,

2001) or other quantitative (spectral) procedures (Bloothooft & Plomp, 1985, 1986, 1987;

Omori, et al. 1996; Lundy et al. 2000), yet, no previous study had investigated several

differentmethods on the same samples. In the present study, we found that different methods

yielded different results regarding the presence or absence of the Fs.

Our results showed that the categorical analysis of the LTAS--with the criteria of the Fs

defined by a peak around 2300-3500 Hz with a bandwidth less than 700 Hz-- is most consistent

with perceptual judgments of vocal ring. This relationship between the acoustic cues and percept

137
may be explained by Gentner's (1980, 1983, 1989) basic assumption of structure-mapping

theory. According to Gentner, our psychological concepts have structures that relate percepts and

objects. Based on the relations represented in the concept structures, people have the ability to

recognize and map one structure onto another according to a similarity comparison. In the

present study, listeners may have perceived the pattern of the spectra such that high frequency

peaks were contrasted with other parts of the acoustic signal. As such, the categorical analysis of

the acoustic spectra may represent listeners’ strategies better than the quantitative analysis.

Unlike the categorical analysis, quantitative analysis only considered overall energy differences

not spectral patterns. Although the categorical analysis is useful to determine if Fs is present, it

may not reflect the production variables. A quantitative measurements that can define the Fs,

may provide better insight into how the vocal mechanisms that can be used to generate the Fs. It

may be useful to further study the development of a quantitative index of the Fs to find for an

operational definition of the Fs.

In the current study, we also investigated other quantitative analyses (F0 and intensity)

that were used in previous studies (Sundberg, 1973; Shutte & Miller, 1985; Seidner et al. 1985;

Cleveland & Sundberg, 1985; Bloothooft & Plomp, 1985; Sengupta, 1990), and the results from

the analysis of F0 in our study were not consistent across the Chinese and Western singers. That

is, our results showed that the Western singers who were perceived to have a strong vocal ring

had values of F0 up to 392 Hz, whereas singers that were heard as “not sure” had F0 above this

point. In contrast to the Western singers, Chinese singers who were perceived to have the strong

vocal ring had a F0 that exceeded 392 Hz. This suggests that there may be different acoustic cues

that can signal the Fs in the two different styles. Results from our intensity analysis, however,

138
agreed with previous studies (Sundberg, 1973; Cleveland and Sundberg, 1985) and showed

higher intensity ranges in all samples (both Western and Chinese groups) that were rated as

having the Fs.

Another quantitative analysis, L3–L1 of the LTAS analysis, was problematic in this study

due to the overlap of F0 and F1. Because L3-L1 depends on the determination of F1, it is very

important to find a solution to differentiate the F1 from the F0 if this metric is to be used. It is

interesting that previous investigators have not noted this problem nor how they overcame this

problem if it did occur. Furthermore, many previous studies investigated the Fs by using a short-

term spectral analysis; however, this procedure also was not adequate in all cases. The problem

with this procedure is that normal singing may not always include sustained vowels so there is

not a sufficient steady-state sample to yield for spectral analysis. One interesting situation that

we found during the investigation of the L3-L1 is that there were many samples that showed a

higher L3 than L1 for both the short-term spectrum and the LTAS (see tables 18-21). The reason

for the relatively high L3 level is not clear; however evidence of a higher L3 than L1 has not

been reported in previous literature. Finally, it was found that the Fs should not be determined

only by measuring the L3-L1 because different singers may use different mechanisms to

generate the Fs; some of the factors may reflect the Fs acoustically, but may not be heard

perceptually. The use of only one measurement domain may result in incorrect conclusions about

the Fs; therefore, the Fs should be investigated both acoustically and perceptually.

One factor that we found to be closely related to the Fs and which has not been identified

in previous studies is that different materials also impact the Fs. In our study, we compared the

different singing materials (regular singing phrase, and phrase sung with the vowels /a/, /i/ and

139
/u/) by controlling the musical phrase of each singer. Results of our categorical and quantitative

analyses showed a significant difference between the singing materials. Previous studies

generally used only one type of singing material to investigate the Fs. Some studies investigated

one vowel in their study and some investigated multiple vowels; some studies investigated

sustained sung vowels and some used the musical phrase with a complex test. Perhaps the

inconsistent findings across these studies can be related to the different materials that were

analyzed. For example, Schutte and Miller (1985) measured the short-term spectrum of the

vowel / ɔ / and found the level of the Fs was around –7 whereas Sengupta (1990) found the level

of the Fs was around -4 for the vowel /a/. Bloothooft and Plomp (1986) used the average of all

vowels and found about –20 for level of the Fs. Thus, data pooling and differing singing

materials may have influenced the findings in these different studies.

Another aspect that demonstrated the importance of the singing material on the Fs is

related to the singers’ familiarity with the material. It was found that the Chinese singers had

difficulty performing some of the tasks with their full voice because these tasks are not typical in

Chinese opera training. For example, Chinese singers had difficulty substituting the musical

phrase with single vowels; they also had difficulty gliding up and down the musical scale. As

results of the perceptual task showed, listeners’ judgments and reliability also reflected the

singers’ comfort with the task.

Data from the acoustic analysis further support the hypothesis that singing materials

influence the presence of the Fs. In our study, there were three languages used in the singing

phrase (German, Italian and English) by the 10 Western singers: 5 sang in Italian, 1 sang in

English, and 4 sang in German (Appendix B). Most of the singers who had the Fs sang the Italian

140
and English repertoires (5 out of 6) and none of the singers who sang the German repertoire had

a strong vocal ring. This suggests that the different languages, singing styles, and repertoires may

affect the Fs. For example, most of the Italian repertoires sung by these singers were chosen from

an opera which requires singers to overcome a loud orchestra. Most of the German repertoires

selected for this study were German Lieder for which only piano accompaniment was required;

therefore, the Fs might not be necessary in this kind of repertoire. The effects of different

repertories and languages across singing styles should be investigated in future studies of the Fs.

In most previous studies of the Fs, perceptual judgments were made by a single listener

(generally the researcher) and often only one singer was studied. Therefore, the acoustic

characteristics of the Fs may be questioned. For example, Schutte and Miller (1985) investigated

one Western male singer and suggested that the bandwidth of the Fs was constant over the vocal

range, up to 440Hz for the vowel /ɔ/. Our findings do not confirm this because of the inconsistent

bandwidths across the sung phrases (regular sensing phrase, phrases sung with the vowels /a/, /i/

and /u/). Seinder et al.’s (1985) study found that a tenor had a broader bandwidth of the Fs than a

baritone and bass singer; however, only one singer was investigated in each vocal category. In

our study, two different vocal categories, tenor and baritone (5 for each) from the Western

groups, were investigated. Results showed that there was no consistency for the bandwidth

within each vocal category. The different findings may relate to the number of subjects used in

these studies.

What is the impact of the independent and combined factors on the Fs? And how do they differ

from Chinese and Western singers?

141
Previous studies identified many factors that impact the Fs in Western classical singers

and these factors include both phonatory and vocal tract adjustments to emphasize higher

frequencies during singing. Results of our study support the hypothesis that many different

phonatory (e.g., F0 and intensity) and articulatory alterations (e.g., phonetic content) can

generate the Fs. Studies also showed that other factors such as F0, intensity, vowel configuration

could be adjusted to generate the Fs (Sundberg, 1970; Bloothooft & Plomp, 1984, 1985, 1986;

Schutte & Miller, 1985; Seidner et al., 1985; Cleveland & Sunberg, 1985). Physiological studies

show that when a Western opera singer lowers his larynx he also expands his pharynx so that the

cross section area of the pharynx tube is 6 times larger than the epilarynx tube. These

physiological adjustments allow the epilarynx tube to become a separate resonator that generates

the Fs (Sundberg, 1973; 1974; Titze & Story, 1997). Other studies of Western opera singers

indicate that the magnitude of the Fs increases with F0 until the F0 exceeds 392 Hz; the Fs

amplitude decreases beyond this F0 cutoff.

Results of the current study show that the Chinese singers who had the Fs had a higher F0

range than singers who were not perceived to have a strong vocal ring. Because these results are

at odds with studies of Western singers (Bloothooft & Plomp, 1985; Schutte & Miller, 1985;

Seidner, 1985), we suspect that the traditional Chinese opera singers may manipulate the vocal

tract differently from the Western opera singer to generate the Fs. The higher F0 associated with

the Fs in the Chinese group would suggest a higher larynx position than in singers without the

Fs. That is, laryngeal elevation commonly occurs at a higher F0 (Sundberg, 1977; Titze & Story,

1997). Only one study (Wang, 1985) undertook a physiological investigation of Chinese opera

singers. His study demonstrated that Chinese opera singers do have an elevated larynx position

142
when they produce the Fs. This is in contrast to Western classically trained singers who typically

lower the larynx to produce the Fs (Sundberg, 1974). Chinese singers may use other vocal tract

configurations to generate the Fs without lowering the larynx. However, as with most studies of

the Fs, physiological data generally have been collected only to show the articulatory

configuration in the Western classically trained singing. Additional physiological data from

different singing cultures are needed to determine the possible articulatory and phonatory

mechanisms, and their interactions, in the production of the Fs.

Carlsson and Sundberg (1992) studied the tuning of the vocal tract and suggested that

singers tune their two lowest formant frequencies to harmonic partials in order to enhance the

overall radiated sound level. Carlsson and Sundberg also indicated that for high-pitch singing,

when the F0 is higher than the first formant, sopranos were found to adjust their vocal tract to

raise their first formant to a frequency just above the fundamental. Carlsson and Sundberg

suggest that this approach increases the sound level significantly, but does not generate the Fs.

The strategies that the Chinese singers in our study applied while singing seem to be comparable

to how Western sopranos produce the voice (higher F0 with a rising larynx position). If

traditional Chinese opera singing is similar to that of Western soprano singing, then the Fs would

not be expected in these singers; however, results from this study showed that some of these

Chinese singers produced the Fs. This is consistent with Wang’s (1985) study; that is, the Fs can

still be generated without a lowered larynx position. We further hypothesize that other non-

Western singing techniques can generate the Fs by using different vocal tract configurations.

This is shown by our results for the high F0 range, when the Chinese singers may generate the Fs

by using different vocal tract configurations in order to overcome the raised larynx position.

143
Another interesting comparison is found between singers W6 and W7 who performed the

same repertories with the same F0 and almost the same intensity levels; however, singer W6

exhibited no Fs and singer W7 was found to have the Fs. We hypothesize that singer W7 had a

different vocal tract configuration than W6. Therefore, there appears to be an interaction between

vocal tract configuration, perhaps caused by operatic skill and the material that is sung.

Moreover, some singers could generate the Fs better in certain vowels and other singers

generated the Fs better in other vowels. Interestingly, we found that many singers who exhibited

the Fs in the regular singing phrase sang passages that contained more vowels that have been

shown to generate the Fs. Perhaps these singers know, at some level, what vowels benefit or

detract from their vocal quality so that they chose their repertoires accordingly. In addition, there

were many singing samples rated as “not sure” as the listeners sometimes heard the vocal ring

and sometimes not in the experiment. This category may have been used more because listeners

only heard the Fs over certain parts of the musical phrase. This is consistent with our findings in

which different factors and singing materials signal the Fs depending on each individual singer.

In future studies, it is important to find some common singing tasks with which all subjects are

familiar so that comparisons of vocal quality can be made in the absence of secondary

influences.

Conclusion

We conclude that the Western classically trained singing is not the only technique that

affects the Fs. The Fs was found in the traditional Chinese opera singing technique; however,

this high frequency energy seems to be somewhat different from what has been described for the

Western singers. This may be caused by the different singing styles manipulated by different
144
vocal tract configurations. Our findings showed that the perceptual judgments are necessary to

investigate the presence or absence of the Fs; however, the differences or similarities between

the Fs and the vocal ring still need to be clarified in future study. The initial goal of this study is

based on the comparison between the listeners’ perceptual judgments and other acoustic

measures. We suggest that the categorical analysis of the LTAS is a good reflection of perceptual

judgments of the Fs. Other analyses such as quantitative analyses might not be appropriate tools

to determine the Fs, but they may provide insight into the mechanisms that generate the Fs. All

factors such as singing technique, F0, intensity, vowel quality, and singing material impacted the

Fs. These factors either interacted or traded-off to signal the Fs in individual singers. Further

investigation is needed to determine how the Fs is generated in different singing techniques.

Finally, the primary importance of this study is to obtain a better understanding of the possible

vocal tract and laryngeal actions that impact the productions of the Fs. That is, in vocal

pedagogy, describing to the singers what they should do physiologically (e.g., lowering the

larynx) may not be the best method; rather, helping singers to realize and elongate what factors

that affect them the most to generate the Fs may be more appropriate.

145
Appendix A

Questionnaire

What is your age? ______________

How many years of voice training do you have? ______________________

How many years of voice training did you have before age of 18 and after age of 18?

Before 18 __________, after 18 _______________.

How would you describe the type of training that you had?

Western: (a) Operatic __________, lyric ______________

(b) Italian ________, French __________, German __________

Chinese: Lao Sheng __________, Jing _________

How many years have you been performing? __________________

What is your voice type? ________________

What is your voice range? Lowest note ________________ highest note ____________

What focus do you believe to be most important to project your voice?

Nasality? __________ Oral cavity? _________ Throat area? ________

Do you round your lips while singing vowels? ______________ or do you retract your

lips while singing vowels? _____________

146
Notes for sound level meter:

Prolong vowels: comfortable pitch and loudness: _________________________

Sustained singing vowels: as loud as possible: ___________________________

Given pitch: B3 ____________ C3 __________C4_________ D4_________

G3 (or G4) __________ E3 (or E4) ___________

1. Reading phrase 3 times: __________________

2. Regular singing phrase: __________________

3. Singing phrase sung with the vowel /a/: ____________________

4. Singing phrase sung with the vowel /i/: ____________________

5. Singing phrase sung with the vowel /u/: ____________________

147
Appendix B

W1 W2 W3 W4 W5

Voice type Tenor Baritone Tenor Baritone Baritone

Repertoire Lyric Lieder Operatic Lieder Lyric


(style)

Language Italian German Italian German English

W6 W7 W8 W9 W10

Voice type Tenor Tenor Baritone Tenor Baritone

Repertoire Operatic Operatic Lieder Operatic Lieder


(style)

Language Italian Italian German Italian German

148
Appendix C: Rating sheet

Perceptual rating sheet for “vocal ring”

1= Yes, definitely a vocal ring

2= Not sure, or sometimes yes, sometimes no.

3= No, definitely no vocal ring at all

First Block comments

1. 1 2 3 ______________________________

2. 1 2 3 ______________________________

3. 1 2 3 ______________________________

4. 1 2 3 _______________________________

5. 1 2 3 _______________________________

149
6. 1 2 3 ________________________________

7. 1 2 3 _______________________________

8. 1 2 3 _______________________________

9. 1 2 3 ________________________________

10. 1 2 3 ________________________________

11. 1 2 3 ________________________________

12. 1 2 3 ________________________________

13. 1 2 3 ________________________________

14. 1 2 3 ________________________________

15. 1 2 3 _________________________________

16. 1 2 3 ________________________________

150
17. 1 2 3 _________________________________

18. 1 2 3 _________________________________

19. 1 2 3 _________________________________

20. 1 2 3 ________________________________

21. 1 2 3 ________________________________

22. 1 2 3 ________________________________

1= Yes, definitely a vocal ring

2= Not sure, or sometimes yes, sometimes no.

3= No, definitely no vocal ring at all

Second Block comments

151
1. 1 2 3 ______________________________

2. 1 2 3 ______________________________

3. 1 2 3 ______________________________

4. 1 2 3 _______________________________

5. 1 2 3 _______________________________

6. 1 2 3 ________________________________

7. 1 2 3 _______________________________

8. 1 2 3 _______________________________

9. 1 2 3 ________________________________

10. 1 2 3 ________________________________

11. 1 2 3 ________________________________

152
12. 1 2 3 ________________________________

13. 1 2 3 ________________________________

14. 1 2 3 ________________________________

15. 1 2 3 _________________________________

16. 1 2 3 ________________________________

17. 1 2 3 _________________________________

18. 1 2 3 _________________________________

19. 1 2 3 _________________________________

20. 1 2 3 ________________________________

21. 1 2 3 ________________________________

22. 1 2 3 ________________________________

153
References

Bartholonew, w. T. (1934). “A physical definition of good ‘voice quality’ in the male

voice,’ J. Acoust. Soc. Am., 1, 24-33.

Bloothooft G & Plomp R. (1984). Spectral analysis of sung vowels. I. Variantion due to

differences between vowels, singers and modes of singing. J Acoust Soc Am., 75(4),

1259-1264.

Bloothooft G & Plomp R. (1985). Spectral analysis of sung vowels. II. The effect of

fundamental frequency on vowel spectra. J Acoust Soc Am., 77(4), 1580-1588.

Bloothooft G & Plomp R. (1986). The sound level of the signer’s formant in

professional singing. J Acoust Soc Am., 79(6), 2029-2033.

Carlsson G & Sundberg. J. (1992). Formant frequency tuning in singing. J Voice, 6 (3),

256-260.

China-guide (2003). Beijing opera, from China-Guide Web site http://www.china-travel-tour-

guide.com/about-china/beijing-opera.shtml

Chinavoc. (2002). Chinese traditional opera: Beijing opera- roles in Chinese opera, from

Chinavoc.com Web site http://www.chinavoc.com/magicn/yzaj.asp

Cleveland T.F. & Sundberg J. (1985). Acoustic analysis of three male voices of different

quality, SMAC83 Conference Proceedings, Stockholm, 1, 143-156.

Cleveland T.F., Sundberg J., & Stone R.E. (2001). Long-term-average spectrum

characteristics of country singers during speaking and singing. J Voice, 15, 54-60

Detweiler RF. (1994). An investigation of the laryngeal system as the resonance source

of the singer’s formant. J Voice, 8(4), 303-313.


154
Dmitriev L, Kiselev A. Relationship between the formant structure of different types of singing

voices and the dimension of supraglottal cavities. Folia Phoniatr (Basel) 1979; (31): 238-

41.

Fant G. (1960). Acoustic Theory of Speech Production. The Hague: Mouton.

Ferguson S., & Kewley-Port D. (2002). Vowel intelligibility in clear and conversational

speech for normal hearing and hearing-impaired listeners. J. Acoust. Soc. Am., 112 (1),

259-271.

Ferguson, S., & Kewley-Port, D. (2008). Talker differences in clear and conversational speech:

Acoustic characteristics of vowels. J. Speech-Language-Hearing Res., 50, 1241-1255.

Hines Jerome. (1990). Great singers on great singing. New York: 5th Limelight Edition.

Hillenbrand, J., Getty, L.J., Clark, M.J., & Wheeler, K. (1995). Acoustic characteristics

of American English vowels. J. Acoust. Soc. Am., 97, 3099-3111.

Hollien, H, & Shipp, T. (1972). Speaking fundamental frequency and chronologic age in males.

Journal of Speech and Hearing Research, 15(1), 155-159.

Hsu Lu. (Personal communication, Sept. 20, 2001).

Hsu Y.L. (1992). A Comparison of the Vocal Techniques in Peking Opera and Bel Canto

Opera. MI, USA: UMI.

Kent R.D., & Read Charles. (1992). The acoustic analysis of speech. Singular

Publishing group, INC. San Diego, CA.

Lıfqvist A., & Mandersson B (1987). Long-time average spectrum of speech and voice

analysis. Folia Phoniat, 39, 221-229.

155
Liberman, A. M. Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1954). The discrimination of

speech sounds within and across phoneme boundaries. Journal of Experimental

Psychology, 54, 179-188.

Leino T. Long-term average spectrum study on speaking voice quality in male actors. (1994). In

A Friberg, J Iwarsson, E Jansson & J Sundberg, Eds. SMAC 93 (Proceedings of the

Stockholm Music Acoustics Conference 1993), Stockholm: Publication Nr 79 issued by

the Royal Swedish Academy of Music, 206-210.

Lundy D.S, Roy S, Casiano R.R, Xue J.W., & Evans J. (2000). Acoustic analysis of the

singing and speaking voice in singing students. J Voice, 14 (4), 490-493.

Mendoza E, Munoz J, Naranjo N.V. (1996). The long- term average spectrum as a

measure of voice stability. Folia Phoniatr Logop., 48, 57-64.

Nawka T, Anders LC, Cebulla M., & Zurakowski D. (1997). The speaker's formant in

male voices, J Voice, (11), 422-428.

Oliveira-Barrichelo V.M., Heuer R.J., Dean C.M., & Sataloff R.T., (2001). Comparison

of singer’s formant, speaker’s ring, and LTA spectrum among classical singers and

untrained normal speakers. J Voice, 15 (3), 344-350.

Omori K, Kacjer A, Carroll L.M, Riley W.D., & Blaugrund SM. (1996). Singing power ratio:
Quantitative evaluation of singing voice quality. J Voice, 10(3), 228-235.

Picheny MA., Durlach NI., & Braida LD. (1986). Speaking clearly for the hard of hearing II:

Acoustic characteristics of clear and conversational speech. Journal of Speech

and Hearing Research, 29 (4), 434-46.

Riesz, R. R. Different intensity sensitivity for the ear for pure tones. Phys. Rev. 1928;

31:867-75.
156
Roederer JG. (1972). Introduction to the physics and psychophysics of music. London,

English Universities Press; New York.

Ross J. (1992). Formant frequencies in Estonian folk singing. J. Acoust. Soc. Am.,

91(6), 3532-3539.

Rossing TD, Sundberg J., & Ternstrım S. (1986). Acoustic comparison of voice use in

solo and choir singing. J Acoust Soc Am., 79(6): 1975-1981.

Sataloff R.T. (1998). Vocal health and pedagogy. Singular publishing Group, Inc. San

Diego, CA.

Schutte HK, Miller R. (1985). Intra-individual parameters of the singer’s formant. Folia

Phoniatr., 37, 31-35.

Sengupta. R. (1990). Study on some aspects of the “Singer’s Formant” in north Indian

classical singing. J Voice, 4 (2), 129-134.

Seidner W, Schutte H.K., Wendler J., & Rauhut A. (1985). Dependence of the high

singing formant on pitch and vowel in different voice types. SMAC83 Conference

Proceedings, Stockholm, 1985;1: 261-268.

Shower EG., & Biddulph R. (1931). Differential pitch sensitivity of the ear. J Acoust

Soc Am., 275-287.

Stone RE Jr, Cleveland TF., & Sundberg J. (1999). Formant frequency in country

singer’s speech and singing. J Voice, 13(2),161-167.

Su WH & Forrest K.M. (2002, June). An acoustic study of the singer’s formant: The

comparison between Western classical and traditional Chinese opera techniques. Paper

presented at the 31st Annual Symposium of the Voice Foundation on the Care of the
157
Professional Voice, Philadelphia, PA.

Su WH & Rademacher J. (2000, July). An acoustic study of vowels produced by

female opera singers: Cultural and stylistic differences. Paper presented at the 29th

Annual Symposium of the Voice Foundation on the Care of the Professional Voice,

Philadelphia, PA.

Sundberg J. (1970). Formant structure and articulation of spoken and sung vowels.

Folia Phoniatr., 22, 28-48.

Sundberg J. (1973). The source spectrum in professional singing. Folia Phoniatr.,

25, 71-90.

Sundberg J. Articulatory interpretation of the ‘singer’s formant’. (1974). J Acoust Soc

Am., 55, 838-844.

Sundberg J. (1977). The acoustics of singing voice. Scientific American, 236(3), 82-4,

86, 88-91.

Sundberg J. (1987). The science of the singing voice. DeKalb: Northern Illinois

University Press.

Sundberg J. (1995). The signer’s formant revisited. J Voice, 4, 106-119.

Sundberg J. (2001). Level and center frequency of the singer’s formant. J Voice, 15 (2),

176-86.

Sundberg J. (2002, Oct). My research on the singing voice from a rear-view-mirror

perspective. The First international Conference on Physiology and Acoustic of Singing.

Ternstrım S., & Sundberg J. (1989). Formant frequencies of choir singers. J Acoust Soc

158
Am., 86 (2), 517-522.

Titze I.R., & Story B.H. (1997). Acoustic interactions of the voice source with the lower

vocal tract. J Acoust Soc Am., 101(4), 2234-2243.

Vennard W. (1967). Singing: the Mechanism and the Technique. New York: Carl

Fischer Inc.

Wang S. (1985). Singing voice: Bright timbre, singer’s formants and larynx positions.

SMAC83 Conference Proceedings, Stockholm, 1, 313-322.

Weiss R., Brown W.S. Jr., & Morris J. (2001). Singer’s formant in sopranos: Fact or

fiction? J Voice, 15 (4), 457-468.

159
Wen-Hui Su

Address: 364 S. Prospectors Rd. Unit 134


Diamond Bar, CA 91765
Phone number: (909) 860-1230 (H)
(626) 487-9809 (C)
E-mail address: [email protected]

Education

Ph.D. (Expected Date: April. 2009) Indiana University, Department of


Speech and Hearing Sciences
Major: Professional Voice and Voice Disorders
Minor: Music/Voice performance and
pedagogy
M.M - May 1996 Indiana University, School of Music
B.M - Dec. 1994 Indiana University, School of Music

Teaching

Private voice counselor and teacher, LA, CA: 2004-2009

Speech Science: Instrumentation and Applications: School of Speech, Language, and Hearing
Sciences, San Diego State University: 2004

Voice counselor at The Cross School of Music, LA, CA: 2004- 2006

Associate instructor for a doctoral seminar: “Acoustic Research in Speech, Language and
Hearing Sciences.” Indiana University: 2002

Guest lecturer on “Care of the Professional Voice” at National Taiwan Traditional Chinese
Opera Department in Taipei, Taiwan: 2002.

Teaching assistant for “Videostroboscopy” related to voice and voice disorders. Indiana
University, Speech and Hearing Department: 2000-2002.

Assistant instructor for “Voice Fluency in Children and Adolescents” course. Indiana University,
Speech and Hearing Department: 2000-2001.

Assistant instructor for “Voice Disorders” course. Indiana University, Speech and Hearing
Department: 1999.

1
Guest lecturer on “Care of the Professional Voice” at Voice Clinic at Veterans General Hospital:
Taipei, Taiwan: 1999.

Publications, Research, Presentations, and Editorial board

Co-Editor for:
Journal of Speech and Hearing Review (2004).

Moya Andrews and Wen-Hui Su (2004). Voice treatment for Children. Journal of Speech and
Hearing Review.

Wen-Hui Su and Karen Forrest (2003, June). The Influence of Training Technique on the
Singer’s Formant. Paper presented at the 32nd Annual Symposium of the Voice Foundation: Care
of the Professional Voice, Philadelphia, PA.

Hiroya Yamaguchi and Wen- Hui Su (2003, June). Perceptual Evaluations of Voice Samples
Using the GRBAS Scale: Comparison of Listeners from Taiwan and the U.S.A. Paper presented
at the 32nd Annual Symposium of the Voice Foundation: Care of the Professional Voice,
Philadelphia, PA.

Wen-Hui Su & Karen Forrest (2002, June). An Acoustic Study of the Singer’s Formant: The
Comparison Between Western Classical and Traditional Chinese Opera Techniques. Paper
presented at the 31st Annual Symposium of the Voice Foundation: Care of the Professional
Voice, Philadelphia, PA.

Wen-Hui Su, Hiroya Yamaguchi, & Moya Andrews (2001, December). GRABAS Rating by
American and Japanese Listeners. Paper presented at the 3rd East Asian Conference on
Phonosurgery, Taipei, Taiwan.

Wen-Hui Su (2001) An Acoustic Study of the Formant Structure of Voices: Male and Female
Chinese Opera Singers. Doctoral second year project.

Wen-Hui Su & Julia Wood Rademacher (2000, July). A Comparison of the Stability of Basal
pitch measures in young adult singers. Paper presented at the 29th Annual Symposium of the
Voice Foundation: Care of the Professional Voice, Philadelphia, PA.

Wen-Hui Su & Julia Wood Rademacher (2000, July). An Acoustic Study of Vowels Produced by
Female Opera Singers: Cultural and Stylistic Differences. Paper presented at the 29th Annual
Symposium of the Voice Foundation: Care of the Professional Voice, Philadelphia, PA.

Wen-Hui Su & Rahul Shrivastav (1999, June). Relationship Between Conversational Range,
Habitual Pitch and Total Musical Range in Trained Singers. Paper presented at the 28th Annual
Symposium of the Voice Foundation: Care of the Professional Voice, Philadelphia, PA.

2
Honors and Awards

Research grant for Ph.D. dissertation from University of Art and Science. Indiana University:
2002

Bernice Eastwood Covalt Memorial Scholarship: 1999

Travel Grant from the Department of Speech and Hearing Sciences, Indiana University, to attend
the “28th Annual Symposium on the Care of the Professional Voice” in Philadelphia, June 1999

Chi-Mai Culture Foundation Scholarship. Taiwan: 1995-96

First prize from 7th Chi-Mai Culture Foundation Voice Competition. Taiwan: 1995

International Student Award. Indiana University: 1995

First prize from Hsin-Chu City Voice Competition. Taiwan: 1989

Workshops attended

Professional voice/singing workshops. The 32nd Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, June 2003.

Fitzmaurice voice/singing workshops. New York: Dec 2002.

Throat Singing Workshop. Workshop by B. Odsuren and B. Battuvshin: Indiana University,


Bloomington. 2002.

Professional voice/singing workshops. The 31st Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, June 2002.

Professional voice/singing workshops. The 29th Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, July 2000.

Professional voice/singing workshops. The 28th Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, June 1999.

You might also like