An Acoustic Study of The Singer's Formant The Comparison Between Western Classical and Traditional Chinese Opera Singing Teachniques
An Acoustic Study of The Singer's Formant The Comparison Between Western Classical and Traditional Chinese Opera Singing Teachniques
Wen-Hui Su
Submitted to the faculty of the University Graduate School in partial fulfillment of the
requirements for the degree Doctor of Philosophy in the Department of Speech and Hearing
Sciences, Indiana University.
April 2009
UMI Number: 3354922
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
______________________________________________________________
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106-1346
Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the requirements
for the degree of Doctor of Philosophy.
_____________________________
Karen Forrest, Ph.D.
(Chairperson)
_____________________________
Moya L. Andrews, Ed.D.
Doctoral Committee
_____________________________
Diane Kewley-Port, Ph.D.
_____________________________
Gary Kidd, Ph.D.
_____________________________
Paul Kiesgen, M.M.
ii
Acknowledgements
I would like to thank all people who have helped and inspired me during my doctoral
study. My dissertation would not have been successful without the support and assistance of
these people.
I especially want to thank my dissertation advisor, Dr. Karen Forrest, for her guidance
during my research and study at Indiana University. With her enthusiasm, her inspiration, and
her great efforts to explain things clearly and simply, research life became smooth and rewarding
for me. In addition, she was always accessible and willing to help her students with their
research. She provided encouragement, advice, good teaching, good company, and lots of good
ideas to students. Dr. Forrest not only is a great advisor, she is also a great friend who deeply
I also would like to thank my doctoral mentor, Dr. Moya Andrews, who brought me into
this field and completely changed my life. She supported and helped me in every possible aspect
through my doctoral study. She provided advices both academically and personally and loves me
as her own daughter. Without her strength, I would not have completed my degree. I am truly
grateful for all the love, help and support that she has given me.
A special thanks to my dissertation committee members, Dr. Moya Andrews, Dr. Diane
Kewley-Port and Dr. Gary Kidd, who provided me great information, knowledge and advice for
my research. David Montgomery spent countless hours providing the technical support that is
I am grateful to have many friends who supported me one hundred percent through this
degree. Dr. Hiroya Yamaguchi and Mrs. HirokoYamaguchi became close friends with me. They
iii
treated and care for me like their own family member. Chang Liu is an unbelievable friend and
colleague who helped and supported me through uncountable obstacles during my doctoral study
and continue to be my close friend through life. Stan Stockton and Michael Johnson always
showed their support and made me feel warm and welcomed. Lin Lee, Philip Feng, Susan Chiu,
Fred Chen, Kwan-Jun Tyan and Monica Chung are lifetime friends who have and will always be
Lastly, and most importantly, I wish to thank my parents, Ching-Shen Su and Kuei-Chen
Lee. They raised me, supported me, taught me, loved me and gave me valuable advice. To them,
iv
Abstract
Wen-Hui Su
An Acoustic Study of The Singer’s formant: The comparison Between Western Classical and
The singer’s formant (Fs) is a prominent spectrum envelope peak near 3000 Hz that
appears in voices sung by trained Western classical singers. It is a raising cluster of formant 3,
4, and 5 and is especially important since this energy allows singers voices to be heard over the
loud orchestra in the big concert or opera (Fant, 1970, Sundberg, 1970).
Over the years, numerous researches have investigated the Fs by using many different
methodologies. Taken together, an overview of the studies gives a good picture of the influences
that affect the production of a Fs, that is, the Fs could not be explained merely by the influence
of one factor. This study was to investigate the Fs by comparing two completely different
training techniques: Trained Chinese opera singing techniques vs. Western classically trained
singing techniques. Different methodologies were compared to examine whether they impacted
the Fs. The effects of different factors on the Fs such as fundamental frequency, intensity and
vowel quality were also investigated in this study.
Our findings showed that not only the Western classically trained singing; the traditional
Chinese opera singing also exhibited the Fs. The perceptual judgments and qualitative analysis
of the LTAS seemed to be reliable tools to investigate the presence or absence of the Fs. Other
acoustic measures such as quantitative analyses - the energy differences between the high and
low frequency regions of the LTAS, the L3-L1 of LTAS analysis, and the L3-L1 of short-term
spectrum analysis might not be sufficient tools to determine the Fs. Factors such as singing
technique, fundamental frequency, intensity, and vowel quality either interacted or trade-off to
signal the Fs; they interacted differently to impact each individual singer.
v
Table of Content
Chapter I: Introduction 1
Mechanism of Fs production 6
Discussion 76
Discussion 99
Conclusion 144
References 154
vii
Chapter I: Introduction
The singer’s formant (Fs) is defined as a prominent, spectrum- envelope peak near 2800
Hz that appears in the singing of certain voice types of Western classically trained singers. The
Fs is a raised cluster of formants 3, 4, and 5 at an optimal frequency that allows singers’ voices
to be heard over the highest sound level of an orchestra in a big concert hall or opera theater
(Bartholomew, 1934; Sundberg, 1974; Sundberg, 1977). The Fs is the perceptual equivalent of
the “vocal ring” (Bartholomew, 1934; Vennard, 1967). The Fs occurs when an optimal frequency
in the voice is enhanced by the properties of the vocal tract. These properties include lengthening
the singer’s vocal tract by protruding the lips, lowering the larynx and expanding the pharynx
methodologies. In early studies, most investigators focused on the definition of the Fs and
measured the Fs based on single vowels produced by classically trained singers, especially males
(Bartholomew, 1934; Fant, 1970; Sundberg, 1970; Sundberg, 1973; Cleveland & Sundberg,
1985; Schutte & Miller, 1985; Sundberg, 1995). After this initial period of investigation,
researchers turned their attention to the magnitude of the Fs and what factors affected the Fs
(Sundberg, 1970; Bloothooft & Plomp, 1984, 1985, 1986; Schutte & Miller, 1985; Seidner,
Sechutte, Wendler & Rauhut, 1985; Cleveland & Sunberg, 1985; Wang, 1985; Rossing,
Sundberg & Ternstrım, 1986; Ternstrım & Sundberg, 1989; Sengupta, 1990; Ross, 1992;
Barrichelo, Heuer, Dean & Sataloff, 2000; Sundberg, 2001; Weiss, Brown & Morris, 2001;
Cleveland, Sundberg, & Stone, 2001). Singing and speaking phrases produced by both untrained
1
and trained singers were used as samples. Throughout this body of work, researchers identified
many factors as possibly affecting the Fs: The most noteworthy among these variables are vocal
training, voice type, fundamental frequency (F0), intensity, and vowel configuration. Several
other studies have focused on clarifying the measurement of the level of the Fs and its center
frequency region and bandwidth (Boothooft & Plomp, 1984; 1985, 1986; Schutte & Miller,
1985; Seidner, Schutte, Wendler & Rauhut, 1985; Sengupta, 1990; Sundberg, 2001) while others
have focused on quantitatively calculating the singing power ratio (Omori, Kacker, Carroll,
Riley, & Blaugrund, 1996; Lundy, Roy, Casiano, Xue, & Evans, 2000).
There are many unanswered questions regarding the Fs including the following. What is
the operational definition of a Fs? To some investigators, the Fs is related to a specific vocal tract
configuration that generates precise resonances (Sundberg, 2001), whereas other researchers
have focused on the listeners’ perception to define the Fs (Wang, 1985;Omori et al. 1996). In the
present document, the Fs relates to the resonant features in the voices if highly trained singers.
Ideally, there would be some equivalence between the perceptual, acoustic and physiological
definitions, but this relationship remains largely unexplored. Therefore, one must ask whether
there are any quantitative criteria to determine the Fs. Over the years, research studies mainly
have focused on Western classically trained singers to evaluate the Fs. Do trained singers of
Previous researchers used different methodologies to determine the Fs and often got
different results. Therefore, the relation between methodologies and the Fs needs to be
investigated. Finally, because previous studies used very few subjects, one must question how
well those results may generalize to the relevant population. The goals of the current research are
2
to investigate different methodologies and factors that may affect the Fs. The main question to be
addressed in this study is: Does the Fs exist in traditional Chinese opera as well as in Western
classically trained singing? Previously identified factors including vowel quality, fundamental
frequency and intensity of Chinese and Western classically trained singing were measured and
Secondary questions that were closely related to the main purpose of this study were also
investigated: (1) How is the Fs perceived by trained listeners? (2) Do analysis procedures and
singing materials impact the measurement of the Fs? (3) What is the impact of the independent
and combined acoustic parameters on the Fs and how do these acoustic parameters differ for
3
Chapter II: Literature Review
issue that has been investigated by many researchers. Bartholomew (1934) defined a good voice
quality for the male voice as the combination of smooth and even control of pitch, intensity and
timbre. A person who can produce a good voice has the ability to produce a higher intensity than
a person who has a poor voice. Bartholomew speculated that, in order to increase the intensity of
a good quality voice, a wide opening of the throat, more vigorous action of the folds, a greater
space between the tongue and the lower pharynx, and a tensing of the pharyngeal walls are
required. Bartholomew further suggested that a good voice quality for males has a F0 of 500 Hz
or lower. This good quality of singing voice is called the “ring,” and this “ring” is characterized
In the past 30 years, Sundberg has conducted many studies based on the Western,
classically-trained singing voice and introduced the term “singer’s formant” (Fs) to characterize
a good quality of the singing voice. As Sundberg notes, the mechanism of singing shares the
same elements as are used in speech production, although singers must manipulate the vocal tract
differently from what is done in speech (Bartholomew, 1934). Therefore, in this chapter, a basic
introduction to the acoustic theory of speech production (Fant, 1960) will be presented.
Characteristics of the singing voice will then be compared to features of the acoustic speech
signal.
According to the acoustic theory of speech production, as proposed by Fant (1960), the
speech sound is generated by the voice source, filtered by the vocal tract, and radiated from the
mouth. The voice source produces a complex tone that is composed of the F0 and its harmonics.
4
According to Fant (1960), the amplitude of these harmonics decreases at 12 dB per octave for
normal speech produced by men. The next stage proposed in the acoustic theory of speech
production is the filtering of the complex tone produced by the vocal source. The vocal tract acts
as a resonator that selectively filters energy from the harmonic source. Whether amplitude of the
harmonic frequencies is attenuated or increased is determined by the momentary shape and size
of the vocal tract. As a result of these shape changes, the spectrum of each speech sound shows a
specific pattern that is defined by spectral peaks and valleys. These peaks are the formants. The
first, second, and third peaks are labeled as F1, F2, and F3 respectively. These formants are most
dominant in the spectrum, although theoretically there are an infinite number of formants (Fant,
1960). In addition, the first three formants provide cues for English vowel categorization and
Although the voice-source spectrum shows a similar envelope slope (-12 dB/octave)
across vowels, the formants produced by the vocal tract differentiate vowel quality, both
acoustically and perceptually (Kent & Read, 1992). The formant frequencies are determined by
the length and shape of the vocal tract. The overall length of the vocal tract is measured by the
distance from the glottis to the lip opening and is determined by the morphology of an
individual. The length can also be modified either by raising or lowering the larynx, or by
protruding or retracting the lips. A longer vocal tract yields lower formant frequencies, other
After filtering of the source sound by the vocal tract, speech is radiated from the mouth
into air. This final stage proposed by the acoustic theory of speech production (Fant, 1960) is the
radiation characteristic, refers to a filtering effect that the causes the output spectrum to increase
5
by 6 dB/oct. Thus, the radiated speech sound is a product of the voice source energy, the
resonator (vocal tract) and the radiation characteristics from the mouth.
Fant’s initial delineation of the acoustic theory of speech production (1960) indicated
independence of all three components of the model. More recently, theorists and investigators
have suggested a non-linear interaction between the source and filter (Titze, 2008) such that
certain vocal configurations improve the efficiency of the transfer function (defined as the ratio
of oral radiated pressure to glottal flow). In many ways, the Fs provides great support for this
Mechanism of Fs production
Speaking and singing involve changes in the shape of the vocal tract in addition to
variations in its length. The shape of the vocal tract varies depending on the position and size of
the constriction along the length of the tube (Sundberg, 1977; 1987). Vocal tract shape is
determined by the movements of the articulators including lip and jaw openings, tongue position,
, and velar and laryngeal height. Articulatory movements are very complicated and a movement
in any of the articulators generally affects the frequencies of the formants (Sundberg, 1987). The
precise relationship between these articulatory movements and the acoustic characteristics of the
resultant signal, first described by Fant (1960) has been an area of great interest in speech and
singing.
singing than during speaking (Bartholomew, 1934). Singers are taught to “cover” their voices
during which they enlarge the cross-sectional area of the pharynx, almost as if they were
yawning and singing at the same time. Some voice teachers describe this vocal configuration as
6
the sensation of holding an egg inside the mouth while singing. This results in a darker voice
quality than is produced in speech (Vennard, 1967; Hines, 1990). Sundberg (1970) was
interested in determining how these vocal tract alterations affect formant frequencies during
singing.
vocal tract and related these movements to the formant frequencies that were produced. Pictures
of the entire vocal tract including the lips and glottis, as well as the frontal part of the cervical
vertebrae were taken during sustained sung and spoken vowels. These x-ray pictures were
examined and related to the vowels’ intensity and formant frequencies. In general, singing was
characterized by greater intensity than speech and this difference could be related to jaw opening
and larynx position. Jaw opening was shown to have its greatest effect on F1.
X-ray pictures of the larynx that were taken during the sung /a/ showed a lowered larynx, and
prominent expansion of the laryngeal ventricle and of the piriform sinuses in singing but not
during conversational speech. Sundberg suggested that the lowered larynx resulted in a lowered
frequency of F2 of sung front vowels. The frequency of F3 of back vowels was increased during
singing because of a decrease of the size of the cavity behind the incisors and an increase in the
size of the posterior part of the oral cavity. The frequency of F4 was found to decrease in sung
back vowels because of the lower larynx position during singing than during speech. These
changes resulted in a reduced frequency distance between F3 and F4. Finally, Sundberg
suggested that the lowered F3 and F4 frequencies found in the sung front vowels is also an effect
of the lowered larynx as well as increased lip protrusion. Sundberg concluded that the
manipulations of jaw opening and larynx lowering clustered to yield the Fs.
7
In a second study, Sundberg (1973) noted that the Fs is characterized by a faster growth
in intensity than is found in the overall spectrum intensity This was an early investigation into
the non-linear relationship between the voice and vocal tract wherein Sundberg hypothesized
that Fs could be attributed to the voice spectrum, the vocal tract transfer function, or both. An
investigation of professional singers was conducted to examine the impact of the source and
Sustained sung and spoken vowels /a/, /i/, and /u/ were produced by trained singers at
four, equally-spaced pitches and at four different intensities: piano, mezzo piano, mezzo forte
and forte. Sundberg (1973) investigated the relation between the overall sound pressure level
(SPL) and the SPL of the Fs as a function of vocal intensity and pitch. Results from the spectrum
analysis of the sung vowels showed that when vocal effort increased, the amplitude of the higher
partials increased as compared with the intensity of the lower partials. Moreover, the level of the
higher partials usually increased more quickly than that of the lower partials when the vocal
effort increased.
Sundberg (1973) hypothesized that spectral balance (the amplitude difference between
the Fs region, denoted as L3, and the first formant frequency region, denoted as L1) is
determined by two factors: formant frequencies and the source spectrum characteristics. In an
effort to determine the contribution of the voice source, he compared the spectra of sung vowels
with those generated by a synthesizer. Differences between sung and synthesized vowels were
calculated to determine if the singers’ source spectra followed a 12 dB/octave decline that
typically is found in male speakers (Fant, 1960). Sundberg (1973) investigated the effect of vocal
intensity on the source spectrum for both the sung and spoken vowels by two singers. One singer
8
had a “dark voice” and the other singer produced a “light voice.” Sundberg suggested that the
variation of vocal effort did not change the source spectrum slope by a constant amount. The
lower partials (i.e., those below 1000 Hz) tended to increase more slowly in amplitude than the
higher partials (those above 1000 Hz) while the vocal intensity increased during singing.
Sundberg also indicated that the level of the Fs increased more quickly than the level of F1 when
vocal intensity increased. Similar effects were found for changes in F0; that is, increased F0
yielded less change in F1 intensity than was noted for the level of Fs.
Sundberg (1973) found that the average source spectra of all sung and spoken vowels
were quite similar for singers. That is, the amplitude of the lower partials relative to the
amplitude of the higher partials was weaker in the loud spoken vowels than in the normal spoken
vowels. Because vocal effort in singing did not differ from that of speech as reflected in the
source spectrum in this study, Sundberg concluded that the singers used a similar type of source
in singing as in their regular speech, and this type of source may be generated by a special
articulation rather than having the vocal folds vibrate differently. Sundberg also compared his
data with other studies in which untrained singers were investigated (e.g., Lindqvist, 1970; Fant,
1970) and found that for untrained singers’ speech, vocal efficiency was limited to a small range
of intensities and pitches. This contrasts with the voice spectra of trained singers wherein
increases in vocal intensity and pitch raise the level of the higher formants, i.e., Fs. Sundberg
concluded that the voice source is different between trained and untrained voices.
In this study (Sundberg, 1973), a physical model of the vocal tract was also simulated to
compare the acoustic signal obtained from the model with the acoustic signal generated by the
synthesizer. Results from the comparison showed the transfer functions were similar for both the
9
model and synthesized signals. Furthermore, in Sundberg’s (1970) previous study, examination
of the frontal x-ray pictures of the larynx during the spoken and sung vowel /a/ produced by
trained singers showed a lowered larynx, expansion of the sinus Morgagni (laryngeal ventricle)
and the piriform sinuses for the sung vowel. Sundberg (1973) hypothesized that the expanded
sinus Morgagni and the piriform sinuses somehow impact the Fs; therefore, he simulated the
vocal tract to include the expanded sinus Morgagni and the piriform sinusus. Comparison of the
transfer functions obtained from the vocal tract model indicated that the transfer functions
comprising a singer’s formant were equivalent to those obtained from the synthesizer. This result
led Sundberg (1974) to hypothesize that the Fs is an extra formant around the frequencies F3, F4,
and F5 that is produced when the larynx is lowered and the pharynx above the opening part of
the larynx is expanded. This effect would substantiate the non-linear interaction of the voice and
vocal tract.
Sundberg (1974) investigated his hypothesis with a simulated male vocal tract. The shape
of the vocal tract was modeled on the tomograms generated by Fant (1960) from which the
dimensions of the lower position of the larynx tube were estimated. Sundberg defined the larynx
tube as a small tube above the vocal folds that is vertically inserted into the pharynx tube. The
larynx tube was modeled as a twin resonator with a wider tube at the inferior end (sinus
Morgagni) and a narrower, longer tube above the sinuses. The dimensions of the larynx tube
were 6 cm in length with a cross-section area of 1 cm. This tube was inserted into the pharyngeal
tube which was formed by a cylindrical brass tube that was closed off at one end. A high level,
modulated DC voltage was used as a sound source for this simulated vocal tract.
10
The first of Sundberg’s (1974) experiments was designed to investigate the effects of the
lower position of the larynx on the pharynx tube. Results showed that when the larynx is
lowered, the pharynx above the opening part of the larynx is widened. Furthermore, Sundberg
(1974) confirmed previous studies that indicated that when the pharynx is widened during larynx
lowering for singing, the cross-sectional area of the pharynx tube is six times larger than the
opening area of the larynx tube. Thus, the larynx tube acts as a separate resonator from the
In addition, Sundberg (1974) noted that when the F0 increased, the area of the larynx
tube opening was normally expanded. This expansion of the larynx tube opening might affect the
ratio between the larynx tube opening and the cross -sectional area of the pharyngeal tube that
generated the Fs. In the next experiment, Sundberg simulated (1974) the size of the larynx tube
opening at different F0 and the output from the vocal tract model was measured. The findings
showed that when F0 increased, the larynx tube opening increased resulting in a raised resonance
frequency of the larynx tube. As indicated before, in order to generate the Fs, the cross-sectional
area of the pharyngeal tube had to be six times larger than the opening area of the larynx tube so
that the larynx tube could be acoustically independent of the pharyngeal tube. When F0
increased, this did not occur and the larynx tube could not become a separate resonator.
Sundberg (1974) hypothesized that in order to generate the Fs at the higher F0, the larynx
tube should be maintained as a separate resonator. Sundberg further suggested that the sinus
Morgagni might have a great impact on counteracting the change in the larynx opening relative
to the pharynx at higher F0. This led to the next experiment in which Sundberg simulated the
sinus Morgagni and then measured the output from the vocal tract model. The small tube that
11
acted as a larynx was inserted into a larger “pharyngeal” tube and the volume of the larynx tube
was varied when the simulated sinus Morgagni was expanded. Again, comparisons were made
between the model’s output and the formant frequencies that were derived from Fant’s equations
of vocal tract resonance (1960). The agreement between the calculated and measured formants
indicated that changes in the size of the larynx tube opening could be overcome by expansion of
the sinus Morgagni, and that sinus Morgagni expansion could be effected by laryngeal lowering.
Sundberg concluded that during singing, the sinus Morgagni is expanded to compensate for the
increased larynx tube opening caused by the increased F0. This expansion allows the larynx tube
to act as a separate resonator with a resonance frequency at 3 kHz. Moreover, the expansion of
In addition to the contribution of the sinus Morgagni to the generation of Fs, Sundberg
(1974) reported that the tomograms showed that both the cross sectional area and the length of
the piriform sinuses were increased when the larynx is lowered. In the final experiment, the
piriform sinuses were simulated based on the findings from the tomograms (Fant, 1960). The
piriform sinuses were simulated by one or two cylindrical tubes that could vary in length and
diameter. These tubes were inserted parallel to the larynx tube, into the closed end of a large
“pharyngeal” tube. The results from the model’s output agreed with the formant frequencies that
were derived from the equations of vocal tract resonance (Fant, 1960). Sundberg concluded that
the lowered larynx causes an expansion of the piriform sinuses and an expansion of the pharynx
tube. Sundberg further suggested that the piriform sinuses could also be interpreted as an
increased pharynx length. This increase in pharyngeal length caused the frequency of F5 to drop
considerably but the resonance frequency of the larynx tube remained at around 3 kHz.
12
Although Sundberg (1974) suggested that the Fs only occurs when the larynx is lowered,
several investigations of other singing styles counter this claim (Wang, 1985; Sengupta, 1990).
For example, Wang (1985) studied Chinese opera singers and found the Fs with an elevated
larynx, however, this result was not replicated by Sundberg (2003) in his investigation of one
Chinese opera singer. Sengupta’s (1990) studies of Northern Indian classical singing also
disagreed with Sundberg’s suggestions about larynx position and the Fs because Indian singers
In an effort to resolve the conflict between Sundberg’s findings and those of other
researchers about larynx height and the Fs, Detweiler (1994) investigated the laryngeal system
Three, trained male singers (one tenor and two baritones) were investigated during the
production of modal and pulse (involving vocal fry) phonations. The main focus of Detweiler’s
(1994) study was to determine if the Fs really was generated only when the cross-sectional area
of the laryngeal outlet was six times smaller than the pharynx tube. Another focus of Detweiler’s
study was to investigate the effect of the laryngeal ventricle (sinus Morgagni) on Fs. Endoscopic
videolaryngoscopy was used to examine the cross-sectional areas of the laryngeal outlet and
laryngopharynx during phonation and images of the larynx were captured with MRI. In addition,
The results from both MRI and the laryngoscopic examinations showed that the cross-
sectional area ratio between the outlet of the larynx and the pharynx ranged from 2.9:1 to 3.7:1,
thereby contradicting Sundberg’s (1974) model. Supporting evidence from both MRI and
laryngoscopic examinations showed a clear laryngeal ventricular space during modal phonation,
13
but not for the pulse phonation; nevertheless, the acoustic study showed that both singing
conditions demonstrated the Fs. Detweiler (1994), therefore, concluded that the sinus Morgagni
was not the clear cause of the Fs. Moreover, results of the vertical laryngeal position obtained
from the MRI showed that the sinus Morgagni behaved differently than what Sundberg
suggested. Detweiler (1994) concluded that Sundberg’s model was inadequate to account for the
Detweiler (1994) supported her hypothesis strongly by using the three different analyses
but there were some questions that still needed to be addressed. Although the results of
Detweiler’s acoustic analysis showed the Fs when the singers sang in both the supine and upright
positions (for the laryngoscopic evaluation during MRI), it is doubtful that the singers could
really sing with their “best voice” while in a supine position. It is also necessary to question
whether the supine position affected the larynx position during singing, which may account for
the differences in results between Detweiler and Sundberg (1974). Detweiler did not specify how
the Fs was identified in the acoustic analysis, therefore, results from this study are hard to
interpret. Finally, it was suggested that the results from the MRI and the laryngoscopic
examinations yielded consistent information. However, the MRI was taken while the singers
were phonating a different vowel (/a/) from the vowel /i/ which was used during the
(i.e, MRI and stroboscopy) with different body positions and different vowels, leads one to
question how the study’s results might have been affected by these variations. Also, direct
stroboscopy may have made it difficult for the singers to sing with their best voices, so it is
14
Titze and Story (1997) also conducted a study to evaluate Sundberg’s (1974) model of
the physiological and acoustic changes associated with the Fs. They used a computer model to
investigate how the vocal tract can be adjusted to produce the best conditions for vocal fold
oscillation. The model was based on magnetic resonance images (MRI) measured from a 30
year-old-male. The input impedance (defined as the ratio of supraglottal pressure to glottal flow)
and transfer function of the vocal tract were computed when the vocal tract shape varied.
Titze and Story (1997) showed that the epilarynx tube (i.e., the narrowed portion of the
laryngeal ventricle above the glottis that is equivalent to Sundberg’s definition of the larynx
tube) influenced the resonant frequencies of the output signal. With a narrowed epilarynx relative
to a uniform vocal tract, the frequencies of F1, F2, and F3 were pulled upwards and the
frequencies of F4 and F5 were pulled downward toward the region of 2500-3500 Hz. When the
resonant frequencies associated with an independent epilarynx were calculated, Titze and Story’s
findings showed that the first 5 formant frequencies were affected and moved toward the
frequency region of 2756 Hz. In a second simulation, Titze and Story calculated the acoustic
consequences of pharyngeal expansion. Their findings confirmed Sundberg’s (1974) results that
the epilarynx tube influences the generation of the Fs. They concluded that when the pharynx is
widened and the ratio of the cross sectional areas of the pharynx to the epilarynx is 6:1, the
narrowed epilarynx tube becomes a separate resonator that causes a cluster of F3, F4 and F5, the
Fs. These findings led Titze to suggest modifications to Fant’s (1960) acoustic theory of speech
From the studies discussed above, it can be concluded that the Fs can be defined as a
prominent, spectrum-envelope peak around 3 kHz that is composed of a raised cluster of F3, F4
15
and F5. When a Western classically trained male singer lowers his larynx and expands his
pharynx during singing, the cross sectional area of the pharynx tube is six times larger than the
epilarynx tube; therefore, the epilarynx tube becomes a separate resonator and generates the Fs.
In addition, laryngeal depression increases the width of the sinus Morgagni and causes the
piriform sinuses to be expanded which maintains the resonance frequency of the laryngeal tube
at around 3 kHz. Because many acoustic variations are used during singing, one must examine
Sundberg’s early studies established the acoustic and physiological bases for the
generation of the Fs. Subsequent studies, reviewed in this section, focused on how different
factors such as vocal training, voice classification, pitch, loudness, and vowel configurations
impact Fs. F0 are common during vocal performance, so Schutte and Miller (1985) investigated
the effect of F0 on the center frequency of the Fs (Schutte & Miller, 1985). They asked one tenor
to sing the vowel /ɔ/ in chromatic steps over his total vocal range, starting with F0 below the
normal tenor singing range and continuing to a F0 above this range. The center frequency of the
Fs from each chromatic note was analyzed by a short-term spectrum, and the results showed that
the Fs appears in the region of 2,200 Hz for the lowest F0, whereas the highest F0 yielded the
highest Fs at 3,100 Hz. Schutte and Miller concluded that within the frequency range
investigated, the frequency of the Fs increased as the F0 increased; however, the spectral balance
(i.e., L3-L1) remained constant throughout the whole F0 range. Within this tenor’s most
commonly used F0 range (131Hz –524Hz), the spectral balance is about –7 dB. These findings
suggest that Fs frequency varies with F0, yet its intensity remains constant across the tenor’s
16
singing range. Therefore, Schutte and Miller defined the Fs by a 7 dB difference between L3 and
L1.
Schutte and Miller’s (1985) study investigated only one singer but it was not specified
how this particular singer was selected or determined to have the Fs before acoustic measures
were made. There were no details that indicated who judged this singer to have the Fs, or if there
was any other acoustic analysis to indicate that the Fs was exhibited in this singer. Therefore,
Seidner, Schutte, Wendler and Rauhut (1985) further studied the effect on the Fs of F0
and they also investigated the effects vowel quality and voice types on the Fs. Five trained
singers (3 males and 2 females) with different voice types (tenor, bass, baritone, soprano, and
alto) were included in their investigation. Singers were asked to produce three different vowels,
/a/, /i/, and /u/ with a loud voice. Each vowel was sung at four notes, C, E, G, and A, over a range
of three octaves. Measures included the level and center frequency of Fs as a function of vowel
quality, voice type and F0. Seidner et al. found that the frequency of the Fs varied depending on
vowel quality and F0. The results showed that the Fs shifted to higher frequencies when F0 was
high and Fs was lower for low F0; the results also showed that there was no relation between the
Seidner et al. (1985) also found that the spectral balance of the Fs, as measured by the
intensity of the Fs relative to that of F1, was affected by the voice types and was higher in the
male singers than in the female singers. For male singers, lower voices (bass and baritone)
showed similar spectral balance of -10 dB with vowel quality while the relative spectral balance
varied with both F0 and vowels for the tenor. The results for the tenor showed that the spectral
17
balance increased when the F0 increased for all vowels (/a/, /i/, /u/); within the range of A4 (440
Hz) to C5 (524 Hz), the greatest relative intensity of the Fs (+20 dB) was generated for the vowel
/a/ and the relative intensity of the Fs decreased beyond this frequency range. For female
singers, the relative intensity of the Fs for the alto was lower than that seen in the male singers,
and the soprano showed the lowest level of the Fs among all singing types. Moreover, the
relative intensity level also varied with both F0 and vowels for female singers. This
investigation, then, defined 3 primary factors that affected the Fs- voice type, F0, and vowel
quality.
Cleveland and Sundberg (1985) also studied the effects of F0 and intensity on the Fs in
different voice classifications as well as the influence of subglottic pressure on these parameters.
Three trained male singers (bass, baritone and tenor) used three loudness levels (forte, mezzo
forte and piano) and three pitch levels (high, medium and low) as they sang the chromatic scale
on the vowel /a/. Fundamental frequency for these singers ranged from E3 (165 Hz) to E4 (330
Hz) during this singing task. The vowel /a/ was preceded by a consonant /p/ so that each singer’s
subglottic pressure could be measured from the oral pressure during the /p/-occlusion.
Cleveland and Sundberg (1985) first investigated the subglottic pressure when singers
produced three different pitch and loudness levels. Although no information was provided to
quantify the singers’ loudness levels and F0, Cleveland and Sundberg showed that changes in
subglottic pressure had a main effect on vocal loudness. When the vocal effort was high (at
forte), the subglottic pressure was high and when the vocal effort was low (at piano), the
subglottic pressure was low. Their findings also showed a relation between F0 and subglottic
18
pressure; when the F0 increased, the subglottic pressure increased at all different levels of vocal
effort.
Cleveland and Sundberg (1985) further suggested that even though the subglottic
pressure was the main effect in controlling vocal SPL, the loudness (i.e. the perception of vocal
level) was affected by other factors such as the relative distance of the partials, the formant
amplitude and frequency. They found that the bass singer used the lowest subglottic pressure yet
produced the highest sound pressure level (SPL) (measured 50 cm from the mouth) whereas the
tenor used the highest subglottic pressure, however, produced the lowest SPL. Cleveland and
Sundberg suggested that singers use different subglottic pressure and/or articulatory movement
in order to achieve certain loudness levels. Further, the same pitch range that was produced by
different singers of different voice types also could cause different SPLs from these singers.
They hypothesized that instead of requiring singers of different voice types to produce the same
pitch, singers might generate more similar SPLs if they sang in their own comfortable pitch
ranges. That is, singers with different vocal ranges need to adjust their phonations differently to
accomplish different fundamental frequency ranges, therefore, subglottal pressure and SPL might
be affected.
Also in this investigation, Cleveland and Sundberg (1985) investigated the relations
between the level of the Fs and the overall SPL values. They suggested that the level of the Fs
increased with both F0 and vocal effort (loudness level- high, medium and low). Their results
showed that the baritone generated the lowest level of the Fs and the tenor generated the highest
level of the Fs. It was also found that the SPL and the amplitude of the Fs are highly correlated in
all subjects; however, the specific influence of increased F0 on the Fs was not described.
19
Cleveland and Sundberg further found that the level of the Fs increased more rapidly than the
amplitude of the partials in the lower frequency region; therefore, they concluded that the Fs was
impacted more by the vocal tract shape than by the pitch and loudness.
Overall the studies reviewed suggest that the Fs is influenced by voice training as well as
voice type (classification); bass (range 82- 262 Hz), baritone (range 98-330 Hz), tenor (range
123-392 Hz) and sometimes alto (range 220- 698 Hz) voices evidence the Fs. Sundberg (1977)
hypothesized that it is difficult for female singers to produce the Fs because it is nearly
impossible to lower the larynx when singing in a high frequency range. Very few studies directly
tested whether the Fs is produced by the female singers, especially sopranos. One study that did
include both males and females (Seidener et al., 1985) suggested that the level of the Fs was
lower for the female singers than the male singers. Seidner et al. did not specify how the Fs was
In another study designed to investigate the effect of voice classification on the Fs,
Sundberg (2001) compared the differences between 5 different voice types including male and
female singers. This study used commercial recordings from 20 classically trained singers that
included equal numbers of singers demonstrating soprano, alto, tenor, baritone, and bass voices.
Each sample was approximately 30 seconds long and was analyzed with long-term-average-
spectra. Both center frequency and level of the Fs were measured for each sample. Findings
showed that both the frequency and the level of the high frequency peak (Fs) varied within and
between voice classifications. The alto singers showed the highest center frequency of the Fs (3
kHz, approximately). Within male singers, tenors showed the highest center frequency (2.84
kHz) whereas the bass singers showed the lowest frequency (2.42 kHz) for the Fs. Moreover,
20
results showed that the highest level of the Fs was found in the baritones. In comparison to the
baritones, the basses and tenors produced a Fs that was 3 dB lower while the Fs for altos was 9
dB lower than in the baritones. Most sopranos obtained two peaks rather than one single peak.
Sundberg (2001) related these two peaks to the F3 and F4 and thus suggested that sopranos did
not exhibit the Fs because there was no clustering of these formants. Sundberg explained that
sopranos do not produce the Fs because of their high fundamental frequencies. Higher
fundamental frequencies affect the frequency distance between partials, which reduces the
These results were confirmed in a study by Weiss, Brown and Morris (2001). Weiss et al.
did a spectrographic analysis of 10 sopranos singing 5 vowels /a/, /i/, /u/, /e/ and /o/ at 3 different
pitch levels: low (261 Hz), mid (622 Hz) and high (932 Hz). Their findings showed that the
spectral peaks for sopranos ranged from 2.6 kHz to 4.6 kHz which was beyond the definition of
the Fs (around 3 kHz) suggested by Sundberg (1970; 1995); therefore, Weiss et al. concluded
Results from Weiss et al. (2001) showed that when the soprano sang low-pitch and mid-
pitch vowels, there was a high frequency reinforcement found around 2.5 kHz, the region of the
Fs. However, the bandwidth of this peak was 2-2.5 times broader than that of the Fs found in
men (Schutte & Miller, 1985). Weiss et al. also showed that for the high-pitched vowel, there
was no clear energy peak found in the region of 3 kHz, but a higher, extended, strong energy was
found between 5-8 kHz. Weiss et al. concluded that the spectral energy generated by sopranos is
simply related to the high-frequency harmonics of the fundamental; thus high female voices
21
Taken together, the results from the investigations reviewed above suggest that the Fs is
affected by the vocal intensity, F0, voice classifications, and the articulatory movements (Fant,
1970; Sundberg 1973; Sundberg, 1977; Schutte & Miller, 1985; Cleveland & Sundberg, 1985;
Seidner et al., 1985; Sundberg, 2001; Weiss et al. 2001); however, in these studies, researchers
rarely provided specific details on how the Fs was defined. Bloothooft and Plomp(1984, 1985,
1986) recognized this short-coming and conducted a series of studies to determine the specific
criteria for the Fs and the factors that impact the its generation. This series of studies investigated
the relation between the level of the Fs and the overall sound level of sung vowels with different
voice types. Bloothooft and Plomp investigated the interactions between the Fs and F0, intensity,
mode of singing (i.e. light, dark, pressed voice, soft, etc), vowel configuration, and voice
In this series of studies, Bloothooft and Plomp (1984, 1985, 1986) used 1/3-octave filters,
with center frequencies from 122-4000 Hz, to approximate the filtering of the auditory system.
The spectra of the vowels were measured every 10 ms and normalized for overall sound-pressure
level to eliminate spectral variation due to level differences. Nine Dutch vowels (/a/, /ɑ/, /i/, /u/,
/ ɑ/, /œ/, /y/, /ε/ and /e/) were sung including males and females with seven voice types ranging
from bass to soprano. Different pitch ranges (F0= 98, 131, 220, 392, 659 and 880 Hz across
singers) sung in nine different singing modes such as neutral, light, dark, soft, etc., were
investigated.
In their first experiment, Bloothooft and Plomp (1984) focused on factors that influence
spectral variance during singing. They determined that vowel spectra variances depend on
factors such as vowel quality, voice type, modes of singing (light, dark, neutral, soft, etc.) and
22
fundamental frequency. They then examined the interactions between these factors in male and
female singers. Bloothooft and Plomp showed that spectral variance is associated with
interactions of all the above factors. Among these factors, vowel quality had the greatest effect
on the spectral variance when the F0 was 98 Hz for males and 220 Hz for females. The impact of
vowel quality on spectral variance decreased when the F0 increased for both males and females.
The results showed that the relation between spectral variance and vowel quality was constant
for F0 up to 392 Hz and decreased when the F0 increased beyond 392 Hz. In other words, vowel
In the following study, Bloothooft and Plomp (1985) used the same methodologies and
subjects used in the first study and investigated the interactions between F0 and overall SPL.
Measurements were made from spectra that were averaged across singers and across the sung
vowels. Their findings showed that when F0 increased from 98-392 Hz, the average SPL of the
sung vowels increased by 16 dB for males. For females, when F0 increased from 220-880 Hz,
there was an average increase of 22 dB for the sung vowels. When singers of both sexes sang
with F0= 392 Hz, males exhibited 8 dB higher overall SPL than female singers.
Bloothooft and Plomp (1985) indicated that the highest sound levels were found in the
1/3 -octave bands with a mean center frequencies of 2.5 kHz for male singers and 3.16 kHz for
female singers. They, therefore, defined the frequency band between 2.5 kHz and 3.16 kHz as
the frequency band of the Fs. Bloothooft and Plomp then measured the sound level in the
frequency bands of the Fs from the average spectra and comparisons were made between the
overall SPL of the vowels and the sound level of the Fs. Findings showed that for the modal
register of the male singers, overall SPL and the sound level of the Fs increased proportionally
23
when F0 increased. For the falsetto register, defined as a high pitch produced by males with use
of only part of the vocal folds, the level of the Fs frequency band was less than in the modal
register. For female singers, increasing F0 increased the difference between the level of the Fs
and overall SPL of the vowel; that is, the overall SPL increased while the level of the Fs
The findings also showed the shapes of the average spectra and the level of Fs were
similar for male and female singers when F0=220 Hz. This similarity between males and females
was found for F0 up to 392 Hz, even if the males used a falsetto register. When F0 was greater
than 392 Hz, female singers showed a decrease in the amplitude of the Fs while the overall
spectral SPL increased. These findings led Bloothooft and Plomp (1985) to agree with
Bartholomew’s (1934) suggestion that the Fs is present in female singing, but it is diminished
In the last study of this series, Bloothooft and Plomp (1986) investigated the sound level
of the Fs and how the five different factors-vowel quality, vocal intensity, F0, mode of singing
and voice classification-interacted to impact the level of the Fs. Bloothooft and Plomp then
defined the Fs based on the outcome of this investigation. The methodologies from the previous
two studies were used to determine the variation in the sound level of the Fs compared to the
overall SPL of each singer as a function of vowel, fundamental frequency, classification, and
mode of singing. Again, the Fs was defined as a peak between 2.5 and 3.16 kHz. The sound level
in this frequency band was measured and normalized relative to overall SPL. The results showed
24
F0 and gender: At a F0 of 392 Hz or less, the level of the Fs was equivalent for male and female
singers; however, the level of the Fs decreased when F0 increased above this frequency.
Vocal intensity: It was found that sound level of the Fs increased when the vocal intensity
increased.
Vowel quality and F0: The results showed that the magnitude of the Fs depended on the vowel
quality. When F0 was 220 Hz, the level of the Fs was low in the sung vowels /u/ and /ɔ/ for both
females and males but the level of the Fs was high in the sung vowel /i/ for both male and
females.
Overall SPL, vowel quality and vocal register: The results showed that the level of the Fs
increased more rapidly with increased overall SPL for the vowels /ɔ/, /y/, and /u/. This effect was
seen for males singing in the modal register and for all females. However, when the same vowels
were sung by males in the falsetto register, the level of the Fs increased less rapidly than for
singing in the modal register with increased overall SPL. Vowel quality did not influence the Fs
Modes of singing and F0: In the male modal register, the level of the Fs was constant relative to
overall SPL over the 3 modes of singing, light, neutral, and loud, as F0 increased. For the female
singers, the level of the Fs remained constant over these three modes of singing when F0 was
Bloothooft and Plomp (1986) suggested that only the results from the level of the Fs
remained stable with different modes of singing (neutral and loud) for both male and female
singers whereas the level of the Fs varied with variations in vowel quality, F0, vocal intensity,
and vocal register. Therefore, the minimum sound level of the Fs (-20 dB relative to F1)
25
measured from the two modes was defined as the threshold of the Fs. Bloothooft and Plomp
concluded that when the relative level of the high-frequency spectral peak around 2-4 kHz
exceeds a threshold of about –20 dB relative to F1, this peak is defined as the Fs.
As noted earlier, Schutte and Miller (1985) used a short-term spectral analysis in an effort
to define the Fs. One singer, a tenor, produced the vowel /ɔ/ in chromatic steps over his entire
vocal range. The stimuli were passed through a spectral analyzer and level differences between
two regions in the spectrum were calculated. The first region was defined through acoustic
theory about vowel identity and was based on the frequency of the lower formants, up to about
1,800 Hz (L1). Schutte and Miller defined a second region, L3, where the Fs was located as the
frequency region around 2.2 kHz-3.5 kHz. In these two regions, peaks (L1 and L3) were defined
by either the highest partial or the average of the two highest partials in each region. The level of
Fs was calculated as the difference between L3-L1. Schutte and Miller calculated the level
differences between the two regions for each note from the most commonly used F0 range of this
tenor (131 Hz –524 Hz). Their findings showed that the level differences between L3 and L1
remained constant throughout the whole F0 range. Schutte and Miller then identified –7 dB
relative to F1 as the averaged level of the Fs for tenor voices. This finding contradicts the
findings of Bloothooft and Plomp (1985) in which the amplitude of the Fs was stable over the
modal register but when the F0 range was beyond 392 Hz, the amplitude of the Fs decreased.
Schutte and Miller (1985) determined the bandwidth of the Fs as the frequencies within -
15 dB from the peak of the center frequency. They suggested that with F0 = 131-392 Hz, the
bandwidth of the Fs was constant, however, the basis for this claim is not clear. Schutte and
26
Miller provided a clear explanation of how to determine the level of the Fs; however, they only
used one tenor who sang one vowel. As noted earlier, Seidner, Schutte, Wendler and Rauhut
(1985) used Schutte and Miller’s methodology and investigated the Fs in five different voice
types of trained singers (one of each tenor, base, baritone, soprano, and alto). Their findings
showed that the level of the Fs was about –10 dB relative to the overall SPL for both bass and
baritone singers. Seidner et al. also compared their results from the short-term spectrum of the
sustained vowels to the long-term-average spectrum (LTAS) of a singing phrase; however, they
do not provide details for the measurement and results from the LTAS.
Sengupta (1990) investigated the presence of the Fs by adopting the methods from
Schutte and Miller’s (1985) study. Four males and four females who were trained, Northern–
Indian, classical singers participated in this study. Short-term spectra of the single vowels /a/, /i/
and /o/ were measured from productions that spanned the singers’ full vocal ranges (range from
2-2.5 octave). The spectral balance, the center frequency and the bandwidth of the Fs were
analyzed. The average results from these eight singers were comparable to Schutte and Miller’s
(1985) findings. Sengupta showed that the center frequency and the bandwidth of the Fs
increased when the F0 increased. Sengupta further found that the spectral balance across all
singers for the vowel /a/ was rather stable when the F0 range was between 230 Hz – 400 Hz,
with an average Fs level of –4 dB (relative to F1), and decreased when the F0 increased. In this
paper, Sengupta did not specify whether all singers were found to have the Fs; however, their
results were presented as the average of all the singers. Therefore, it is assumed that all these
27
So far, most of the studies reviewed investigated the presence or absence of the Fs but
without uniform agreement on how the Fs is defined. Omori, Kacker, Carroll, Riley, and
Blaugrund (1996) sought to determine the Fs by using different measurements from the short-
term spectrum. Each sustained vowel /a/ was analyzed by Fast Fourier Transform using a
Hamming window. The investigators measured the Fs quantitatively by calculating the “singing
power ratio” (SPR) of the sustained /a/. The SPR was determined by dividing the singing power
peak (SPP), the highest harmonic peak between 2-4 kHz, by the highest peak between 0-2 kHz.
The spectra of the sustained sung and spoken /a/ vowels from 37 singers (21 professional singers
and 16 non-professional singers) and 20 non-singers were measured and compared. The age
range of the 37 subjects was from 19-60 years with the duration of the vocal training ranging
Results from Omori et al. (1996) showed that the SPR of the vowel /a/ sung by the
singers was significantly greater than the SPR produced by male and female non-singers. There
was no significant difference in the SPR between professional and non-professional singers. In a
comparison of the sung vowel /a/ and the spoken vowel /a/, results from statistical analysis
showed significantly greater SPR in the sung vowel than that of the spoken vowel for trained
singer. Omori et al. also found that there were no significant differences in SPR between the
male and female singers for either the sung or spoken vowels. Finally, they investigated the
effectiveness of vocal training in relation to the SPR of the sung vowel. Statistical results showed
that there was a significant difference in SPR related to the duration of training: SPR produced
by singers who had longer durations of training (> 4 years) was significantly greater than the
SPR produced by the singers who had shorter durations of training (< 4 years). Based on these
28
results, Omori et al. (1996) concluded that SPR was a reliable tool to analyze the acoustic
Although Omori et al. mentioned that vocal training of the subjects ranged from 1-42
years, it was not specified whether the years of training related to whether the singers were
professional or non- professional. If the non-professional singers had fewer years of vocal
training than the professional singers, as one might expect, the training effect would contradict
the finding of no significant difference between the professional and non- professional singers.
Lundy, Roy, Casiano, Xue, and Evans (2000) attempted to replicate the findings from Omori et
al. (1996) to evaluate the utility of SPR in investigating the Fs. Lundy et al. recruited 55 singing
students (14 males and 41 females) between the ages of 18 and 37 years. Their results were
opposite to those of the Omori et al.’s study in that Lundy et al. found no significant difference in
SPR between the sung and spoken vowel. Lundy et al. also found no significant difference
between the SPR of the sung vowel related to the duration of training. However, there is one
common finding from both studies in that there was no significant difference between the male
Lundy et al. (2000) suggested that their results did not replicate the findings of Omori et
al. (1996) because of differences in the populations. The subjects in Lundy et al.’s study were
students whereas Omori et al. studied professional singers with a broader variation in length of
study. Lundy et al. further questioned whether SPR can represent the acoustic characteristics of
the singing voice quality and suggested that SPR needed to be investigated in future research.
In a more recent study, Sundberg (2001) used two different acoustic analyses to clarify
the definition and determination of the Fs: The short-term spectrum analysis and the LTAS
29
analysis. According to Fant’s (1960) acoustic theory of speech, formant frequencies affect
formant levels in normal speech. When two formants are close in frequency, the levels of the
formants increase. Sundberg (2001) adopted Fant’s equation for predicting formant levels and
predicted the different levels of F3 (denoted as L3). He applied a voice source that decreased by
12 dB/octave and varied the values of F1 and F2 to measure the impact on the Fs. The values
predicted by Fant’s equations were referred toas “expected values” and values measured from
participants of this experiment were indicated as “observed values.” Findings confirmed Fant’s
suggestions that different frequency spacing of F1 and F2 affects the L3. The next step was to
investigate various sung and spoken vowels produced by Western classically trained singers and
untrained singers.
Sundberg (2001) asked three male speakers (1 trained singer and 2 untrained singers) to
read a standard text (not specified by the author) with their normal conversational loudness.
Seven professional singers (4 tenors, 1 baritone and 2 basses) were asked to sing a vowel
sequence (/u/, /o/, /a/, /æ/, /e/, /i/, /ı/) at their intermediate pitch and loudness. Finally, three
professional sopranos were asked to sing a solo part of a choir piece with the pitch range from
D4-G5. The vowels (/u/, /o/, /a/, /æ/, /e/, /i/, /ı/) sung on sustained long notes in this choir piece
Spectra of these samples were calculated at the middle part of the vowel. L3 was
determined by measuring the strongest partial in the frequency region between 2-4 kHz while
level of F1 (denoted as L1) was measured as the strongest partial near F1. The difference
between L3-L1 was then calculated. The difference between the expected and the observed
30
values was then calculated. Sundberg (2001) suggested that if the observed level was
significantly higher than the predicted level, the vowel could be defined as having a Fs.
Findings from this experiment showed that the average level of the Fs across vowels for
the male singers was 10.8 dB, with the vowels /u/ and /o/ giving the highest levels of the Fs. The
differences between observed and expected values of L3-L1 were close to 0 dB or negative for
speakers, with an average of and –3.1 dB across speakers and vowels. The results for female
singers showed that the values of L3-L1 varied greatly between and within vowels with a mean
value of –4 dB. Sundberg (2001) noted L3-L1 might be difficult to measure in the female singers
because of the high F0 and the resultant great frequency distance between adjacent partials;
therefore, L3-L1 varied greatly depending on how close a partial was to F1 and F3. Sundberg
(2001) suggested that the LTAS gives clear spectrum envelope peaks at certain formant
frequencies during singing because it yields the time average of sound level in adjacent
frequency bands. LTAS is stable for speech and singing samples, and most importantly, it is less
dependent on F0 and intensity than other analysis techniques; therefore, LTAS may be most
appropriate in analyses of voices with high F0. Sundberg used the LTAS to measure the Fs from
commercial recordings of 20 classical trained singers representing 5 different male and female
voice types (soprano, alto, tenor, baritone, and bass voices). Each sample was approximately 30
seconds long. Both the center frequency and level of the Fs were measured for each sample. As
noted earlier, findings showed that both the frequency and the level of the Fs varied within and
Each of the studies discussed above made different contributions to the definition of the
Fs. Schutte and Miller (1985), Sengupta (1990) and Sundberg (2001) clarified the definition of
31
the Fs by using varied methodologies of acoustic analysis, such as LTAS, short-term spectrum,
and the difference between L3-L1. Bloothooft and Plomp (1984, 1985, 1986) investigated the
effects of various factors quantitatively by using the 1/3-oct filter spectra. Omori et al. (1996)
and Lundy et al. (2000) determined the Fs quantitatively by calculating the singing power ratio
(SPR). The results of these studies provide an overall picture of the interactions between the
important factors of the Fs: voice classification, fundamental frequency, intensity, and vowel
configuration. What has yet to be fully investigated, however, is vocal technique. The following
questions are paramount: What is the impact of Western classical vocal training on the Fs? Do
other types of singers exhibit the Fs? Are there other methods of vocal training that also yield
Fs?
Overall, Sundberg’s studies suggest that Western classical voice training is important to
the development of the Fs. Although studies by Wang (1985) and Sengupta (1990) suggest that
singers trained in other musical styles produce the Fs, Sundberg’s limited investigation (2003)
doesn’t support those findings. When comparing Western classically trained singers to untrained
singers, many studies (Rossing, Sundberg & Ternstrım,1986; Ternstrım & Sundberg,1989;
Ross, 1992; Sundberg, 2001; Cleveland, Sundberg & Stone, 2001) show that only trained singers
exhibit the Fs. Further, these studies also suggest that the Fs only occurs with classical Western
training. However, Rossing, Sundberg and Ternstrım (1986) investigated whether other types of
Rossing et al. (1986) compared the timbral difference between solo and choral singing.
They first investigated eight trained male singers (3 professional and 5 amateur singers with
32
various amounts of vocal training) singing in both choral and solo modes. Singers were then
asked to sing one solo musical phrase that was written by the researchers in order to incorporate
most of the same vowels and pitches with the choral and solo passages. Each sample was
analyzed by LTAS. Their findings from one professional singer showed that the energy increased
around 2-4 kHz for both the solo passages (both loud singing and soft singing) and the choir
passages (loud singing) suggesting the presence of the Fs. The results also showed that all
professional singers had more energy around 2-4 kHz than amateur singers. Furthermore, the
choir passages, especially the soft singing, exhibited more energy in the lower, fundamental
frequency region (100 Hz- 315 Hz) which Rossing et al. attribute to glottal source characteristics.
Following Sundberg’s model, Rossing et al. suggested that prominent energy in the Fs region
was mainly affected by different articulatory factors that resulted from training. Therefore, even
trained singers appear to use different singing techniques for solo versus choral singing.
Ternstrım and Sundberg (1989) investigated the presence of the Fs in eight untrained
choir singers (bass singers). Singers were asked to speak a song phrase, four times, with their
normal conversational pitch and loudness. They were also asked to sing the song phrase four
times. Samples were analyzed by LTAS and the level of the Fs was measured from these spectra.
The level of the Fs was determined by measuring the difference between formant peaks at the
lower frequency region (around 500 Hz) and the peaks in the Fs region (around 3000 Hz).
Sundberg’s (1986) definition of the Fs as energy around 3 kHz which averages 7.2 dB greater in
singing than in spoken phrases was used to determine the Fs. Findings from the LTAS showed a
small increase in amplitude of the Fs region in the sung phrase when compared to the spoken
phrase (mean=1.4 dB). Ternstrım and Sundberg then suggested that untrained choir singers did
33
not exhibited the Fs since the increase in high frequency energy for singing did not approximate
7.2 dB. Ternstrım and Sundberg also compared their results with the previous study by Rossing
et al. (1986) in which the professional singers generated the Fs not only in the solo passages but
also in the choir passages (loud singing). In comparison, Ternstrım and Sundberg concluded that
the untrained singers in their study were unable to generate the Fs in the choir mode. This
contrasts with trained singers who are able to produce the Fs in both solo and choir modes.
In a more recently study, Cleveland, Sundberg and Stone (2001) investigated the
presence or absence of the Fs in male country singers and compared the spectra of these singers
to one classical singer. Five male country singers were asked to sing the National Anthem as
well as one country song, chosen by each singer. Subjects were then asked to speak the text of
the National Anthem and of the song that they chose to sing. The classically trained singer was
also asked to sing and speak the National Anthem and one piece from the oratorical collections.
Samples were recorded and then normalized and analyzed by the LTAS. Results showed that the
classically trained singer obtained a clear Fs by increased energy in the Fs region near 2.8 kHz.
The LTAS for the country singers were similar for the spoken and sung samples and did not
show the Fs. However, Cleveland et al. (2001) noted a slightly increased energy peak between 3-
4 kHz that was observed in all spoken samples that suggest that presence of the speaker’s
formant. The “speaker’s formant” is typically found in “good voices” of singers, actors, radio
announcers, etc. (Leino, 1994; Nawka, 1997). The results of this study led Cleveland et al. to
There are a few studies in which researchers investigated the Fs by using singing styles
from different cultures. Wang (1985) investigated ten male singers with three different singing
34
styles: Western opera singing style, Chinese singing style and early music singing style. Details
of the singers and how many singers represented each singing style were not provided. Singers
were asked to sing three vowels, /a/, /i/ and /u/, with their full voice. All samples were analyzed
acoustically with a short-term spectrum. Wang also investigated the physiological changes
during singing by measuring the vertical distance of the larynx position during singing. Wang
found that the Chinese opera and the early music singers exhibited higher positions of the larynx
than the Western singers, yet the Fs still was exhibited in these two non-Western singing styles.
The position of the larynx was the highest for the vowel /i/ with a similar height for the vowel
/a/, and the lowest larynx for the vowel /u/ for all singing styles; however, reports from Wang’s
study did not provide the formant frequency and amplitudes of for each vowel nor of the clear
relations between the laryngeal height and acoustic measurements of vowel qualities. Also, it is
not clear how they defined the high energy peak (amplitude) for the Fs.
Sengupta (1990) also compared different cultures and singing styles by investigating four
male and four female trained, North-Indian classical singers. These singers were asked to sing
/a/, /i/, and /o/ with their full vocal range. Samples were recorded on a stereo cassette recorder
with a microphone 17 cm from the subject’s mouth. Spectrograms were taken over the frequency
range up to 8 kHz using both narrow band and wide band filters. PWR spectra were taken at the
Sengupta (1990) first identify the Fs by comparing the spectrum of the sung vowel to the
spoken vowel from the trained male singers. Results showed the presence of the energy around
2-4 kHz for the sung vowel /o/ and absence of the energy around the same region for the spoken
vowel /o/ indicating the Fs was found in the sung vowel /o/. Results for the female singers,
35
however, were not detailed. In this study, Sengupta also investigated the center frequency of Fs
by measuring the amplitude of the highest partial or by averaging the amplitude of the two
highest partials in the region of 2-4 kHz. Results showed that when the F0 increased, the center
frequency of the vowel /a/ sung by four male and four female trained singers increased.
Furthermore, Sengupta measured resonance balance of the sung vowel /a/ by calculating the
amplitude level differences between the spectral region of the Fs (between 2-4 kHz) and region
of vowel formants (frequency under 1.8 kHz). The results showed steady values of –4 dB
between the range of 230 Hz to 400 Hz and gradually decreases as the F0 increased for the vowel
/a/. Results also showed that the level of the Fs raised when the pitch raised.
In another study, Ross (1992) investigated the presence of the Fs in Estonian folk singing,
performed by two female singers. Both singers performed one Estonian folk song with a F0
range of 200-300 Hz. The LTAS from the first 50 seconds of the song was computed and a
determination of the presence of Fs was made. The findings showed that there was no increased
energy or a clustered of formants around the Fs region: the level of the peak around the Fs region
(3 kHz) was about 30-40 dB less than that of the first formant. Ross, therefore concluded, these
two female Estonian folk singers did not exhibit the Fs; however, it is not clear whether the Fs
was not exhibited in these singers because of the singing styles or because female singers do not
To date, research on the Fs has been primarily based on experiments conducted with
Western classically trained singers although there are a few studies that also identify Fs in other
singing styles (Wang, 1985; Sengupta, 1990). Perhaps because of the population generally
studied, one of the prominent suggestions has been that the techniques employed in Western
36
classical singing constitute a primary factor for the Fs. A broader perspective on the Fs,
including both the entire range of factors influencing the Fs and the hierarchy of these factors,
can be gained by gathering data from professional singers who have been trained to use
specialized singing techniques other than those used in Western classical music. Traditional
Chinese opera is one such form of professional singing that requires extensive training but
utilizes techniques that are distinct from those used in Western classical singing. The following
section reviews the variables that are distinctive in traditional Chinese opera.
In general, information on Chinese opera is rather limited. Although there are many
different regional operas that evidence some differences, the general musical style can be
discussed collectively under the umbrella of “Chinese opera” (Grout & Williams, 2002). Hsu
(1992) indicated that traditional Chinese opera in the early era was often performed in the open
air with a small Chinese instrumental ensemble. It was then moved to the theatre or teahouse
where the audience noise and social activities under the stage were unavoidable aspects of these
performances. Although the size of the Chinese ensemble is not as large as the Western
orchestra, the loudness is no less than the Western orchestra, and the timbre that these
instruments produce is extremely piercing to the ears. Nevertheless, traditional Chinese opera
singers can still be heard clearly above the loud, piercing instrumental ensemble and the
audience noise, in the same way that trained Western classical singers can be heard over the loud
orchestra in large concert halls and opera houses. This suggests that the traditional Chinese opera
37
Traditional Chinese opera singing is quite different from that of the Western classical
singing in terms of training and techniques. In an interview with five traditional Chinese opera
singers and teachers (personal communication, 2001) it was noted that traditional Chinese opera
singers begin their training at the age of 5 to 10 years. By comparison, it is recommended that
Western singers not start vocal training until after puberty (Sataloff, 1996). According to the
Chinese opera teachers, traditional Chinese opera performance historically has placed emphasis
on the singers’ appearance, especially the aesthetic appearance of the singers’ faces and their
facial expressions. For centuries, singers have been taught to retract the lips when singing
because opening the mouth and protruding the lips are considered unattractive. Today, traditional
Chinese opera singers also believe that the best way to project the voice is to focus on bright
vowel sounds; as opposed to the Western classical training techniques, lip retraction is the most
important technique for producing such bright vowels in Chinese opera (personal interviews
from five famous traditional Chinese opera singers and teachers, 2001).
In Western classical singing, darker voice quality is appreciated more than the brighter
voice quality (Hines, 1990). In order to produce a more aesthetically pleasing tone quality,
Western classical singers are trained to open the jaw, protrude and round their lips. These
techniques serve to project the voice. In addition to the different timbres valued by each music
tradition, the constrictions in the oral cavity due to the position of the tongue, the lips and the
mouth are different in Western classical singing than in traditional Chinese opera singing in part
The vowels /a/ and /i/ are the favorite vocalizations for traditional Chinese opera singing.
Traditional Chinese opera singers believe that the vowel /a/ helps project the voice. This belief,
38
based on practical experience, provides a striking correlation with the findings of empirical
studies conducted with Western classical singers (Sundberg, 1974), i.e. Fs is most prominent in
/a/. Traditional Chinese opera singers also believe that the /i/ sound aids in projecting the voice.
When singers retract the lips for the /i/ sound, the vowel is pushed up into the nasal cavity and
this nasal resonance can carry the voice (Hsu, 1992), leading Western listeners to perceive a
Another difference between Western classical singing and Chinese opera relates to voice
classification. Hsu (1992) and Grout and Williams (2003) suggest that voice classification in
traditional Chinese opera is based on “singing style” or character type. This proposal was
confirmed in personal interviews with professional traditional Chinese opera singers. Singers are
not categorized by vocal ranges, such as soprano, alto, bass, baritone, and tenor, as they are in
Western classical singing. Rather, singers of traditional Chinese opera are classified by the kind
of characters that they typically portray. The timbre and pitch of the voice depends on age, sex,
and social status of the dramatic roles. The major three characters for male singers in Chinese
opera are Lao-sheng, Hsiao-Sheng and Wu-sheng. The Lao-sheng character-type is a middle-
aged or old man, an official of the imperial court, a general, or some other distinguished person.
This character-type sings with a full baritone voice. The Hsiao-sheng, or “scholar-lover,”
character-type has a high-pitched voice similar to the tenor and sings in the falsetto region. Wu-
sheng, who plays warrior roles and wears costumes which symbolize armor, has a wide vocal
range. This character-type is more involved in acting than in singing. Another character, Jing, is
known as “painted face male.” His facial colors symbolize the type of character; for example, red
represents good and white represents treacherous. Jing often plays the part of a high-ranking
39
army general, a warrior or official depending on his paint. “His robust baritone voice and unique
painted face together with his swaggering self-assertive manner all combine to make him the
most forceful personality in most scenes in which he appears” (Chinese traditional opera, 2003,
para. 5).
Although these character-types of traditional Chinese opera could be located within the
range-based classification system used in Western classical singing, their ranges are not exactly
the same. Traditional Chinese opera singers are trained to have much wider ranges than Western
classical singers. For example, the Lao-Sheng and Jing character-types, who most often sing in
the baritone range, also sing up into the tenor range and down into the bass range during opera
performance.
Given these differences discussed above, one may wonder whether the traditional
Chinese opera singer also has a high frequency peak in the spectrum, i.e. Fs. This question was
addressed in a pilot study by investigating the Fs in traditional Chinese opera, particular in Lao-
As noted, traditional Chinese opera singing has different techniques of training than
Western classical singing techniques, yet the voices can still be heard over a loud instrumental
ensemble. Furthermore, Wang (1985) confirmed the Fs in traditional Chinese opera singing. Su
(2000; 2002) hypothesized that the Western vocal training style in which the singers are taught to
round the lips, lower the larynx and lengthen the vocal tract might not be the only contribution to
the Fs. Instead, the intensity, F0, and the vowel configurations may have more effect on the Fs.
Two males (1 Lao-sheng and Jing: range overlapped with baritone and tenor) and one
female (in soprano range) traditional Chinese opera singers from the National Taiwan Traditional
40
Chinese Opera Department served as subjects in this study. Each singer had at least 15 years of
vocal training. Each singer was asked to choose a familiar musical phrase that was at least 40
seconds in duration. The pitch rangesof their singing phrases were not controlled. The singers
were asked to read the text of the musical phrase that they chose, three times, with their normal
conversational voices. They were then asked to sing the phrase with their full voice as if they
were singing in a large concert hall. Using the same musical score and range, each subject was
then asked to sing the phrase three more times with the single vowels /a/, /i/, and /u/ replacing the
text. Samples were recorded on a DAT in a quiet room in the National Taiwan College of
Performing Arts. The output of the DAT signals were analyzed using CSpeech (Milenkovic,
1987), a computer based speech analysis program. All samples were analyzed by long-term-
The results from this pilot study showed that the female singer (range =soprano) did not
exhibit the Fs for any of the samples. One male singer, Lao-Sheng (range overlaps baritone and
tenor), exhibited the Fs for all of the sung samples. Only two samples sung by Jing (range
overlaps baritone and tenor) exhibited the Fs; however, the LTAS did show energy extending to
the high frequency region for this singer, but this did not quite meet the criteria developed for Fs.
This result agreed with Weiss et al.’s (2001) suggestion that Fs is not necessary for the high-
pitch voice because the maximal projection of the high-pitch voice exhibits strong energy in the
high frequency area. When the phrase sung by Jing which did not exhibit the Fs was edited to
eliminate the intervals with F0 of 695 Hz or above, a Fs was exhibited. It was hypothesized that
the Fs was not seen in the LTAS because of the high F0 for some segments of the singing. As
noted by previous researchers, the Fs does not occur in voices with high F0.
41
One possible explanation for the lack of Fs in some of the traditional Chinese opera
singers in this pilot study may relate to the vowel content of the phrases. As noted earlier, vowel
quality is a factor that influences Fs; therefore, the number of /a/, /i/, and /u/ vowels from the
musical phrase that traditional Chinese opera singers sang were counted. It was found that there
were few of these vowels in the traditional Chinese opera lyrics. In order to investigate the vowel
dependent-nature of the Fs in the singing phrase, the pilot study included a comparison of each
singer’s performance of the phrase using the original text to his performance of the same phrase
using only single vowels, /a/, /i/, and /u/. Results showed that Lao-Sheng exhibited the Fs in all
three vowels samples. Although Jing exhibited no sign of the Fs when the phrase was sung with
the original text, when the singer was asked to sing the same musical phrase with a single vowel,
the result showed the existence of the Fs in the vowels /a/ and /u/ but not the vowel /i/.Both male
singers showed the highest level of the Fs for vowel /a/ suggesting that the Fs may be exhibited
in the preferred vowels. Thus, absence of the Fs in the performance of the musical phrase with
the original text might be explained by the dominance of non-preferred vowels, rather than an
42
Chapter III: Research questions
As illustrated through the review of existing literature, different research questions and
methodologies have been used to investigate the Fs. Previous studies identified a range of factors
influencing the Fs, including vocal training technique, F0, and vowel quality, and suggested that
the Fs cannot be explained merely by one factor. The main purpose of this study was to
determine whether the Fs exists in traditional Chinese opera as well as in Western classical
singing. Secondary questions that were closely related to the main purpose of this study were
also investigated. The three following questions were addressed: (1) How is the Fs perceived by
trained listeners? (2) What factors impact the Fs? (3) What is the impact of the independent and
combined factors on the Fs and do these differ for Chinese and Western singers?
however, few studies have compared these results to the perception of the Fs. Therefore,
perceptual judgments were obtained in the present study from a group of highly trained singers.
In past years, researchers investigated the Fs to determine how it enhances the singing voice so
that listeners are able to perceive the “ring” over a very loud orchestra (Bartholomew, 1934).
Vocal “ring” relates to people’s perception of the timbre of professional singing whereas the Fs
is considered to be the physical correlate of vocal “ring.” Unfortunately, not many studies have
been done to evaluate the vocal ring and its relation to the acoustic measures based on perceptual
judgments of the Fs. Those studies that have investigated the relation between vocal ring and Fs
43
(Wang, 1985; Omori et al, 1996) were not detailed enough to yield unambiguous interpretation
of their results. In the current study, we asked whether trained listeners could identify a “ring” in
different types of singing such as traditional Chinese opera and Western classical singing and
whether listeners’ perceptions corresponded to the physical identification of the Fs. A second
question addressed in this perceptual study was whether judgments of vocal ring are reliable
In the past, quantitative definitions of Fs were based on a single vowel, but identification
of the Fs from connected passages of singing was mostly based on categorical evaluation alone
(i.e. present or absent). Researchers have used a variety of methodologies in these studies (e.g.
short-term spectra, LTAS, 1/3-octave bands) and reported inconsistent findings. There are no
studies that have investigated different methodologies in one experimental design to determine
whether analysis procedure is a factor in identifying the Fs. In this research, we asked whether
Many studies have investigated the Fs by using LTAS because it is stable for speech and
singing samples, and most importantly, it is less dependent on F0 and intensity than other
analysis procedures (Sundberg 2001). LTAS provides information about spectrum envelope
peaks during singing because it yields the time average of sound level for adjacent frequency
bands (Sundberg, 2001). Mendoza, Munoz and Naranjo (1996) measured voice stability by using
the LTAS and suggested that the LTAS is an appropriate measurement to detect the stability of
speech signals of 30 seconds or greater. Moreover, Byrne et al (1994) used the LTAS to compare
44
the spectra of different languages and their results showed that the LTAS was similar for all
languages which suggested that LTAS is applicable for the present study.
In many previous studies, the precise amount of energy increase in the LTAS to define
the Fs was not provided. The present researcher found that this description of Fs was hard to
apply because it was not clear how a peak should be defined in terms of amplitudes and
bandwidth. Therefore, in this study, categorical (i.e. present or absent of the Fs) and quantitative
measures of LTAS (center frequency, relative intensity L3-L1 difference) and short-term spectra
(L3-L1) were made in order to decide on the presence or absence of Fs. Comparisons of these
The final questions that we asked were: What are the effects of the independent and
combined factors that affect the Fs and do these factors differ for Western and Chinese singers?
Voice classification was controlled and other factors such as singing technique (Chinese and
Western), vowel quality (/i/, /a/,/u/), fundamental frequency and intensity of singing were
45
Chapter IV: Methods
Three experiments were conducted to investigate the Fs in Chinese and Western classical
opera. Perceptual judgments were obtained in the first experiment by asking a group of highly-
trained singers to judge the presence or absence of the vocal ring. The second experiment
examined the Fs by measuring LTAS, categorically. The third experiment evaluated the Fs by
measuring LTAS and the short-term spectrum quantitatively. The purpose and details of these
This section will describe the subjects who participated in this study (Chinese and
Western) and detail the recording procedures and tasks that were used to collect the data for the
Singers
Ten males (5 Lao-sheng and 5 Jing with vocal ranges somewhat overlapping tenors and
baritone of Western classical singing) trained in traditional Chinese opera singing participated in
this study. All singers had a minimum of 15 years of individual voice lessons. These singers
were all professional singers and were selected from National Taiwan Traditional Chinese Opera
Ten Western classically trained singers (5 tenors, 5 baritones), with a minimum of 5 years
of vocal training were selected from the Indiana University School of Music. All singers were
referred by their voice professors only if they met the professors’ criteria of “professional
singers.” Singers’ ages ranged from 27 to 40 years. Although the two groups of singers were in
the similar age range, they had quite different durations of vocal training. Recall that it is typical
46
for Traditional Chinese opera singers to start training between the ages of 5 to 10 years old. On
the other hand, Western classical singers normally start their training after puberty. Therefore,
singing expertise, as judged by singing professionals, was a more realistic criterion for matching
subjects than duration of training if singers’ were to be of equivalent age. Age was considered to
be an important control variable because of the known effects of aging on the voice. (Hollien &
Shipp, 1972). All subjects were selected under the conditions of self-report of normal hearing
and had no medical history of voice pathology. Before the recording started, each subject was
asked to fill out a questionnaire, which was prepared by the researcher (see Appendix A). The
purpose of the questionnaire was to collect information about the characteristics of the subjects
such as the singers’ age, the singers’ vocal category, years of training, etc. to assure that all
Recording Procedures
Subjects were asked to choose one of the musical phrases that they were most familiar
with and bring the lyrics with them on the day of the recording. The recordings were done in a
quiet room in either National Taiwan College of Performing Arts or in the Indiana University
Department of Speech and Hearing Sciences. Each subject was asked to stand in a comfortable
position with a condenser microphone (ATM71) positioned 30 cm in front of his lips. The
microphone signals were transduced and recorded on a DAT recorder (SONY TCD-D8). A
sound level meter was placed at the same distance as the microphone position and the sound
level of each singer’s voice was measured and noted. A musical keyboard was provided for the
singers to identify comfortable keys in which they wanted to sing; this instrument also helped to
47
Data Collection
Subjects were asked to prolong the vowels /a/, /i/, and /u/, with the most natural and
comfortable habitual pitch and loudness for 5 seconds each. These samples constituted the
sustained “spoken” vowel samples. The purpose of this and the following task was to compare
the single vowels of the non-singing samples to the singing samples. Subjects were asked to sing
the vowels /a/, /i/, and /u/. Subjects were instructed to sing these three vowels in their most
comfortable pitch for 5 seconds each. They were asked to sing as loud as possible as if they were
singing in a big concert hall. These vowels provided the sustained “sung” vowels that were
compared with the sustained “spoken” vowels from the previous task. Subjects were asked to
sing certain musical notes such as C4, D4, G4, and E4 given by a keyboard cue. Those music
notes were in the range produced by baritone and tenor voices. The rationale for giving the same
notes to all the singers was to be able to have a standard for comparison across the different
singers. However, when the singers were asked to produce these given notes, some of the singers
were not able to complete the task due to their different vocal ranges. Even though the notes
were within the standard ranges for tenors and baritones, some of the singers had to sing an
octave higher or lower than the target pitch for certain notes. Some of the singers were not
comfortable with certain notes and they either refused to sing them or sang very uncomfortably.
Due to the lack of the consistency of these singing notes, this part of the data was not analyzed.
Subjects were also asked to glide up and down the musical scale of their comfortable ranges with
the vowel /a/. The lowest and the highest notes of this glide were sustained for at least 1-2
seconds.
48
Prior to the recording session, each singer chose his most familiar musical phrase, with a
minimum length of 40 seconds. This was done to give singers the opportunity to present their
best performances in the following tasks. The musical passage for each Western classically
trained singer is listed in Appendix B to indicate the language and type of repertoire that each
singer used in his performance. All Chinese singers sang in Mandarin. The singers were first
asked to read the phrase that they chose, three times, using their normal conversational pitch and
loudness. The purpose of this task was to compare running speech to the singing phrases. One
1974). The rationale for reading the same text from the singing phrase was to be able to control
and compare the vowels in both singing and speaking samples. Moreover, repeating the text
three times provided a sample of adequate length for a stable acoustic analysis with the LTAS
The singers then sang the same musical phrase with their most comfortable pitch range.
Singers were asked to pretend that they were singing in a large concert hall so that they could
perform with their full voices in which the Fs may normally occur. The number of the vowels /a/,
/i/, /u/ from each musical phrase were also counted to evaluate whether or not the vowel content
affected the magnitude of the Fs. The same musical score and range were then sung with each
single vowel /a/, /i/, and /u/. There were two rationales for the selection of these three vowels.
First, results of many studies showed that vowels /a/, /i/, and /u/ have the most acoustically
distinct Fs (Sundberg, 1970, 2001; Bloothooft & Plomp, 1984, 1985; Su, 2000). Second, the
vowels /a/ and /i/ are the most common vowels used for traditional Chinese opera (Hsu, 1992).
By comparing the musical phrase performed on a sustained vowel with the same musical phrase
49
performed with the original text, we were able to investigate the effect of phonetic context on the
Fs.
Subjects
“Vocal ring” was judged perceptually by 12 doctoral students in voice performance, each
with a minimum of 5 years performance experience, from the Indiana University School of
Music. One male singing professor, the chair of the voice department, who has more than 20
years of teaching experience, also participated in this perceptual study. The average age of the 12
doctoral students was 28 years with a range of 27 to 36 years. The age of the singing teacher was
55 years. All listeners had attended recitals and concerts for at least 5 years and were familiar
with different voice qualities. All the listeners passed a hearing screening with thresholds of 20
Procedures
Stimulus tapes
All 40 samples of the sung passage were digitized (see section on acoustic analysis) and
transferred from the computer to a DAT tape. There were two blocks of stimuli: 1. Original
musical phrase and 2. Musical phrase sung with the single vowel /a/. Because of the time
consuming, vowels /i/ and /u/ were not included in the perceptual experiment. There were two
consecutive repetitions of each sample within each block, with a 3-second interval between the
repetitions. Each block had 20 samples (10 Chinese singers and 10 Western singers) and lasted
approximately 15 minutes. In addition to the stimulus tapes, two sets of samples were recorded
for a practice tape and each set contained 3 samples. Each set of practice stimuli was 60 seconds
50
in duration. Samples were presented with a 5-second interval before the next stimulus. Both sets
Listening procedures
Practice section
Listeners received a practice session to familiarize them with the “vocal ring.” The
practice session was held one day prior to experimental testing in a quiet classroom in the
Indiana University School of Music. The stimulus tape was presented through 2 loudspeakers at
a distance of 6 feet from the listeners. The samples were presented at 80 dB SPL, as measured 6
feet from the loudspeakers. There were two sets of samples presented in this practice session and
each set contained three samples. The first set contained Western classically trained singing,
traditional Chinese opera singing and popular music singing, all from commercial recordings.
These samples were used to provide the concept of vocal ring. Both Chinese opera and Western
classical samples contained a ring whereas the popular music did not possess a ring as judged by
The second set of stimuli acquainted the listeners with the type of stimuli and procedures
that would be used in the experiment. The three samples in the second set contained Western
classically trained singing, Chinese opera and popular music all sung without orchestral
accompaniments. The samples of Western opera were obtained from a singing professor at
Indiana University, School of Music. The traditional Chinese opera sample came from samples
collected for an earlier pilot study (Su, 2000). The popular music singing sample was recorded
by a student from the School of Music who was not a professional singer.
51
Before the practice session started, each listener was given a rating sheet with a 3-point
scale (1= strong vocal ring, 2= not sure, 3= no vocal ring) on the sheet. After listening to each
sample, listeners were asked to rate the vocal ring on the rating sheets. The answers recorded by
the listeners provided information for the researcher to determine whether the listeners’ concept
of the vocal ring was consistent within the group and with the experimenters’ judgments.
Experimental session
The experimental session was conducted in a quiet classroom in the School of Music.
The stimulus tape was presented through 2 loudspeakers at a distance of 6 feet from the listeners.
The samples were presented at 80 dB SPL, as measured 6 feet from the loudspeakers. Listeners
were given the response sheets which were divided into a rating section and a “comments”
section (see Appendix C). All ratings were made on a 3-point scale, with ‘1’ indicating “strong
vocal ring” perceived, ‘2’ indicating “not sure,” and ‘3’ indicating “no vocal ring.” Listeners
During this experiment, you will hear samples of Chinese opera or Western classical
singing. Each sample will last about 30 seconds and will be played twice. You are required to
judge if there is a vocal ring in each sample by using a scale from 1 to 3. Rate the sample as 1 if
you hear a strong vocal ring. Rate the sample as 2 if you are not sure, and rate as 3 if you hear no
vocal ring. You will be getting two answer sheets. The first one is the score sheet which has the
rating scale for you to respond. The second sheet is for you to write your comments. Record your
rating on the score sheet. Please wait until you have heard 2 repetitions of the sample to mark
your score sheet. You are allowed to write down your comments anytime during stimulus
presentation. Write down your comments on the second sheet of paper regarding how you feel
52
about the vocal quality of different samples that you hear. You can also comment on which part
of the music you heard the ring; for example, at the high pitch, low pitch, or certain vowels, etc.
These comments are optional. We will not play the next sample until you are all finish your
ratings and comments, and are ready to proceed. There will be 2 blocks of samples. Block 1 will
have the phrases with original texts and Block 2 will present the singers as they sing an isolated
vowel. A 5-minute break was provided between the first (phrase) and second (vowel) blocks.
There are a total of 40 samples for you to rate. I will indicate the sample number before each
Five listeners who participated in the perceptual rating procedure were recruited again
four months after the test and were asked to rate the same samples (the regular singing phrase
and phrase sung with the vowel /a/) that they rated during the first experimental session. The
same procedures as were used in the first test session were used in this second rating session.
Data Analysis
The results for the perceptual test were based on 70% agreement across listeners. That is,
a sample was considered to have a vocal ring if 70% of the listeners rated the sample with a “1”-
strong vocal ring. The same criterion of 70% agreement across listeners was used for ratings of
“not sure” and “no vocal ring.” Intra-judge reliability was based on the number of samples that
each of the listeners rated the same in both listening sessions. Inter-judge reliability was used as
an index of the strength of the vocal ring. It was reasoned that a singing sample that received the
same rating across listeners in the two experimental sessions had more salient cues to the vocal
ring than a sample that had a wider range of ratings. Therefore, the numbers of listeners who
53
gave the same rating across sessions was taken as an index of the strength or weakness of
Acoustic analysis
One of the purposes of this study was to investigate the different acoustic cues that may
impact the Fs. This was done categorically (Experimental 2) and quantitatively (Experimental 3).
The singers’ samples were output from the DAT tape and analyzed with CSpeech and TF32
(Milenkovic, 1987; 1997; 2003), a speech-processing tool for PC computers. All acoustic
signals were low-pass filtered at 9 kHz and digitized with a 22 kHz sampling rate. Formant
frequencies for the vowels /a/, /i/, and /u/ were measured with FFT and LPC. The LPC analysis
included 26 coefficients. The FFT was used to determine the fundamental frequency and LPC
provided a good estimate of formant frequencies for speech and smoothed the spectrum for sung
vowels.
A LTAS was calculated for each digitized sample by averaging the FFTs (512 points)
This analysis was conducted for all phrase–length passages (i.e. sung passage, sung vowel
extra spectrum envelope peak (a cluster of formant 3, 4 and 5) that appears between 2300 Hz and
3500 Hz. The procedure from previous studies, whereby the researcher first identified increasing
amplitude between 2300 Hz and 3500 Hz was applied. The bandwidth of this peak energy was
54
then measured at the 3 dB down points from the high and low frequency sides of the peak. If the
cluster of energy in the frequency region of Fs had a bandwidth that was less than or equal to
1000 Hz, that cluster was defined as a peak and its low and high frequency –3dB boundaries
were recorded. Therefore, the Fs in this experiment was defined by a cluster of energy between
2300 and 3500 Hz, with a bandwidth less than or equal to 1000 Hz.
Reliability of peak determination was made from randomly picked samples by two other
experienced investigators from the Indiana University Department of Speech and Hearing
Sciences. The presence or absence of a measured Fs in all the regular singing phrases and the
speaking phrases from both Chinese and Western singing groups was determined. The same
musical phrases sung with the vowels /a/, /i/, and /u/ were then investigated to determine if Fs is
impacted by vowel quality. Samples from the regular singing phrase and the phrase sung with the
vowel /a/ that matched the criteria of the Fs were then compared with the perceptual ratings. This
was done to investigate whether there is a relationship between the categorical acoustic cues to
In addition, the presence or absence of the Fs from the gliding musical scales with the
vowel /a/ were also measured by the LTAS. The same procedures for determining the Fs were
used in this analysis. Results were compared with a study from Sundberg (2001) in which a
55
As discussed before, many researchers defined the Fs by investigating single vowels;
however, there is no operational definition of the Fs from a sung passage. Previous researchers
who used LTAS analysis to investigate the Fs simply decided if there was an energy peak
between 2300 Hz and 3500 Hz with increasing amplitude from the region of 2000 Hz.
Unfortunately, no study really specifies how much increase in energy is needed to yield the Fs.
In the current study the researcher attempted to quantify Fs using the differences in energy
(measured in dB) between high (2000 –4000 Hz) and low (0- 2000 Hz) frequency regions. This
high-low frequency energy difference was compared for sung and spoken samples of the same
passage.
The spectra from all the sung and spoken phrases were filtered using Elliptic IIR filters
designed in MatLab Sptool. Each sung and spoken phrase was first low- pass filtered (fс=2000
Hz) with a ripple of 3 dB in the pass-band. The stop-band edge frequency was set at 2500 Hz
with 50dB roll-off. The original full spectra of the sung and spoken phrase were then filtered by
the band- pass filter which was set at 2000-4000 Hz with a pass-band ripple of 3 dB. The stop-
band edge frequencies were set at 1500 and 4500 Hz, each with a 50dB roll-off. All samples
filtered by both low-pass and band-pass filters were saved and then transferred to the waveform
files from the MatLab workspace and analyzed with Cool Edit (Symtrilliam, 1999). The energy
values (RMS in dB) in the low frequency region (0-2kHz) and high frequency region (2-4 kHz)
were calculated by Cool Edit. Finally the relative energy level differences between the high
frequency band (2000-4000 Hz) and the low frequency band (0-2000 Hz) were calculated by
56
The relative energy level difference between the two frequency bands for the sung phrase
was first calculated; a small absolute difference between the two regions was expected if there
was a Fs. The relative energy level difference between the two regions for the spoken phrase was
then calculated; a greater negative difference between the two regions was expected for the
spoken, compared to the sung samples. Finally, the absolute difference in relative energy in the 2
A two- way ANOVA was performed to investigate the main factors of material (spoken
and sung phrases) and style (Western and Chinese groups), as well as their interaction effect.
Tukey HSD analysis was also performed to test the interaction effect between the all the
materials in order to examine which material significantly impacted on the spectral energy. In
order to investigate whether different materials impact the Fs, one-way ANOVAs were used to
evaluate the relative energy differences between the spoken phrase, the regular sung phrase, and
the musical phrase sung by /a/, /i/ and /u/) within each group-Western and Chinese. Moreover,
two sample t-tests were performed to compare the relative energy difference between the
Western and Chinese groups for the vowels /a/, /i/ and /u/.
Finally, Spearman’s Rho was used to investigate the correlation between the quantitative
analysis of the LTAS and perceptual rating for the regular singing phrase and the phrase sung
with the vowel /a/. Composite scores of the perceptual ratings for each singer were determined
by multiplying the number of the responses to receive each rating (1= strong vocal ring, 2= not
sure, or 3= no vocal ring) by the ratings’ scale value and determining the sum. For example,
singer C4 received “1” from four listeners (1x4= 4), “2” from eight listeners (2x8 =16) and “3”
from one listener (3x1 =3). Therefore, the composite rating for C4 was 23.
57
Acoustic measures of the F0
The F0 of the entire sung phrase was calculated by using Cspeech version 4.0
(Milenkovic, 1987; 1997). The algorithm uses an autocorrelation procedure to track F0 changes.
The simultaneous use of the pitch trace, the waveform and the spectrogram provided facilitated
the determination of the pitch of specific vowels. The mean, minimum, maximum, and the
standard deviation of the F0 of all these phrases were computed by positioning the cursors to the
endpoints (beginning and the end) of the waveform. In addition, the highest and the lowest F0
were measured from each sung phrase by manually positioning the cursor to these pitch levels
and recording the output of the F0 values that appeared on the screen.
(Experiment 1). The purpose of this comparison was to investigate any F0 differences between
the singers who exhibited the strong vocal ring and singers who did not exhibit the strong vocal
ring. Both mean F0 and F0 range were investigated to determine their impact on the Fs for both
traditional Chinese opera singing and Western classically trained singing. Because of the limited
sample size, statistical analysis was not undertaken. Instead, the frequency difference of F0 was
operationally defined as a minimum of 1 semitone. This definition was based on the study of
differential pitch sensitivity of the ear (Shower & Biddulph, 1931). Shower and Biddulph found
that the minimum change in F0 that is detectable by the human ear is on the order of 1.0% or less
in the whole musical range. Because the smallest pitch interval between notes in Western music
is a semitone, with the frequency difference around 5.9% of the frequency range, our definition
58
Intensity measurement
The intensity range (the highest and lowest intensities) of each sung phrase was measured
by using the sound level meter during the recording of each phrase. The sound level meter was
placed at the same distance as the microphone position in front of the singer. The researcher
watched the sound level meter when singers were singing their passages, and the highest and
lowest intensities were noted during the singing. The results were compared to the listeners’
perceptions to determine intensity range differences between singers who were rated as having a
strong vocal ring and singers who did not exhibit the strong vocal ring. The differential threshold
for the intensity was operationally defined by minimum of 1 dB. This definition was based on
the study of Reisz (1928), which showed that with the intensity between 70-110 dB, intensity
Several studies (Schutte & Miller, 1985; Seidner et al., 1985; Rossing et al., 1986;
Sengupta, 1996; Sundberg, 2001) determined the Fs by measuring the difference between the
level of the formant peak around 3000 Hz (L3) and the level of the first formant (L1) from either
the short-term spectrum or the LTAS. In this study, the researcher measured the L3-L1 from the
LTAS of the regular singing phrase and the phrase sung with the vowel /a/ for all singers. Again,
comparisons were made between this measure and listeners’ judgments of the vocal ring. The
purpose of this comparison was to investigate the level difference between singers who were
rated as having the strong vocal ring and singers who did not exhibit the strong vocal ring.
Formant peaks from the LTAS were determined by using Rossing et al.’s (1986) criteria.
Rossing et al. defined the level of the Fs as the formant frequency level at the 3 kHz frequency
59
region. They also defined the level of the first formant as the frequency level around 500Hz. The
first formant peak (L1) around 500 Hz was identified manually from the LTAS in the current
study. The frequency and amplitude of the first formant and the formant (L3) in the region of 2-4
kHz were obtained from LPC with 26 coefficients. When there were two or three peaks adjacent
to each other in the higher frequency region near the third formant (around 2.3-3.5 kHz), as is
common for the Fs, the average of these peaks was calculated. The differences between L3 and
Fs also was investigated by short- term spectral analysis. The level of the Fs (L3-L1) was
calculated for each individual vowel from sustained sung and spoken vowels and the vowels
selected from the musical phrase. Both FFT and LPC analyses were used to measure the formant
frequencies for all sustained vowels /a/, /i/ and /u/ and all vowels (/a/, /i/, and /u/) within the
musical phrase. A 50 ms segment was extracted from a steady-state portion of each vowel. The
FFT gave a good estimate of the F0 and harmonics, and LPC provided a good estimate of
formant frequencies for speech and smoothed the spectrum for sung vowels. A 26 coefficients
LPC spectrum and a broad-band spectrogram were used to assist identification of the formant
frequencies. The LPC spectrum was overlaid on the plot of the FFT when the broad-band
spectrogram was displayed. By moving the cursor on the spectra, both frequency and intensity
The prolonged spoken vowels /a/, /i/, and /u/, with the most natural and comfortable pitch
and loudness were first measured and the mean L3-L1 was calculated for each vowel. The L3-L1
from the prolonged sung the vowels and vowels /a/, /i/ and /u/ edited from the sung phrase were
then measured. The mean L3-L1 was calculated and compared with L3-L1 from the spoken
60
sustained vowels. The first formant peak (L1) was identified manually by determining the
frequency and amplitude of the highest harmonic or average of the highest two harmonics near
the first formant. The third formant peak (L3) also was identified by the highest harmonic or
average of the highest two or three harmonics around 2.3 - 3.5 kHz. Formant values from the
sustained spoken vowels from each singer were used as a standard to help determine the F1 of
the vowels edited from the sung phrase. Comparisons were made between the mean L3-L1values
of the spoken and sung vowels within and between each singing group. Additionally, the mean
L3-L1 values were compared to the perceptual judgments to determine if this parameter was a
The formants of the vowels /a/, /i/, and /u/ from the spoken phrases were not measured
with short-term spectral analysis because many vowels were not long enough to exhibit a steady-
state portion. Also, the speed of speech and the quick changes of articulatory movements due to
the different contexts would not provide a reasonable comparison to the lengthened vowels in the
sung passages.
61
Chapter V: Results and discussions
Results for all experiments will be discussed relative to the results of the perceptual
judgments (Exp. 1). Comparisons were made between the acoustic measures (i.e., categorical
measurements of LTAS, quantitative measurements of LTAS, L3-L1 of short term spectra) and
listeners’ judgments concerning the presence of the “vocal ring.” The acoustic results are
compared to listeners’ judgments using a 70% criterion wherein a singing sample was considered
to have a vocal ring if 70% of listeners rated “yes” they heard a “strong vocal ring.” The
category for these samples will be termed “vocal ring” for the remaining discussion. The second
perceptual category was formed by 70% of listeners giving a rating of “not sure” to any sample.
This rating indicated that sometimes listeners heard the “strong vocal ring” and sometimes they
did not. The rating “no vocal ring” will not be related to the acoustic data because this category
was rarely used, and no sample was judged by 70% of the listeners as having “no vocal ring.”
This experiment assessed listeners’ perception of the vocal ring for both traditional
Chinese opera and Western classically trained singers. The Chinese and the Western samples of
the regular singing phrases and the phrases sung with the vowel /a/ were rated. No spoken
passage was rated in the perceptual rating session. All ratings were made on a 3-point scale with
‘1’ indicating “strong vocal ring” perceived, ‘2’ indicating “not sure” (i.e., sometimes yes,
sometimes no), and ‘3’ indicating “no vocal ring”. A practice session was provided prior to the
experimental session that tested listeners’ ability to hear the vocal ring. Results of this practice
62
session showed consistent answers across all listeners, indicating that they agreed on the concept
The perceptual judgment test was given one day after the practice session. The results
showed that 70% of the listeners heard a strong vocal ring in samples produced by 4 out of 10
Chinese traditional opera singers (C8, C11, C14 and C15) for the regular singing phrase (Table
1). However, listeners were “not sure” if a ring was present throughout the musical phrase of
four singers (C3, C4, C5 and C6). Around half of the listeners (46%) heard a strong vocal ring
for singers C9 and C16; but, half of the listeners (54%) were not sure. Essentially the “no vocal
ring” response was not used except by one listener for just two singers, C8 and C11.
General comments from listeners for the regular phrase sung by Chinese opera singers
indicated that the vocal ring was perceived, but not always throughout the entire phrase. Most of
the listeners commented that they heard the vocal ring in the high F0 range but could not hear it
in the lower F0 range. Moreover, listeners indicated the possibility that not perceiving the vocal
ring could be a result of their unfamiliarity with the language and bias in terms of singing
techniques. Listeners did report that the ring was perceived in certain vowels, such as /i/, /u/, /e/
and /a/ in the Chinese samples, particularly in the vowels /a/ and /i/. Listeners also noted that
they heard a stronger vocal ring in sustained notes and vowels than in running notes with
complex context.
For the same musical phrase sung with the vowel /a/, results showed that two singers (C8
and C15) were perceived to produce a strong vocal ring (Table 2). As noted above, these singers
also were perceived to have the vocal ring when they sang the regular phrase. Listeners were
“not sure” about the vocal ring in four singers (C3, C6, C14, and C16). Ratings for the remaining
63
singers (C4, C5, C9 and C11) did not provide any clear judgments about the strength of the vocal
ring. For these singers, less than 70% of listeners rated samples as falling into any of the 3 vocal
ring categories (i.e., strong vocal ring, not sure, or no vocal ring). All singers, except singer C15,
were perceived by at least one listener as having no ring for the phrase sung with the vowel /a/.
Listeners’ comments generally indicated that the vocal ring was mostly heard in the higher F0
and sustained notes when the sample phrase sung with the vowel /a/ was presented. Most
listeners commented that on the samples that they rated “unsure,” they sometimes heard the
vocal ring, yet not throughout the entire phrase. These comments are consistent with those made
64
Table 1: Results of perceptual rating for the regular sing phrase: percentage of listeners’ rating
“strong vocal ring”, “no vocal ring” and “not sure” for traditional Chinese opera singers. Percent
Strong vocal 8% 23% 23% 15% 69% 46% 77% 77% 77% 46%
ring
Not sure 92% 77% 77% 85% 23% 54% 15% 23% 23% 54%
No vocal ring 0% 0% 0% 0% 8% 0% 8% 0% 0% 0%
65
Five out of ten Western classically trained singers (W1, W3, W5, W7 and W9) were
perceived to have a vocal ring for the regular singing phrase (Table 3) whereas judgments of “not
sure” were obtained for two singers (W2 and W6). Approximately half of the listeners heard the
vocal ring and half of the listeners were “not sure” for singers W4, W8, and W10. There was
only 1 singer (W6) for which any listener indicated no vocal ring. For the phrase sung with the
vowel /a/ (Table 4), results from the perceptual rating showed seven singers (W1, W2, W3, W5,
W7, W9 and W10) were perceived to have a strong vocal ring. Five of these singers (W1, W3,
W5, W7 and W9) also were perceived to produce the vocal ring during the sung passage.
Listeners were unsure of the ringing quality for the remaining three singers (W4, W6 and W8).
Listeners commented that the vocal ring of these three singers was sometimes perceived and
sometimes not. Results from the perceptual rating for the phrase sung with the vowel /a/ showed
that only one listener rated one sample (sung by W2) as having no vocal ring.
Five listeners who participated in the perceptual rating procedure were recruited again
four months after the test and were asked to rate the same samples (the regular singing phrase
and phrase sung with the vowel /a/) sung by both traditional Chinese opera singers and Western
classically trained singers. The same procedures as were used in the first test session were used
in this second rating session and the reliability of the listeners’ judgments was calculated.
Additionally, the percentage of listeners that provided the same rating during both listening
sessions was determined and used as an index of the robustness of the singer’s vocal ring.
66
Table 2: Results of perceptual rating for phrase sung with the vowel /a/: Percentage of listeners’
rating “strong vocal ring”, “no vocal ring” and “not sure” for traditional Chinese opera singers.
Strong vocal 15% 31% 0% 23% 69% 38% 54% 23% 85% 8%
ring
Not sure 77% 62% 62% 69% 23% 54% 38% 69% 15% 69%
67
Table 3: Results of perceptual rating for the regular sing phrase sung by Western classically
trained singers: percentage of listeners’ rating “strong vocal ring”, “no vocal ring” and “not sure”
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
Strong vocal ring 92% 23% 92% 62% 100% 23% 100% 46% 92% 62%
No vocal ring 0% 0% 0% 0% 0% 8% 0% 0% 0% 0%
68
Table 4: Results of perceptual rating for the singing phrase sung by Western classically trained
signers with the vowel /a/: percentage of listeners’ rating “strong vocal ring”, “no vocal ring” and
“not sure” for Western classically trained singers. Percent ratings are based on the judgments of
13 listeners.
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
Strong vocal ring 77% 69% 77% 31% 77% 31% 100% 31% 100% 69%
Not sure 23% 23% 23% 69% 23% 69% 0% 69% 0% 31%
No vocal ring 0% 8% 0% 0% 0% 0% 0% 0% 0% 0%
69
Results from the regular singing phrase showed that the reliability of listeners’ judgments
regarding the vocal ring ranged from 30% to 90%, with an average of 50 % (Table 5) for
traditional Chinese opera singers. There was a mean reliability of 46% for the phrase sung with
the vowel /a/, with a range of 20% to 80% across the judges. In comparison to the traditional
Chinese opera singers, the reliability of judgments for Western singers ranged from 50% -100%,
with a mean of 72% (Table 5). The mean reliability of the Western singers for the phrase sung
with the vowel /a/ ranged from 60% to 90%, with a mean of 72%.
On average, 60% of the listeners provided the same rating in both listening sessions with
a range of 40% to 80% for the regular phrase sung by the Chinese singers (Table 6).
Interestingly, the reliability of listeners’ perception was higher for the singers that were judged as
having a strong vocal ring (C8, C11, C14, and C15), with a mean reliability of 70% compared to
the singers that received a rating of “not sure” (C3, C4, C5 and C6); the reliability within
listeners for this group of Chinese singers was 50%, on average, when they sang the regular
phrase. Similar results were found when the phrase was sung with the vowel /a/; there was a
higher reliability across judges (mean of 70% ranging from 60% to 80%) for the singers that
were judged as having a strong vocal ring (C8 and C15) compared to those that were judged as
Mean reliability for the “not sure” singers was 40% with a range from 20 to 60%
reliability (Table 6). Reliability of ratings was lower for a phrase sung with the vowel /a/ than the
regular singing phrase, with a mean reliability of 50% (ranging from 20% to 80%) for the
Chinese singers (Table 6). Judgments for singers C14 and C15 had the highest reliability across
70
listeners, with an average of 80% reliability for the regular singing phrase. For a singing phrase
sung with vowel /a/, reliability for singer C8 was the highest (80%).
71
Table 5. Percentage of samples that received identical ratings across two listening sessions for
each listener. Top panel is for the Chinese singers and bottom panel is for the Western singers.
Chinese EF BH JM TW D3 Mean
Regular singing 60 30 90 60 50 50
Western EF BH JM TW D3 Mean
72
Table 6 Percentage of listeners that perceived identical ratings across two listening sessions for
samples that were perceived to produce a “strong vocal ring,” based on 70% agreement across
Judgment % 60 60 80 80 40 60 60 40
Judgment % 80 60 40 40 60 20
73
The ratings of the Western group showed results similar to the Chinese group for the
sung phrase. The reliability of listeners’ judgments was also higher for singers who were rated as
having a strong vocal ring compared to those singers who were rated as not sure. Results showed
an average of 84% reliability, with a range from 60% to 100% when listeners rated singers who
had a strong vocal ring (Table 7). By comparison, singers who received a rating of “not sure”
had an average reliability of 50%, with a range from 40% to 60%. Unlike the reliability for the
Chinese group, listener reliability for the Western singers did not vary with the perceived
strength of vocal ring when singers sang a phrase with the vowel /a/. Reliability across listeners
was 71% for singers with a strong vocal ring and a mean reliability of 73% when listeners were
Listener reliability in both vocal ring conditions ranged from 40% to 100% reliability for
Western singers singing the phrase with the vowel /a/ (Table 7). Listener reliability was similar
for judgments of the Western singer’s original singing phrase and for the judgments of the phrase
sung with the vowel /a/. An average of 74% reliability across the 5 listeners (with a range of 40%
to 100%) was found for the regular singing phrase, and an average listener reliability of 72%
with a range of 40%-100% was noted for the phrase sung with the vowel /a/. These values were
higher than the reliability of ratings of listener judgments for the Chinese group. Singers W7 and
W9 exhibited the highest reliability for the regular singing phrase and listener ratings were most
reliable for the singers W3, W8 and W9 for the phrase sung with the vowel /a/. Judges were
100% reliable across listening sessions for these three singers (Table 7).
74
Table 7: Percentage of listeners that perceived identical ratings across two listening sessions for
samples that were perceived to produce a “strong vocal ring,” based on 70% agreement across
75
Discussion
Previous research suggests that vocal tract configuration affects the Fs in the Western
classically trained singing technique (Bartholomew, 1934; Suindberg, 1970; Sundberg, 2001).
The limited numbers of studies of other singing techniques do not provide a clear indication of
the Fs in non-Western classical singing. It was not clear whether other singing techniques also
produce the Fs. This experiment provides general information of the listeners’ perceptions of the
existence of the Fs in two different singing techniques (traditional Chinese opera singing and
Western classical singing). Results disagreed with previous studies (Bartholomew, 1934;
Suindberg, 1970; Sundberg, 2001) and showed that listeners were able to hear the Fs in both the
Western classically trained singing style and the traditional Chinese opera singing style. Our
study is consistent with the results of Wang (1985) who revealed the Fs in 3 different types of
singing styles, Western classical singing, early music singing and the traditional Chinese opera
In the present study, more Western than Chinese singers were perceived as having a
strong vocal ring for both the regular singing phrase and the phrase sung with the vowel /a/.
Although there were many samples rated as “not sure” in the Chinese group, listeners reported
that the vocal ring was still perceived in most of the samples, only not throughout the whole
phrase. Listeners indicated that the uncertainty of these judgments might have been affected by
their unfamiliarity with the Chinese language and the different singing style. Although the lack
of familiarity with the Chinese language seemed to have affected the listeners’ perceptions,
listeners’ comments indicated that the vocal ring was still identified in certain common vowels
such as /a/, /i/, /u/ and /e/ within the musical phrase; this was especially noted in the vowels /a/
76
and /i/. This is consistent with previous studies (Sundberg, 1970; Seidner, 1985; Bloothooft &
Plomp, 1984, 1985, 1986) in which different vowel qualities, especially the vowels /a/, /i/ and
Listeners also indicated that the Fs could be more easily heard in sustained notes with
single vowels than in running notes with complex contexts for both Chinese and Western
singers. This result is similar to what is seen in speech production wherein speech quality also is
impacted by signal duration. For example, Hillenbrand et al.’s (1995) investigation of speech
intelligibility revealed that vowel duration can be an important cue for identifying some vowels.
Ferguson and Kewley-Port (2002, 2008) and Picheny et al. (1986) studied the acoustic difference
between clear and conversation speech. They found that one of the reasons that clear speech had
superior intelligibility was because it has longer steady state durations than are found in
conversational speech. Based on these speech production and intelligibility results, we suggest
that when singing a complex texts, singers had to change their vocal tract configurations quickly
to incorporate all the relevant articulatory gestures. This rapid change might decrease a singer’s
ability to achieve the right vocal tract configuration for the Fs. Therefore, less vocal ring was
heard. However, when vowels were sustained, singers had a longer duration to achieve the right
vocal tract configuration for the Fs. Therefore, Fs was heard more in the steady vowels than that
The comparison of the first and second perceptual ratings showed that the listeners’
judgments were more reliable for the Western group than for the Chinese group for both the
regular singing phrase and the phrase sung with the vowel /a/. This further confirms that the
listeners’ perceptions were likely affected by the familiarity with the languages and techniques.
77
Recall that listeners were classically-trained, professional Western singers and were familiar with
that music style and the texts; therefore, they may have been less distracted by the singing style
of Western compared to Chinese opera. As Lundin (1967) suggests, musical preferences may be
culturally conditioned: For example, listeners in the present study indicated that harshness was
heard in Chinese’s singing and it influenced their perceptions of the singing sample. Harshness
may be due to the different techniques wherein Chinese singers are taught to sing with a bright
voice whereas the Western singers are taught to sing with a dark voice which is the singing
method that balances the high and low formants. Because all listeners in this experiment were
trained Western singers and were used to the dark timbre, the bright voice might sound harsh to
Moreover, Chinese music is based on a pentatonic scale which might be dissonant to the
Western listener. The Western music, however, is based on the diatonic scale and is consonant to
the Western listeners’ ears. It has been hypothesized that predictable musical sequences are
preferred and considered to have greater tonality (Roederer, 1972). Therefore, the familiarity
with Western music may have led the listeners to hear the vocal ring in Western classical opera,
whereas the unfamiliarity with Chinese music may have led to “musical tension” (Roederer,
1972, p. 148). Previous research on the Fs led the current investigator to assume that if a singing
voice contained a vocal ring, all listeners who are familiar with this concept would be able to
perceive the ring regardless of the different languages or techniques. This assumption was
negated by listeners’ comments that indicated that their perceptions were still affected by the
different techniques, language, and music style. These results suggest the need for evaluation of
78
the vocal ring in musicians from other musical traditions, including those trained in Chinese
opera.
It was interesting that listeners made more reliable judgments for singers who were heard
to have a strong vocal ring than for singers who were judged as “not sure” for both groups. This
suggests that when the vocal ring is strong, listeners can make consistent judgments about its
presence. In addition, listeners’ judgments and reliability may also reflect the singers’ comfort
with the task and the training methods. Classically trained Western singers are taught to
substitute the original texts with a single vowel during their practice in order to become familiar
with the music and singing technique before they include the texts. Therefore, the Western
singers were comfortable when they substituted a vowel for the text of a musical phrase. This
may have influenced the vocal ring when the singing phrase was sung with the vowel /a/; in this
task, listeners were more able to hear the ring in the Western singers. In contrast, in traditional
Chinese opera training, singers are not trained to substitute the whole singing text with one
particular vowel. When they were asked to sing only one vowel throughout the whole musical
phrase instead of the regular texts, they felt uncomfortable and were not able to project their full
voice even though they were allowed to practice as many times as they wished before the
recording. This comfort level may explain why there were more samples perceived as having a
strong vocal ring in the regular singing phrase than the phrase sung with the vowel /a/ by the
The few previous studies that investigated the percept of Fs by multiple listeners only
presented the results from the listeners’ average ratings (Wang, 1985; Omori et al. 1996). None
79
studied each listener’s perception and reliability and found that there were several factors that
may influence the listeners’ perceptions. These factors include listeners’ skills, singers’ abilities,
language differences, the familiarly of the musical style and technique. Future study of these
In general, the Fs has been identified by the presence of a peak around the region of
2300-3500 Hz in the Western classical singing. Therefore, the categorical analysis of the Fs was
based on the presence of a peak around the region of 2300 Hz to 3500 Hz in the LTAS for the
Western classical trained singers in this experiment. For the traditional Chinese opera singers,
the Fs was determined with the presence of a peak around the region of 2300 Hz to 3700 Hz
because Chinese singers had overall higher F0 range than the Western singers. In addition, a
peak was defined by a bandwidth that was less than or equal to 1000Hz. The reliability of
defining a peak was done by the consensus between the current researcher and two other
experienced researchers in acoustic. All samples were judged and the reliability analysis showed
80% agreement across all judges. When there was a disagreement between investigators, the
presence or absence of the peak was based on the decision of the majority of judges.
Results of the categorical analysis showed that the spectra for four out of ten of the
Chinese singers (C8, C11, C14 and C15) matched the criteria of the Fs for the regular singing
phrases (Table 8). Judges identified the Fs in five Chinese singers (C5, C6, C8, C14, and C15)
for the musical phrase sung with vowel /a/ (Table 9). The results of the categorical analysis
showed that seven out of ten Western singers (W1, W2, W3, W4, W5, W7, and W9) matched the
80
criteria of the Fs for the regular singing phrase (Table 10). The Fs was identified for eight
Western singers (all except W6 &W8) for the musical phrase sung with the vowel /a/ (Table11).
determine the correspondence of acoustic measures with vocal ring. The results from the
perceptual ratings for the traditional Chinese opera singers matched (exhibited the Fs both
acoustically and perceptually) exactly with the categorical results for regular singing phrases.
There was a correspondence between the Fs and the perceptual ratings for C9 and C15 when the
/a/ vowel was sung. Results from the perceptual rating did not match with the categorical results
for C5, C6 and C14, however, for the sung /a/ phrases. Although these three samples fulfilled the
acoustic criteria of the Fs, listeners did not perceive a vocal ring. For example, the LTAS from
C5 for singing /a/ exhibited a peak at 3165 Hz with a bandwidth of 797 Hz, but listeners did not
perceive a strong vocal ring (Figure 1). Similar results were found for C14 in the phrase sung
with vowel /a/ wherein the categorical analyses yielded a Fs (peak at 3424 Hz, bandwidth at 796
Hz), but listeners were not able to perceive this sample as having a “strong vocal ring” (Figure
2). Several samples, for example singers C9 and C4 (Figure 3 & 4), exhibited a cluster of
increasing energy around the Fs frequency region; however, because the bandwidth was over
1000 Hz, the energy cluster did not meet the operational definition of a Fs. Listeners also were
not able to identify a strong vocal ring in these samples. These two analyses, perceptual ratings
and categorical analysis, are consistent in suggesting that a peak in the region of 3 kHz that has a
For the Western classically trained singers, the categorical measurements were also
compared to the perceptual rating to determine the correspondence of the acoustic measures with
81
vocal ring. Application of the criteria to define Fs, categorically, yielded results that showed 7
out of 10 Western singers (W1, W2, W3, W4, W5, W7, and W9) had the Fs for the regular
singing phrase. There was no correspondence between the categorical analysis and the perceptual
rating for W2 and W4 when the regular singing phrase was sung (Table 10). Although the
categorical criteria of the Fs were met, listeners could not perceive a “strong vocal ring” in
For the phrase sung with the vowel /a/, the LTAS showed 8 singers (all except W6 and
W8) exhibited the Fs (Table 11). When the results were compared to the perceptual ratings, there
was a correspondence of categorical measurements and the perceptual rating for 7 of these
singers (W1, W2, W3, W5, W7, W9 and W10) but no correspondence of these measures for
singer W4.
82
Table 8: Results of the categorical analysis for the regular singing phrase sung by traditional
Chinese opera singers: Shaded boxes indicated that the spectrum of 4 singers (C8, C11, C14 and
C15) matched the acoustic criteria of the Fs of the categorical analysis. The perceptual rating
showed that 70% of listeners heard a “strong vocal ring” in these four singers. Samples that were
Bandwidth (Hz) N/A N/A N/A N/A 753 1508 517 991 431 N/A
83
Table 9: The results of the categorical analysis for the phrase with the vowel /a/ sung by
traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 5 singers (C5, C6,
C8, C14, and C15) matched the criteria of the Fs of the categorical analysis. Results of the
perceptual rating showed that 70% of listeners heard a “strong vocal ring” in singers C8 and
C15, but could not hear a “strong vocal ring” in singers C5, C6 and C14. Samples that were not
Center frequency No 3165 3165 2871 3445 3505 3359 3424 3596 No
peak peak
Bandwidth (Hz) N/A 1335 797 754 689 1027 1400 796 151 N/A
84
Table10: Results of the categorical analysis for the regular singing phrase sung by Western
classically trained singers: Shaded boxes indicated that the spectrum of 7 singers (W1, W2, W3,
W4, W5, W7, and W9) matched the criteria of the Fs of the categorical analysis. Results of the
perceptual rating showed that 70% of listeners heard a “strong vocal ring” in singers W1, W3,
W5, W7, W9. Samples that were not shaded provided information, but no peaks exhibited.
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
Center frequency 2907 2842 3317 3058 2412 3300 2778 2700 2498 No
peak
Bandwidth (Hz) 668 818 366 754 431 1080 236 1100 344 N/A
85
Table 11: Results of categorical analysis for the phrase with the vowel /a/ sung by Western
classically trained singers: Shaded boxes indicated that the spectrum of 8 singers (all except W6
& W8) matched the criteria of the Fs. Results of the perceptual rating showed that 70% of
listeners heard a “strong vocal ring” in these singers except singers W4, W6 and W8. Samples
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
Center frequency 2929 2713 3338 3036 2369 3445 2778 2885 2627 2821
Bandwidth (Hz) 344 193 259 732 280 1258 237 1020 345 883
86
Figure 1: The LTAS of the phrase sung with vowel /a/ by a traditional Chinese opera singer, C5:
A clear peak or a cluster of peaks around a specific frequency region with a bandwidth less than
1000 Hz. However, more than 70% of listeners were “not sure” if they perceived the strong vocal
-10
-20
-30
-40
Amplitude (dB)
-50
-60
-70
-80
-90
-100
0 1 2 3 4 5 6 7 8 9 10
Frequency (kHz)
87
Figure 2: The LTAS of a phrase sung with vowel /a/ by traditional Chinese opera singer, C14: A
clear peak or a cluster of peaks around a specific frequency region with a bandwidth less than
1000 Hz. However, more than 70% of listeners were “not sure” if they perceived the strong vocal
-10
-20
-30
-40
Amplitude (dB)
-50
-60
-70
-80
-90
-100
0 1 2 3 4 5 6 7 8 9 10
Frequency (kHz)
88
Figure 3: The LTAS of the regular singing phrase sung by traditional Chinese opera singer, C9:
An increased cluster of energy around a specific frequency region with a bandwidth in excess of
-10
-20
-30
-40
-50
Amplitude (dB)
-60
-70
-80
-90
-100
0 1 2 3 4 5 6 7 8 9 10
Frequency (kHz)
89
Figure 4: The LTAS of the phrase sung with vowel /a/ by traditional Chinese opera singer, C4:
An increased cluster of energy around a specific frequency region with a bandwidth in excess of
-10
-20
-30
-40
-50
Amplitude (dB)
-60
-70
-80
-90
-100
0 1 2 3 4 5 6 7 8 9 10
Frequency (kHz)
90
A comparison of the categorical Fs analysis for the Chinese and the Western groups
revealed that the mean bandwidth of Fs was greater in Chinese singers (673 Hz) than in the
Western group (409 Hz) for the regular singing phrase. Moreover, results showed that the mean
center frequency of the Fs in the Chinese group was higher (3442 Hz) than the mean center
frequency in the Western group (2782 Hz). Similar results were found for the singing of the
phrase with the vowel /a/. The Chinese group exhibited greater bandwidth (420 Hz) and higher
center frequency (3520 Hz) of the Fs than the Western group (fc =2796 Hz, BW =363 Hz).
Categorical analyses of Fs for the phrase sung with the vowels /i/ and /u/ and the spoken
phrase were then conducted for both Chinese and Western groups. Recall that there was no
perceptual judgment on the phrase sung with the vowels /i/, /u/ or the spoken phrase for either
the Chinese or Western group. Results from the categorical analysis showed that the LTAS for 6
singers (C3, C4, C8, C9, C11 and C15) exhibited the Fs for the phrase sung with vowel /i/ (Table
12) and 8 singers (all but C3 and C8) had a Fs for the phrase sung with vowel /u/ (Table 13).
None of the speaking samples from the Chinese group showed the Fs, although the spoken
phrase from singers C3, C4, C5, C6, C9, C14, and C15 showed an increasing energy in the
higher frequency region; however, this energy did not meet the operational definition of Fs. For
the Western group, results of the categorical analysis of the LTAS showed that 5 singers (W3,
W4, W7, W9, and W10) matched the criteria of the Fs for the phrase sung with the vowel /i/
(Table 14) and 8 singers (except W1 and W8) exhibited the Fs for the phrase sung with the
91
Table 12: Results of the categorical analysis for the phrase with the vowel /i/ sung by the
traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 5 singers (C3, C4,
C8, C11 and C15) matched the criteria (1000Hz bandwidth) of the Fs. Samples that were not
Bandwidth (Hz) 700 431 N/A N/A 409 N/A 301 2032 193 N/A
92
Table 13: Results of the categorical analysis for the phrase with the vowel /u/ sung by the
traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 8 singers (all but
C3 and C8) matched the criteria (1000Hz bandwidth) of the Fs. Samples that were not shaded
Center frequency No 2993 3284 3187 No 2950 3521 3618 3553 3069
(Hz) peak peak
Bandwidth (Hz) N/A 129 517 431 N/A 86 559 302 302 150
93
None of the spoken samples had a high frequency peak that matched the criteria of the Fs
in the Western group. Similar to the Chinese group, there was a strong energy distribution of
partials extending to the high frequency region for almost all speaking samples from the Western
singers. The energy in the Western singers’ speech exhibited a concentration between 2-4 kHz
whereas the Chinese singers had more diffuse energy in the higher frequencies (Fig. 5). This
energy concentration produced by the Western singers may be the “speaking formant” (Oliveira-
The formant bandwidths and center frequency from the phrases sung with the vowels /i/
and /u/ were compared across the Chinese and Western groups. Results showed that the Fs
bandwidth was smaller among the Chinese singers (Mean =310 Hz) than among the
Westernsingers (Mean =415 Hz) for the phrase sung with the vowel /u/ and there was no
difference between the Chinese (407 Hz) and Western singers (405 Hz) for the phrase sung with
the vowel /i/. Samples from the phrase sung with the vowel /i/ showed lower mean center
frequency (2612 Hz) for the Chinese group than for the Western group (2920 Hz), whereas
samples from the phrase sung with the vowel /u/ showed higher mean center frequency (3272
Hz) for the Chinese group than the Western group (2789 Hz).
Recall that one of the tasks required the singers to glide up and down the musical scale
with the vowel /a/. Data from this task were also analyzed by the LTAS, and the presence or
absence of the Fs was also determined categorically, as discussed before. For the Chinese singers
who were found to have the Fs (C8, C11, C14 and C15) both perceptually and categorically for
the regular singing phrase, the results for gliding the musical scales showed the Fs in all of them
except singer C11. The center frequency and bandwidths for singers C8, C14 and C15 were
94
comparable for the gliding musical scales and the regular singing phrase. Similar results were
obtained for the Western singers in which all of the singers who exhibited the Fs in the regular
singing phrase also showed the Fs when they glided through the musical scales.
95
Table 14: Results of the categorical analysis for the phrase with the vowel /i/ sung by Western
classically trained singers: Shaded boxes indicated that the spectrum of 5 singers (W3, W4, W7,
W9, and W10) matched the criteria of the Fs. Samples that were not shaded provided
Center frequency No 2500 3295 3122 2390 No 2778 3381 2412 2993
peak peak
Bandwidth (Hz) N/A 1300 323 258 1500 N/A 345 1028 388 710
96
Table 15: Results of the categorical analysis for the phrase with the vowel /u/ sung by the
traditional Chinese opera singers: Shaded boxes indicated that the spectrum of 8 singers (all but
W1 and W8) matched the criteria of the Fs. Samples that were not shaded provided information,
Center frequency No 2692 3058 2756 2218 3402 2713 3122 2541 2929
peak
Bandwidth (Hz) N/A 194 689 582 258 194 194 1210 345 862
97
Figure 5: The LTAS of the speaking phrase for Chinese singer (C5) showed increasing energy
around the higher frequency region (Top panel), whereas the Western singer (W3) showed
-50
-60
-70
-80
-90
-100
-10
-20
Speaker’s formant -30
-40
Amplitude (dB)
-50
-60
-70
-80
-90
-100
0 1 2 3 4 5 6 7 8 9
Frequency (kHz)
98
Discussion
Results from the categorical measurements of the LTAS matched the results from the
perceptual judgments and showed that the Fs was produced by both the traditional Chinese opera
singers and the Western classically trained singer in this study. However, there were some
exceptions. Some of the singers from both groups (C5, C6, C14, and W4, W2 and W4) exhibited
a peak around 3000 Hz with bandwidths of less than 1000 Hz, yet listeners did not perceive a
vocal ring (Table 9, 10 & 11). The bandwidths of the peaks for these singers were between 700
Hz and 800 Hz. Thus, the categorical results may have been more consistent with the perceptual
data if the definition of a peak corresponding to Fs was 700 Hz rather than the 1 kHz cutoff used
There were some samples (W6 & W8) in the present study that were noted to have more
than one peak in the high frequency region which Seidner et al. (1985), Rossing et al. (1987) and
Sundberg (2001) also found in their studies. These researchers noted that 2 peaks, rather than 1,
appeared in the high frequency region in some of their baritone, tenor, alto, and soprano singers.
Sundberg related these two peaks to the F3 and F4 rather than the Fs because there was no
cluster of formants. Results from the present perceptual judgments and categorical analyses
agree with previous studies in that a single high frequency peak is needed to define the Fs.
Many studies (Sundberg 1970; Bloothooft and Plomp, 1986; Sundberg, 2001; Cleveland et al.,
2001; Oliveira-Barrichelo et al. 2001) defined the Fs by comparing the energy level difference
between the spoken and sung phrases or vowels. Findings from these studies suggest that unlike
singing samples, no cluster of formant peaks appeared in the Fs region for spoken samples. In the
current study, none of the spoken samples from the Chinese or Western group met the
99
operational definition of the Fs. However, many of the spoken samples showed a strong energy
distribution of partials extending to the high frequency region. This high amplitude energy in
higher frequencies is not consistent with expectations, in that the energy is predicted to decrease
in the higher frequency region for normal speech (Fant, 1960). However, this high energy may
indicate a speaker’s formant that previous research suggests is found in some trained singers
Some of the results from this experiment are inconsistent with previous studies (Seidner
et al., 1985; Schutte & Miller, 1985; Segupta, 1990). Previous studies showed that the bandwidth
for male singers singing in the Western classical style (base, baritone and tenor) ranged from
1000 Hz–2000 Hz whereas the current study showed that when the bandwidth of Fs exceeded
700 Hz, listeners did not perceive the ring for our singers. The inconsistency in these studies may
be due to differences in the definition of bandwidth. Previous studies defined the bandwidth of
Fs by the frequencies with intensities that were –15 dB from the peak amplitude, whereas the
current study used a –3dB criterion. Also, previous studies measured the Fs by using short-term
spectra with single vowels whereas the current study used the LTAS with the entire musical
phrase. A further inconsistency between the current data and results from previous studies was
found in the bandwidths for tenors and baritones. Seidner et al.’s (1985) study found that the
tenor had a broader bandwidth of Fs than baritone and bass singers; however, only one singer
was investigated in each vocal category. Our data suggest that there is great variability across
singers in the bandwidth of the Fs (Tables 10 & 11). In the current study, two different vocal
categories in the Western classical singing, tenor and baritone (5 for each), were investigated.
Results showed that voice classification did not have a consistent impact on Fs. For example, the
100
bandwidths for tenor singers who matched the criteria of the Fs categorically ranged from 236
Hz-668 Hz and 431-818 Hz for baritone in the regular singing phrase. As for the phrase sung
with the vowel /a/, the bandwidths for tenors who matched the categorical criteria of the Fs
ranged from 237-345 Hz whereas baritone showed bandwidths between 193-883 Hz. This
suggests that the range of the bandwidth varies among the Western singers regardless of their
vocal classification. Moreover, the median bandwidths of Fs for /i/ and /u/ for were 345 and 270
Hz, respectively for the tenors, whereas the median bandwidths of these two vowels for baritones
were 484 and 420 Hz, respectively. This seems to contradict previous results from Seidner et al.
(1985) in that he found greater Fs bandwidths for tenors than for baritones. The investigation of
only one singer from each voice classification may have led Seidner et al. to an erroneous
relationship between voice-type and Fs bandwidth. Clearly future studies of Fs should include
multiple singers.
Because of the time consuming nature of the perceptual task, only the regular singing
phrase and phrase sung with the vowel /a/ were used to investigate listeners’ perception. The
vowel /a/ was chosen instead of other two vowels, /i/ and /u/, because it was the vowel that was
most commonly investigated in the previous literature. The vowels /i/ and /u/ were categorically
analyzed in this experiment and the results showed that more singers exhibited the Fs for the
phrase sung with the vowel /u/ than for /a/ and /i/ for both Chinese and the Western groups.
Because the vowel /u/ was not investigated perceptually, it is not known if listeners’ judgments
would have been consistent with the categorical measures such that more listeners would hear a
101
Previous researchers used different materials and methods to investigate the Fs. For
example, some researchers used sustained sung and spoken vowels whereas others used singing
phrases with complex texts. Some researchers used short-term spectral analysis and others
applied the LTAS analysis to the singing material. However, none of the previous studies
investigated the Fs by investigating a variety of materials or methods. Although it was not clearly
stated, the assumption from previous studies seemed to be that the different materials or methods
would not affect the Fs. In other words, if Fs was exhibited by a singer, it always exists no matter
what materials and methods are used to evaluate it. In our study, we varied the material to
include the regular singing phrase and phrases sung with the vowels /a/, /i/ and /u/. Our
investigation of Fs indicated differences in the presence of Fs depending on the material that was
used. Therefore, it appears that the Fs must be investigated in a variety of contexts to better
During the recordings, the Chinese singers expressed difficulty with the task of gliding
the musical scale as they were instructed. This may be due to the fact that this singing scale is
based on Western classical training whereas the Chinese singers were not trained to perform this
type of task. All of the Chinese singers (except sing C11) did not follow the instructions but sang
in the Chinese style in which they used scale steps rather than a glide (D-#C-E-D#-F-E-G-#F…
etc.). Even though singer C11 expressed difficulty with gliding the musical scale, he still tried to
follow the instructions. It is interesting to note that the Fs was exhibited in the Chinese singers
who did not follow the instructions but maintained their own singing style. On the other hand,
singer C11 who followed the instructions showed no Fs. This suggests that task familiarity and
training culture, as well as issues about the singing material, may impact the Fs. Sundberg (2002)
102
investigated one Chinese opera singer by asking him to sing a musical scale with the vowel /a/.
He revealed that no Fs was found in this subject. This leads us question whether Sundberg’s
subject was instructed to sing a Western musical scale that was unfamiliar to him. If this was the
task, then the absence of the Fs for this singer might simply be caused by unfamiliarity with the
musical style. Unfortunately, the methodology was not fully provided in Sundberg’s report.
The highest and lowest fundamental frequencies (F0) from each singing phrase were
difference between F0s was operationally defined by a minimum of 1 semitone (Shower and
Biddulph, 1931).
Analysis of the highest and lowest F0 from the regular singing phrase for the traditional
Chinese opera singers showed that singing samples which were judged as having a strong vocal
ring exhibited a higher F0 than the samples that were judged as not sure. This was true for
comparisons of either the highest or the lowest F0 produced during the singing phrase. The F0
across the samples that were perceived to have a vocal ring showed the highest F0 to be 495.18
Hz (41.6 SD) on average, and the lowest F0 to have a mean of 289.85 Hz (55.5 SD) (Fig.6a).
The F0 for the samples that were judged as “not sure” showed that, on average, the highest F0
was 421.83 Hz (23 SD) and the lowest F0 was 234.03 Hz (54.5 SD). There was one exception to
this pattern: C16 had a high F0 for both his highest and lowest pitches (highest =510.3 Hz and
lowest =252.5 Hz). However, results from the listeners’ ratings were ambiguous; 46% of
103
listeners heard the vocal ring and 54% of listeners indicated that they were “not sure” about the
The mean F0 from each regular singing phrase was also measured and investigated in
relation to the perceptual judgments. Results from the regular singing phrase for the Chinese
group also showed that singing samples which were judged as having a strong vocal ring
exhibited a higher mean F0 than the samples that were judged as not sure. Across the samples,
those that were perceived to have a strong vocal ring showed a mean F0 of 360 Hz (51.8 SD),
whereas the mean was 306 Hz (43SD) in samples that were rated as not sure. Similar results
were found for the highest F0 when the musical phrase was sung with the vowel /a/ by the
Chinese singers. Samples which had a strong vocal ring exhibited a mean high F0 of 524 Hz
(39.2 SD) compared to a mean high F0 of 463 Hz (68.2 SD) for samples that did not clearly have
a ring throughout the phrase (Fig. 6b). By contrast, the average lowest F0 during the /a/ singing
was slightly lower, by 1 semitone, when listeners were sure about the ring (270 Hz, SD= 86.2)
compared to when they were unsure about the vocal ring’s presence (286 Hz, SD = 44). Results
of the mean F0 measured from each singing phrase sung with the vowel /a/ showed that samples
which were judged as having a strong vocal ring exhibited the same mean (less than one
semitone difference) F0 (346 Hz, SD=51.2) as the samples that were judged as “not sure” (354
Hz, SD=41).
104
Figure 6a: Scatterplot for the highest and the lowest F0 measured from regular singing phrase
from the Chinese singers. Filled diamond indicates the highest F0 and opened square indicates
the lowest F0. 1 on the X axis indicates the samples that were perceived as having a “strong
vocal ring” and 2 indicates the samples that were rated as “not sure."
600 Highest F0
Lowest F0
500
466X2
440X2
400
300 311X2
F0
200
100
0
0 1 2
Perception
105
Figure 6b: Scatterplot for the highest and the lowest F0 measured from phrase sung with the
vowel /a/ for the Chinese singers. Filled diamond indicates the highest F0 and opened square
indicates the lowest F0. 1 on the X axis indicates the samples that were perceived as having a
“strong vocal ring” and 2 indicates the samples that were rated as “not sure."
600 Highest F0
Lowest F0
500
400
300 311X
F0
200
100
0
0 1 2
Perception
106
For the Western classically trained singers, the highest and lowest F0 from each singing
phrase also were measured and investigated in relation to the perceptual judgments. In contrast to
the results from the Chinese group, results from the regular singing phrase showed that samples
that were judged as having a strong vocal ring exhibited a lower F0 range than the samples that
were rated as not sure. Samples that were rated as having a strong vocal ring showed a F0 that
ranged between 166 Hz and 387 Hz (Fig 7a). The F0 range for tokens that were judged as “not
sure” was between 175 and 443 Hz. The mean F0 measured from each regular singing phrase
also was investigated relative to the perceptual judgments for the Western singers. Across the
samples, the mean was within one semitone for singers with (267 Hz, SD= 59) and without (277
For the phrase sung with the vowel /a/ by the Western singers, the average high and low
F0 (highest = 370 Hz and lowest = 161.8 Hz) were equivalent for tokens perceived to have a
strong vocal ring as for tokens with ratings of “not sure” (highest = 368 Hz and lowest = 153
Hz). Samples that were perceived as having a strong vocal ring showed the same mean F0, less
than one semitone difference, as samples that were rated as “not sure” for the phrase sung with
the vowel /a/. The average across the samples that were perceived to have a strong vocal ring
showed a mean F0 of 254 Hz (58.2 SD); mean F0 was 247 Hz (55.7SD) in samples that were
rated as “not sure” (Fig. 7b). The Western classically trained singers had highest and lowest F0
ranges that were at least 100 Hz lower than the Chinese singers. This is true for singers that were
perceived as having the strong vocal ring for both the regular singing phrase and the phrase sung
with the vowel /a/. Results for the mean F0 measured from both singing phrases showed an
107
average of about a 80 Hz (3 semitones) higher mean F0 in the Chinese group than the Western
The distribution of the F0 range (lowest and highest) across singers (Figures 6a & b)
shows great variability for both categories (strong vocal ring and not sure) in the Western singers
in contrast to the Chinese group that showed a clear distinction of the F0 range produced for
strong vocal ring and not sure. For the regular singing phrase, most singers from the Chinese
group who were perceived to have a vocal ring produced higher F0 than singers who did not
have the strong ring. This is true for both the lowest and highest F0 (Fig.6a). In general, similar
results were found in the phrase sung with the vowel /a/ (Fig 6b). Although the results from the
regular singing phrase for the Western groups showed that the mean value for the lowest and the
highest F0 was lower in samples that were perceived as having a strong vocal ring than in
samples that were rated as not sure, the distribution of F0 overlapped for these two perceptual
categories (Fig. 7a & 7b). The results from the analysis of F0, discussed above, indicate that the
F0 range (highest and lowest levels) was associated the Fs for regular singing phrase and phrase
sung with the vowel /a/ for the Chinese group; however, this relationship was not seen for the
Western group. The distribution of F0 from all Chinese singers showed that the Fs was
evidenced when there was a higher F0 range. For the Western group, the overall distribution
108
Figure 7a: Scatterplot for the highest and the lowest F0 measured from regular singing phrase
for the Western signers. Filled diamond indicates the highest F0 and opened square indicates the
lowest F0. 1 on the X- axis indicates the samples that were perceived as having a “strong vocal
ring” and 2 indicates the samples that were rated as “not sure."
600 Highest F0
Lowest F0
500
400
392X2
300
329X2
F0
200
100
0
0 1 2
Perception
109
Figure 7b: Scatterplot for the highest and the lowest F0 measured from phrase sung with the
vowel /a/ for the Western singers. Filled diamond indicates the highest F0 and opened square
indicates the lowest F0. 1 on the X axis indicates the samples that were perceived as having a
“strong vocal ring” and 2 indicates the samples that were rated as “not sure."
600 Highest F0
Lowest F0
500
400 392X3
329X2
300
F0
200
185X2
164X2
100
0
0 1 2
Perception
110
Intensity measured from sound level meter
(Reisz, 1928). Both the highest and lowest SPLs measured from each singing phrase were
investigated in relation to the perceptual ratings. The average power of the highest and lowest
levels were calculated and then converted into decibels. Samples from the regular singing
phrases which were perceived as having a strong vocal ring showed higher mean intensity for
both highest and lowest levels (highest = 99.89 dB SPL and lowest = 96.87 dB SPL) than
samples that received a rating of “not sure” (highest = 96.05 dB SPL and lowest= 93.38 dB SPL)
for the traditional Chinese opera singers (Table 16). Samples from phrases sung with the vowel
/a/ which were perceived as having a strong vocal ring showed a lower mean intensity range
(97.16 dB–99.16 dB SPL) than the samples that were rated as “not sure” (104.12 dB-110.04 dB
SPL) (Table 16). This result could have been affected by the sample sung by C16, who showed
the highest intensity ranges (110 dB-116 dB); however, listeners could not identify a strong
vocal ring. Listeners reported that this singer’s voice was very loud, but mostly it sounded like
“shouting” instead of the vocal ring. When this sample was removed, the average intensity range
for the “not sure” group (highest level of 93.42 dB SPL and 90.29 dB SPL) was lower than for
the vocal ring group (highest level of 99.16 dB SPL and lowest level of 97.16 dB SPL).
111
Table 16: Results of the highest and lowest intensity measured across singers from both the
regular singing phrase and the phrase sung with the vowel /a/ in the Chinese group. All results
based on singers who were perceived to produce a “strong vocal ring” and “not sure” with 70 or
greater agreement across listeners. The top panel is for the regular singing phrase and the bottom
112
Table 17: Results of the highest and lowest intensity measured across singers from both the
regular singing phrase and the phrase sung with the vowel /a/ in the Western group. All results
based on singers who were perceived to produce a “strong vocal ring” and “not sure” with 70 or
greater agreement across listeners. Top panel is for the regular singing phrase and the bottom
W1 W2 W3 W5 W7 W9 W10 W4 W6 W8
Lowest SPL(dB) 94 98 90 94 94 99 86 86 96 86
113
For the Western singers, samples from the regular singing phrases which had a strong
vocal ring showed almost the same mean SPL range (highest = 98.72 dB SPL and lowest = 95.2
dB SPL) as samples which received ratings of “not sure” (highest = 94.01dB SPL and lowest =
92.01 dB SPL) (Table 17). The phrases sung with the vowel /a/ with a strong vocal ring showed
a higher mean intensity for both the highest and lowest levels (highest = 98.67 dB SPL – lowest
= 95.13 dB SPL) than the samples that were rated as “not sure” (highest = 99.11 to lowest =
Comparison across groups indicated that Chinese and Western singers used similar
intensity ranges when the vocal ring was heard in the regular singing passages. There were some
intensity differences between groups when listeners were “not sure” if a vocal ring was heard
during the passage; in general, the Western opera singers used lower intensities (both highest and
lowest) than the Chinese singers. These results were not maintained for the passage sung with the
vowel /a/. In these samples, the Western singers used higher intensities (both highest and lowest)
than Chinese singers, whether or not the vocal ring was perceived.
The quantitative analysis was first carried out by calculating the difference in energy
(measured in dB) between high (2000-4000 Hz) and low (0-2000 Hz) frequency regions. Each
sung and spoken phrase was first filtered by a low-pass filter (fc= 2000 Hz) and then the original
waveform was band pass filtered at 2000-4000 Hz. The energy values (RMS in dB) in both
frequency regions were calculated and the difference between the two intensity values was then
calculated. Statistical analyses were performed to investigate differences between the traditional
Chinese opera singers and the Western classically trained singers as well as differences between
114
the sung phrases and spoken phrases within and between the two groups. Furthermore, the
correlation between the perceptual ratings and the quantitative analysis was investigated. Finally,
results of the quantitative analysis were compared with the results from the categorical analysis
(Experiment. 2) in order to evaluate if the two measures provided the same information about the
Results from quantitative measurements (calculation of relative energy between high and
low frequency bands) were first analyzed statistically by a two- way ANOVA (SPSS 11.5) to
investigate two main factors: material (2 levels- relative intensity difference between spoken and
sung phrases) and singing style (2 levels- relative intensity difference between Western and
Chinese groups), and their interaction effect. Data from all subjects were included in the statistic
analyses. The results showed that there was a significant difference in relative energy between
the Chinese and Western groups (F (1,36) = 13.572, p<0.05) and there was a significant difference
between the sung and spoken phrases (F (1,36) = 5.609, p<0.05). Results also showed that there
The impact of the materials on the relative intensity was investigated within each group,
Chinese and Western. A one-way ANOVA was performed with five levels to compare the
spoken phrase, regular sung phrase, and musical phrase sung with /a/, /i/ and /u/ produced by the
Chinese singers. Results showed that there was a significant difference between materials (F (4, 45)
= 3.368, p<0.05). A Tukey HSD was calculated to determine the pair-wise differences within the
Chinese singers. The results showed that there was a significant difference in relative energy
between singing phrases sung with the vowels /a/ and /i/ (p<0.05). There was a significant
difference between the regular singing phrase and the phrase sung with the vowel /i/ (p<0.05);
115
however, no other significant effects of material were found (/a/ vs /u/; /i/ vs /u; regular singing
vs /a/ or /u/). Interestingly, results from the Tukey HSD showed that there was no significant
difference in relative energy between the spoken phrase and any of the sung phrases (p>0.05 for
impacted on the relative energy within the Western group. As with the Chinese singers, the five
levels compared were the spoken phrase, the regular sung phrase, and the musical phrase sung by
/a/, /i/ and /u/. Results showed that there was a significant difference between materials (F (4, 45) =
18.582, p<0.05). Results from the Tukey HSD showed that there was a significant difference in
relative energy between singing phrases sung with the vowels /a/ and /i/ (p<0.05) and a
significant difference between the sung vowels /i/ and /u/ (p< 0.05). Also, there was a significant
difference between the regular singing phrase and the phrase sung with the vowel /i/ (p<0.05).
There was no significant difference in relative energy between singing phrases sung with /a/ or
/u/ (p=0.275) or between the regular singing phrase and the phrase sung with the vowels /a/
(p=0.931) or /u/ (p=0.228). Similar to the results from the Chinese group, results from the
Western group also indicated that different vowels impact on spectra within the Western group.
Results also were similar to the Chinese group in that there also was no significant difference
between the regular singing phrase and the spoken phrase (p=0.240) produced by the Western
singers. Unlike the Chinese group, results for the Western group showed that there were
significant differences between the spoken phrase and the phrases sung with the vowels /a/, /i/,
116
Independent sample t-tests were performed to compare the relative energy difference
between the Western and Chinese singers for the phrase sung with the vowels /a/, /i/ and /u/.
Results showed that there was a significant difference in relative energy between Western and
Chinese singing for the /a/ phrase (F (1, 18) = 1.548, p< 0.01) and for the /u/ phrase (F (1, 18) =
4.170, p< 0.01). There was no significant difference in relative energy between the Western and
Chinese singers for the /i/ phrase (F (1, 18) = 0.496, p=0.19).
Correlations between the perceptual ratings (i.e. the cumulative ratings across listeners)
and the intensity measurements for the regular singing phrase and the musical phrase sung with
vowel /a/ were investigated for each singing group separately. Results showed that there was no
significant correlation between the perceptual ratings and the quantitative measurements for
either the regular singing phrase (r =0.03, p=0.934 for Chinese; r =0.321, p=0.365 for Western)
or for the phrase sung with the vowel /a/ (r = 0.127, p=0.726 for Chinese; (r=0.127, p=0.726 for
Results from the quantitative analysis of the relative intensity differences between the
high and low frequency regions also were compared to the results from the categorical analyses
of the LTAS in order to determine if the two measures yielded similar information about the
presence or absence of the Fs. Results showed that there was no relationship between categorical
analysis and quantitative analysis. For example, results from both the perceptual and categorical
analyses showed that singers C8 and C11 exhibited the Fs and singers C3, C4, C5 and C6 had no
Fs in the regular singing phrase. Therefore, we expected that the differences in relative energy
between the high and low frequency regions for C8 and C11 would be smaller than for C3, C4
C5 and C6. However, the results (Table 18) showed greater differences in singers C8 (-13.2dB)
117
and C11 (–12.8dB) than in C3 (-7.1dB), C4 (-6.2), C5 (-5dB) and C6 (-7.4dB). Similar results
also were found when the phrase was sung as the vowel /a/ (Table18).
The results for the Western classically trained singers also showed no relationship
between values from the quantitative analysis (relative intensity) of the LTAS and categorical
analysis. For example, the quantitative results showed a –3.1 dB relative energy difference
between high and low frequency regions in singer W6 who exhibited no Fs either categorically
or perceptually. By comparison, singer W9, also had about a -3 dB difference between high and
low frequency regions, but he did have the Fs categorically or perceptually for the regular
singing phrase (Table 19). Similar results were found for the phrase sung with the vowel /a/ in
which W1 and W6 showed the same energy difference (0.2 dB) between the two frequency
bands, however, W1 had the Fs categorically and perceptually but W6 did not have the Fs in
Previous investigators (Schutte and Miller, 1985; Bloothooft and Plomp, 1986; Sengupta,
1990; Sundberg 2001) defined the Fs by measuring the difference between L3-L1 of the short-
term spectrum. The purpose of this study was to investigate the level difference between singers
who were rated as having the strong vocal ring and singers who were note rated as having the
strong vocal ring. Another purpose of this experiment was to compare the L3-L1 of the LTAS
with the findings from the previous studies to determine the relation between these two cues. The
criteria for determining the Fs and the F1 of the LTAS were based on Rossing et al.’s (1986)
suggestions which the level of the Fs corresponded to formant the frequency level in the 2-4 kHz
frequency region and the level of the first formant was identified by a frequency around 500 Hz.
118
Table 18: Results from the quantitative measurements of the LTAS-by calculating the relative
intensity differences between high and low frequency regions: with categorical measurements
based on 70% of the listeners who perceived a “strong vocal ring” and “70% of listeners were
Spoken phrase
C3 C4 C5 C6 C8 C9 C11 C14 C15 C16
Relative energy in dB -0.8 -3.5 -2.2 -0.6 -14.8 -3 -17.4 3.3 2.2 3.1
(difference b/t high-low)
Categorical No No No No No No No No No No
measurement
119
Table 19: Results from quantitative measurements-by calculating the relative intensity
differences between high and low frequency regions: with categorical measurements based on
70% of listeners perceived as having a “strong vocal ring” and “70% of listeners perceived as
Relative energy in dB 0.2 0.9 0.3 -3.2 1.1 0.2 -1.1 -7.2 -4.1 -0.1
(difference b/t high-low)
Perceptual rating Yes Yes Yes No Yes No Yes No Yes Yes
(70% of Yes/not sure)
Categorical Yes Yes Yes No Yes No Yes No Yes Yes
measurement
Spoken phrase
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
Relative energy in dB 5.5 4.4 -0.8 -0.7 0.4 -4.2 3.3 0.2 -1 -1
(difference b/t high-low)
Categorical measurement No No No No No No No No No No
120
However, it was difficult to define F1 in this study because the singing samples included high F0
ranges that were quite close to the first formant. Therefore, the researcher was not able to
differentiate the F1 from F0. Results from L3-L1 of LTAS analysis are not reported because
either the first formant peak could not be differentiated from the F0 or its harmonics (Fig.8).
The short-term spectral analysis was used to measure the difference between the peak
level around 3000 Hz and the level of the first formant (L3-L1). Negative values indicated that
the L3 was lower in amplitude than L1 and positive value indicated that the L3 was higher in
amplitude than L1. Stimuli included sustained vowels that were sung and spoken, as well as
these vowels edited from the regular singing phrase. Samples that met the perceptual criteria of
strong vocal ring and “not sure” only were included in this analysis. For the Chinese singers, the
sustained sung vowels /a/, /i/ and /u/ which had the Fs categorically showed relatively smaller
difference between L3-L1 than in the spoken vowels (Table 20). Also, results from samples that
exhibited the Fs showed that there was greater energy exhibited in the L3 area for the vowels /a/,
/i/ and /u/ edited from the regular singing phrase than in the sustained sung vowels (Table 20).
For samples from the Chinese group that had no Fs categorically, the sustained sung vowels /a/
and /u/ also had smaller negative energy difference between L3-L1 than the sustained spoken
vowels. There was no difference in L3-L1 between sung and spoken samples when the Chinese
singers produced a sustained /i/ that did not have the Fs, categorically. The sustained sung
vowels /a/ and /u/ showed a smaller negative energy difference between L3-L1
121
Figure 8. Top panel shows unexcited first formant of the LTAS from singer W5 for the phrase
sung with vowel /a/, and bottom panel showed inseparable harmonics and first formant of the
LTAS from singer W7 for the phrase sung with the vowel /a/.
-10
-20
-30
-40
-50
Amplitude (dB)
-60
-70
-80
-90
-100
-10
-20
-30
-40
-50
Amplitude (dB)
-60
-70
-80
-90
-100
0 1 2 3 4 5 6 7 8 9
Frequency (Hz)
122
than the same vowels edited from the regular singing phrase, whereas the sustained vowel /i/ had
less positive energy difference between L3-L1 than in the vowel /i/ selected from the regular
For the Western samples that had the Fs categorically, the mean values of L3-L1 from the
short-term spectral analysis of sustained spoken vowels /a/, /i/, and /u/ showed less energy in the
L3 region than the same sustained sung vowels (Table 21). Moreover, the results showed that the
sustained sung vowels /a/, /i/ and /u/ had higher energy in L3 region than that of the same three
vowels edited from the regular singing phrase (Table 21). For samples from the Western group
that had no Fs categorically, the mean values of L3-L1 of the vowels /a/, /i/ and /u/ for sustained
spoken vowels showed less energy in the L3 region than the same sustained sung vowels. Results
also showed that sustained sung vowels /a/, /i/, and /u/ had higher energy around L3 region than
the same vowels edited from the regular singing phrase (Table 21).
In summary, L3- L1 from the short-term spectra showed a much lower energy level in all
the speaking samples than the singing samples from both Chinese and Western groups. There
was only one exception, when listeners were “not sure” about the vocal ring; the sustained
spoken vowel /i/ had almost the same L3-L1 as the sustained sung vowel /i/ when produced by
the Chinese singers. An overall comparison between the Chinese and the Western singers
indicated less of a negative L3-L1 difference for the Western than for the Chinese samples.
There were two exceptions from singers that were rated as not sure; the sustained spoken vowel
/i/, and the vowel /a/ selected from the regular singing phrase showed higher energy level in L3
region in the Chinese singers than in the Western singers (Table 20 & 21).
123
Table 20: Top Panel: The mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from
Chinese group (C8, C11, C14, and C15) that were judged as having the Fs.
Bottom: The mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from the Chinese
group (C3, C4, C5, and C6) that were judged as “not sure” of the Fs.
124
Table 21: Top Panel: Mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from the
Western group (W1, W3, W5, W7, and W9) that were judged as having the Fs.
Bottom Panel: Mean value of L3-L1 from the vowels /a/, /i/ and /u/ for samples from the
Western group (W2, W4, W6, W8, and W10) that were judged as “not sure” of the Fs.
125
Discussion
One of the purposes of this study was to define the Fs quantitatively. Listeners noted on
the comment sheets that they identified the presence and absence of the vocal ring by focusing
on several factors such as the F0, intensity and vowel quality. Results from the analysis of F0
showed that the Chinese singers had a higher F0 range (highest and lowest), whether they had
the Fs or not, than the Western singers for both the regular singing phrase and the phrase sung
with the vowel /a/ (Figure 6 & 7). Western singers who were rated as having a strong vocal ring
had values of F0 up to 392 Hz, whereas samples that were judged as “not sure” had F0 above this
point. This is consistent with the findings of Bloothooft and Plomp (1985) that showed that the
level of the Fs for male singers increased when the F0 increased up to 392Hz but the Fs
decreased when F0 increased beyond this point. However, Bloothooft and Plomp only used
Western opera singers in their investigation. In contrast to the Western singers, Chinese singers
in the current study who were heard to have the strong vocal ring had a F0 that exceeded 392 Hz.
This suggests that there may be different acoustic cues that can signal Fs in the two different
singing styles.
Several studies (Sundberg, 1973; Cleveland and Sundberg, 1985) have shown that one of
the important factors that impacts the Fs is vocal intensity. The results from the current study
agree with these studies and found that the SPL affected the percept of vocal ring for both
traditional Chinese opera singers and Western classically trained signers. Results showed higher
SPL ranges in all of the samples that were rated as having the Fs, perceptually and categorically,
than samples that were rated as “not sure” for both groups (Table 16 & 17). Moreover, it was
found that Chinese singers that had a Fs showed a higher SPL range in the regular singing phrase
126
than in the phrase sung with the vowel /a/, whereas Western singers who had the Fs had the same
SPL in the regular singing phrase and the phrase sung with the vowel /a/. The difference in SPL
between the regular singing phrase and phrase sung with the vowel /a/ may relate to singers’
comfort with these different tasks. The purpose of requiring the singers (Chinese and Western) to
perform the same tasks was to be able to compare the different singing techniques by controlling
the material across all singers. However, we found that the Chinese singers were not trained to
substitute the whole singing text with one particular vowel; therefore, they were not able to
project their full voice with such a task. We conclude that the familiarity of the singing material
and training style also impact the SPL of the singing voice.
Another way to investigate the Fs quantitatively was to determine the relative energy
difference between the high and low energy difference for sung and spoken phrases. The results
from the statistical analysis showed that there was a significant difference in relative energy
between the Chinese and Western singers which suggests that the different singing techniques
impact the spectral energy distribution. Also, there was a significant main effect of material such
that a relative energy difference was found when speaking and singing samples. This indicates
that the different materials used affected the spectral energy. Sundberg (1970) compared sung
and spoken vowels and found that the spectrum levels around 3 kHz were different between
spoken samples and sung vowels because of different articulatory configurations in singing and
speaking; however, results from our post hoc analysis showed that there was no significant
difference between the sung and spoken samples within each group. The results suggested that
singers in either Chinese or Western groups used similar vocal tract configurations between their
127
We might also suspect that the samples included in the statistics analysis affected the
results: That is, all data were included in the statistic analysis for both Chinese and Western
groups regardless of the different ratings (strong vocal ring, not sure, no vocal ring). This was
done because the vocal ring was sometimes heard in the samples that were rated as “not sure.”
Therefore, the lack of significant difference in relative intensity between spoken and sung
phrases may relate to pooling across these subjects with strong and weak vocal rings.
Furthermore, the results of the statistical analysis showed a significant difference between the
Western and the Chinese groups for the regular singing phrase. Data from Table 18 and 19 show
that the singers in both groups had greater intensity in the low frequency regions than in the high
frequency regions (as indicated by negative values); however, the data show that Chinese
subjects had greater differences in relative energy (high-low) than the Western singers for both
the regular singing phrase and the phrase sung with the vowel /a/. Although the relative energy
was different between the Chinese and Western singers, singers in both groups still demonstrated
the Fs. There are a number of possible explanations for this. First, Chinese singers produced a
higher F0 range than the Western singers. One reason may relate to language differences; that is,
there may be a difference in relative energy distributions in Chinese and Western languages.
Another more likely possibility is that the Chinese singers have a higher laryngeal
position during singing than during speaking (Wang, 1985). This laryngeal elevation may
shorten the vocal tract and yield higher frequency energy. Laryngeal elevation, as may be seen in
Chinese singers, is not used in Western opera singers (Sundberg, 1974). In fact, Sundberg
showed that Western opera singers lower the larynx to produce the Fs. This lowered larynx
position seen in Western singers may be incompatible with the higher F0 used by Chinese
128
singers. Also, language requirements may constrain the vocal tract configuration such that
different language cultures may use different articulatory manipulations, our perceptual and
categorical results indicate that Fs is produced by both Western and Chinese opera singers.
Notably results are contrary to those from Bartholomew (1934) and Sundberg (2001) who
maintain that the Fs is only exhibited in the Western classically trained singing style.
Another purpose of this study was to quantitatively investigate whether different vowels
affected the singing when the musical phrase was controlled. The statistical analysis from this
experiment showed that the different vowel qualities impacted the spectra within and between
groups. Specifically the vowel /i/ showed significant spectral differences from other vowels. This
is consistent with the results from the categorical measurements discussed before in which /i/
showed a different center frequency than the other two vowels (/a/ & /u/). Moreover, the
statistical results also showed that there were significant differences between the groups for the
vowels /a/ and /u/, however, no significant difference between the two groups for the vowel /i/.
This suggests that the vocal tract configurations during the singing vowels /a/ and /u/ sung by the
Chinese were different from the Western singers, whereas the vocal tract configuration is similar
the perceptual judgments, to assess the validity of these different measurement techniques in
determining the Fs. The relative energy level differences between the high and low frequency
regions were inconsistent across the singing samples and the quantitative measures of Fs were
not consistent with results from the categorical measures or perceptual ratings. The quantitative
129
measure for calculating the relative energy difference between the high and low frequency
In this experiment, differences between the level of the third formant and the first
formant (L3-L1) of the short-term spectra were also measured in order to compare them with
previous studies. Previous studies showed various values for the L3- L1 (Bloothooft & Plomp,
1986; Schutte & Miller, 1985; Sengupta, 1990). For example, Bloothooft and Plomp found an
overall average of –20 dB difference between L3 and L1 for all vowels; Schutte and Miller
suggested that the Fs was noted when L3- L1 was about –7 dB for a tenor who sang the vowel /
ɔ/, and Sengupta suggested –4 dB for the male singers with the vowel /a/. Our results showed
that L3-L1 differed depending on the sung material. Both the Chinese and Western singers who
exhibited the Fs had differences in L3-L1 for the sustained sung vowels /a/, /i/ and /u/ and the
same vowels edited from the regular singing phrase (Table 20 & 21). We suspect that the level of
the L3-L1 varies due to multiple factors such as singing techniques, different vowels, different
Singers from both Chinese and Western groups that did not evidence the Fs (categorically
and perceptually) in the sung vowels still showed small differences between L3-L1 of the short-
term spectrum which means, there was still an increased energy in the high frequency region for
these sung vowels. In addition, when these samples were compared to the sustained spoken
vowels, the results also showed small energy differences between the L3- L1 in these spoken
vowels, which means that there was also an increase in the L3 region for the spoken vowels.
130
Oliveira-Barrichelo et al.’s (2001) results suggest that although these singers do not have the Fs,
they do have certain level of training to adjust their vocal tract and generate the energy in the
higher frequency region. As such, some of the highly trained singers in the present investigation
131
Chapter VI: General Discussion and Conclusions
The main purpose of this study was to investigate whether different singing techniques,
traditional Chinese opera and Western classical singing, exhibit the Fs. This study supported our
hypothesis that the Fs not only is found in the Western classical singing technique but also marks
other singing styles such as traditional Chinese opera singing. Previous researchers’
(Bartholomew, 1934; Sundberg, 1987; Sundberg, 2001) claims that the Fs occurs because of
vocal tract configurations that are unique to Western classically trained singing were not
supported by the present investigation. However, previous studies mainly were based on the
Western trained and untrained subjects; therefore, it is difficult to make inferences about non-
A ubiquitous definition of Fs is still elusive. In the previous literature, the Fs was defined
based on either the acoustical approach or the physiological approach. The acoustical approach
indicates that the Fs is recognized by a raised cluster of formants 3, 4, and 5 in the acoustic
spectra; the precise amt of increase in this energy remains ill-defined (Sundberg, 1974). The
physiological approach indicates a lengthening of the singer’s vocal tract by lowering the larynx
and expanding the pharynx. In the current study, we sought to define Fs relative to its percept. As
such quantitative and categorical analyses of the acoustic spectra were compared to listeners’
judgments of a vocal ring. Our data showed that the Fs was found not only in the Western
classically trained singers but also the traditional Chinese opera singers. Specifically listeners
heard a vocal ring in samples that evidenced high frequency peaks in their spectra. However, the
high frequency energy found in the Chinese samples seemed to be somewhat different from that
132
found in the Western singers; that is, the center frequency of the peak was higher in the Chinese
singers than in of the Western singers. Results from the acoustic analysis also showed that
bandwidths of the Fs in the Chinese singers were broader than those of the Western singers.
These differences suggest that the traditional Chinese opera singers may manipulate the vocal
tract differently than Western opera singers to generate the Fs. Further investigation of vocal
tract control is needed to understand how Chinese opera singers produce the Fs. It is clear,
however, that these physiological adjustments yield a Fs that is distinct in the acoustic spectra.
As noted earlier, the Fs serves to amplify singers’ voices above the level of the orchestral
accompaniment. Sundberg (1970, 1978) suggested that the Western symphony orchestra has its
highest level of sound in the vicinity of 450 Hz and the amplitude declines abruptly above that
frequency. Therefore, one might expect that the frequency of the Fs would vary depending on the
Chinese opera singer typically have a higher frequency range than instruments of the Western
orchestra (Guy, 2003). That is, the spectrum of the Chinese orchestra has high energy that
extends through the high frequency region and gradually decreases beyond 4000 Hz. Therefore,
we suspect that because the Chinese orchestra includes a high energy extending throughout a
broader spectral range (both high and low regions), Chinese singers need to generate the Fs at a
higher frequency than is seen in the Western singers: the production of the Fs at a higher
frequency may be accomplished by using a higher F0. In summary, we hypothesize that singers
generate the Fs differently to overcome the various orchestras depending on their cultural and
133
A primary shortcoming of previous studies of the Fs is that they did not relate their
acoustic measurements to specific judgments about the listeners’ perceptions. The assumption
based on Bartholomew (1934) was that the vocal ring was the perceptual attribute of the Fs. If
the Fs causes the percept of the vocal ring, then perceptual judgments are needed about the
presence of the vocal ring. Based on Bartholomew’s assumption, results from the perceptual
rating of this study showed that listeners heard the vocal ring not only in the Western classically
trained singers but also the traditional Chinese opera singers. However, results also showed that
the perceptual judgments of the Fs were not consistent across all listeners. Although listeners
were able to reliably differentiate the singers who had the vocal ring and singers who did not
have the vocal ring in both traditional Chinese singers and the Western classically trained
singers, they were less confident of these judgments for the Chinese singers. We found that there
were many factors that may influence the listeners’ perceptions of the Fs such as language and
technique variations; listeners’ training; singer’s abilities; and bias in instructions, which will be
discussed below.
In this study, we found that although listeners were able to perceive the vocal ring in the
Chinese singers, they expressed that the uncertainty of these judgments might have been affected
by their unfamiliarity with the Chinese language and the different singing technique. Recall that
listeners were professionally trained classical Western singers and were familiar with the
languages and the singing technique; however, listeners were not familiar with the traditional
Chinese singing technique and language which distracted the listeners’ judgments of the Fs.
Although the lack of familiarity with the Chinese language seemed to have distracted the
134
listeners’ perception, listeners expressed that the vocal ring was still identified in certain
common vowels such as /a/, /i/, /u/, and /e/ within the musical phrase.
As noted before, 13 listeners (highly-trained musicians) were used in this experiment, yet
the reliability for the perceptual judgments varied across these listeners. Although the listeners in
this study were highly skilled musicians and were familiar with identifying the presence or
absence of a vocal ring, they may not be experienced with using rating scales. Studies on speech
perception and discrimination commonly use perceptual training and multiple testing sessions to
obtain more stable results. In the present study, there was only one short practice session and one
test session for most listeners. In future studies, investigations of the perceptions of the Fs should
provide greater training. For example, listeners could receive specific training of the Fs until they
are familiar with its definition. In order to achieve that, multiple training sections could be
provided for the listeners until their responses match the qualitative analysis of the Fs to the same
criteria level. Only then will listeners be exposed to the experimental material and asked to make
judgments about the vocal ring. In addition, listeners should repeat the perceptual tests multiple
times to determine the reliability of the perceptual judgments. Also, it may be useful to use
One of the factors that affected the outcome of this experiment is singers’ ability in terms of the
singing tasks and methods given in this study. As we mentioned in the previous chapter (p.77),
classically trained Western singers were comfortable with substituting a particular vowel for the
text of a whole musical phrase because it was one of the basic routines of their vocal training.
However, the traditional Chinese opera singers were not comfortable with such singing method
and were unable to project their full voice; therefore, there were more samples perceived as
135
having a strong vocal ring in the regular singing phrase than the phrase sung with the single
vowel. In future studies, it is important to find the common singing tasks for both groups so that
Another factor that may have influenced perception relates to the instructions given to the
listeners. That is, in our instructions, listeners were not specifically told that these samples might
have or might not have the vocal ring. Because the practice session emphasized the presence of
the vocal ring and the instructions asked listeners to rate the ring, listeners may have had a bias
toward expecting all samples to have the Fs. Therefore, it is possible that listeners rated some
samples as “not sure,” (as if they sometimes heard the ring and sometimes not) rather than
indicating that “no vocal ring” was heard. In future research, instructions should either state that
some samples may not have the vocal ring or some tokens that do not have the rings should be
Another possibility is that the Fs and the vocal ring are not the same thing. The questions
asked in this paper were based on the assumption that the vocal ring and the Fs were the same
thing and the entire analysis was guided by this precept. However, it’s possible that people
weren’t hearing the Fs but something else that singers manipulated to generate a ringing tone.
For example, evidence in the previous literature suggested that the sopranos do not have the Fs;
however, they still project a loud and ringing sound that was not due to the high-energy peak at
the Fs region (Carlsson and Sundberg, 1992; Sundberg, 2001). Unlike Western sopranos, results
from the spectral analysis of the traditional Chinese singers in the current study showed a
relationship between how listeners perceived the ring and the presence of the high frequency
peak at the Fs region. However, there is still a possibility that what the listeners in this study
136
heard in those Chinese singers may not have been the Fs but something else to help project the
voice. Therefore, in future study it is critical to refine what the listeners are asked to identify. It is
important to clarify whether listeners identify the ring that may be caused by other mechanisms
or identify the ring that is the Fs. In order to find a better clarification between the Fs and the
ringing tone, it is also important to conduct a perceptual rating of vocal ring on sopranos and to
match their acoustic signals in order to see what acoustic signals may lead to the percept.
What factors impacted the Fs (Factors investigated were analysis procedures and singing
materials)?
Another issue that may have yielded different results across previous studies is the
different methodologies used to investigate the Fs. Although the method of investigation was not
expected to impact the Fs, this assumption had not been tested previously. Some of the studies
investigated the Fs by using the short-term spectra (Schutte & Miller, 1985; Sengupta, 1990;
Sundberg, 2001; Seidner et al., 1985; Weiss et al. 2001), some of the studies used LTAS
(Rossing, 1986; Ternstrom & Sundberg, 1989; Ross, 1992; Cleveland et al. 2001; Sundberg,
2001) or other quantitative (spectral) procedures (Bloothooft & Plomp, 1985, 1986, 1987;
Omori, et al. 1996; Lundy et al. 2000), yet, no previous study had investigated several
differentmethods on the same samples. In the present study, we found that different methods
Our results showed that the categorical analysis of the LTAS--with the criteria of the Fs
defined by a peak around 2300-3500 Hz with a bandwidth less than 700 Hz-- is most consistent
with perceptual judgments of vocal ring. This relationship between the acoustic cues and percept
137
may be explained by Gentner's (1980, 1983, 1989) basic assumption of structure-mapping
theory. According to Gentner, our psychological concepts have structures that relate percepts and
objects. Based on the relations represented in the concept structures, people have the ability to
recognize and map one structure onto another according to a similarity comparison. In the
present study, listeners may have perceived the pattern of the spectra such that high frequency
peaks were contrasted with other parts of the acoustic signal. As such, the categorical analysis of
the acoustic spectra may represent listeners’ strategies better than the quantitative analysis.
Unlike the categorical analysis, quantitative analysis only considered overall energy differences
not spectral patterns. Although the categorical analysis is useful to determine if Fs is present, it
may not reflect the production variables. A quantitative measurements that can define the Fs,
may provide better insight into how the vocal mechanisms that can be used to generate the Fs. It
may be useful to further study the development of a quantitative index of the Fs to find for an
In the current study, we also investigated other quantitative analyses (F0 and intensity)
that were used in previous studies (Sundberg, 1973; Shutte & Miller, 1985; Seidner et al. 1985;
Cleveland & Sundberg, 1985; Bloothooft & Plomp, 1985; Sengupta, 1990), and the results from
the analysis of F0 in our study were not consistent across the Chinese and Western singers. That
is, our results showed that the Western singers who were perceived to have a strong vocal ring
had values of F0 up to 392 Hz, whereas singers that were heard as “not sure” had F0 above this
point. In contrast to the Western singers, Chinese singers who were perceived to have the strong
vocal ring had a F0 that exceeded 392 Hz. This suggests that there may be different acoustic cues
that can signal the Fs in the two different styles. Results from our intensity analysis, however,
138
agreed with previous studies (Sundberg, 1973; Cleveland and Sundberg, 1985) and showed
higher intensity ranges in all samples (both Western and Chinese groups) that were rated as
Another quantitative analysis, L3–L1 of the LTAS analysis, was problematic in this study
due to the overlap of F0 and F1. Because L3-L1 depends on the determination of F1, it is very
important to find a solution to differentiate the F1 from the F0 if this metric is to be used. It is
interesting that previous investigators have not noted this problem nor how they overcame this
problem if it did occur. Furthermore, many previous studies investigated the Fs by using a short-
term spectral analysis; however, this procedure also was not adequate in all cases. The problem
with this procedure is that normal singing may not always include sustained vowels so there is
not a sufficient steady-state sample to yield for spectral analysis. One interesting situation that
we found during the investigation of the L3-L1 is that there were many samples that showed a
higher L3 than L1 for both the short-term spectrum and the LTAS (see tables 18-21). The reason
for the relatively high L3 level is not clear; however evidence of a higher L3 than L1 has not
been reported in previous literature. Finally, it was found that the Fs should not be determined
only by measuring the L3-L1 because different singers may use different mechanisms to
generate the Fs; some of the factors may reflect the Fs acoustically, but may not be heard
perceptually. The use of only one measurement domain may result in incorrect conclusions about
the Fs; therefore, the Fs should be investigated both acoustically and perceptually.
One factor that we found to be closely related to the Fs and which has not been identified
in previous studies is that different materials also impact the Fs. In our study, we compared the
different singing materials (regular singing phrase, and phrase sung with the vowels /a/, /i/ and
139
/u/) by controlling the musical phrase of each singer. Results of our categorical and quantitative
analyses showed a significant difference between the singing materials. Previous studies
generally used only one type of singing material to investigate the Fs. Some studies investigated
one vowel in their study and some investigated multiple vowels; some studies investigated
sustained sung vowels and some used the musical phrase with a complex test. Perhaps the
inconsistent findings across these studies can be related to the different materials that were
analyzed. For example, Schutte and Miller (1985) measured the short-term spectrum of the
vowel / ɔ / and found the level of the Fs was around –7 whereas Sengupta (1990) found the level
of the Fs was around -4 for the vowel /a/. Bloothooft and Plomp (1986) used the average of all
vowels and found about –20 for level of the Fs. Thus, data pooling and differing singing
Another aspect that demonstrated the importance of the singing material on the Fs is
related to the singers’ familiarity with the material. It was found that the Chinese singers had
difficulty performing some of the tasks with their full voice because these tasks are not typical in
Chinese opera training. For example, Chinese singers had difficulty substituting the musical
phrase with single vowels; they also had difficulty gliding up and down the musical scale. As
results of the perceptual task showed, listeners’ judgments and reliability also reflected the
Data from the acoustic analysis further support the hypothesis that singing materials
influence the presence of the Fs. In our study, there were three languages used in the singing
phrase (German, Italian and English) by the 10 Western singers: 5 sang in Italian, 1 sang in
English, and 4 sang in German (Appendix B). Most of the singers who had the Fs sang the Italian
140
and English repertoires (5 out of 6) and none of the singers who sang the German repertoire had
a strong vocal ring. This suggests that the different languages, singing styles, and repertoires may
affect the Fs. For example, most of the Italian repertoires sung by these singers were chosen from
an opera which requires singers to overcome a loud orchestra. Most of the German repertoires
selected for this study were German Lieder for which only piano accompaniment was required;
therefore, the Fs might not be necessary in this kind of repertoire. The effects of different
repertories and languages across singing styles should be investigated in future studies of the Fs.
In most previous studies of the Fs, perceptual judgments were made by a single listener
(generally the researcher) and often only one singer was studied. Therefore, the acoustic
characteristics of the Fs may be questioned. For example, Schutte and Miller (1985) investigated
one Western male singer and suggested that the bandwidth of the Fs was constant over the vocal
range, up to 440Hz for the vowel /ɔ/. Our findings do not confirm this because of the inconsistent
bandwidths across the sung phrases (regular sensing phrase, phrases sung with the vowels /a/, /i/
and /u/). Seinder et al.’s (1985) study found that a tenor had a broader bandwidth of the Fs than a
baritone and bass singer; however, only one singer was investigated in each vocal category. In
our study, two different vocal categories, tenor and baritone (5 for each) from the Western
groups, were investigated. Results showed that there was no consistency for the bandwidth
within each vocal category. The different findings may relate to the number of subjects used in
these studies.
What is the impact of the independent and combined factors on the Fs? And how do they differ
141
Previous studies identified many factors that impact the Fs in Western classical singers
and these factors include both phonatory and vocal tract adjustments to emphasize higher
frequencies during singing. Results of our study support the hypothesis that many different
phonatory (e.g., F0 and intensity) and articulatory alterations (e.g., phonetic content) can
generate the Fs. Studies also showed that other factors such as F0, intensity, vowel configuration
could be adjusted to generate the Fs (Sundberg, 1970; Bloothooft & Plomp, 1984, 1985, 1986;
Schutte & Miller, 1985; Seidner et al., 1985; Cleveland & Sunberg, 1985). Physiological studies
show that when a Western opera singer lowers his larynx he also expands his pharynx so that the
cross section area of the pharynx tube is 6 times larger than the epilarynx tube. These
physiological adjustments allow the epilarynx tube to become a separate resonator that generates
the Fs (Sundberg, 1973; 1974; Titze & Story, 1997). Other studies of Western opera singers
indicate that the magnitude of the Fs increases with F0 until the F0 exceeds 392 Hz; the Fs
Results of the current study show that the Chinese singers who had the Fs had a higher F0
range than singers who were not perceived to have a strong vocal ring. Because these results are
at odds with studies of Western singers (Bloothooft & Plomp, 1985; Schutte & Miller, 1985;
Seidner, 1985), we suspect that the traditional Chinese opera singers may manipulate the vocal
tract differently from the Western opera singer to generate the Fs. The higher F0 associated with
the Fs in the Chinese group would suggest a higher larynx position than in singers without the
Fs. That is, laryngeal elevation commonly occurs at a higher F0 (Sundberg, 1977; Titze & Story,
1997). Only one study (Wang, 1985) undertook a physiological investigation of Chinese opera
singers. His study demonstrated that Chinese opera singers do have an elevated larynx position
142
when they produce the Fs. This is in contrast to Western classically trained singers who typically
lower the larynx to produce the Fs (Sundberg, 1974). Chinese singers may use other vocal tract
configurations to generate the Fs without lowering the larynx. However, as with most studies of
the Fs, physiological data generally have been collected only to show the articulatory
configuration in the Western classically trained singing. Additional physiological data from
different singing cultures are needed to determine the possible articulatory and phonatory
Carlsson and Sundberg (1992) studied the tuning of the vocal tract and suggested that
singers tune their two lowest formant frequencies to harmonic partials in order to enhance the
overall radiated sound level. Carlsson and Sundberg also indicated that for high-pitch singing,
when the F0 is higher than the first formant, sopranos were found to adjust their vocal tract to
raise their first formant to a frequency just above the fundamental. Carlsson and Sundberg
suggest that this approach increases the sound level significantly, but does not generate the Fs.
The strategies that the Chinese singers in our study applied while singing seem to be comparable
to how Western sopranos produce the voice (higher F0 with a rising larynx position). If
traditional Chinese opera singing is similar to that of Western soprano singing, then the Fs would
not be expected in these singers; however, results from this study showed that some of these
Chinese singers produced the Fs. This is consistent with Wang’s (1985) study; that is, the Fs can
still be generated without a lowered larynx position. We further hypothesize that other non-
Western singing techniques can generate the Fs by using different vocal tract configurations.
This is shown by our results for the high F0 range, when the Chinese singers may generate the Fs
by using different vocal tract configurations in order to overcome the raised larynx position.
143
Another interesting comparison is found between singers W6 and W7 who performed the
same repertories with the same F0 and almost the same intensity levels; however, singer W6
exhibited no Fs and singer W7 was found to have the Fs. We hypothesize that singer W7 had a
different vocal tract configuration than W6. Therefore, there appears to be an interaction between
vocal tract configuration, perhaps caused by operatic skill and the material that is sung.
Moreover, some singers could generate the Fs better in certain vowels and other singers
generated the Fs better in other vowels. Interestingly, we found that many singers who exhibited
the Fs in the regular singing phrase sang passages that contained more vowels that have been
shown to generate the Fs. Perhaps these singers know, at some level, what vowels benefit or
detract from their vocal quality so that they chose their repertoires accordingly. In addition, there
were many singing samples rated as “not sure” as the listeners sometimes heard the vocal ring
and sometimes not in the experiment. This category may have been used more because listeners
only heard the Fs over certain parts of the musical phrase. This is consistent with our findings in
which different factors and singing materials signal the Fs depending on each individual singer.
In future studies, it is important to find some common singing tasks with which all subjects are
familiar so that comparisons of vocal quality can be made in the absence of secondary
influences.
Conclusion
We conclude that the Western classically trained singing is not the only technique that
affects the Fs. The Fs was found in the traditional Chinese opera singing technique; however,
this high frequency energy seems to be somewhat different from what has been described for the
Western singers. This may be caused by the different singing styles manipulated by different
144
vocal tract configurations. Our findings showed that the perceptual judgments are necessary to
investigate the presence or absence of the Fs; however, the differences or similarities between
the Fs and the vocal ring still need to be clarified in future study. The initial goal of this study is
based on the comparison between the listeners’ perceptual judgments and other acoustic
measures. We suggest that the categorical analysis of the LTAS is a good reflection of perceptual
judgments of the Fs. Other analyses such as quantitative analyses might not be appropriate tools
to determine the Fs, but they may provide insight into the mechanisms that generate the Fs. All
factors such as singing technique, F0, intensity, vowel quality, and singing material impacted the
Fs. These factors either interacted or traded-off to signal the Fs in individual singers. Further
Finally, the primary importance of this study is to obtain a better understanding of the possible
vocal tract and laryngeal actions that impact the productions of the Fs. That is, in vocal
pedagogy, describing to the singers what they should do physiologically (e.g., lowering the
larynx) may not be the best method; rather, helping singers to realize and elongate what factors
that affect them the most to generate the Fs may be more appropriate.
145
Appendix A
Questionnaire
How many years of voice training did you have before age of 18 and after age of 18?
How would you describe the type of training that you had?
What is your voice range? Lowest note ________________ highest note ____________
Do you round your lips while singing vowels? ______________ or do you retract your
146
Notes for sound level meter:
147
Appendix B
W1 W2 W3 W4 W5
W6 W7 W8 W9 W10
148
Appendix C: Rating sheet
1. 1 2 3 ______________________________
2. 1 2 3 ______________________________
3. 1 2 3 ______________________________
4. 1 2 3 _______________________________
5. 1 2 3 _______________________________
149
6. 1 2 3 ________________________________
7. 1 2 3 _______________________________
8. 1 2 3 _______________________________
9. 1 2 3 ________________________________
10. 1 2 3 ________________________________
11. 1 2 3 ________________________________
12. 1 2 3 ________________________________
13. 1 2 3 ________________________________
14. 1 2 3 ________________________________
15. 1 2 3 _________________________________
16. 1 2 3 ________________________________
150
17. 1 2 3 _________________________________
18. 1 2 3 _________________________________
19. 1 2 3 _________________________________
20. 1 2 3 ________________________________
21. 1 2 3 ________________________________
22. 1 2 3 ________________________________
151
1. 1 2 3 ______________________________
2. 1 2 3 ______________________________
3. 1 2 3 ______________________________
4. 1 2 3 _______________________________
5. 1 2 3 _______________________________
6. 1 2 3 ________________________________
7. 1 2 3 _______________________________
8. 1 2 3 _______________________________
9. 1 2 3 ________________________________
10. 1 2 3 ________________________________
11. 1 2 3 ________________________________
152
12. 1 2 3 ________________________________
13. 1 2 3 ________________________________
14. 1 2 3 ________________________________
15. 1 2 3 _________________________________
16. 1 2 3 ________________________________
17. 1 2 3 _________________________________
18. 1 2 3 _________________________________
19. 1 2 3 _________________________________
20. 1 2 3 ________________________________
21. 1 2 3 ________________________________
22. 1 2 3 ________________________________
153
References
Bloothooft G & Plomp R. (1984). Spectral analysis of sung vowels. I. Variantion due to
differences between vowels, singers and modes of singing. J Acoust Soc Am., 75(4),
1259-1264.
Bloothooft G & Plomp R. (1985). Spectral analysis of sung vowels. II. The effect of
Bloothooft G & Plomp R. (1986). The sound level of the signer’s formant in
Carlsson G & Sundberg. J. (1992). Formant frequency tuning in singing. J Voice, 6 (3),
256-260.
guide.com/about-china/beijing-opera.shtml
Chinavoc. (2002). Chinese traditional opera: Beijing opera- roles in Chinese opera, from
Cleveland T.F. & Sundberg J. (1985). Acoustic analysis of three male voices of different
Cleveland T.F., Sundberg J., & Stone R.E. (2001). Long-term-average spectrum
characteristics of country singers during speaking and singing. J Voice, 15, 54-60
Detweiler RF. (1994). An investigation of the laryngeal system as the resonance source
voices and the dimension of supraglottal cavities. Folia Phoniatr (Basel) 1979; (31): 238-
41.
Ferguson S., & Kewley-Port D. (2002). Vowel intelligibility in clear and conversational
speech for normal hearing and hearing-impaired listeners. J. Acoust. Soc. Am., 112 (1),
259-271.
Ferguson, S., & Kewley-Port, D. (2008). Talker differences in clear and conversational speech:
Hines Jerome. (1990). Great singers on great singing. New York: 5th Limelight Edition.
Hillenbrand, J., Getty, L.J., Clark, M.J., & Wheeler, K. (1995). Acoustic characteristics
Hollien, H, & Shipp, T. (1972). Speaking fundamental frequency and chronologic age in males.
Hsu Y.L. (1992). A Comparison of the Vocal Techniques in Peking Opera and Bel Canto
Kent R.D., & Read Charles. (1992). The acoustic analysis of speech. Singular
Lıfqvist A., & Mandersson B (1987). Long-time average spectrum of speech and voice
155
Liberman, A. M. Harris, K. S., Hoffman, H. S., & Griffith, B. C. (1954). The discrimination of
Leino T. Long-term average spectrum study on speaking voice quality in male actors. (1994). In
Lundy D.S, Roy S, Casiano R.R, Xue J.W., & Evans J. (2000). Acoustic analysis of the
Mendoza E, Munoz J, Naranjo N.V. (1996). The long- term average spectrum as a
Nawka T, Anders LC, Cebulla M., & Zurakowski D. (1997). The speaker's formant in
Oliveira-Barrichelo V.M., Heuer R.J., Dean C.M., & Sataloff R.T., (2001). Comparison
of singer’s formant, speaker’s ring, and LTA spectrum among classical singers and
Omori K, Kacjer A, Carroll L.M, Riley W.D., & Blaugrund SM. (1996). Singing power ratio:
Quantitative evaluation of singing voice quality. J Voice, 10(3), 228-235.
Picheny MA., Durlach NI., & Braida LD. (1986). Speaking clearly for the hard of hearing II:
Riesz, R. R. Different intensity sensitivity for the ear for pure tones. Phys. Rev. 1928;
31:867-75.
156
Roederer JG. (1972). Introduction to the physics and psychophysics of music. London,
Ross J. (1992). Formant frequencies in Estonian folk singing. J. Acoust. Soc. Am.,
91(6), 3532-3539.
Rossing TD, Sundberg J., & Ternstrım S. (1986). Acoustic comparison of voice use in
Sataloff R.T. (1998). Vocal health and pedagogy. Singular publishing Group, Inc. San
Diego, CA.
Schutte HK, Miller R. (1985). Intra-individual parameters of the singer’s formant. Folia
Sengupta. R. (1990). Study on some aspects of the “Singer’s Formant” in north Indian
Seidner W, Schutte H.K., Wendler J., & Rauhut A. (1985). Dependence of the high
singing formant on pitch and vowel in different voice types. SMAC83 Conference
Shower EG., & Biddulph R. (1931). Differential pitch sensitivity of the ear. J Acoust
Stone RE Jr, Cleveland TF., & Sundberg J. (1999). Formant frequency in country
Su WH & Forrest K.M. (2002, June). An acoustic study of the singer’s formant: The
comparison between Western classical and traditional Chinese opera techniques. Paper
presented at the 31st Annual Symposium of the Voice Foundation on the Care of the
157
Professional Voice, Philadelphia, PA.
female opera singers: Cultural and stylistic differences. Paper presented at the 29th
Annual Symposium of the Voice Foundation on the Care of the Professional Voice,
Philadelphia, PA.
Sundberg J. (1970). Formant structure and articulation of spoken and sung vowels.
25, 71-90.
Sundberg J. (1977). The acoustics of singing voice. Scientific American, 236(3), 82-4,
86, 88-91.
Sundberg J. (1987). The science of the singing voice. DeKalb: Northern Illinois
University Press.
Sundberg J. (2001). Level and center frequency of the singer’s formant. J Voice, 15 (2),
176-86.
Ternstrım S., & Sundberg J. (1989). Formant frequencies of choir singers. J Acoust Soc
158
Am., 86 (2), 517-522.
Titze I.R., & Story B.H. (1997). Acoustic interactions of the voice source with the lower
Vennard W. (1967). Singing: the Mechanism and the Technique. New York: Carl
Fischer Inc.
Wang S. (1985). Singing voice: Bright timbre, singer’s formants and larynx positions.
Weiss R., Brown W.S. Jr., & Morris J. (2001). Singer’s formant in sopranos: Fact or
159
Wen-Hui Su
Education
Teaching
Speech Science: Instrumentation and Applications: School of Speech, Language, and Hearing
Sciences, San Diego State University: 2004
Voice counselor at The Cross School of Music, LA, CA: 2004- 2006
Associate instructor for a doctoral seminar: “Acoustic Research in Speech, Language and
Hearing Sciences.” Indiana University: 2002
Guest lecturer on “Care of the Professional Voice” at National Taiwan Traditional Chinese
Opera Department in Taipei, Taiwan: 2002.
Teaching assistant for “Videostroboscopy” related to voice and voice disorders. Indiana
University, Speech and Hearing Department: 2000-2002.
Assistant instructor for “Voice Fluency in Children and Adolescents” course. Indiana University,
Speech and Hearing Department: 2000-2001.
Assistant instructor for “Voice Disorders” course. Indiana University, Speech and Hearing
Department: 1999.
1
Guest lecturer on “Care of the Professional Voice” at Voice Clinic at Veterans General Hospital:
Taipei, Taiwan: 1999.
Co-Editor for:
Journal of Speech and Hearing Review (2004).
Moya Andrews and Wen-Hui Su (2004). Voice treatment for Children. Journal of Speech and
Hearing Review.
Wen-Hui Su and Karen Forrest (2003, June). The Influence of Training Technique on the
Singer’s Formant. Paper presented at the 32nd Annual Symposium of the Voice Foundation: Care
of the Professional Voice, Philadelphia, PA.
Hiroya Yamaguchi and Wen- Hui Su (2003, June). Perceptual Evaluations of Voice Samples
Using the GRBAS Scale: Comparison of Listeners from Taiwan and the U.S.A. Paper presented
at the 32nd Annual Symposium of the Voice Foundation: Care of the Professional Voice,
Philadelphia, PA.
Wen-Hui Su & Karen Forrest (2002, June). An Acoustic Study of the Singer’s Formant: The
Comparison Between Western Classical and Traditional Chinese Opera Techniques. Paper
presented at the 31st Annual Symposium of the Voice Foundation: Care of the Professional
Voice, Philadelphia, PA.
Wen-Hui Su, Hiroya Yamaguchi, & Moya Andrews (2001, December). GRABAS Rating by
American and Japanese Listeners. Paper presented at the 3rd East Asian Conference on
Phonosurgery, Taipei, Taiwan.
Wen-Hui Su (2001) An Acoustic Study of the Formant Structure of Voices: Male and Female
Chinese Opera Singers. Doctoral second year project.
Wen-Hui Su & Julia Wood Rademacher (2000, July). A Comparison of the Stability of Basal
pitch measures in young adult singers. Paper presented at the 29th Annual Symposium of the
Voice Foundation: Care of the Professional Voice, Philadelphia, PA.
Wen-Hui Su & Julia Wood Rademacher (2000, July). An Acoustic Study of Vowels Produced by
Female Opera Singers: Cultural and Stylistic Differences. Paper presented at the 29th Annual
Symposium of the Voice Foundation: Care of the Professional Voice, Philadelphia, PA.
Wen-Hui Su & Rahul Shrivastav (1999, June). Relationship Between Conversational Range,
Habitual Pitch and Total Musical Range in Trained Singers. Paper presented at the 28th Annual
Symposium of the Voice Foundation: Care of the Professional Voice, Philadelphia, PA.
2
Honors and Awards
Research grant for Ph.D. dissertation from University of Art and Science. Indiana University:
2002
Travel Grant from the Department of Speech and Hearing Sciences, Indiana University, to attend
the “28th Annual Symposium on the Care of the Professional Voice” in Philadelphia, June 1999
First prize from 7th Chi-Mai Culture Foundation Voice Competition. Taiwan: 1995
Workshops attended
Professional voice/singing workshops. The 32nd Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, June 2003.
Professional voice/singing workshops. The 31st Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, June 2002.
Professional voice/singing workshops. The 29th Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, July 2000.
Professional voice/singing workshops. The 28th Annual Symposium of the Voice Foundation on
the Care of the Professional Voice, Philadelphia, PA, June 1999.