Reymore Huron Timbre Qualia Psychomusicology Prepublicationcopy 2020 PDF
Reymore Huron Timbre Qualia Psychomusicology Prepublicationcopy 2020 PDF
net/publication/342538848
CITATIONS READS
0 164
2 authors, including:
            Lindsey Reymore
            The Ohio State University
            6 PUBLICATIONS   5 CITATIONS   
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Lindsey Reymore on 30 June 2020.
                                    PREPUBLICATION COPY
©American Psychological Association, [2020]. This paper is not the copy of
record and may not exactly replicate the authoritative document published in
the APA journal. Please do not copy or cite without author's permission. The
final article is available, upon publication, at:
https://doi.org/10.1037/pmu0000263
Using auditory imagery tasks to map the cognitive linguistic dimensions of musical instrument
timbre qualia
Author Note
Music, 1866 College Road, The Ohio State University, Columbus, OH 43210-1170, Tel: (772)
Preliminary results from this study were presented at the Timbre 2018 conference and the
15th International Conference on Music Perception and Cognition and appear in the Proceedings
of the 15th International Conference on Music Perception and Cognition. This project is included
Abstract
Two studies are reported related to musical instrument timbre qualia. In the first study, open-
ended interviews were conducted with 23 musicians who were asked to describe their
descriptions. In a second study, 460 musician participants rated subsets of the same 20 imagined
instrument sounds according to the 77 categories derived from Study 1. Principal Component
Analyses were applied to the results of Study 2, yielding several models. Researcher
interpretations of the components in these models were combined with the results of
supplementary polls where musicians rated the descriptive utility of each candidate component,
producing a final 20-dimensional timbre qualia model. The model dimensions include:
Introduction
extraordinary variety of instruments used around the world. In many cultures, more than one
instrument will share similar pitch ranges and intensities, suggesting that differences other than
pitch or intensity are valued in some way. Since most musical lines can be played by more than
one instrument, this raises the question of why a musician might, for a particular passage, choose
one instrument or instrument combination over another. For example, in the second movement
(funeral march) of Beethoven’s third symphony, why does Beethoven assign the solo line to an
oboe rather than a flute or French horn? Various sounds evoke different phenomenological
trumpet is described as “noble,” or a tuba is described as “heavy.” How widespread are such
from?
The first purpose of the current study is to identify the kinds of phenomenological or
subjective experiences commonly associated with the sounds produced by different musical
instruments. In music research, the term quale, or what it is like to experience something, has
been borrowed from philosophy and has been used to represent the “phenomenal character” of a
given musical event (Arthur, 2016, p.4). The concept of qualia has been approached differently
relational and ineffable. Arthur (2016) argues that even though it may be impossible to
communicate musical qualia in words to someone who has not experienced the qualia in
question, those who have had the experiences are nevertheless able to use words to index their
shared descriptions of scale-degree qualia; building on this work, Arthur (2016) reports
experiments with scale degree qualia and rhythm qualia which similarly suggest that musical
components can elicit relatively stable qualia. Accordingly, as described below, we propose to
examine various qualia associated with different tone colors or timbres through the use of verbal
descriptions. Although our investigation would ideally consider a wide range of instruments
from across the globe, our methodology requires participant familiarity with the instrument
sounds. Since we rely on Western-enculturated participants, the scope of our study is necessarily
A second goal of the current chapter is to create a cognitive linguistic model of timbre
characterizations of musical instruments, which we call Timbre Trait Profiles, that can be applied
in analysis. The aim is to be able to triangulate the Timbre Trait Profiles with information about
instrumentation and music theoretical analysis in order to build an understanding of the role of
Empirical studies of musical timbre have been conducted at least since the time of
Helmholtz (1885). In recent decades, timbre research has emphasized the use of
multidimensional scaling (MDS), with pioneering work by Plomp (1970), Wessel (1973), and
Grey (1977). In MDS, paired similarity judgments are used to construct a multidimensional map
of the relationships between sounds. MDS itself does not offer any interpretation of the
presumed underlying dimensions. Instead, it is up to the researcher to infer and interpret the
physical origin or perceptual meaning of the latent dimensions revealed through MDS.
Following Grey’s work, the MDS approach received considerable extension (e.g., Kendall &
Carterette, 1991; Kendall, Carterette, & Hajda, 1999; McAdams, Winsberg, Donnadieu, De
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  5
Soete, & Krimphoff, 1995). In interpreting MDS dimensions, researchers have tended to
approach, which we use here, is to use language as a starting point for the investigation of the
Several studies have addressed the semantics of timbre. Many of these studies have found
three to four semantic dimensions for timbre space (e.g. Pratt & Doak, 1976; von Bismarck,
1974). A 1974 study by von Bismarck is considered by some to be the first comprehensive study
of timbre semantics (Saitis & Weinzierl, 2019). Von Bismarck asked participants to use bipolar
scales to rate synthetic harmonic complex tones and noises with systematically varied spectral
envelopes. Factor analysis of the ratings yielded four factors explaining more than 80% of the
variance. These factors were defined as dull-sharp, compact-scattered, full-empty, and colorful-
colorless. Following up on these results, Kendall and Carterette (1993a) carried out a study in
which participants rated dyads produced by wind instruments using von Bismarck’s semantic
differentials. While the differentials did not result in successful differentiation among timbres, a
different version of the experiment in which verbal attribute magnitude estimation (VAME) was
concluded that von Bismarck’s adjectives were not ecologically valid. Thus, in the second part of
their study (1993b), dyads were rated instead on a list of adjectives from Piston’s Orchestration
(1955). The four-factor model resulting from this study accounted for over 90% of the variance
and included factors interpreted as power, strident, plangent, and reed. Elliot, Hamilton, and
Theunissen (2013) triangulated MDS with acoustical analyses and discriminant analysis of
dimensional space. Zacharakis, Pastiadis, and Reiss (2014, 2015) investigated timbre
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                   6
words from a predefined vocabulary list of 30 words, and results from both languages were
found to be reducible to three dimensions using factor analysis, interpreted by the authors as
Wallmark (2019a) reports a corpus linguistic study of orchestration treatises and manuals,
from which he derives seven categories of the timbre lexicon: affect, matter, cross-modal
correspondence, mimesis, action, acoustics, and onomatopoeia. These seven categories are
further reduced to three conceptual dimensions, interpreted as material, sensory, and activity.
Our approach shares a key strategy with Wallmark’s, that of deriving dimensionality from
ecological language.
In the current study, we aim to identify labels for timbre qualia dimensions that arise
from a bottom-up selection of descriptive terms provided by participants. This strategy contrasts
with MDS approaches where the researchers themselves provide labels for the implied
dimensions, and where such labels are not necessarily connected to the conventional descriptive
lexicon. We begin our study of timbre by polling musicians directly, asking them to provide
descriptors shared across the musicians’ responses. Our approach aims to go beyond timbre as a
purely perceptual phenomenon to include the broader concept of qualia, that is, of the
phenomenological experience of sound that may extend beyond acoustical and perceptual
correlates to include cognitive, affective, cultural, and other facets. Timbre functions on different
scales of detail, and research has established that timbre is affected by both pitch height and
loudness (Siedenburg & McAdams, 2017). However, the current study invites perceptual
judgments that in principal include natural variations in pitch height and loudness as potential
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                    7
integral components of timbre qualia insomuch as pitch height and loudness are a part of a
Descriptions of instrumental timbre can be gathered via interviews; for example, Traube
(2004) used interviews to investigate the timbre semantics of the classical guitar, while Nykänen,
Johansson, Lundberg, & Berg (2009) similarly employed interviews to examine descriptions of
saxophone timbre. One approach to investigating instrument qualia might have different
instruments play the same passage, and then have listeners describe the different qualia evoked
(e.g., Hailstone, et al., 2009). The selection of suitable stimuli passages raises a number of
questions. A musical passage will convey a number of qualities independent of the nominal
timbre of the performed instrument. For example, the passage itself may be rather quiet or loud,
animated or subdued, somber or joyous, etc. The character of the passage is apt to have a marked
impact on how listeners describe the associated phenomenological qualia. In addition, different
performers are apt to exhibit idiosyncratic interpretations of the passage, as well as variations in
timbre. Even in the case of single isolated instrument tones, there are a number of variables that
can have a marked impact, such as the amount of reverberation, microphone proximity, mode of
When we hear a recording of an actual oboe tone, the tone will differ in various ways
from what we might imagine an oboe tone to sound. The recorded tone may be higher in pitch
than expected, have a drier reverberation, a slightly breathier character, a faster vibrato, etc. That
is, actual sound stimuli commonly deviate from an internalized prototype in various ways.
This experience is not unique to sound. Suppose you are a computer scientist interested in
identifying the features of a human face. You recruit participants and ask them to describe
various faces. When presented with a photograph of a particular face, we tend not identify that
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                                   8
the person has a nose, two eyes, and a mouth (the type description). Instead, we offer
descriptions like aquiline nose, close-set eyes, and small mouth (token description).1 Such
descriptions make sense only because they identify deviations from an average or prototypical
face—an intersubjective template that is shared between observers. That is, descriptions tend to
Similarly, when asked to describe a recorded oboe tone, there is a temptation to describe
the token rather than the type. In much timbre research (such as ours), the goal is to identify the
timbral qualities of a type rather than a token. That is, we aim to characterize the qualia features
of an “oboe,” rather than the qualia features associate with a particular recording of an oboe.
One might identify two approaches whose goal is collecting type rather than token
descriptions. One approach is to expose participants to a large number of token instances and
then seek the commonalities. For example, one might have listeners describe 40 contrasting oboe
recordings, including different players, different tempos, different pitches, different reverberant
environments, different vibratos, different musical styles, different microphone proximities, and
so on. In analyzing the results, the challenge is to eliminate the token descriptions while retaining
the type descriptions—a task that must be done without any a priori knowledge of which
An alternative approach might endeavor to tap into listeners’ existing mental stereotypes
or cliché images of typical sounds produced by different instruments. Specifically, one might ask
participants to describe imagined sounds rather than actual sounds. The assumption is that mental
images of instrument sounds are more likely to represent prototypical instrument characteristics
1
 Note that we intend the terms type and token to be taken generally here; we do not intend to imply the narrower
definitions of the terms used commonly used in linguistics.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                9
This “imagined sounds” approach raises two critical methodological questions. First, how
vivid are these imagined sounds? Secondly, how closely do these imagined sounds approximate
With regard to the vividness of imagined sounds, Halpern, Zatorre, Bouffard, and
Johnson (2004) conducted a multidimensional scaling task and showed that people compare
neuroimaging results confirmed that activity in the auditory association cortices is present for
both perceived and imagined timbre. Similarly, Tużnik, Augustynowicz, and Francuz (2018)
trained participants to imagine synthesized auditory stimuli varying in timbre and found that
suggest that musicians perform timbre imagery tasks more accurately than do non-musicians.
Although these results are suggestive, of course it remains possible that perceived and imagined
sounds nevertheless recruit different networks and may rely on different auditory features.
Regarding the second question of the prototypicality of what people imagine, pertinent
findings are reported in Huron, (2006, Chapter 4). Huron asked musicians to imagine a wide
variety of sounds, including individual tones, chords, rhythms, etc. When imagining a single
tone, for example, he found that musician participants tend to imagine tones very near the precise
center of the distribution for actual pitches in Western music. Specifically, the mean pitch for
imagined tones was just two semitones away from the actual mean pitch for a large sample of
music. Similarly, when asked to imagine a chord, musicians tend to imagine a major chord in
root position—the most commonly occurring chord in Western music. When asked to imagine
any rhythm, musicians tend to imagine the most commonly occurring meter at the most
commonly occurring tempo. In general, whether imagining pitches, scale degrees, chords, or
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               10
rhythms, Huron found a high rate of intersubjective agreement among his participants suggesting
In the current study, when asked to imagine an oboe tone, for example, many participants
reported imagining a “tuning A” tone commonly played by an oboe when initiating an orchestral
tuning session. One might question whether a “tuning A” oboe tone represents a prototypical
oboe sound. Nevertheless, for non-oboe players, the “tuning A” tone is likely the most common
In summary, although the use of “imagined sounds” may initially seem questionable for
timbre studies, the evidence nevertheless suggests that, especially for musician participants,
imagined sounds can be remarkably vivid, and imagined sounds are more likely to be
actual sound stimuli. If the goal is to identify type rather than token timbral features, imagined
sounds may well be superior to actual sound recordings. Finally, we should note that an
important attraction of the imagined sound method is that it is easier to administer and avoids
many thorny issues related to selecting or recording nominally representative stimuli. Ultimately,
we anticipate that both the value and limitations of the imagined sound method will become
The purpose of this study is to build a timbre qualia model that might prove useful in
descriptions of instrument timbres. Constructing the model involves two phases: an exploratory
phase intended to solicit a wide range of timbre descriptors and a model-building phase in which
the large number of descriptors identified in the exploratory phase is distilled to a more
parsimonious model.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                 11
Preliminary Study
participants characterize a wide range of contrasting and diverse sounds in order to capture as
much descriptive variance as possible. One way to achieve such a breadth would be to include a
large number of contrasting musical instruments. For example, a suitable list might include an
African mbira, Chinese er-hu, Australian didgeridoo, Western violin, Indonesian peng ugal, etc.
As will be described later, our methodology requires that the instruments used as stimuli be
familiar to our participants. Consequently, in conducting the qualia survey, it was necessary to
restrict the target sounds to familiar Western musical instruments. Given the criterion of
familiarity, we assembled a list of 44 instruments (shown in Table 1) that are likely to be familiar
large number of descriptive terms. Because our intention is to use these descriptive terms in a
subsequent study, an additional concern was that Study 2 did not become excessively long.
list, the greater the tractability of the study. At the same time, reducing the number of
order to select a subset of the instruments from Table 1 considered to be maximally diverse in
timbre. In order to minimize experimenter bias, we recruited 17 independent musician judges for
this preliminary study (for participant demographics, see Appendix A.1.). Given the instruments
in Table 1, the participants were asked to create a subset of 20 instruments exhibiting the most
contrasting timbres. In presenting the list of instruments, the order was uniquely randomized for
each participant.
contrasting timbres. However, no two sets contained the same 20 instruments. Consequently, the
17 sets were used as input to a computational procedure intended to generate the most timbrally
contrasting instrument group based on the musicians’ judgments. For each set of 20 instruments,
all possible pairs of instruments in the set were tallied. These tallies were then combined across
all 17 sets. This resulted in an aggregate dissimilarity score for each pair of instruments; the aim
is to discover which set of 20 instruments produces the highest aggregate dissimilarity score. The
final selected optimal set of maximally contrasting instruments is shown in Table 2.2
2
  It should be noted that not all 1.8 trillion 20-instrument combinations were evaluated, so there is no guarantee that
the apparent best set is the true optimum.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                 13
Study 1: Interviews
Having selected 20 contrasting instruments, Study 1 asked musician participants to imagine the
sounds produced by these 20 instruments and to describe their phenomenal experiences of the
imagined sounds. Descriptions were transcribed in situ and the comments later analyzed for
content.
Method
Previous experience with qualia research suggests that participants dislike typing, and that
participants are likely to provide more verbose descriptions in the presence of another person
than when alone. If asked to type an open-ended response, participants typically type a few
words before wanting to move on. On the other hand, participants can be quite effusive when
speaking, especially in conversation. Accordingly, interviews were conducted live, with the
researcher typing a verbatim transcript of the comments while the participant spoke. Participants
were asked to imagine the sound of a specific instrument, to make ratings of familiarity and
vividness, and then to re-imagine and describe the sound they were imagining in as much detail
as possible. Detailed instructions for the interviews are included in Appendix B.2.
After reading and listening to the instructions, participants were asked to imagine and
describe the sounds of the target instrument. Each participant described the 20 instrument sounds
(Table 2) in a unique random order. As evident in the instructions, participants rated the
vividness of the imagined sound, as well as their familiarity with the sound of the instrument.
Additionally, if a participant’s primary instrument was not on the list, the participant was further
Appendix A.2.) Most interviews ranged between 40 and 60 minutes in duration, though some
interviews lasted 90 minutes or more. In post-interview follow-ups, participants were asked their
impressions of the task. Nearly all participants reported that the task was somewhat or
considerably difficult. Despite these reservations, our participants nevertheless were able to
describe at length all of the instruments involved in the study. Moreover, in the debriefing
sessions following the interviews, all participants responded positively to the task, including
Results
Each of the 23 participants described the target 20 instruments. Some participants, whose
primary instrument or instruments were not one of the target instruments, opted to describe their
own instrument(s) in addition to the target 20. All together participants provided 477 descriptions
of individual instruments. Each instrument description was parsed into terms or phrases,
hereafter referred to as component ideas. Across all instrument descriptions, parsing yielded
4,809 component ideas, representing an average total of 240.5 component ideas for each of the
imagined instrument sounds, across all participants. Ratings of familiarity and vividness were
closely correlated, r(475) = .70. The number of component ideas in a given instrument
description was positively correlated with both familiarity and vividness, r(475) = .28.
The bagpipes, French horn, flute, and cymbals garnered the highest number of average
component ideas per participant (M = 11.8), whereas the wood block, alto saxophone, bass
clarinet, and snare drum garnered the lowest average number of component ideas per participant
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                             15
(M = 8.6). Notice, however, that the range of average number of component ideas for each
Content Analysis
The purpose of the content analysis was to distill each comment into one or more component
ideas. In their responses, some participants tended to provide a perfunctory list of adjectives,
whereas other participants offered more effusive descriptions, sometimes resorting to elaborate
metaphors or narratives. The following three responses illustrate the range of response styles,
with each followed by the list of component ideas arising from the content analysis:
[Oboe.] “Oboe is like grass for some reason. I guess it has a lot of
tonal information. Like it produces more around the tone. It’s very
woody, grainy; it’s bright but with a few shades of darkness, shady.
It’s complex; I find it poignant. It’s mostly yellow with some green
really loud, I don’t know why. Piquant, savory-sweet, but not very
                      tendon/skinny muscle.”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               16
ominous harp. I can’t remember when a harp was ever evil. A very
In describing various instrument sounds, it was not uncommon for participants to relay
particular stories or associations. For example, in describing the wood block, one participant
relayed her experience playing a wood block in a percussion class she had taken. Since personal
personal associations from the content analysis. In addition to eliminating anecdotes and
associations, we also eliminated references to specific musical works (e.g. “Bolero”). These
criteria resulted in the exclusion of some 533 component ideas (i.e., roughly 11% of all collected
comments), narrowing the number of component ideas from 4,809 to 4,276. Subsequently, the
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                           17
modifiers “more,” “most,” “many,” “few,” “less,” “least,” “somewhat,” “a lot of,” and “very”
were also trimmed from the descriptions, as were negations such as “not.”
Eliminating duplicate component ideas, modifiers, and negations from the list of 4,276
resulted in 2,487 unique component ideas—a number deemed too many for manual sorting.
Consequently, we focused on only the 502 component ideas that were mentioned more than
once. Table 3 identifies the 50 most common component ideas. It should be noted that because
we did not control for how many times a single participant used any given term, it is possible
that these counts may be skewed by participants who favored the reuse of certain words. Words
with common roots (e.g., power and powerful) have been combined. Together, these 50 ideas
                       1. round               85
                       2. resonant            84
                       3. warm                83
                       4. loud                83
                       5. low                 73
                       6. bright              69
                       7. clear               67
                       8. metallic            65
                       9. high                58
                       10. sharp              56
                       11. deep/depth         54
                       12. piercing           47
                       13. soft               47
                       14. colorful/color     43
                       15. nasal              40
                       16. airy               39
                       17. full               39
                       18. rich               38
                       19. hollow             38
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                           18
                       20. mellow             35
                       21. pure               35
                       22. buzz/buzzy         35
                       23. voice/vocal        35
                       24. smooth             34
                       25. dark               34
                       26. focused            34
                       27. light              32
                       28. big                31
                       29. cut/cuts/cutting   31
                       30. reedy              30
                       31. open               29
                       32. percussive         28
                       33. ringing/rings      28
                       34. direct             25
                       35. complex            24
                       36. beautiful          23
                       37. sweet              23
                       38. shrill             22
                       39. brassy             22
                       40. sustain            21
                       41. powerful/power     21
                       42. thin               18
                       43. harsh              17
                       44. rough              17
                       45. pretty             15
                       46. fat                15
                       47. gentle             15
                       48. annoying           13
                       49. woody              12
                       50. twangy             12
The 502 component ideas were printed on individual slips of paper that were used in an
ensuing pile sort task (de Munck, 2009; see Appendix B.4. for instructions) that was conducted
independently by both authors. We assembled the 502 component ideas into 59 and 70
categories; Table 4 shows the descriptive labels for the categories from both pile sorts. Upon
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               19
discussion of both analyses, we agreed on a list of reconciled categories, which are reported in
1’s pile sort for the “aesthetic” category included the terms “beautiful,” “elegant,” “cute,” and
“pretty.” The “valence” category included the terms “positive,” “nice,” “lovely,” “pleasant,”
“beautiful/pleasant” category, which included the terms “beautiful,” “lovely,” “nice,” “pretty,”
In reconciling the two lists and naming the reconciled categories, we referred back to the
list of 4,276 component ideas in order to take into account the number of times a given idea
appeared in the transcripts. In the above example, for the various component ideas included in
the three combined pile sort categories (“aesthetic,” “valence,” and “beautiful/pleasant”), the
most frequent word by far was the word “beautiful”—hence the revised name for the combined
category. Notice that the ideas “painful” and “unpleasant” no longer belong in the final resolved
category. It was agreed that these ideas could be accommodated under the reconciled category of
“shrill/harsh/annoying.”
This reconciliation procedure continued until all of the original pile-sort categories from
both experimenters were distilled into 75 descriptive categories shown Table 5 (for a complete
account of both experimenters’ categories and how these categories are related to the reconciled
categories, see online supplemental material). As described above, in providing labels or names
for our final resolved categories, we payed close attention to the frequency of occurrence of
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                            20
different ideas. Hence, for example, in the category “gentle/calm,” the word “gentle” occurred
most frequently, followed by the word “calm.” Other terms belonging to that category, including
Recall that each of the 502 component ideas used in the pile sort task occurred a
minimum of twice in the qualia interview transcripts. As a further check of the inclusiveness of
the 75-category list, we went back to the 2,487 unique component ideas in the transcripts and
assigned each idea to its most appropriate category. Using this procedure, we found that 1,607 of
the 2,487 unique component ideas could be accommodated within the 75-fold classification
taxonomy. Of the unclassified component ideas, the experimenters agreed that 67 of these
orphan component ideas could be accommodated by adding two more categories, deemed
visceral (44) and grainy/gravelly (23). These additional categories are included at the bottom of
Table 5 and indicated by the † symbol. Accordingly, our final taxonomy consists of 77
categories that are able to classify all of the timbre descriptors that occurred more than once, and
second study in which participants judged each instrument according to all 77 categories. This
Method
and composers. Participants (n = 460) were recruited in two ways. First, participants were
recruited via the Internet (n = 399) using email listservs and social media. This subset of
participants took the study in a self-determined location. Second, participants were recruited
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                 24
from the Ohio State University School of Music subject pool (n = 61). Subject pool participants
were second year undergraduate music students and were tested individually in an Industrial
Acoustic Corporation sound attenuation room. All participants took part in the study online using
the same Qualtrics survey. (For participant demographics, see Appendix A.3.)
For each instrument rated, participants were given a list of the 77 component ideas and
asked to rate how well each of the terms described the sound of the instrument as imagined by
the participant. Terms were rated on a scale of 1 (not …) to 7 (very …). Full instructions for this
Recall that we limited the number of instruments used in Study 1 in order to preempt the
problem of having too many judgments for participants in Study 2. However, Study 1 produced
an unexpectedly large number (77) of descriptive categories. Despite our efforts, given the 20
instruments used in Study 1 and 77 descriptors, a full set of ratings would require 1,540
judgments, which remains too many for a single participant. Consequently, in Study 2
participants judged a subset of the 20 instruments. Participants were asked to rate two randomly
selected instruments, resulting in 154 judgments. As noted in the instructions, having completed
As in Study 1, participants were asked to imagine instrument sounds rather than listen to
recorded stimuli. Participants also rated their familiarity with the instrument and the vividness
with which they were able to imagine the sound. Five familiarity ratings were possible:
extremely familiar, very familiar, moderately familiar, slightly familiar, not familiar at all. Data
was retained only for those participants who reported moderate familiarity or better. With regard
to the reported quality of imagining the sound, no data were collected if a participant rated the
Data Quality
In order to better ensure data quality, we established further exclusion criteria to eliminate
responses that deviated significantly from responses by other participants. Specifically, for each
individual set of 77 ratings for a given instrument, we calculated the paired correlation with the
ratings from all other participants for that instrument. A priori we established an exclusion
criterion of r = .25; that is, if a given participant exhibited an average correlation less than r =
.25 with the other participants judging that instrument, then their data was excluded from further
analysis. In order to avoid a situation in which much or most data was discarded, we also a priori
established that no more than 25% of instrument-participant judgments would be eliminated and
that at least 20 judgments would be available for each instrument. If necessary, the r = .25
average correlation criterion would be weakened in order to satisfy either or both of these
conditions. However, neither of these retention conditions arose and so there was no need to
weaken the correlation criterion. In the end, 70 of 1,571 instrument judgments failed to achieve
the r =.25 average correlation, representing a total reduction of 4.5% of participant responses.
After this exclusion, a total of 1,501 judgments remained. Overall, the average inter-rater
correlation across all included participants and all instruments after exclusions was r =.50.
Results
Recall that participants were asked to rate two instruments but could opt to continue to rate as
many instruments as they wanted. Partial data in which a participant only completed the rating of
a single instrument was included. Seventy-nine (17%) participants rated only a single instrument;
148 (32%) participants rated two instruments; 165 (36%) rated between 3 and 5 instruments; 48
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                    26
(10%) rated between 6 and 10 instruments; 20 (4%) especially eager participants rated between
11 and 20 instruments.
Since collection of data for a given instrument was avoided if participants gave low
familiarity or vividness ratings, more data was consequently collected for those instruments that
are generally more familiar to the participant pool. Instruments received between 51 and 104
ratings for all 77 categories (for more detail, see Appendix C).
Table 5 identifies the instruments with the highest and lowest average means for each of
the 77 categories.
Table 6 reports the most highly rated descriptive category for each instrument along with the
corresponding average rating value. Ratings were made on a scale of 1 (“not”) to 7 (“very”).
                                         PCA Optimization
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               29
PCA is almost always used in an effort to create the most parsimonious model, identifying the
least number of components that are able to account for the greatest variance. However, in
increasing parsimony, the full richness of the phenomenon is diminished. It is evident from the
results of the interviews that when given the opportunity, people can be very effusive in their
descriptions of timbre qualia, using various linguistic approaches and a diverse vocabulary. This
creativity and complexity suggests that a relatively large number of components may need to be
retained in order to explain sufficient variance; however, the ideal model must balance
In PCA, the researchers are tasked with choosing the number of components to retain.
Selecting “too many” factors would mean that some of the factors have little or no utility or
relevance; selecting “too few” factors would mean that useful or relevant information has been
unwisely discarded. Given that the goals of the project include capturing as much of the semantic
richness of timbre perception as possible, one might argue that it is better to err on the side of
tool for testing hypotheses. Often, the number of components to be retained is based on one of a
number of possible rules of thumb, such as setting a cutoff based on some percent of the
cumulative variance explained, eigenvalues, and/or a scree plot, among other methods. These
common approaches are relatively informal, based primarily on their intuitive appeal and
practicality (Jolliffe, 2002). After examining the results and considering various common
heuristics for choosing a model, we determined that we were not satisfied by the models
suggested by these common rules of thumb. Thus, we began our model-building process instead
by exploring a wide range of PCA models (for full details on the PCA and interpretation process,
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                               30
see Appendix F). To assess sampling adequacy, the Kaiser, Mayer, Olkin (KMO) statistic was
calculated. The overall measure of sampling adequacy (MSA) of .95 suggested that PCA was
appropriate for the data. We began by looking at models containing 10, 20, 30, 40, 50, and 60
components. All potential models were rotated using the promax rotation to aid the interpretation
labels to components. To label each component, we relied on an automated routine that ordered
the terms loading on a component from strongest to weakest. Up to five of the most strongly-
loading terms were included in the interpretive label for a given component. In order to simplify
interpretation and to guarantee that the terms included in the names of the components were
judging the strengths of the loadings; we agreed that ± .65 offered both a reasonable and
only terms which loaded at greater than or equal to ± .65, and we included no more than five
As might be expected, in the case of 50- and 60-component models, a majority of the
resulting components simply echoed one of the original 77 categories. In the case of the 10-
component model, many of the terms that loaded strongly (i.e., greater than or equal to ±.65) on
the same component seemed excessively heterogeneous. For example, one of the components in
3
 Models using different rotations (promax, oblimin, varimax, simplimax, and quartimax) were produced using R.
These models were carefully compared in terms of their interpretability, and although they were highly similar, we
considered promax to have consistently produced somewhat simpler models. Our manual assessment was supported
by the MIC (mean item complexity) values for the models, which were slightly lower for promax rotations.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                            31
15, 20, 25, 30, 35, and 40 components and then continuing to narrow our exploration. The most
expected, the models within this range were highly similar. Serial neighbors were typically
distinguished by just one or two differences. Naturally, as the models grew in components, more
Common PCA practice has the experimenters choose a model and interpret the individual
components. In the current case, rather than leaving this task solely to the authors, one might
systematically comparing several models, each containing two or three dozen components,
would be excessively time-consuming. Moreover, in comparing models, one often sees aspects
of one model that are appealing while other aspects seem deficient in some way. Accordingly,
rather than having musicians compare complete models, we created a superset of all of the
components evident in those timbre models containing between 23 and 37 components and asked
musicians to assess the pertinence (defined as “relevance” and “usefulness”) of each individual
component. During these assessment tasks, participants made their ratings based on the
descriptive labels that had been previously assigned to each component using the method
described above, which contained up to five of the words that loaded strongly (greater than or
equal to ± .65) onto the component. Among the 59 components of the superset, several groups of
components exhibited very similar labels. For these groups of highly similar components, ten
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                              32
musician participants were recruited to identify which versions were most musically pertinent.
Their judgments reduced the number of components in the superset from 59 to 33.
components using a 100-point scale, where zero was defined as “not at all pertinent” and 100
was defined as “very pertinent.” The value 50 was explicitly defined as “moderately pertinent.”.
Participants also selected the single best descriptive term for each component. Finally, freeform
comments were also collected. Pertinence rating results can be found in the online supplemental
material, Appendix F.
As noted earlier, in creating a timbre model, our preference is to err on the side of
including too many descriptive components rather than too few. However, at the same time, it is
important that a model be practical for the purposes of possible future music analyses. In
instrument according to all of the dimensions or components in a timbre model. In the end, we
chose an average cut-off for pertinence ratings of 60, producing a model of 19 components—
which we suggest may offer a reasonable balance between inclusiveness and parsimony, given
our music analytic goal. As described below, we subsequently decided to include a twentieth
Participant Comments
In assessing the pertinence of individual components, we invited participant comments for each
term. In total, the participants provided 115 comments. Participants were notably dissatisfied
with the grouping of “watery/fluid,” with the terms “soft/smooth,” “singing/voice-like,” “sweet,”
and “gentle/calm.” At the same time, participants indicated that they thought “watery/fluid” was
also pertinent. Consequently, this category was broken into two separate dimensions in the final
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               33
model: “watery, fluid,” and “soft/smooth, singing/voice-like, gentle/calm,” bringing the number
Brightness
It should be noted that at this point, the model did not directly include the classic “brightness”
dimension that is a ubiquitous finding in virtually all studies of timbre. In general, the word
“bright” was the sixth-most common descriptive term used in our interviews, attesting to its
importance. It was also one of the 77 intermediate categories that were distilled in our PCA
analyses. The category “bright” loaded most strongly on the component that included the highly
loading on this dimension across a series of PCA models (.61–.63). Unfortunately, these values
were just below our a priori cut-off of ±.65. It is likely that “bright” did not break off into a
stronger independent category because of the high correlation between pitch height and
brightness. In the PCA models, “bright” generally shared most variance with the
sparkling/brilliant component while also exhibiting lower, but still considerable, negative
research, and because “bright” was a prominent descriptor in the interviews, we post-hoc
The final dimensions of the model do not reflect the results of a single PCA model, but rather are
the products of a process involving close examination of two dozen PCA models guided by the
input of a group of independent musicians. The first word in each dimension description (Table
8, column 1) is the word that was chosen as most pertinent by the most participants in the
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                 34
supplementary studies. The words that follow are the other categories that loaded strongly onto
In order to facilitate discussion of the timbre qualia dimensions of this model, we offer
shorthand labels for the dimensions, listed in the second column of Table 8. Each shorthand label
contains up to the first two terms listed in the full dimension description. Recall that many of the
descriptive terms during the component stage were compound, which we had indicated with a
slash (e.g., “rumbling/booming”). For the sake of brevity, only the first half of compound terms
were included in the shorthand label. When applicable, this was followed by the second term in
the full dimension description. When a shorthand label contained two terms, we joined these
terms together by using a slash to create a new compound term to serve as the shorthand label for
the dimension.
Discussion
Results from the open-ended transcriptions are consistent with previous research in timbre
linguistics suggesting that timbre linguistics are largely metaphorical (Saitis & Weinzierl, 2019;
Wallmark & Kendall, 2018). Our participants used all of the strategies described by Porcello
(2004), who also analyzed spoken language: spoken/sung imitations, lexical onomatopoetic
metaphors, pure metaphors, association, and evaluation. Wallmark (2019a) proposes seven
treatises, including Affect (emotional and aesthetic), Matter (physical features), Cross-modal
general, interview transcriptions in the current study exhibited responses that fall into these
categories. More specifically, we can consider the final 20-dimensional model in relation to
Wallmark’s conceptual categories. Because our final dimensions are composed of up to five
terms, dimensions with multiple terms may belong to multiple conceptual categories.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               36
While onomatopoeia was used frequently in interviews, these terms generally did not
make it through the dimension reduction process, largely because of the semantic variety of
onomatopoetic terms used. However, the word “buzzy,” which is part of the final dimension
nasal/reedy, does fall into Wallmark’s category of Onomatopoeia. Mimesis accounts for a few of
the terms, such as “nasal,” “reedy,” “singing,” and “voice-like.” Affective terms are also
relatively less common but include terms on the soft/singing and direct/loud dimensions. Action
terms occur with similar frequency and include “pinched,” “constrained,” and “open.” However,
the conceptual categories that are most common in our final dimension labels are Acoustics,
importance. Seven of the twenty dimensions can be categorized fully as Acoustic, and Acoustic
terms play roles in four other dimensions. A table of the dimension descriptions for the current
study and the conceptual categories into which they would likely be sorted on Wallmark’s
In a recent review of timbre semantics research, Saitis and Weinzierl (2019) write that the
most salient timbre dimensions have been found to be “brightness/sharpness (or luminance),
roughness/harshness (or texture), and fullness/richness (or mass)” (p. 135). Our 20-dimensional
model offers apparent parallels to these three dimensions: sparkling/brilliant maps onto
observation, however, relates to the variance explained by each of our twenty dimensions. As
listed in Appendix F, the dimensions are listed approximately by variance explained from
greatest to least (this ordering can only be approximate because the components ultimately were
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                                   37
drawn and chosen from the superset of components, rather than a single PCA model).4 However,
our model’s direct parallels to brightness and roughness are surprisingly low in terms of relative
variance explained.
three-dimensional PCA with the promax rotation; this model explains just 39% of the total
variance. The first component in the three-dimensional model from our data corresponds with the
first component of the 20-dimensional model, rumbling/low. The second component corresponds
generally with the second dimension of our final model, soft/singing, with the highest-loading
term, “beautiful,” loading at .83—at first glance, this dimension does not appear to be directly
mappable onto the brightness/roughness/fullness model. The third dimension includes top-
loading term “shrill/harsh/annoying” at .70; the next highest-loading terms are “piercing/sharp”
(.64), “noisy” (.64), and “buzzy” (.60). The presence of “piercing/sharp” suggests that this
dimension may be related to the conventional brightness/sharpness dimension, yet the top-
loading term includes the word “harsh,” suggesting it is instead parallel to the conventional
roughness/harshness dimension. Given all three components in the model, it seems logical to
harsh/rough and the third component to correspond to brightness/sharpness, despite the inclusion
of the word “harsh.” In sum, while our data can be recast as a three-dimensional model that
model, these correspondences are not completely straightforward, and the three-dimensional
4
  One exception to this is watery/fluid, which was separated from the soft/singing dimension at a late stage, based on
responses to the supplementary polls. Thus, the second and third dimensions, soft/singing and watery/fluid together
explain the second greatest amount of variance.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                38
If we can reduce our dimensionality further as illustrated above, and previous studies
have found that semantic timbre space can be, in the words of Saitis and Weinzierl (2019, p.
120), “adequately explained” using just two or three dimensions, why propose a model with 20
dimensions? Our response to this question comes from the observation that adequate explanation
is entirely relative. Depending on the goal, three dimensions may be sufficient. However, for
musical purposes, there are few situations in which three timbral dimensions would be adequate.
A music instructor or ensemble conductor who only had three semantic dimensions of timbre
available would not only find their vocabulary impoverished, but absolutely inadequate. For
some purposes of everyday musical life, even 20 dimensions may prove insufficient. We
emphasize that the amount of variance explained by some dimension does not directly determine
its artistic value. Since our motivating purpose for this study is to contribute to the development
of a descriptive language for music theoretical endeavors, the number of useful dimensions
remains an open question. Our experience as music theorists suggests to us that a useful
Two principal concerns come with such high dimensionality. First, is this unwieldy—are
20 dimensions simply too many to handle in future empirical work and in analysis? And second,
do we actually need this many—would fewer dimensions be sufficient? Ultimately, the answer to
both questions will be determined by practice. Our experience with the current study and other
ongoing studies making use of the 20-dimensional model suggests that participants do not have
difficulties working with the 20 dimensions in rating tasks. In terms of computation and analysis,
modern software is quite capable of working with high-dimensional data. We anticipate that
music theorists will be able to use the model for analysis via a computational program, currently
under development, that presents the resulting data to program users in easily interpretable ways,
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  39
including through visualizations. It may be the case that future work demonstrates that some of
our dimensions are less useful than others and could be removed from the model; however, it
would be easier to prune, rather than to add, dimensions, and so we prefer to err on the side of
In describing musical timbres, a pertinent question is why certain qualia descriptors arise
but not others. Why do listeners tend to characterize sounds as open but not free? Why
brassy/metallic but not plastic/rubbery? Why watery/fluid but not oily/sticky? In the ensuing
discussion, we propose potential acoustical correlates for many of our dimensions, based on
previous research in acoustics and timbre. While the discussion that follows at present remains
speculative, we anticipate that the ideas presented in this section will stimulate useful conjectures
Acoustic Attributes
A number of the 20 timbre qualia dimensions can be directly related to acoustical properties of
the sound generator. Notably, these include descriptions of modes of activation and the materials
Mode of Activation
Three of the dimensions relate to contrasting sound envelopes. Two represent the extreme
envelope possibilities, namely percussive and sustained/even. The former is associated with
sounds whose mode of activation is struck, hit, or dropped. The latter is associated with sounds
whose mode of activation is blown or rubbed. The third dimension, ringing/long decay, suggests
a sort of intermediate envelope category in which a struck sound generator exhibits low internal
These three dimensions are consistent with evidence from MDS studies suggesting that
attack time is a salient factor in how listeners make dissimilarity judgments, in particular when
the sounds that they are judging contain both sustained and impulsive sounds. Faure, McAdams,
& Nosulenko (1996) found that similar terms in French were correlates of attack time, including
“pas soufflé” (not blown), “pincé,” (plucked), and “attaque rapide” (fast attack). The current
results imply that musicians find not only the attack quality to be salient, but also consider the
Physical Material
The terms included in the brassy/metallic and woody dimensions suggest that the material of the
acoustic vibrator is important; however, as we will see, the mode of activation plays a critical
In the case of struck metal plates, Fletcher, Perrin, and Legge (1989) have drawn
attention to distinctive nonlinear acoustical features. Bending metal changes its stiffness, and so
striking a plate can provoke momentary nonlinearities that lead to telltale acoustic features,
notably dynamic frequency shifts. This is evident from the musical instruments rated highest on
“metallic,” which are generally struck instruments made of metal: cymbals (6.5), triangle (6.5),
and vibraphone (5.2). Similar nonlinear pitch shifts can be heard as the distinctive “twang” sound
that is produced when a loose metal string is plucked with a large displacement. Accordingly, the
banjo, which was rated most “twangy” (6.6), was also ranked as the fourth most “metallic”
instrument (4.0).
The problem with identifying material of construction from sounded air columns is
illustrated in the case of brass instruments. Brass instruments are technically “lip reed”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                                 41
instruments. They employ cup- or funnel-shaped mouthpieces that are activated by buzzing lips
Bowsher, 1982; Freour & Scavone, 2012). Regeneration is a form of feedback that leads to a
ringing dynamic filter effect. In the production of a single tone, the power of successive
harmonics increases over time. The overall result is a sort of “bwah” sound where successive
partials are emphasized in a quick upward and then slower downward frequency sweep.5
instruments. A number of manufacturers market plastic trumpets whose sounds are uncannily
indistinguishable from metal instruments. Furthermore, early “brass” instruments like the
In short, a “brassy” sound is not the result of an instrument’s material, but rather is a
mouthpiece interacting with a resonant tube (Myers et al., 2012). Since at least 1600, the vast
majority of lip-reed instruments have been made of brass, and it is this simple association that
explains the tendency for listeners to describe the characteristic regenerative acoustical
readily apparent in the case of the alto saxophone, which is a blown instrument made of brass,
yet receives low ratings for both “metallic” (3.2) and “brassy” (3.3).
5
  Brass instruments with long narrow bores (trumpet and trombone, but not tuba or flugelhorn) are capable of
producing an especially intense brassy sound that some performers refer to as “sizzle.” The narrow bore permits
high air pressures that have been shown to produce a shock wave within the instrument—generating distinctive
inharmonic partials (Hirschberg, Gilbert, Msallam, & Wijnands, 1996). These sounds are particularly associated
with a “brassy” character. It also explains why some wide-bore lip-reed instruments—like the tuba and didgeridoo—
are commonly described as sounding less brassy. In the case of the didgeridoo, the absence of a bell has the further
effect of reducing the intensity of the higher harmonics and so diminishes the potential brassy character.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  42
The top “woodiest” instruments were the wood block (6.5), bass clarinet (5.6), English
horn (5.4), and oboe (4.7). The wood block, a struck sound source, stands nearly a full point
higher on average than the next-woodiest instrument, which is a wind instrument, consistent with
the claim that struck sounds are better at conveying information about the physical material of
the source than are aerophones. As with brass instruments, woodwinds are also sometimes made
with plastic rather than wood. Manufacturers of top-of-the-line plastic oboes and English horns
continue to improve their imitation of the tone color of wood instruments; such plastic
instruments have recently started to be adopted by some professional players. As with the
instruments is largely based on the fact that woodwinds have traditionally been crafted from
wood.
By way of summary, timbral qualities that listeners associate with physical material are
better understood as arising from characteristic acoustical patterns that are only indirectly related
to the material of construction. Terms like “woody” and “brassy” might be regarded as
misnomers. Instead, the distinguishing features arise from phenomena like acoustical
regeneration or dynamic frequency shifts, and these features provide the basis for learned
associations for certain classes of sound generators. Moreover, these acoustical patterns are
strongly dependent on the mode of activation, so struck materials are more likely to exhibit
Intensity-related dimensions
The words “soft” and “smooth” are primarily related to touch, indicating generally pleasurable,
low-intensity tactile sensations. “Soft” is also a common synonym for “quiet.” “Soft,” “smooth,”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                              43
“gentle,” and “calm” suggest that a major feature of the soft/singing dimension is low intensity.
However, there is more to this dimension than simply a quieter dynamic, as is evident from the
terms “singing,” “voice-like,” and “sweet.” The soft/singing dimension appears to focus on an
especially light form of singing, based on the high loadings on terms such as “gentle” and
“calm.”
Singing-like sounds are presumably limited to pitches that fall within the range of human
voices. Hence instruments like the tuba and piccolo are less likely to produce sounds
exhibiting the highest “singing/voice-like” character, followed by the English horn and oboe.
These instruments fall in a higher pitch range, indicating that this dimension may be suggestive
of a female voice.
Wallmark (2014) discusses the relationship between timbre and the voice, reviewing
evidence for timbral vocality in relation to an embodied theory of timbre. He reports evidence
from a related neuroimaging study that suggests that motor regions of the brain involved in
component loadings on the soft/singing dimension of the current study, which include “singing”
and “voice-like,” are consistent with this finding, as the other terms included are generally
positively valenced (e.g. “sweet,” “gentle,” “calm”). Terms that also loaded onto this component
strongly but did not meet the criteria to be included with the dimension label are also quite
and “warm.”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                              44
pitched harmonic non-percussive sound with relatively steady or slowly evolving spectral and
dynamic features.
The direct/loud dimension implies more than simply a high intensity level—as evident with the
Loud sounds arise from more energetic sound sources, so they are more “powerful” in the literal
sense of physical energy. Loud sounds are less susceptible to auditory masking, so they are also
more noticeable or salient. That is, they are capable of commanding greater attention; they are
In ethology, loud sounds are associated with aggression and alarm (Morton, 1977, 1994).
beyond intensity. Whereas “sweet,” “gentle,” and “calm” load onto the soft/singing dimension,
Hence, the contrast between soft/singing and direct/loud is not merely one of intensity,
but also of sweet versus aggressive, soft versus powerful, singing versus assertive, and gentle
versus commanding. Notice that the emphasis on the terms “singing” and “voice-like” in the
soft/singing dimension suggests a more human, prosocial, or friendly quality that is absent from
One might ask why soft/singing and direct/loud did not collapse into a single dimension.
If these dimensions simply represented opposite levels of intensity, then a single dimension
would surely suffice. We might speculate that it is the affective connotations (aggression versus
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                    45
gentleness, sweet singing versus commanding assertion, etc.) that account for the independence
Frequency-related dimensions
Large masses or large volumes vibrate at a lower frequency than small masses or small volumes.
Low frequency sounds are therefore readily associated with larger objects (Hinton, Nichols, &
Ohala, 1994)—consistent with the terms “fat,” “heavy,” and “thick.” Moreover, the term “low”
Although “deep” can evoke associations such as profound, heartfelt, and rapturous,
dictionary definitions first associate “deep” with synonyms like cavernous, gaping, or huge—
physical properties that are consistent with large volumes, and so associated with low
frequencies. For example, we generally expect someone with a “deep” voice to be a larger
person.
Finally, a characteristic of very low frequencies is that they are generally devoid of pitch.
For extremely low frequencies, individual cycles of vibration may be evident, such as when a
passing truck creates a trembling vibration. An instrument like the bass drum is quite capable of
producing such low frequencies. Such pitchless sounds may be described as “rumbling.” In short,
rumbling/low appears to represent a coherent qualia dimension strongly linked to low frequency
sounds.
Notice that all four descriptors in the sparkling/brilliant dimension originate as visual descriptors
rather than auditory descriptors. Among timbre researchers, brightness has been especially
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  46
valued since it is strongly correlated with spectral centroid, an easily calculated acoustical
measure. “Brilliant” and “bright” also imply an energy-related component, suggesting something
high and loud, as perhaps in the sound of a piccolo or piccolo trumpet. On the other hand,
synonyms including effervescent and animated; definitions for “shimmer” offer synonyms such
as twinkle, glimmer, and glitter. In vision, these terms are associated with the presence of small
points of intense light that are evident in reflected cut glass, gems, or the twinkling of stars. Due
Notice that it is difficult to strike a triangle without the instrument swinging or rotating.
This movement causes phase shifting that adds a dynamic aspect to the sound that might be
likened to twinkling or shimmering effects in vision. In the auditory domain, sparkling and
shimmering sounds may not simply be higher in frequency, but also perhaps involve pointillistic
What makes listeners judge a sound as pure/clear? The instrument rated most pure was the harp,
while the instrument rated most clear was the piccolo. In both cases, the kazoo was rated the
polar opposite. Dictionary synonyms for “pure” emphasize the concept of something being
There are several possible acoustical factors that might account for the descriptions of
sounds as pure or clean. Acoustically, “clean” might be interpreted as referring to sounds with a
high signal-to-noise ratio. For instruments that produce pitched sounds, it is possible to
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               47
distinguish noise components (like bow noise or breathiness) from periodic components. Such an
analysis can be made using the tone-to-noise ratio (TNR) (Sottek, Kamp, & Fiebig, 2013; Sottek,
Apart from the ratio of periodic-to-noise components, another factor that might account
for the pure/clear dimension is the degree to which the periodic components of a tone conform to
commonly measured by the harmonics-to-noise ratio (HNR) (Fernandes, et al., 2018; Wayland,
Gargash, & Longman, 1995). The HNR can be calculated by comparing the aggregate energy of
all partials whose frequencies conform to the harmonic series with the aggregate energy for all
Yet another factor that might contribute to the pure/clear quale is Terhardt’s concept of
pitch weight or “toneness” (Huron, 2016; Parncutt, 1989; Terhardt, Stoll, & Seewan, 1982a,
1982b). Even for sounds that are highly harmonic, the clarity of evoked pitches is known to be
influenced by pitch height. Especially high or low tones are associated with weak or vague pitch.
When asked which sounds exhibit the clearest pitch, listeners identify complex tones in the
region between E2 and G5—a region centered near middle C and spanning the combined range
Recall that the instruments rated most and least “pure” were the harp and the kazoo
respectively. Similarly, the instruments rated most and least “clear” were the piccolo and (again)
the kazoo. In contrast to the kazoo, both the harp and piccolo produce strictly harmonic complex
tones with low noise components, suggesting that tone-to-noise ratio and harmonicity may be
pertinent factors. However, the high average rating on “clear” for the high-pitched piccolo
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  48
suggests that pitch weight or toneness may not necessarily be critical to the pure/clear qualia
dimension.
Finally, we might consider the term “precise,” which also loaded highly on the pure/clear
dimension. The instrument rated most “precise/clean” was the wood block. Yet again, the kazoo
was rated the polar opposite. Although the wood block involves inharmonic partials, it
nevertheless provides a clear pitch—especially compared with other struck instruments. What
the harp and wood block share in common is abrupt onsets, which might contribute to a sense of
precision.
In summary, pure/clear timbres might be associated with sounds that have a high tone-to-
noise ratio, high harmonicity, and percussive onsets capable of conveying a sense of precision.
Focused/Compact.
Focused/compact may similarly be associated with sounds exhibiting high harmonicity. That is,
one may suppose that a focused/compact sound would include few or no inharmonic partials with
little or no noise components. A single plucked guitar string or oboe tone might qualify as highly
focused/compact, whereas a kazoo or rattle would exhibit low ratings focused/compact. The term
“compact” might also imply sounds produced by smaller sound sources, such as the contrast
between a piccolo and a bass drum, or narrower bores, such as the contrast between a trumpet
and a tuba.
Airy/breathy.
The component terms that load onto the dimension airy/breathy refer to both a blown mode of
activation and to accompanying unpitched noise components arising from gross air movement.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  49
In jazz styles, singers commonly perform with a notable degree of breathiness, which tends to
convey a sense of proximity or intimacy to the sound. This same breathiness is commonly
audible in the performance of wind instruments played in a jazz style, like the flute, trumpet, or
flugel horn.
In speech and singing, breathiness arises when the vocal folds do not exhibit full closure
or adduction (Grillo & Verdolini, 2008). In brass instruments, breathiness similarly arises when
the vibrating lips of the performer are relaxed so as to avoid full closure. For instruments like the
saxophone, clarinet, oboe, and bassoon, breathiness can be produced by directing a proportion of
ratio (HNR). Recall that this is calculated by comparing the aggregate energy of all partials
whose frequencies conform to the harmonic series with the aggregate energy for all other
Both “shrill” and “noisy” suggest an association with high energy, while the term “shrill” further
suggests an association with high frequency. When activated by high energy, many vibrators
produce non-linear (chaotic) oscillations leading to inharmonic partials or noise bands, which
characteristic of vocal production for animals experiencing stress (Blumstein, Davitian, & Kaye,
2010), an observation that may be pertinent to the potential source of this dimension.
In a discussion of noisy timbres, Wallmark (2014) notes that timbral noise does not have
the relative smoothness/spikiness of a signal, as two of the likely correlates. Such a multiplicity
of sources of noise may be related to the fact that multiple dimensions in our final model seem to
pure/clear, focused/compact, and airy/breathy, along with raspy/grainy (discussed in the section
below). Wallmark considers “noisy” timbres to have components of brightness, noise, and
roughness, each markers of physical exertion; “noisy” timbres parallel the embodied experiences
of anger and fear and are accordingly consistent with negative appraisal. Of our potentially
noise-related dimensions in the current model, shrill/noisy is certainly the most negatively-
valenced and high-arousal: the dimension also includes the term “harsh,” and “annoying” was
also a related term. As noted earlier, pure/clear carries a positive connotation, and for musicians,
kind of low-arousal noise in a sound that is related to intimacy and physical closeness rather than
interpretation of noise that is distinct from Wallmark’s “noisy” timbre, even though airy/breathy
timbres may share some of the same acoustic properties as noisy ones.
Synonyms for “raspy” include scraping, grating, and grinding; synonyms for “guttural” include
throaty, husky, and gruff. “Grainy” and “gravelly” are linked to fragmentary or gritty particles.
Raspy/grainy appears to provide a contrast with pure/clean and a strong relationship with the
concept of timbral roughness, which has featured prominently in recent timbre research (e.g.
Both “grainy” and “gravelly” imply some sort of rapid amplitude modulation that
interrupts the sound, producing a more pointillistic, quick series of sounds. Bowed instruments
are quite capable of producing a highly rough sound when a bow is slowly and forcefully drawn
across a string. The resulting sound might well be characterized as raspy or gravelly. Bowed
strings produce sound according to the so-called “stick-slip principle,” where the bow
momentarily grabs and displaces the string until the restoring pressure releases the string
(Casado, 2017). In regular bowing, this stick-slip cycle occurs hundreds of times per second.
However, with forceful bowing, the cycle is greatly slowed and so the sound descends into the
frequency region of roughness. Notice that this slow stick-slip mechanism characterizes all forms
of scraping. This mode of sound activation is consistent with common synonyms for “rasp,”
including scraping, grating, grinding, and scratching. In the case of the kazoo, the grainy or raspy
quality is likely a consequence of the rapid amplitude modulations associated with the vibrating
tissue paper.
The raspy/grainy qualia dimension also implies links to a number of acoustic phenomena
Poyatos (1991) catalogued 40 unusual vocal qualities from adenoidal voice to velarized voice.
However, raspy and grainy are not among the voice qualities that he identified. Nevertheless,
Poyatos includes voice qualities that are commonly regarded as synonyms, especially among
linguists. These include creaky voice or vocal fry (Keating, Garellek, & Kreiman, 2015).
sounds that are mostly associated with Semitic languages like Arabic, Assyrian, and Hebrew.
Guttural consonants involve constrictions or closures in the lower vocal tract, such as at the root
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               52
of the tongue, with the velum, or via the glottis. These consonants combine low frequency
resonances with temporal interruptions, such as the glottal stop (Goldstein, 1994).
associated with temporal roughness and/or low resonance associated with guttural phonemes.
Hollow.
Several qualia dimensions in the final model are suggestive of unique resonant features. The
dimension hollow is a notably clear example. A space is described as “hollow” when it is empty.
Spaces that are occupied with various objects lead to high acoustic dispersion and greater energy
between hollow and non-hollow cavities is filter Q. In general, low energy absorption produces
resonances whose filter shapes have steep slopes (high Q). By contrast, high energy absorption is
associated with less steep filter slopes (low Q) (Pyzdek, 2015; Chowning, 1973).
For very large spaces, the main effect of a “hollow” (or empty) environment is audible
echoes, but for small spaces (like a Chinese wood block), the effect will be high filter Q.
Evidently, listeners hear the high Q as symptomatic of an unoccupied space and so tend to
The timbre of the clarinet sound, which is sometimes described as “hollow,” has been
linked to a concentration of spectral energy near odd harmonics, or the odd-to-even harmonic
energy ratio (Caetano et al., 2019). While the B-flat clarinet was not rated in the current study,
the bass clarinet was ranked fourth on hollow with a rating of 3.71.
Muted/Veiled.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                53
The dimension muted/veiled similarly suggests a frequency-domain filtering effect—in this case,
the effect of a low-pass filter. Merely placing your hand in front of your mouth while speaking
will have the effect of attenuating high frequency partials, producing a “veiled” sonic
impression. The mutes used in brass and string instruments reduce the overall intensity of the
sound, but also attenuate the low frequency harmonics. Instrument mutes also tend to introduce a
resonance, often in the region of 1-3 kHz, corresponding to the region of many speech
Open.
The term open is commonly used in the musical training of both vocalists and instrumentalists.
phonetics, the term “open” has a quite specific meaning. It is used to refer to any articulation in
which the mouth opening is wide—as when the chin is lowered. Slawson (1981) offers a more
general interpretation, suggesting that openness refers to those sounds produced by any relatively
wide tube or a tube that has no significant narrowing. Slawson notes that non-open sounds are
produced by tubes that have at least one cross-sectional narrowing. In speech, the vowel [u] (as
in “food”) involves narrowing between the lips; the vowel [i] (as in “feed”) involves narrowing
between the tongue and the hard palette; open vowels, like [aw], [a], and [ae] exhibit no
significant narrowings (Slawson, 1981; 135–136). Acoustically, Traunmüller (1981) found that
open speech sounds exhibit a higher first formant (F1) relative to the fundamental frequency
(F0). That is, openness is apparent when there is a relatively large gap between the pitch of a
It is open to debate whether what our participants meant by “open” is the same as what
speech researchers mean by open. However, wind instrumentalists are often taught to mimic
open vowels sounds as they play, which suggests a potential relationship between open vowels
and timbral openness. Furthermore, the relationship of the term “open” as applied to timbre and
vocality has been discussed in previous research. Traube, in a study of classical guitar timbre
(2004), notes that words such as “open” seem to refer to phonetic gestures. She details
similarities between speech and timbre perception, proposing a phonetic mode of timbre
attacks and releases, respectively, of guitar tones, provided support for this phonetic mode of
timbre perception.
Resonant/vibrant.
The term “resonant” has a specific acoustic interpretation that may or may not accurately reflect
the meaning of this word as used by our participants. In acoustics, a vibrator is said to be
resonant in situations of especially efficient acoustic transduction: that is, where a small amount
of input mechanical energy generates a large amount of sound energy. This can occur, for
example, when a sound is produced in a highly reverberant room. It also occurs when a sound
continues long after being imparted with some mechanical energy, such as a bell continuing to
sound well after it has been struck. Due to the efficiency of the sound production, resonant
sounds are typically also loud. The joint qualifier “vibrant” seems to add an intentional,
deliberate, or willful character to the dimension. Dictionary definitions of “vibrant” include such
The nasal/reedy dimension is also characterized by the terms as pinched, buzzy, and constrained.
In speech, most vowels and consonants are produced with tight velopharyngeal closure, meaning
that there is little or no airflow through the nose. In many languages, exceptions are found in the
nasal consonants: [m], [n], and [ŋ]. Of these sounds, the m sound exhibits a “humming” quality,
whereas the [n] and [ŋ] sounds are more clearly “nasal” sounding. The [m] sound is produced
with closed lips, and thus sound is emitted only through the nose. (Ironically, this is the least
nasal sounding of the nasal consonants.) For [n] and [ŋ], the resulting sound issues from both the
Two identical (or near identical) sound sources lead to a pattern of constructive and
Although the acoustics is not clear, it may be that the nasality associated with the oboe, bassoon,
and bagpipes arises from the use of double reeds, where two coupled vibrators produce a comb
spectrum. This speculative account might also explain why single-reed instruments such as the
Especially when played at high intensity, muted brass instruments can also produce a
raspy, piercing, or nasal quality (Smith, 1980). As noted earlier, mutes act as high-pass filters,
attenuating low frequency components and effectively creating a brighter sound (Backus, 1976).
More importantly, mutes produce a series of prominent minima (Causse & Sluchin, 1982). That
is, mutes superimpose a comb-like spectrum akin to those characteristic of “nasal” sounds. The
acoustical effect is stronger for metal mutes than for cardboard mutes due to the higher reflection
frequency content. The high-pass effect of mutes probably contributes to this pinched or
constrained quality. Compared with other instruments, the oboe exhibits relatively intense higher
partials—a high spectral centroid that likely contributes to the pinched sound. The choice of the
bass drum as the polar opposite instrument reinforces the suggestion that the absence of low
the acoustics of reed instruments remains obscure. Much of the difficulty lies in the nonlinear
behavior of the reed as well as the complex geometry of single-reed mouthpieces (Almeida,
George, Smith, & Wolfe, 2013; Chatziioannou & van Walstijn, 2012; Dalmont, Gilbert, &
Watery/Fluid.
Of all of the qualia dimensions in the 20-dimensional model, watery/fluid would seem to be the
most metaphorical. The term “watery” seems quite concrete—implying a liquid, wet, or
potentially teary state. Yet, with the exception of the orchestral “bird whistle,” no common
musical instrument makes use of water or fluid in its sound production. Instead, “watery” is more
likely to evoke the image of a river or stream. Flowing water is often quite quiet; at the same
time, waterfalls, fountains, or streams generate characteristic sounds arising from the frequent
bursting of bubbles.
The instrument with the highest average watery/fluid rating is the harp. Harp arpeggios
and glissandi, in particular, seem most acoustically analogous to the sequential percussive
bursting of bubbles. Although the solo harp literature is relatively small, the repertoire curiously
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                 57
includes disproportionately many water-themed works. A sample includes Samuel Pratt’s Little
Fountain, John Charles Thomas’s Echoes of a Waterfall, Anthony Sidney’s From a Chinese
Waterfall, Jean-Michel Damase’s Pluie [Rain], Charles Oberthür’s Au bord de la mer [By the
Sea], Ian Hovhaness’s The World Beneath the Sea, and many others.
In comparison to the word “watery,” the term “fluid” is perhaps more abstract.
Definitions for “fluid” include synonyms such as flowing, flexible, effortless, graceful, and
elegant. In the case of music, the term fluid seems more closely tied to performance manner
rather than some static spectral feature per se. A fluid performance is likely fast-paced, yet
In the above discussion, we have focused on possible acoustic origins of the timbre descriptors in
our final model. However, it should also be acknowledged that cultural and historical factors
have likely contributed to shaping the timbre vocabulary observed throughout these studies,
especially those terms that are not predominantly acoustic. In particular, such cultural factors
likely have a strong influence on the ways in which specific instruments are characterized. For
example, the harp, which has a strong Western cultural association with the image of an angel
playing atop a fluffy cloud, may be characterized as pure/clear both because of the harmonicity
of the sound itself, but also because of the association with the purity of heaven and angels. As is
often the case with understanding the development of meaning, however, we might ask to what
extent the harmonicity of the sound itself supported and encouraged the development of such an
association. Furthermore, while the harp is rated as most “pure” in our study, several other
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                             58
instruments rated nearly as highly on “pure” do not have such explicitly ethereal connotations,
In the vocabulary of the final model writ large, we can identify potential cultural
overtones, with the inclusion of terms such as “sweet,” “gentle,” and “calm.” On the other hand,
the direct/loud dimension resonates with Western stereotypes of masculinity, including terms
previously mentioned, the pure/clear descriptions may be related through association to an ideal
of moral purity, as evidenced by the symbolic import of the harp in Christian cultural history.
The repertoire of water-themed works for harp suggests a cultural association between the harp
and water. However, it is also notable that relatively few of the final dimensions carry obvious
cultural overtones: the majority of terms, like ringing/long decay and resonant/vibrant seem to
On the other hand, the earlier collection of 77 categories derived from the interviews
contains more terms likely to be reflective of cultural values and/or expressions, such as
suggesting instrument timbres that evoke a certain culture or landscape. We can imagine that
these qualia would not be evoked for individuals from other cultures.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                59
Both in the context of the current study and more generally, more research is needed to
develop an understanding of which aspects of timbre semantics are more or less culturally
Whence Timbre
Where does the experience of timbre come from? What is the function of timbre? In addressing
this question, a number of possible accounts may be entertained. For example, one or more
timbre categories or dimensions might reflect innate evolved biological functions, represent
social constructions that arise through enculturation, and/or emerge from individual perceptual
useful sensory experiences. Localization provides crucial information about the position of sound
sources, including potential predator and prey (Heffner & Heffner, 1992). Loudness conveys
useful information regarding energy, power, and proximity of sound sources (Huron, Kinney, &
Precoda, 2006). Pitch is an integral aspect of parsing multi-source scenes into distinct acoustical
sources (Bregman, 1994). Plausible arguments might be advanced in support of the idea that
these three auditory phenomena represent adaptive traits that arose through evolution by natural
selection.
Might the timbre categories or dimensions encountered in the current study be susceptible
candidates. For example, “rumbling,” “low,” “shrill,” or “noisy” qualities might be linked to
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                60
some survival function, as they can provide ethological cues as to the size and energy of a
confer some adaptive value such as alerting a listener to possible infectious disease; indeed, we
can often tell by a person’s voice whether they have a cold or sinus infection. However, if the
recognition of nasal timbres was intended to help avoid potential disease, one might expect
On the one hand, timbre-based sound source identification is plausibly important for
survival, helping to identify threats and friendly overtures. However, most of the dimensions in
our model do not appear to have an obvious evolutionary rationale. What is the survival value,
for example, of the ability to distinguish the sounds of wood and metal when they are struck?
Qualities like reedy, brassy, metallic, woody, percussive, ringing, sparkling, brilliant, resonant,
vibrant, open, focused, compact, and hollow seem poorly linked to survival. Most of the qualia
categories encountered in this study seem far from having any biologically necessary reason for
their existence.
In the case of social constructions, there are a number of timbre descriptions that have
clear cultural links. Earlier, we mentioned the association of the harp with purity as a possible
consequence of the cultural association of harps with celestial angels and a heavenly afterlife in
Western culture. Other examples include the association of the flute with pastoral settings, the
association of the trumpet with royalty or nobility, the association of the horn with hunting, or
the military connotations of the snare drum. While our interview participants did indeed mention
such associations in the open-ended descriptive task, and some of these associations were
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                 61
included in the 77 initial timbre qualia categories, they were not sufficiently robust for inclusion
acoustic features. For example, vibrating metal plates generate different spectral patterns
depending on whether the plate is round, triangular, or rectangular. Kunkler-Peck and Turvey
(2000) showed that listeners can discriminate among these different geometries by sound alone.
However, no one has suggested how distinguishing round, triangular, or rectangular steel plates
might enhance survival. When contrasted with listener inability to discriminate metal, wood, or
plastic wind instruments, the research suggests that timbre categories arise not from specific
This “generalized learning” account gains credence when one considers research in the
field of ecological acoustics. A telling example is evident in the work of Li, Logan, and Pastore
(1991) on walking sounds. Li et al. observed that listeners are generally skilled in deciphering
whether a person is male or female on the basis of the sound of their footsteps. Using
standardized footwear, they showed that the key indicator is the time delay between the sound of
the heel contact and the sound of the sole slap. For a given gait, male walkers tend to have long
time delays between these two sounds—a consequence of the fact that men generally exhibit
disproportionately longer feet than women. Although the motivation to decipher sex may be
innate, the specific mechanism discovered by Li et al. cannot be innate since the key acoustic
feature identified in their work can be heard only when individuals wear shoes and walk on hard
surfaces—conditions that would not have existed in the long period of human adaptiveness.
Experience has a demonstrable effect on timbre perception. For example, some listeners
might not be able to recognize the sound of an English horn as distinct from an oboe. But
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  62
professional oboe players not only make this distinction, they are readily able to distinguish
American and European oboe timbres and can detect subtle differences in timbre that result from
miniscule adjustments to a reed. As the ability to make and describe finer discriminations of
timbre appears to be related to exposure, more exposure and close attention to a certain range of
timbres might lead to the development of a more refined vocabulary for those timbres. We would
thus expect those individuals who have the greatest experience with a range of musical sounds to
While we have proposed several plausible connections between timbre qualia and
acoustical features, such connections are currently largely theoretical, and future research is
called for to provide empirical evidence supporting such connections. However, if our account of
the timbre qualia dimensions and their connections to acoustical features is accurate, it is striking
that the dimensions that emerged in this study seem to have emerged from specific acoustical
patterns such as comb spectra, frequency-shifts due to changing stiffness, high filter Q,
regeneration, envelope features, or other acoustical patterns that permit general learning across a
class of sound sources. In short, the dimensions that emerge in this study are less suggestive of
evolved adaptations or socially constructed categories, and more suggestive of a general capacity
for auditory learning based on acoustical patterns encountered in the sonic environment.
highly stable over hundreds of thousands of years, natural selection can favor the development of
innate sensory and perceptual mechanisms that aid survival. However, when an environment is
highly variable, survival is better ensured through an individual’s capacity to learn rather than
innate or fixed response behaviors. In variable conditions, learning itself becomes a powerful
adaptive trait. Timbre has long been recognized as the “grab-bag” of auditory phenomena
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                63
(Bregman, 1990; Siedenburg & McAdams, 2017). If, as suggested here, timbre qualia traits tend
to reflect the idiosyncrasies of real-world acoustical features, then the apparent haphazard
It is possible that the plasticity of human timbre perception originates in our capacity for
learning speech sounds. Each generation must learn spoken language anew—a process requiring
timbre discrimination. Since the sounds used in language can vary considerably, the human
auditory system may be optimized for learning whatever acoustical features are salient in some
environment.
If timbre categories are learned from experience, there may be no fixed or optimum
Depending on the context in which a timbre model is being applied, one may want either a more
refined or more broad timbre model. The 77 categories arising from our initial content analysis
offer a finer level of detail and explain more variance than the 20 categories in our distilled
model. On the other hand, an MDS study that distills timbre to two dimensions of “brightness”
and “percussiveness” offers an even more streamlined classification scheme than a five-
Conclusion
The research described in this project was motivated by the question: what are the
phenomenological experiences associated with different musical timbres? Our aim was to
construct a model of the cognitive linguistic dimensions of Western musical instrument timbre
qualia that might ultimately prove useful in musical analysis, especially the analysis of classical
instrumental music. The motivating question was approached via two principal studies. In the
first study, open-ended interviews were conducted with musicians who were asked to describe a
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                64
variety of instrument sounds. The interview transcripts were subject to content analysis, leading
rated how appropriate each of the 77 descriptive categories was for characterizing a variety of
Western instrument sounds. Principal component analyses were then conducted on these results
in order to reduce the number of dimensions by collapsing those categories exhibiting high
shared variance. In two supplementary studies, musician participants rated the relevance of
various components. The results of several PCA analyses and the results of the follow-up rating
In examining the 20 dimensions of the final model, we identified possible sources for the
qualia characterizations. In considering possible origins for the qualia dimensions, it was noted
that many of the dimensions can be related to distinctive acoustic patterns, suggesting that timbre
categories may originate in the availability of discriminable acoustic features rather than from
when considering this research. First, all of the research was conducted in English. Possible
pertinent descriptive categories common in languages other than English may not be represented.
Secondly, the research relied entirely on musicians and musically sophisticated listeners, whose
descriptions and experiences may not be representative of the general listening population. A
further caveat is that the research relied exclusively on imagined rather than heard sounds and
was focused on the description of prototypical rather than specific or instantiated sounds.
Consequently, our results describe cognitive representations of timbre, which may or may not be
applicable to the perceptual dimensions of timbre. More research is needed to determine how
these cognitive dimensions, derived from imagined sounds, relate to the perception of heard
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  65
timbre. Moreover, the imagined sounds were limited to a subset of 20, mostly Western classical
instruments, and not all instruments were equally familiar to the participants. Next, given the
open-ended nature of the descriptive task, participants used descriptive terms that may relate to
pitch and loudness, which have traditionally been definitionally separate from timbre. In fact,
two of the final dimensions, including direct/loud and rumbling/low include loudness- and pitch-
related terms, respectively. We chose not to eliminate such terms from our analyses, as they
represented features of the imagined sounds that were important to participants. However, this
feature of our model is important to keep in mind when comparing our results with other studies
that may take a stricter definition of timbre. During the initial rating task (Study 2), participants
were asked to rate each of their imagined instrument sounds on 77 dimensions; the length of this
list has potential negative implications for participant attention and concentration, and it is
possible that participants may have lost track of which instrument they were rating. Finally, it
must be acknowledged that some of the methodologies used, namely the content analysis and
PCA analyses, involved interpretive decisions by the researchers and are thus open to researcher
bias.
The results of the initial interviews confirm that the ways in which we describe timbre are
complex and rich, with musicians in the study making creative use not only of adjectives, but of
practicality, but it is possible that in some contexts, the intermediate 77 categories resulting from
the initial content analysis may prove more useful. For example, a single timbre descriptor (such
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  66
as “ping/ding/ting”) may exhibit little utility for distinguishing certain instruments, such as
distinguishing the oboe from the trombone, but nevertheless have value in discriminating just
one or two sounds from many others, such as distinguishing the triangle from all other
instruments. Though evident in the interview data and the 77-category model, these useful but
narrowly specific descriptive categories are largely lost in the process of dimensionality
MDS is highly useful when the number of dimensions is small. However, when the
number of dimensions is high, MDS becomes impractical due to the high number of required
paired comparisons. Nevertheless, MDS may prove useful in confirmatory studies of models
involving a high number of dimensions. A high-dimensional model may itself be used to select a
dramatically reduced set of paired comparisons, increasing the tractability of gathering the
perceptual data, while still allowing the potential for a high number of dimensions to emerge in
an MDS solution.
This paper has emphasized a cognitive linguistic approach to timbre by examining the
natural language of musicians’ real-time, conversational descriptions of timbre qualia. Data from
the rating study demonstrate consistencies in how timbre language is applied to the sounds of
musical instruments. These results imply the existence of a large number of timbre categories,
suggesting that future timbre research may need to consider further creative methodological
approaches that allow for models entailing many more qualities or dimensions. Finally, the
current study sets the stage for future research aimed at integrating linguistic characterizations of
References
Almeida, A., George, D., Smith, J., & Wolfe, J. (2013). The clarinet: how blowing pressure, lip
       force, lip position and reed “hardness” affect pitch, sound level, and spectrum. Journal of
       the Acoustical Society of America, 134(3), 2247–2255.
Arthur, C. (2006). When the Leading Tone Doesn’t Lead: Musical Qualia in Context. Doctoral
       dissertation, The Ohio State University.
Backus, J. (1976). Input impedance curves for the brass instruments. The Journal of the
       Acoustical Society of America, 60(2), 470–480.
Blumstein, D. T., Davitian, R., & Kaye, P. D. (2010). Do film soundtracks contain nonlinear
       analogues to influence emotion? Biology Letters, 6(6), 751–754.
Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. MIT
       press.
Caetano M., Saitis C., Siedenburg K. (2019) Audio Content Descriptors of Timbre. In:
       Siedenburg K., Saitis C., McAdams S., Popper A., Fay R. (eds) Timbre: Acoustics,
       Perception, and Cognition. Springer Handbook of Auditory Research, vol 69. Springer
       International Publishing.
Casado, S. (2017). Studying friction while playing the violin: exploring the stick–slip
       phenomenon. Beilstein Journal of Nanotechnology, 8(1), 159-166.
Chatziioannou, V., & van Walstijn, M. (2012). Estimation of clarinet reed parameters by inverse
       modelling. Acta Acustica united with Acustica, 98(4), 629–639.
Dalmont, J.P., Gilbert, J., & Ollivier, S. (2003). Nonlinear characteristics of single-reed
       instruments: Quasistatic volume flow and reed opening measurements. Journal of the
       Acoustical Society of America, 114(4), 2253–2262.
de Munck, V. C. (2009). Research design and methods for studying cultures. Rowman Altamira.
Elliott, S. J., & Bowsher, J. M. (1982). Regeneration in brass wind instruments. Journal of Sound
       and Vibration, 83(2), 181–217.
Elliott, T. M., Hamilton, L. S., & Theunissen, F. E. (2013). Acoustic structure of the five
       perceptual dimensions of timbre in orchestral instrument tones. The Journal of the
       Acoustical Society of America, 133(1), 389–404. https://doi.org/10.1121/1.4770244
Fabre, B., Gilbert, J., Hirschberg, A., & Pelorson, X. (2012). Aeroacoustics of musical
       instruments. Annual Review of Fluid Mechanics, 44, 1–25.
Faure, A., McAdams, S, & Nosulenko, V. (1996). Verbal correlates of perceptual dimensions of
       timbre. In 4th International Conference on Music Perception and Cognition, Montréal,
       Canada (pp. 79–84).
Fernandes, J., Teixeira, F., Guedes, V., Junior, A., & Teixeira, J. P. (2018). Harmonic to Noise
       Ratio Measurement-Selection of Window and Length. Procedia Computer Science, 138,
       280–285.
Fletcher, N. H., Perrin, R., & Legge, K. A. (1989). Nonlinearity and chaos in
       acoustics. Acoustics Australia, 18(1), 9–13.
Freour, V., & Scavone, G. (2012). Investigation of the effect of upstream airways impedance on
       regeneration of lip oscillations in trombone performance. Proceedings of the Acoustics
       2012 Nantes Conference. Nantes, France.
Goldstein, L. (1994). Possible articulatory bases for the class of guttural consonants.
       Phonological structure and phonetic form. Papers in Laboratory Phonology, 3, 234–241.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               69
Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. The Journal of the
       Acoustical Society of America, 61(5), 1270–1277.
Grillo E.U., Verdolini K. (2008). Evidence for distinguishing pressed, normal, resonant, and
       breathy voice qualities by laryngeal resistance and vocal efficiency in vocally trained
       subjects. J Voice, 22, 546–552.
Hailstone, J. C., Omar, R., Henley, S. M., Frost, C., Kenward, M. G., & Warren, J. D. (2009). It's
       not what you play, it's how you play it: Timbre affects perception of emotion in music.
       The Quarterly Journal of Experimental Psychology, 62(11), 2141–2155.
Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural
       correlates of perceived and imagined musical timbre. Neuropsychologia, 42(9), 1281–
       1292. https://doi.org/10.1016/j.neuropsychologia.2003.12.017
Heffner, R. S., & Heffner, H. E. (1992). Evolution of sound localization in mammals. In The
       evolutionary biology of hearing (pp. 691–715). Springer, New York, NY.
Hinton, L., Nichols, J., & Ohala, J. J. (Eds.) (1994). Sound symbolism. Cambridge University
       Press.
Hirschberg, A., Gilbert, J., Msallam, R., & Wijnands, A. P. J. (1996). Shock waves in
       trombones. The Journal of the Acoustical Society of America, 99(3), 1754–1758.
Huron, D. (2016). Voice leading: The science behind a musical art. MIT Press.
Jolliffe, I.T. (2002). Principal Components Analysis, Second Edition. Springer: New York.
Keating, P.A., Garellek, M., & Kreiman, J. (2015). Acoustic properties of different kinds of
       creaky voice. Proceedings of the 18th International Congress of Phonetic Sciences.
       Glasgow, UK: The Scottish Consortium for ICPhS 2015, University of Glasgow.
Kendall, R. A., & Carterette, E. C. (1991). Perceptual scaling of simultaneous wind instrument
       timbres. Music Perception: An Interdisciplinary Journal, 8(4), 369–404.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                             70
Kendall, R. A., & Carterette, E. C. (1993a). Verbal attributes of simultaneous wind instrument
       timbres: I. von Bismarck's adjectives. Music Perception: An Interdisciplinary Journal,
       10(4), 445–467.
Kendall, R. A., & Carterette, E. C. (1993b). Verbal Attributes of Simultaneous Wind Instrument
       Timbres: II. Adjectives Induced from Piston’s “Orchestration.” Music Perception: An
       Interdisciplinary Journal, 10(4), 469-501.
Kendall, R. A., Carterette, E. C., & Hajda, J. M. (1999). Perceptual and acoustical features of
       natural and synthetic orchestral instrument tones. Music Perception: An Interdisciplinary
       Journal, 16(3), 327–363.
Li, X., Logan, R. J., & Pastore, R. E. (1991). Perception of acoustic source characteristics:
       Walking sounds. The Journal of the Acoustical Society of America, 90(6), 3036–3049.
McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., & Krimphoff, J. (1995). “Perceptual
       scaling of synthesized musical timbres: Common dimensions, specificities, and latent
       subject classes.” Psychological research 58.3, 177–192.
Morton, E.S. (1977). On the occurrence and significance of motivation-structural rules in some
       bird and mammal sounds. American Naturalist, 111(981), 855–869.
Morton, E.S. (1994). Sound symbolism and its role in non-human vertebrate communication. In
       L. Hinton, J. Nichols & J. Ohala (eds.), Sound Symbolism, Cambridge: Cambridge
       University Press, pp. 348–365.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                   71
Myers, A., Pyle Jr., R.W., Gilbert, J., Campbell, D.M., Chick, J.P., & Logie, S. (2012). Effects of
       nonlinear sound propagation on the characteristic timbres of brass instruments. Journal of
       the Acoustical Society of America, 131(1), 678–688.
Nykänen, A., Johansson, Ö., Lundberg, J., & Berg, J. (2009). Modelling perceptual dimensions
       of saxophone sounds. Acta Acustica United with Acustica, 95(3), 539–549.
Pratt, R. L., & Doak, P. E. (1976). A subjective rating scale for timbre. Journal of Sound and
       Vibration, 45(3), 317–328.
Qi, Y., & Hillman, R. E. (1997). Temporal and spectral estimations of harmonics-to-noise ratio
       in human voice signals. The Journal of the Acoustical Society of America, 102(1), 537–
       543.
Saitis, C. & Weinzierl, S. (2019). The Semantics of Timbre. In Siedenburg, K., Saitis,
       C., McAdams, S., Popper, A.N., Fay, R.R. (eds.), Timbre: Acoustics, Perception,
       Cognition (pp. 119–149). Springer International Publishing.
Siedenburg, K., & McAdams, S. (2017). Four Distinctions for the Auditory “Wastebasket” of
       Timbre. Frontiers in psychology, 8, 1747.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                            72
Slawson, W. (1981). The color of sound: a theoretical study in musical timbre. Music Theory
       Spectrum, 3, 132–141.
Smith, N. (1980). The Horn Mute: An Acoustical and Historical Study. DMA dissertation,
       Eastman School of Music, University of Rochester.
Sottek, R., Kamp, F., & Fiebig, A. (2013). A new hearing model approach to tonality. In INTER-
       NOISE and NOISE-CON Congress and Conference Proceedings, Innsbruck.
Terhardt, E., Stoll, G., & Seewann, M. (1982a). Pitch of complex signals according to virtual-
       pitch theory: Tests, examples, and predictions. The Journal of the Acoustical Society of
       America, 71(3), 671–678.
Terhardt, E., Stoll, G., & Seewann, M. (1982b). Algorithm for extraction of pitch and pitch
       salience from complex tonal signals. The Journal of the Acoustical Society of
       America, 71(3), 679–688.
Thompson, A. E. (1978). Nasal air flow during normal speech production. Master’s thesis,
       Department of Speech and Hearing Sciences, University of Arizona.
Tużnik, P., Augustynowicz, P., & Francuz, P. (2018). Electrophysiological correlates of timbre
       imagery and perception. International Journal of Psychophysiology, 129, 9–17.
       https://doi.org/10.1016/j.ijpsycho.2018.05.004
von Bismarck, G. (1974). Timbre of steady sounds: A factorial investigation of its verbal
       attributes. Acta Acustica united with Acustica, 30(3), 146–159.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               73
von Helmholtz, Hermann. (1885). On the sensations of tone as a physiological basis for the
       theory of music, 2nd English edition. Translated by Alexander J. Ellis. London:
       Longmans, Green, and co.
Wallmark, Z. (2019b). Semantic Crosstalk in Timbre Perception. Music & Science, 2, 1–18.
Wallmark, Z., Iacoboni, M., Deblieck, C., & Kendall, R. A. (2018). Embodied listening and
       timbre:   Perceptual,   acoustical,   and    neural   correlates. Music   Perception:   An
       Interdisciplinary Journal, 35(3), 332–363.
Wallmark, Z., & Kendall, R. A. (2018). Describing sound: The cognitive linguistics of timbre.
       The Oxford handbook of timbre. New York, NY: Oxford University Press. http://dx. doi.
       org/10.1093/oxfordhb/9780190637224.013, 14.
Wayland, R., Gargash, S., & Longman, A. (1995). Acoustic and perceptual investigation of
       breathy voice. The Journal of the Acoustical Society of America, 97(5), 3364–3364.
Wessel, D.L. (1973) Psychoacoustics and music: a report from Michigan State University.
       PACE: Bulletin of the Computer Arts Society 30, 1–2.
Yoshikawa S., Nobara Y. (2017) Acoustical Modeling of Mutes for Brass Instruments. In:
       Schneider A. (eds.) Studies in Musical Acoustics and Psychoacoustics. Current Research
       in Systematic Musicology, vol 4. Springer.
Zacharakis, A., Pastiadis, K., & Reiss, J. D. (2014). An Interlanguage Study of Musical Timbre
       Semantic Dimensions and Their Acoustic Correlates. Music Perception: An
       Interdisciplinary Journal, 31(4), 339–358. https://doi.org/10.1525/mp.2014.31.4.339
Zacharakis, A., Pastiadis, K., & Reiss, J. D. (2015). An Interlanguage Unification of Musical
       Timbre: Bridging Semantic, Perceptual, and Acoustic Dimensions. Music Perception: An
       Interdisciplinary Journal, 32(4), 394–412.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                    74
Zhang, J. D., & Schubert, E. (2019). A Single Item Measure for Identifying Musician and
      Nonmusician Categories Based on Measures of Musical Sophistication. Music
      Perception: An Interdisciplinary Journal, 36(5), 457–467.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                  75
The average age of the 17 participants was 28.5 years (range 19–42, SD=6.3). As a group, the
participants reported an average of 17.1 years of regular music practice (range 7–30, SD=5.2).
Participants listed 11 different instruments as primary instruments, including flute, oboe, clarinet,
bassoon, French horn, trombone, percussion, piano, violin, viola, and cello. Five of these
participants identified primarily as composers, and two of those five identified secondarily as
conductors.
In recruiting participants for our interview study, we concentrated our efforts on professional
orchestral musicians, conductors, and composers. The average age of the 23 participants was
32.8 (range 19–69, SD=13.5). As a group, the 23 participants reported an average of 21.1 years
of regular musical practice (range 8–59, SD=12.5) and 18.9 years of large ensemble experience
(range 7–50, SD=11.9). The principal instrument for these musicians included flute, oboe, alto
saxophone, trumpet, violin, viola, cello, double bass, guitar, and percussion. Five participants
Participants were also asked to report secondary instruments on which they had practiced
regularly at any time and on which they considered themselves to be proficient. Each musician
reported at least one additional instrument. In total, 30 unique primary and secondary
and composers. Participants (n = 460) were recruited in two ways. First, participants were
recruited via the Internet (n = 399) using email listservs and social media. This subset of
participants took the study in a self-determined location. Second, participants were recruited
from the Ohio State University School of Music subject pool (n = 61). Subject pool participants
were second year undergraduate music students and were tested individually in an Industrial
Acoustic Corporation sound attenuation room. All participants took part in the study using the
The average age of the 460 participants was 26.1 years (range 18–69, SD=9.3). As a
group, the participants reported an average of 13.4 years of regular music practice (median=10.5,
range 1–60, SD=9.7). Recall that participants identified their musical backgrounds as one of six
Although recruitment was aimed at musicians, we did not exclude data from the six participants
who self-identified as music-loving non-musicians for two reasons: all reported at least two years
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                               77
of regular musical practice, and they all successfully provided sufficient ratings for familiarity
Appendix B, Instructions.
“Below is a list of 44 musical instruments. From this list, we want you to select 20 instruments
that, as a group, you consider to exhibit the greatest contrast—that is, where each instrument
sound is relatively unique compared with the other 19 instruments you select. Do you have any
questions?”
“The purpose of this experiment is to gather information about how people experience the
sounds of different musical instruments. Rather than play any sounds to you, we simply want
you to imagine the sounds in your head. For example, we might ask you to imagine the sound of
a violin playing a single sustained tone. In imagining these sounds, you should imagine a sound
produced by a professional musician rather than a beginner or amateur. In addition, you should
imagine a typical or common sound rather than some unusual sound that an instrument might be
able to make.
I’ll mention the name of a specific instrument. Then you should do your best to imagine the
sound of that instrument. You can take your time doing that. When you are ready, I’ll ask you to
describe the sound. What does the sound sound like? Think of as many words or phrases,
I want you to say as much as you can about the sound and about how you experience the sound. I
will be transcribing your remarks, so I might ask you to slow down or repeat what you said.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                79
I may ask you some questions, but the purpose of the questions is simply to get you to talk about
the sound you imagine. Ideally, I wouldn’t ask you any questions at all.
Feel free to talk about any aspect of the sound—whatever catches your attention, whatever you
think, whatever it reminds you of. My preference is for you to simply talk about what you
When I first mention an instrument, I’ll ask you to judge on a 10-point scale how familiar you
are with the sound. For example, you might give a “10” to an instrument that you play regularly
and so are very familiar with. Conversely, if you really don't know an instrument well, you might
give it a rating of 2 or 3.
After you finish imagining the sound, I’ll then again ask you to rate on a 10-point scale how
clearly or vividly you were able to imagine the sound. Then I’ll ask you to imagine the sound
again and we’ll continue with you describing the sound. Do you have any questions about this?”
“In this task you are required to sort ideas into what you consider appropriate categories. There
are some 502 items requiring sorting. There is no prescribed number of categories. Use as many
categories as you feel necessary to form coherent groups. After sorting some of the items you
may find it helpful to revise some of your existing categories. If necessary, feel free to create a
“miscellaneous” category that might be used to group items that don’t seem to fit with anything
else.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                                   80
After you have completed this task, provide an identifying label for each category. Write the
If you use a “miscellaneous” category, after sorting all of the items, please return to the
miscellaneous category and determine whether any items might be reasonably added to one of
“In this study, we are interested in how people describe the sound quality or character of
You will be asked to imagine the sounds produced by particular instruments, such as the trumpet
or oboe. For each instrument you will be presented with a list of terms (such as “metallic” or
“buzzy”) and asked to rate how well the term describes the instrument. For example, you might
respond that “heroic/noble” describes the trumpet well, but that “timid” is a poor descriptor of
the trumpet.
For each instrument, you will be asked to rate 77 descriptive terms. It is likely that it will take no
more than 5 to 10 minutes to complete the task for one instrument. However, the survey asks you
to complete the task for two instruments. Accordingly, it should take about 15 to 20 minutes to
rate both. After you rate two, you will be given the option to end the survey or to rate another
instrument (which would really help us out with this project). After each instrument you rate,
you will have the option to rate another instrument, or to end the survey.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                            81
Before making your ratings, you will be asked how familiar you are with the instrument. We
only want to collect data for instruments that you recognize pretty well. If you are not at all
We will also ask you to rate how clearly or vividly you were able to imagine the sound.
Do your best to continue imagining the sound of the instrument as you make your judgments.
                                   (arousal: relaxing-to-
        gentle/calm                irritating)                peaceful/relaxing
                                   speed
        reedy                      sound source               reedy
        warm                       (temperature)              warm
        happy/joyful               happy                      happy/joyful
                                                              spaciousness/reverb/presen
        resonant/vibrant           (resonance)                ce
        watery/fluid               fluidity                   liquid/watery
        heavy                      weight                     (big/heavy)
        rich/complex               (complexity)               (richness)
                                   overtones
        wavy/undulating            (stability)                vibrato/wavy
        heroic/noble               (character-like)           (triumphant)
                                                              (majestic/noble)
        ringing/long decay         (resonance)                (decay/taper)
                                   (decay)                    (crash/clang)
        woody                      sound source               woody
N.B. Parentheses indicate that the words in the listed category were distributed among more than
Figure F.1 identifies the 33 components, rank-ordered by average pertinence. In seeking a more
parsimonious model, one would hope to see evidence of some discontinuity in the ratings where
one group of components is rated considerably lower than another group. However, no obvious
discontinuity is evident in Figure 1. The largest drop in pertinence ratings is between 15 and 16
components, from 74.7 (“loud, aggressive, commanding, assertive, powerful, direct, projecting”)
to 64.9 (“quick decay”). However, 64.9 is still a high pertinence rating, given that 50 was labeled
as “moderately pertinent.”
Figure F.1. Mean pertinence ratings for each of the 33 components. On the x-axis, each set
                      is represented by its initial descriptive term.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                          87
We began our analysis of PCA results with the typical approaches: looking at the Eigenvalues,
considering the scree plot/knee point, and examining the percent of cumulative variance
explained. Table D.1. below includes a summary of the unrotated PCA on the ratings dataset;
                                                     Proportion   Cumulative
                                                     Variance     Variance
                                    Eigenvalue       Explained    Explained
                      PC1           12.41            0.16         0.16
                      PC2           11.08            0.14         0.31
                      PC3           6.33             0.08         0.39
                      PC4           5.46             0.07         0.46
                      PC5           3.08             0.04         0.50
                      PC6           2.45             0.03         0.53
                      PC7           1.71             0.02         0.55
                      PC8           1.47             0.02         0.57
                      PC9           1.45             0.02         0.59
                      PC10          1.24             0.02         0.61
                      PC11          1.15             0.01         0.62
                      PC12          0.96             0.01         0.63
                      PC13          0.89             0.01         0.65
                      PC14          0.86             0.01         0.66
                      PC15          0.82             0.01         0.67
                      PC16          0.81             0.01         0.68
                      PC17          0.78             0.01         0.69
                      PC18          0.76             0.01         0.70
                      PC19          0.74             0.01         0.71
                      PC20          0.71             0.01         0.72
                      PC21          0.69             0.01         0.73
                      PC22          0.67             0.01         0.73
                      PC23          0.66             0.01         0.74
                      PC24          0.63             0.01         0.75
                      PC25          0.60             0.01         0.76
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA   88
As can be seen in both the Table and the Figure, the cutoff based on the arbitrary rule of
preliminary data from this study in a conference proceeding (Reymore & Huron, 2018), this is
how we approached the model, which contained 11 dimensions. The “knee point” method would
suggest around just seven components, depending on where exactly the “knee” was read.
However, after collecting more data, looking carefully at possible interpretations for rotated
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                              90
components, and considering our research goals, which include the use of this model in music
theoretical scholarship, we were ultimately not satisfied with the 11- or 7-component models. To
start, the 11-component model suggested by the Eigenvalue method explains just 62% of the
variance. A 7-component model suggested by the knee plot explains just 55% of the variance.
After considering the variances explained and interpreting the components of these models, we
felt, as professional musicians, that even the 11-dimension model was intuitively incomplete.
The process described below, of considering the available models and alternative methods of
interpretation, was the result of about six months of active consideration and continuous
To address our concerns about variance explained, we might have turned to another rule
based on the acceptable amount of variance explained (Jolliffe, 2002). We considered variance
explained by models from previous research in timbre semantics: von Bismarck found 81%
variance explained with four factors for synthetic sounds, and Kendall and Carterette (1993a),
testing a restricted set of natural recorded tones, found nearly 98% of variance explained with
only two factors. Kendall and Carterette 1993b also resulted in a different two-dimensional space
explaining 96% of variance in the data. In order to achieve even the lowest of these numbers,
81%, in the current study, we would need to include 32 dimensions. To achieve variance
explained along the order of 96–98%, as reported Kendall and Carterette, our data would require
63–71 dimensions.
Thus, we found ourselves in a position where none of the typical rules of thumb for PCA
interpretation ended with a model that would sufficiently meet our goals. Eigenvalues and scree
plots suggested models that did not explain enough variance, but a model based simply on a
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA                                             91
variance goal would include far too many dimensions to be practical. Accordingly, we needed
another way of choosing a model. Rather than inventing further arbitrary rules of thumb, we set
out on an exploratory expedition and dug deep into many possible models. One approach would
have simply been for us to pick the model we felt best represented our music theoretical
intuitions. However, such a choice would have been especially biased in that it would reflect
only our own timbral values. Ultimately, researcher bias is unavoidable in the interpretation of
PCA, and we knew we would at some point have to simply make a decision. However, to
mitigate that bias as much as possible, we computed the component superset and added further
The left column contains the twenty dimensions of the final timbre qualia model constructed in
the current paper. The column on the right contains what we have judged to be the corresponding
contain multiple terms, some can be sorted into multiple categories; the terms we have attributed
Table G.1.