0% found this document useful (0 votes)
161 views94 pages

Reymore Huron Timbre Qualia Psychomusicology Prepublicationcopy 2020 PDF

Uploaded by

beirut9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views94 pages

Reymore Huron Timbre Qualia Psychomusicology Prepublicationcopy 2020 PDF

Uploaded by

beirut9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/342538848

Using auditory imagery tasks to map the cognitive linguistic dimensions of


musical instrument timbre qualia.

Article  in  Psychomusicology: Music, Mind, and Brain · June 2020


DOI: 10.1037/pmu0000263

CITATIONS READS

0 164

2 authors, including:

Lindsey Reymore
The Ohio State University
6 PUBLICATIONS   5 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Musical Instrument Timbre Qualia View project

Timbre & Color View project

All content following this page was uploaded by Lindsey Reymore on 30 June 2020.

The user has requested enhancement of the downloaded file.


Running head: COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 1

PREPUBLICATION COPY
©American Psychological Association, [2020]. This paper is not the copy of
record and may not exactly replicate the authoritative document published in
the APA journal. Please do not copy or cite without author's permission. The
final article is available, upon publication, at:

https://doi.org/10.1037/pmu0000263

Using auditory imagery tasks to map the cognitive linguistic dimensions of musical instrument

timbre qualia

Lindsey Reymore and David Huron

Ohio State University

Author Note

School of Music, Ohio State University

Correspondence regarding this paper should be addressed to Lindsey Reymore, School of

Music, 1866 College Road, The Ohio State University, Columbus, OH 43210-1170, Tel: (772)

486-8177, Email: [email protected]

Preliminary results from this study were presented at the Timbre 2018 conference and the

15th International Conference on Music Perception and Cognition and appear in the Proceedings

of the 15th International Conference on Music Perception and Cognition. This project is included

in the first author’s published dissertation.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 2

Abstract

Two studies are reported related to musical instrument timbre qualia. In the first study, open-

ended interviews were conducted with 23 musicians who were asked to describe their

phenomenal experiences of imagined sounds for 20 Western instruments. A content analysis of

the transcribed interviews suggests 77 qualitative categories underlying the musicians’

descriptions. In a second study, 460 musician participants rated subsets of the same 20 imagined

instrument sounds according to the 77 categories derived from Study 1. Principal Component

Analyses were applied to the results of Study 2, yielding several models. Researcher

interpretations of the components in these models were combined with the results of

supplementary polls where musicians rated the descriptive utility of each candidate component,

producing a final 20-dimensional timbre qualia model. The model dimensions include:

rumbling/low, soft/singing, watery/fluid, direct/loud, nasal/reedy, shrill/noisy, percussive,

pure/clear, brassy/metallic, raspy/grainy, ringing/long decay, sparkling/brilliant, airy/breathy,

resonant/vibrant, hollow, woody, muted/veiled, sustained/even, open, and focused/compact.

keywords: timbre, semantics, orchestration, qualia, content analysis


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 3

Introduction

Organologists—that is, people who study musical instruments—have documented the

extraordinary variety of instruments used around the world. In many cultures, more than one

instrument will share similar pitch ranges and intensities, suggesting that differences other than

pitch or intensity are valued in some way. Since most musical lines can be played by more than

one instrument, this raises the question of why a musician might, for a particular passage, choose

one instrument or instrument combination over another. For example, in the second movement

(funeral march) of Beethoven’s third symphony, why does Beethoven assign the solo line to an

oboe rather than a flute or French horn? Various sounds evoke different phenomenological

experiences, including shared cultural implications as well as personal associations. Informal

descriptions of instrument sounds often resort to stereotypic characterizations, such as when a

trumpet is described as “noble,” or a tuba is described as “heavy.” How widespread are such

phenomenological experiences and associations, and where do these characterizations come

from?

The first purpose of the current study is to identify the kinds of phenomenological or

subjective experiences commonly associated with the sounds produced by different musical

instruments. In music research, the term quale, or what it is like to experience something, has

been borrowed from philosophy and has been used to represent the “phenomenal character” of a

given musical event (Arthur, 2016, p.4). The concept of qualia has been approached differently

in music research in comparison to philosophy, which commonly treats qualia as intrinsic/non-

relational and ineffable. Arthur (2016) argues that even though it may be impossible to

communicate musical qualia in words to someone who has not experienced the qualia in

question, those who have had the experiences are nevertheless able to use words to index their

shared experiences. In an open-ended description task, Huron (2006) observed intersubjectively


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 4

shared descriptions of scale-degree qualia; building on this work, Arthur (2016) reports

experiments with scale degree qualia and rhythm qualia which similarly suggest that musical

components can elicit relatively stable qualia. Accordingly, as described below, we propose to

examine various qualia associated with different tone colors or timbres through the use of verbal

descriptions. Although our investigation would ideally consider a wide range of instruments

from across the globe, our methodology requires participant familiarity with the instrument

sounds. Since we rely on Western-enculturated participants, the scope of our study is necessarily

limited to Western musical instruments.

A second goal of the current chapter is to create a cognitive linguistic model of timbre

qualia that will be useful in musical analysis. Specifically, we anticipate generating

characterizations of musical instruments, which we call Timbre Trait Profiles, that can be applied

in analysis. The aim is to be able to triangulate the Timbre Trait Profiles with information about

instrumentation and music theoretical analysis in order to build an understanding of the role of

timbre qualia in musical organization and the structure of musical experience.

Empirical studies of musical timbre have been conducted at least since the time of

Helmholtz (1885). In recent decades, timbre research has emphasized the use of

multidimensional scaling (MDS), with pioneering work by Plomp (1970), Wessel (1973), and

Grey (1977). In MDS, paired similarity judgments are used to construct a multidimensional map

of the relationships between sounds. MDS itself does not offer any interpretation of the

presumed underlying dimensions. Instead, it is up to the researcher to infer and interpret the

physical origin or perceptual meaning of the latent dimensions revealed through MDS.

Following Grey’s work, the MDS approach received considerable extension (e.g., Kendall &

Carterette, 1991; Kendall, Carterette, & Hajda, 1999; McAdams, Winsberg, Donnadieu, De
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 5

Soete, & Krimphoff, 1995). In interpreting MDS dimensions, researchers have tended to

emphasize physical (acoustical) rather than perceptual or cognitive interpretations. An alternative

approach, which we use here, is to use language as a starting point for the investigation of the

cognitive dimensions of timbre.

Several studies have addressed the semantics of timbre. Many of these studies have found

three to four semantic dimensions for timbre space (e.g. Pratt & Doak, 1976; von Bismarck,

1974). A 1974 study by von Bismarck is considered by some to be the first comprehensive study

of timbre semantics (Saitis & Weinzierl, 2019). Von Bismarck asked participants to use bipolar

scales to rate synthetic harmonic complex tones and noises with systematically varied spectral

envelopes. Factor analysis of the ratings yielded four factors explaining more than 80% of the

variance. These factors were defined as dull-sharp, compact-scattered, full-empty, and colorful-

colorless. Following up on these results, Kendall and Carterette (1993a) carried out a study in

which participants rated dyads produced by wind instruments using von Bismarck’s semantic

differentials. While the differentials did not result in successful differentiation among timbres, a

different version of the experiment in which verbal attribute magnitude estimation (VAME) was

used showed an improvement of success in timbre differentiation. However, the authors

concluded that von Bismarck’s adjectives were not ecologically valid. Thus, in the second part of

their study (1993b), dyads were rated instead on a list of adjectives from Piston’s Orchestration

(1955). The four-factor model resulting from this study accounted for over 90% of the variance

and included factors interpreted as power, strident, plangent, and reed. Elliot, Hamilton, and

Theunissen (2013) triangulated MDS with acoustical analyses and discriminant analysis of

participant ratings of experimenter-selected adjectives using bipolar scales, yielding a five-

dimensional space. Zacharakis, Pastiadis, and Reiss (2014, 2015) investigated timbre
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 6

descriptions across languages: Greek- and English-speaking participants selected descriptor

words from a predefined vocabulary list of 30 words, and results from both languages were

found to be reducible to three dimensions using factor analysis, interpreted by the authors as

texture, luminance, and mass.

Wallmark (2019a) reports a corpus linguistic study of orchestration treatises and manuals,

from which he derives seven categories of the timbre lexicon: affect, matter, cross-modal

correspondence, mimesis, action, acoustics, and onomatopoeia. These seven categories are

further reduced to three conceptual dimensions, interpreted as material, sensory, and activity.

Our approach shares a key strategy with Wallmark’s, that of deriving dimensionality from

ecological language.

In the current study, we aim to identify labels for timbre qualia dimensions that arise

from a bottom-up selection of descriptive terms provided by participants. This strategy contrasts

with MDS approaches where the researchers themselves provide labels for the implied

dimensions, and where such labels are not necessarily connected to the conventional descriptive

lexicon. We begin our study of timbre by polling musicians directly, asking them to provide

descriptions of various instrumental sounds. We then look for inter-subjectively reliable

descriptors shared across the musicians’ responses. Our approach aims to go beyond timbre as a

purely perceptual phenomenon to include the broader concept of qualia, that is, of the

phenomenological experience of sound that may extend beyond acoustical and perceptual

correlates to include cognitive, affective, cultural, and other facets. Timbre functions on different

scales of detail, and research has established that timbre is affected by both pitch height and

loudness (Siedenburg & McAdams, 2017). However, the current study invites perceptual

judgments that in principal include natural variations in pitch height and loudness as potential
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 7

integral components of timbre qualia insomuch as pitch height and loudness are a part of a

“typical” auditory image for any given instrument.

Descriptions of instrumental timbre can be gathered via interviews; for example, Traube

(2004) used interviews to investigate the timbre semantics of the classical guitar, while Nykänen,

Johansson, Lundberg, & Berg (2009) similarly employed interviews to examine descriptions of

saxophone timbre. One approach to investigating instrument qualia might have different

instruments play the same passage, and then have listeners describe the different qualia evoked

(e.g., Hailstone, et al., 2009). The selection of suitable stimuli passages raises a number of

questions. A musical passage will convey a number of qualities independent of the nominal

timbre of the performed instrument. For example, the passage itself may be rather quiet or loud,

animated or subdued, somber or joyous, etc. The character of the passage is apt to have a marked

impact on how listeners describe the associated phenomenological qualia. In addition, different

performers are apt to exhibit idiosyncratic interpretations of the passage, as well as variations in

timbre. Even in the case of single isolated instrument tones, there are a number of variables that

can have a marked impact, such as the amount of reverberation, microphone proximity, mode of

articulation, pitch height, and overall dynamic level.

When we hear a recording of an actual oboe tone, the tone will differ in various ways

from what we might imagine an oboe tone to sound. The recorded tone may be higher in pitch

than expected, have a drier reverberation, a slightly breathier character, a faster vibrato, etc. That

is, actual sound stimuli commonly deviate from an internalized prototype in various ways.

This experience is not unique to sound. Suppose you are a computer scientist interested in

identifying the features of a human face. You recruit participants and ask them to describe

various faces. When presented with a photograph of a particular face, we tend not identify that
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 8

the person has a nose, two eyes, and a mouth (the type description). Instead, we offer

descriptions like aquiline nose, close-set eyes, and small mouth (token description).1 Such

descriptions make sense only because they identify deviations from an average or prototypical

face—an intersubjective template that is shared between observers. That is, descriptions tend to

focus on how something differs from a common internalized norm or prototype.

Similarly, when asked to describe a recorded oboe tone, there is a temptation to describe

the token rather than the type. In much timbre research (such as ours), the goal is to identify the

timbral qualities of a type rather than a token. That is, we aim to characterize the qualia features

of an “oboe,” rather than the qualia features associate with a particular recording of an oboe.

One might identify two approaches whose goal is collecting type rather than token

descriptions. One approach is to expose participants to a large number of token instances and

then seek the commonalities. For example, one might have listeners describe 40 contrasting oboe

recordings, including different players, different tempos, different pitches, different reverberant

environments, different vibratos, different musical styles, different microphone proximities, and

so on. In analyzing the results, the challenge is to eliminate the token descriptions while retaining

the type descriptions—a task that must be done without any a priori knowledge of which

descriptions might be token or type.

An alternative approach might endeavor to tap into listeners’ existing mental stereotypes

or cliché images of typical sounds produced by different instruments. Specifically, one might ask

participants to describe imagined sounds rather than actual sounds. The assumption is that mental

images of instrument sounds are more likely to represent prototypical instrument characteristics

and common associations.

1
Note that we intend the terms type and token to be taken generally here; we do not intend to imply the narrower
definitions of the terms used commonly used in linguistics.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 9

This “imagined sounds” approach raises two critical methodological questions. First, how

vivid are these imagined sounds? Secondly, how closely do these imagined sounds approximate

a protoypical or commonplace sound for a given instrument?

With regard to the vividness of imagined sounds, Halpern, Zatorre, Bouffard, and

Johnson (2004) conducted a multidimensional scaling task and showed that people compare

imagined timbres similarly to how they compare perceived timbres. Accompanying

neuroimaging results confirmed that activity in the auditory association cortices is present for

both perceived and imagined timbre. Similarly, Tużnik, Augustynowicz, and Francuz (2018)

trained participants to imagine synthesized auditory stimuli varying in timbre and found that

electrophysiological measurements reflected differences in imagined timbres. Their results also

suggest that musicians perform timbre imagery tasks more accurately than do non-musicians.

Although these results are suggestive, of course it remains possible that perceived and imagined

sounds nevertheless recruit different networks and may rely on different auditory features.

Regarding the second question of the prototypicality of what people imagine, pertinent

findings are reported in Huron, (2006, Chapter 4). Huron asked musicians to imagine a wide

variety of sounds, including individual tones, chords, rhythms, etc. When imagining a single

tone, for example, he found that musician participants tend to imagine tones very near the precise

center of the distribution for actual pitches in Western music. Specifically, the mean pitch for

imagined tones was just two semitones away from the actual mean pitch for a large sample of

music. Similarly, when asked to imagine a chord, musicians tend to imagine a major chord in

root position—the most commonly occurring chord in Western music. When asked to imagine

any rhythm, musicians tend to imagine the most commonly occurring meter at the most

commonly occurring tempo. In general, whether imagining pitches, scale degrees, chords, or
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 10

rhythms, Huron found a high rate of intersubjective agreement among his participants suggesting

broadly shared prototypes (Huron, 2006).

In the current study, when asked to imagine an oboe tone, for example, many participants

reported imagining a “tuning A” tone commonly played by an oboe when initiating an orchestral

tuning session. One might question whether a “tuning A” oboe tone represents a prototypical

oboe sound. Nevertheless, for non-oboe players, the “tuning A” tone is likely the most common

experience they have of hearing an isolated oboe sound.

In summary, although the use of “imagined sounds” may initially seem questionable for

timbre studies, the evidence nevertheless suggests that, especially for musician participants,

imagined sounds can be remarkably vivid, and imagined sounds are more likely to be

representative of average, commonplace, or prototypical sounds compared with descriptions of

actual sound stimuli. If the goal is to identify type rather than token timbral features, imagined

sounds may well be superior to actual sound recordings. Finally, we should note that an

important attraction of the imagined sound method is that it is easier to administer and avoids

many thorny issues related to selecting or recording nominally representative stimuli. Ultimately,

we anticipate that both the value and limitations of the imagined sound method will become

evident through future practical research experience.

The purpose of this study is to build a timbre qualia model that might prove useful in

music analysis. Our approach concentrates on identifying intersubjectively consistent verbalized

descriptions of instrument timbres. Constructing the model involves two phases: an exploratory

phase intended to solicit a wide range of timbre descriptors and a model-building phase in which

the large number of descriptors identified in the exploratory phase is distilled to a more

parsimonious model.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 11

Preliminary Study

In soliciting timbre descriptions as part of Study 1 (described below), we aimed to have

participants characterize a wide range of contrasting and diverse sounds in order to capture as

much descriptive variance as possible. One way to achieve such a breadth would be to include a

large number of contrasting musical instruments. For example, a suitable list might include an

African mbira, Chinese er-hu, Australian didgeridoo, Western violin, Indonesian peng ugal, etc.

As will be described later, our methodology requires that the instruments used as stimuli be

familiar to our participants. Consequently, in conducting the qualia survey, it was necessary to

restrict the target sounds to familiar Western musical instruments. Given the criterion of

familiarity, we assembled a list of 44 instruments (shown in Table 1) that are likely to be familiar

to Western musician participants.

Table 1. List of instruments used in the preliminary study.

piccolo French horn cymbals acoustic guitar


flute trumpet bass drum electric guitar
oboe trombone marimba bass guitar
English horn bass trombone vibraphone steel guitar
clarinet tuba triangle banjo
bass clarinet violin temple blocks mandolin
bassoon viola wood block harmonica
contrabassoon cello piano accordion
alto saxophone double bass harpsichord bagpipes
tenor saxophone timpani organ kazoo
baritone saxophone snare drum harp digeridoo

In describing these 44 instruments, we anticipated that our participants might produce a

large number of descriptive terms. Because our intention is to use these descriptive terms in a

subsequent study, an additional concern was that Study 2 did not become excessively long.

Consequently, it is appropriate to consider using a reduced list of instruments—the shorter the


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 12

list, the greater the tractability of the study. At the same time, reducing the number of

instruments is likely to reduce variance and generality.

Accordingly, prior to the interviews in Study 1, a preliminary study was conducted in

order to select a subset of the instruments from Table 1 considered to be maximally diverse in

timbre. In order to minimize experimenter bias, we recruited 17 independent musician judges for

this preliminary study (for participant demographics, see Appendix A.1.). Given the instruments

in Table 1, the participants were asked to create a subset of 20 instruments exhibiting the most

contrasting timbres. In presenting the list of instruments, the order was uniquely randomized for

each participant.

The participants produced seventeen 20-instrument sets deemed to exhibit maximally

contrasting timbres. However, no two sets contained the same 20 instruments. Consequently, the

17 sets were used as input to a computational procedure intended to generate the most timbrally

contrasting instrument group based on the musicians’ judgments. For each set of 20 instruments,

all possible pairs of instruments in the set were tallied. These tallies were then combined across

all 17 sets. This resulted in an aggregate dissimilarity score for each pair of instruments; the aim

is to discover which set of 20 instruments produces the highest aggregate dissimilarity score. The

final selected optimal set of maximally contrasting instruments is shown in Table 2.2

Table 2. Final 20-instrument set resulting from the preliminary study.

alto saxophone cymbals kazoo timpani


bagpipes English horn oboe triangle
banjo flute piano tuba
bass clarinet French horn piccolo vibraphone
bass drum harp snare drum wood block

2
It should be noted that not all 1.8 trillion 20-instrument combinations were evaluated, so there is no guarantee that
the apparent best set is the true optimum.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 13

Study 1: Interviews

Having selected 20 contrasting instruments, Study 1 asked musician participants to imagine the

sounds produced by these 20 instruments and to describe their phenomenal experiences of the

imagined sounds. Descriptions were transcribed in situ and the comments later analyzed for

content.

Method

Previous experience with qualia research suggests that participants dislike typing, and that

participants are likely to provide more verbose descriptions in the presence of another person

than when alone. If asked to type an open-ended response, participants typically type a few

words before wanting to move on. On the other hand, participants can be quite effusive when

speaking, especially in conversation. Accordingly, interviews were conducted live, with the

researcher typing a verbatim transcript of the comments while the participant spoke. Participants

were asked to imagine the sound of a specific instrument, to make ratings of familiarity and

vividness, and then to re-imagine and describe the sound they were imagining in as much detail

as possible. Detailed instructions for the interviews are included in Appendix B.2.

After reading and listening to the instructions, participants were asked to imagine and

describe the sounds of the target instrument. Each participant described the 20 instrument sounds

(Table 2) in a unique random order. As evident in the instructions, participants rated the

vividness of the imagined sound, as well as their familiarity with the sound of the instrument.

Additionally, if a participant’s primary instrument was not on the list, the participant was further

asked to imagine and describe the sound of their primary instrument.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 14

Twenty-three musicians participated in the interviews (for participant demographics, see

Appendix A.2.) Most interviews ranged between 40 and 60 minutes in duration, though some

interviews lasted 90 minutes or more. In post-interview follow-ups, participants were asked their

impressions of the task. Nearly all participants reported that the task was somewhat or

considerably difficult. Despite these reservations, our participants nevertheless were able to

describe at length all of the instruments involved in the study. Moreover, in the debriefing

sessions following the interviews, all participants responded positively to the task, including

those who described it as challenging.

Results

Each of the 23 participants described the target 20 instruments. Some participants, whose

primary instrument or instruments were not one of the target instruments, opted to describe their

own instrument(s) in addition to the target 20. All together participants provided 477 descriptions

of individual instruments. Each instrument description was parsed into terms or phrases,

hereafter referred to as component ideas. Across all instrument descriptions, parsing yielded

4,809 component ideas, representing an average total of 240.5 component ideas for each of the

imagined instrument sounds, across all participants. Ratings of familiarity and vividness were

closely correlated, r(475) = .70. The number of component ideas in a given instrument

description was positively correlated with both familiarity and vividness, r(475) = .28.

The bagpipes, French horn, flute, and cymbals garnered the highest number of average

component ideas per participant (M = 11.8), whereas the wood block, alto saxophone, bass

clarinet, and snare drum garnered the lowest average number of component ideas per participant
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 15

(M = 8.6). Notice, however, that the range of average number of component ideas for each

participant-instrument combination is comparatively small (8.4–12.2).

Content Analysis

The purpose of the content analysis was to distill each comment into one or more component

ideas. In their responses, some participants tended to provide a perfunctory list of adjectives,

whereas other participants offered more effusive descriptions, sometimes resorting to elaborate

metaphors or narratives. The following three responses illustrate the range of response styles,

with each followed by the list of component ideas arising from the content analysis:

[Oboe.] “Oboe is like grass for some reason. I guess it has a lot of

tonal information. Like it produces more around the tone. It’s very

woody, grainy; it’s bright but with a few shades of darkness, shady.

It’s complex; I find it poignant. It’s mostly yellow with some green

undertones. It’s sort of protagonistic, I think, I always think of oboe as

really loud, I don’t know why. Piquant, savory-sweet, but not very

full-bodied. It sounds like a tendon. Yeah, like a skinny muscle.”

COMPONENT IDEAS: “like grass,” “has a lot of tonal

information,” “produces more around the tone,” “woody,”

“grainy,” “bright,” “shades of darkness,” “ shady,” “complex,”

“poignant,” “yellow with green undertones,” “protagonistic,”

“loud,” “piquant,” “savory-sweet,” “not full-bodied,” “like a

tendon/skinny muscle.”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 16

[Harp] “Magical, resonant again in a way. That even though it is a

percussive sound, it seems to last longer than other percussive

instruments. Elegant, sometimes twinkling; it’s hard to imagine an

ominous harp. I can’t remember when a harp was ever evil. A very

visceral instrument—you can hear the sound being produced even

though it’s so delicate-sounding usually in character.”

COMPONENT IDEAS: “magical,” “resonant,” “percussive,”

“sound lasts longer,” “elegant,” “twinkling,” “not

ominous/evil,” “visceral,” “delicate.”

[Bass clarinet] “Deep, woody, dark, hollow, very smooth, even

mysterious or dark and brooding.”

COMPONENT IDEAS: “deep,” “woody,” “dark,” “hollow,”

“very smooth,” “mysterious” “dark,” “brooding.”

In describing various instrument sounds, it was not uncommon for participants to relay

particular stories or associations. For example, in describing the wood block, one participant

relayed her experience playing a wood block in a percussion class she had taken. Since personal

association can be rather arbitrary, we specifically excluded autobiographical anecdotes or

personal associations from the content analysis. In addition to eliminating anecdotes and

associations, we also eliminated references to specific musical works (e.g. “Bolero”). These

criteria resulted in the exclusion of some 533 component ideas (i.e., roughly 11% of all collected

comments), narrowing the number of component ideas from 4,809 to 4,276. Subsequently, the
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 17

modifiers “more,” “most,” “many,” “few,” “less,” “least,” “somewhat,” “a lot of,” and “very”

were also trimmed from the descriptions, as were negations such as “not.”

Eliminating duplicate component ideas, modifiers, and negations from the list of 4,276

resulted in 2,487 unique component ideas—a number deemed too many for manual sorting.

Consequently, we focused on only the 502 component ideas that were mentioned more than

once. Table 3 identifies the 50 most common component ideas. It should be noted that because

we did not control for how many times a single participant used any given term, it is possible

that these counts may be skewed by participants who favored the reuse of certain words. Words

with common roots (e.g., power and powerful) have been combined. Together, these 50 ideas

account for 1,874 of 4,276 component ideas (43.8%).

Table 3. Top 50 most-frequently used component ideas from interviews

Descriptive Term Number of Occurrences

1. round 85
2. resonant 84
3. warm 83
4. loud 83
5. low 73
6. bright 69
7. clear 67
8. metallic 65
9. high 58
10. sharp 56
11. deep/depth 54
12. piercing 47
13. soft 47
14. colorful/color 43
15. nasal 40
16. airy 39
17. full 39
18. rich 38
19. hollow 38
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 18

20. mellow 35
21. pure 35
22. buzz/buzzy 35
23. voice/vocal 35
24. smooth 34
25. dark 34
26. focused 34
27. light 32
28. big 31
29. cut/cuts/cutting 31
30. reedy 30
31. open 29
32. percussive 28
33. ringing/rings 28
34. direct 25
35. complex 24
36. beautiful 23
37. sweet 23
38. shrill 22
39. brassy 22
40. sustain 21
41. powerful/power 21
42. thin 18
43. harsh 17
44. rough 17
45. pretty 15
46. fat 15
47. gentle 15
48. annoying 13
49. woody 12
50. twangy 12

The 502 component ideas were printed on individual slips of paper that were used in an

ensuing pile sort task (de Munck, 2009; see Appendix B.4. for instructions) that was conducted

independently by both authors. We assembled the 502 component ideas into 59 and 70

categories; Table 4 shows the descriptive labels for the categories from both pile sorts. Upon
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 19

discussion of both analyses, we agreed on a list of reconciled categories, which are reported in

the first column of Table 4.

Consider, by way of example, the categories “aesthetic” and “valence” created by

Experimenter 1, and the category “beautiful/pleasant” created by Experimenter 2. Experimenter

1’s pile sort for the “aesthetic” category included the terms “beautiful,” “elegant,” “cute,” and

“pretty.” The “valence” category included the terms “positive,” “nice,” “lovely,” “pleasant,”

“unpleasant,” and “painful.” By comparison, Experimenter 2 created a single

“beautiful/pleasant” category, which included the terms “beautiful,” “lovely,” “nice,” “pretty,”

“pleasant,” “positive,” “melodious,” and “melodic.” The divergent interpretations were

reconciled by forming a single category, “beautiful.”

In reconciling the two lists and naming the reconciled categories, we referred back to the

list of 4,276 component ideas in order to take into account the number of times a given idea

appeared in the transcripts. In the above example, for the various component ideas included in

the three combined pile sort categories (“aesthetic,” “valence,” and “beautiful/pleasant”), the

most frequent word by far was the word “beautiful”—hence the revised name for the combined

category. Notice that the ideas “painful” and “unpleasant” no longer belong in the final resolved

category. It was agreed that these ideas could be accommodated under the reconciled category of

“shrill/harsh/annoying.”

This reconciliation procedure continued until all of the original pile-sort categories from

both experimenters were distilled into 75 descriptive categories shown Table 5 (for a complete

account of both experimenters’ categories and how these categories are related to the reconciled

categories, see online supplemental material). As described above, in providing labels or names

for our final resolved categories, we payed close attention to the frequency of occurrence of
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 20

different ideas. Hence, for example, in the category “gentle/calm,” the word “gentle” occurred

most frequently, followed by the word “calm.” Other terms belonging to that category, including

“relaxing” and “peaceful,” occurred less frequently.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 21
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 22

Recall that each of the 502 component ideas used in the pile sort task occurred a

minimum of twice in the qualia interview transcripts. As a further check of the inclusiveness of

the 75-category list, we went back to the 2,487 unique component ideas in the transcripts and

assigned each idea to its most appropriate category. Using this procedure, we found that 1,607 of

the 2,487 unique component ideas could be accommodated within the 75-fold classification

taxonomy. Of the unclassified component ideas, the experimenters agreed that 67 of these

orphan component ideas could be accommodated by adding two more categories, deemed

visceral (44) and grainy/gravelly (23). These additional categories are included at the bottom of

Table 5 and indicated by the † symbol. Accordingly, our final taxonomy consists of 77

categories that are able to classify all of the timbre descriptors that occurred more than once, and

67% of all the unique timbre descriptions offered by our participants.

Table 4. Final list of 77 categories resulting from content analysis.

aggressive funny/comical pinched/constrained soaring/floating


airy/breathy gentle/calm ping/ding/ting soft/smooth
beautiful grainy/gravelly † powerful sparkling/shimmering
big happy/joyful precise/clean supportive/foundational
brassy heavy pure sustained/even
bright heroic/noble quick decay sweet
brilliant high raspy/guttural thick/fat
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 23

buzzy hollow reedy thin/narrow


clear light (in weight) resonant/vibrant twangy
colorful loud rich/complex unclear/indistinct
commanding/assertive low ringing/long decay unique/distinct
cool/cold metallic rough versatile/flexible
cute/innocent mournful/wailing round visceral †
dark muted/veiled rumbling/booming warm
deep mysterious/ethereal sad/melancholy watery/fluid
direct/projecting nasal salient/present wavy/undulating
dramatic/expressive noisy serious/solemn woody
focused/compact open shrill/harsh/annoying
folk-like/pastoral percussive simple
full piercing/sharp singing/voice-like

† Categories added after review.

Study 2: Rating Task

In any classification system, it may be possible to create a more parsimonious scheme by

combining or eliminating categories with high shared variance. Accordingly, we conducted a

second study in which participants judged each instrument according to all 77 categories. This

task provided data for subsequent Principal Component Analyses.

Method

As in Study 1, recruitment for Study 2 focused on professional orchestral musicians, conductors,

and composers. Participants (n = 460) were recruited in two ways. First, participants were

recruited via the Internet (n = 399) using email listservs and social media. This subset of

participants took the study in a self-determined location. Second, participants were recruited
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 24

from the Ohio State University School of Music subject pool (n = 61). Subject pool participants

were second year undergraduate music students and were tested individually in an Industrial

Acoustic Corporation sound attenuation room. All participants took part in the study online using

the same Qualtrics survey. (For participant demographics, see Appendix A.3.)

For each instrument rated, participants were given a list of the 77 component ideas and

asked to rate how well each of the terms described the sound of the instrument as imagined by

the participant. Terms were rated on a scale of 1 (not …) to 7 (very …). Full instructions for this

task can be found in Appendix B.4.

Recall that we limited the number of instruments used in Study 1 in order to preempt the

problem of having too many judgments for participants in Study 2. However, Study 1 produced

an unexpectedly large number (77) of descriptive categories. Despite our efforts, given the 20

instruments used in Study 1 and 77 descriptors, a full set of ratings would require 1,540

judgments, which remains too many for a single participant. Consequently, in Study 2

participants judged a subset of the 20 instruments. Participants were asked to rate two randomly

selected instruments, resulting in 154 judgments. As noted in the instructions, having completed

two instruments, participants could further volunteer to rate additional instruments.

As in Study 1, participants were asked to imagine instrument sounds rather than listen to

recorded stimuli. Participants also rated their familiarity with the instrument and the vividness

with which they were able to imagine the sound. Five familiarity ratings were possible:

extremely familiar, very familiar, moderately familiar, slightly familiar, not familiar at all. Data

was retained only for those participants who reported moderate familiarity or better. With regard

to the reported quality of imagining the sound, no data were collected if a participant rated the

vividness as “not vividly at all.”


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 25

Data Quality

In order to better ensure data quality, we established further exclusion criteria to eliminate

responses that deviated significantly from responses by other participants. Specifically, for each

individual set of 77 ratings for a given instrument, we calculated the paired correlation with the

ratings from all other participants for that instrument. A priori we established an exclusion

criterion of r = .25; that is, if a given participant exhibited an average correlation less than r =

.25 with the other participants judging that instrument, then their data was excluded from further

analysis. In order to avoid a situation in which much or most data was discarded, we also a priori

established that no more than 25% of instrument-participant judgments would be eliminated and

that at least 20 judgments would be available for each instrument. If necessary, the r = .25

average correlation criterion would be weakened in order to satisfy either or both of these

conditions. However, neither of these retention conditions arose and so there was no need to

weaken the correlation criterion. In the end, 70 of 1,571 instrument judgments failed to achieve

the r =.25 average correlation, representing a total reduction of 4.5% of participant responses.

After this exclusion, a total of 1,501 judgments remained. Overall, the average inter-rater

correlation across all included participants and all instruments after exclusions was r =.50.

Results

Recall that participants were asked to rate two instruments but could opt to continue to rate as

many instruments as they wanted. Partial data in which a participant only completed the rating of

a single instrument was included. Seventy-nine (17%) participants rated only a single instrument;

148 (32%) participants rated two instruments; 165 (36%) rated between 3 and 5 instruments; 48
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 26

(10%) rated between 6 and 10 instruments; 20 (4%) especially eager participants rated between

11 and 20 instruments.

Since collection of data for a given instrument was avoided if participants gave low

familiarity or vividness ratings, more data was consequently collected for those instruments that

are generally more familiar to the participant pool. Instruments received between 51 and 104

ratings for all 77 categories (for more detail, see Appendix C).

Table 5 identifies the instruments with the highest and lowest average means for each of

the 77 categories.

Table 5. Highest- and lowest-rated instruments.

Descriptive Category Highest-rated Instrument Lowest-rated Instrument

Instrument Mean SD Instrument Mean SD

Aggressive Crash cymbals 5.88 1.39 Harp 1.36 0.76


Airy/Breathy Flute 5.45 1.65 Wood block 1.56 1.17
Beautiful Harp 6.53 0.85 Kazoo 1.60 0.97
Big Bass drum 6.49 0.80 Triangle 1.79 1.26
Brassy French horn 5.70 1.11 Harp 1.14 0.39
Bright Triangle 6.56 0.66 Bass drum 1.56 1.00
Brilliant Piccolo 5.68 1.56 Bass drum 2.21 1.66
Buzzy Kazoo 6.74 0.83 Wood block 1.36 0.88
Clear Piccolo 6.09 1.01 Kazoo 2.71 1.50
Colorful English horn 5.54 1.48 Bass drum 2.36 1.57
Commanding/Assertive Snare drum 6.19 0.95 Harp 2.18 1.48
Cool/Cold Triangle 4.12 2.08 Timpani 2.08 1.34
Cute/Innocent Triangle 4.92 1.78 Bass drum 1.24 0.59
Dark Bass clarinet 5.88 1.03 Triangle 1.31 0.76
Deep Bass drum 6.57 0.72 Triangle 1.29 0.63
Direct/Projecting Bagpipes 6.36 1.00 Harp 2.99 1.62
Dramatic/Expressive English horn 5.90 1.20 Wood block 2.22 1.55
Focused/Compact Wood block 6.08 1.20 Harp 3.01 1.80
Folk-like/Pastoral Banjo 5.85 1.55 Crash cymbals 1.49 1.04
Full Tuba 6.30 0.89 Triangle 2.55 1.58
Funny/Comical Kazoo 6.59 0.91 Harp 1.48 0.95
Gentle/Calm Harp 6.31 0.95 Crash cymbals 1.24 0.58
Grainy/Gravelly Kazoo 4.71 1.98 Harp 1.18 0.45
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 27

Happy/Joyful Piccolo 5.35 1.39 Bass drum 2.43 1.41


Heavy Bass drum 6.42 0.85 Triangle 1.28 0.59
Heroic/Noble French horn 6.25 1.01 Kazoo 1.16 0.52
High Piccolo 6.90 0.39 Tuba 1.28 0.63
Hollow Wood block 5.40 1.61 Tuba 2.08 1.45
Light (in weight) Triangle 6.05 1.34 Bass drum 1.44 0.90
Loud Bagpipes 6.74 0.60 Harp 2.36 1.39
Low Tuba 6.52 0.99 Piccolo 1.03 0.17
Metallic Crash cymbals 6.49 1.16 Wood block 1.29 0.65
Mournful/Wailing English horn 4.76 1.91 Wood block 1.22 0.53
Muted/Veiled French horn 3.55 1.90 Crash cymbals 1.24 0.62
Mysterious/Ethereal Harp 5.81 1.71 Kazoo 1.27 0.71
Nasal Kazoo 6.42 1.09 Bass drum 1.25 0.85
Noisy Crash cymbals 6.45 1.19 Harp 1.38 0.69
Open Flute 5.10 1.52 Kazoo 2.96 1.66
Percussive Snare drum 6.94 0.27 English horn 1.29 0.60
Piercing/Sharp Piccolo 6.39 1.18 Tuba 1.72 1.12
Pinched/Constrained Kazoo 5.12 1.70 Timpani 1.65 0.97
Ping/Ding/Ting Triangle 6.85 0.58 Bass clarinet 1.10 0.31
Powerful Bass drum 6.60 0.66 Triangle 2.33 1.41
Precise/Clean Wood block 6.08 1.18 Kazoo 2.08 1.28
Pure Harp 6.03 1.24 Kazoo 1.66 0.97
Quick decay Wood block 6.04 1.72 Bagpipes 2.26 1.74
Raspy/Guttural Kazoo 4.25 2.06 Harp 1.12 0.32
Reedy Oboe 6.18 1.34 Bass Drum 1.07 0.26
Resonant/Vibrant Vibraphone 6.15 1.21 Kazoo 3.07 1.77
Rich/Complex English horn 5.63 1.39 Wood block 1.97 1.26
Ringing/Long Crash cymbals 5.94 1.52 Wood block 1.46 0.83
Rough Kazoo 4.99 1.71 Harp 1.16 0.51
Round Tuba 5.88 1.37 Kazoo 2.01 1.38
Rumbling/Booming Bass drum 6.62 0.90 Piccolo 1.14 0.49
Sad/Melancholy English horn 5.32 1.59 Kazoo 1.32 0.60
Salient/Present Bagpipes 5.83 1.51 Bass clarinet 3.92 1.40
Serious/Solemn English horn 5.27 1.52 Kazoo 1.10 0.41
Shrill/Harsh/Annoying Kazoo 5.90 1.53 Harp 1.27 0.70
Simple Wood block 6.33 1.11 Bagpipes 2.78 1.73
Singing/Voice-like Flute 5.25 1.46 Snare drum 1.14 0.73
Soaring/Floating Flute 5.76 1.34 Snare drum 1.57 1.07
Soft/Smooth Harp 5.47 1.46 Snare drum 1.37 0.73
Sparkling/Shimmering Triangle 5.95 1.39 Bass drum 1.26 0.58
Supportive/Foundational Tuba 6.34 0.97 Kazoo 1.58 1.21
Sustained/Even Oboe 5.48 1.41 Wood block 2.32 1.92
Sweet Harp 5.61 1.32 Snare drum 1.39 0.75
Thick/Fat Tuba 6.28 1.05 Triangle 1.22 0.52
Thin/Narrow Piccolo 4.90 1.76 Bass drum 1.25 0.62
Twangy Banjo 6.61 0.77 Bass drum 1.18 0.47
Unclear/Indistinct Kazoo 3.37 1.89 Wood block 1.54 1.16
Unique/Distinct Bagpipes 6.76 0.62 Bass drum 4.31 1.67
Versatile/Flexible Piano 4.99 2.02 Wood block 2.01 1.38
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 28

Visceral Bass drum 5.40 1.90 Triangle 2.64 1.82


Warm French horn 5.70 1.28 Triangle 2.17 1.35
Watery/Fluid Harp 4.61 2.17 Wood block 1.23 0.62
Wavy/Undulating Vibraphone 4.48 2.28 Wood block 1.15 0.46
Woody Wood block 6.54 1.05 Crash cymbals 1.04 0.20

Table 6 reports the most highly rated descriptive category for each instrument along with the

corresponding average rating value. Ratings were made on a scale of 1 (“not”) to 7 (“very”).

Table 6. Highest-rated category for each instrument.

Instrument Highest-rated category Mean SD

Alto Saxophone Direct/Projecting 5.70 1.32


Bagpipes Unique/Distinct 6.76 0.62
Banjo Twangy 6.61 0.77
Bass clarinet Low 6.14 1.04
Bass drum Rumbling/Booming 6.62 0.90
Crash cymbals Loud 6.66 0.63
English horn Reedy 6.07 1.25
Flute High 6.02 1.20
French horn Heroic/Noble 6.25 1.01
Harp Beautiful 6.53 0.85
Kazoo Buzzy 6.74 0.83
Oboe Unique/Distinct 6.35 0.76
Piano Clear 5.99 1.07
Piccolo High 6.90 0.39
Snare drum Percussive 6.94 0.27
Timpani Powerful 6.51 0.79
Triangle Ping/Ding/Ting 6.85 0.58
Tuba Low 6.52 0.99
Vibraphone Resonant/Vibrant 6.15 1.21
Wood block Percussive 6.77 0.56

PCA Optimization
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 29

PCA is almost always used in an effort to create the most parsimonious model, identifying the

least number of components that are able to account for the greatest variance. However, in

increasing parsimony, the full richness of the phenomenon is diminished. It is evident from the

results of the interviews that when given the opportunity, people can be very effusive in their

descriptions of timbre qualia, using various linguistic approaches and a diverse vocabulary. This

creativity and complexity suggests that a relatively large number of components may need to be

retained in order to explain sufficient variance; however, the ideal model must balance

explanatory power with practicality.

In PCA, the researchers are tasked with choosing the number of components to retain.

Selecting “too many” factors would mean that some of the factors have little or no utility or

relevance; selecting “too few” factors would mean that useful or relevant information has been

unwisely discarded. Given that the goals of the project include capturing as much of the semantic

richness of timbre perception as possible, one might argue that it is better to err on the side of

including “too many” factors rather than “too few.”

PCA is considered an exploratory statistical method rather than an inductive statistical

tool for testing hypotheses. Often, the number of components to be retained is based on one of a

number of possible rules of thumb, such as setting a cutoff based on some percent of the

cumulative variance explained, eigenvalues, and/or a scree plot, among other methods. These

common approaches are relatively informal, based primarily on their intuitive appeal and

practicality (Jolliffe, 2002). After examining the results and considering various common

heuristics for choosing a model, we determined that we were not satisfied by the models

suggested by these common rules of thumb. Thus, we began our model-building process instead

by exploring a wide range of PCA models (for full details on the PCA and interpretation process,
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 30

see Appendix F). To assess sampling adequacy, the Kaiser, Mayer, Olkin (KMO) statistic was

calculated. The overall measure of sampling adequacy (MSA) of .95 suggested that PCA was

appropriate for the data. We began by looking at models containing 10, 20, 30, 40, 50, and 60

components. All potential models were rotated using the promax rotation to aid the interpretation

of the resulting components.3

In order to meaningfully compare models, care must be taken in assigning descriptive

labels to components. To label each component, we relied on an automated routine that ordered

the terms loading on a component from strongest to weakest. Up to five of the most strongly-

loading terms were included in the interpretive label for a given component. In order to simplify

interpretation and to guarantee that the terms included in the names of the components were

maximally representative of their components, we chose a relatively high threshold when

judging the strengths of the loadings; we agreed that ± .65 offered both a reasonable and

conservative cutoff. Consequently, in assigning a label to each principal component, we selected

only terms which loaded at greater than or equal to ± .65, and we included no more than five

terms in the name of a component.

As might be expected, in the case of 50- and 60-component models, a majority of the

resulting components simply echoed one of the original 77 categories. In the case of the 10-

component model, many of the terms that loaded strongly (i.e., greater than or equal to ±.65) on

the same component seemed excessively heterogeneous. For example, one of the components in

the 10-component model was interpreted as containing 11 descriptors, including “brassy,”

3
Models using different rotations (promax, oblimin, varimax, simplimax, and quartimax) were produced using R.
These models were carefully compared in terms of their interpretability, and although they were highly similar, we
considered promax to have consistently produced somewhat simpler models. Our manual assessment was supported
by the MIC (mean item complexity) values for the models, which were slightly lower for promax rotations.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 31

“reedy” and “funny/comical;” another component containing 21 descriptors included

“dramatic/expressive” along with “folk-like/pastoral” and “gentle/calm.”

Consequently, we continued the exploratory approach by comparing models containing

15, 20, 25, 30, 35, and 40 components and then continuing to narrow our exploration. The most

plausible or practical models appeared to range between 23 and 37 components. As might be

expected, the models within this range were highly similar. Serial neighbors were typically

distinguished by just one or two differences. Naturally, as the models grew in components, more

distinctions were made.

Common PCA practice has the experimenters choose a model and interpret the individual

components. In the current case, rather than leaving this task solely to the authors, one might

consider inviting independent musicians to choose an appropriate model. However,

systematically comparing several models, each containing two or three dozen components,

would be excessively time-consuming. Moreover, in comparing models, one often sees aspects

of one model that are appealing while other aspects seem deficient in some way. Accordingly,

rather than having musicians compare complete models, we created a superset of all of the

components evident in those timbre models containing between 23 and 37 components and asked

musicians to assess the pertinence (defined as “relevance” and “usefulness”) of each individual

component. During these assessment tasks, participants made their ratings based on the

descriptive labels that had been previously assigned to each component using the method

described above, which contained up to five of the words that loaded strongly (greater than or

equal to ± .65) onto the component. Among the 59 components of the superset, several groups of

components exhibited very similar labels. For these groups of highly similar components, ten
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 32

musician participants were recruited to identify which versions were most musically pertinent.

Their judgments reduced the number of components in the superset from 59 to 33.

Subsequently, another 12 musician participants rated the pertinence of these 33

components using a 100-point scale, where zero was defined as “not at all pertinent” and 100

was defined as “very pertinent.” The value 50 was explicitly defined as “moderately pertinent.”.

Participants also selected the single best descriptive term for each component. Finally, freeform

comments were also collected. Pertinence rating results can be found in the online supplemental

material, Appendix F.

As noted earlier, in creating a timbre model, our preference is to err on the side of

including too many descriptive components rather than too few. However, at the same time, it is

important that a model be practical for the purposes of possible future music analyses. In

particular, we anticipate conducting future research where participants characterize each

instrument according to all of the dimensions or components in a timbre model. In the end, we

chose an average cut-off for pertinence ratings of 60, producing a model of 19 components—

which we suggest may offer a reasonable balance between inclusiveness and parsimony, given

our music analytic goal. As described below, we subsequently decided to include a twentieth

dimension based on participant comments.

Participant Comments

In assessing the pertinence of individual components, we invited participant comments for each

term. In total, the participants provided 115 comments. Participants were notably dissatisfied

with the grouping of “watery/fluid,” with the terms “soft/smooth,” “singing/voice-like,” “sweet,”

and “gentle/calm.” At the same time, participants indicated that they thought “watery/fluid” was

also pertinent. Consequently, this category was broken into two separate dimensions in the final
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 33

model: “watery, fluid,” and “soft/smooth, singing/voice-like, gentle/calm,” bringing the number

of dimensions in the final model up to 20.

Brightness

It should be noted that at this point, the model did not directly include the classic “brightness”

dimension that is a ubiquitous finding in virtually all studies of timbre. In general, the word

“bright” was the sixth-most common descriptive term used in our interviews, attesting to its

importance. It was also one of the 77 intermediate categories that were distilled in our PCA

analyses. The category “bright” loaded most strongly on the component that included the highly

loading terms “sparkling/shimmering,” and “brilliant.” “Bright” consistently exhibited a high

loading on this dimension across a series of PCA models (.61–.63). Unfortunately, these values

were just below our a priori cut-off of ±.65. It is likely that “bright” did not break off into a

stronger independent category because of the high correlation between pitch height and

brightness. In the PCA models, “bright” generally shared most variance with the

sparkling/brilliant component while also exhibiting lower, but still considerable, negative

loadings on the rumbling/low component. Because of the importance of brightness in timbre

research, and because “bright” was a prominent descriptor in the interviews, we post-hoc

included this classic concept in the sparkling/brilliant dimension.

Timbre Qualia Model

The final dimensions of the model do not reflect the results of a single PCA model, but rather are

the products of a process involving close examination of two dozen PCA models guided by the

input of a group of independent musicians. The first word in each dimension description (Table

8, column 1) is the word that was chosen as most pertinent by the most participants in the
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 34

supplementary studies. The words that follow are the other categories that loaded strongly onto

the component, ordered by strength of loading.

In order to facilitate discussion of the timbre qualia dimensions of this model, we offer

shorthand labels for the dimensions, listed in the second column of Table 8. Each shorthand label

contains up to the first two terms listed in the full dimension description. Recall that many of the

descriptive terms during the component stage were compound, which we had indicated with a

slash (e.g., “rumbling/booming”). For the sake of brevity, only the first half of compound terms

were included in the shorthand label. When applicable, this was followed by the second term in

the full dimension description. When a shorthand label contained two terms, we joined these

terms together by using a slash to create a new compound term to serve as the shorthand label for

the dimension.

Table 7. Final model dimensions and shorthand labels.

Dimension description Shorthand label

1. rumbling, booming, low, deep, thick, fat, heavy rumbling/low


2. soft, smooth, singing, voice-like, sweet, gentle, calm soft/singing
3. watery, fluid watery/fluid
4. direct, projecting, loud, aggressive, commanding,
assertive, powerful direct/loud
5. nasal, reedy, buzzy, pinched, constrained nasal/reedy
6. shrill, harsh, noisy shrill/noisy
7. percussive percussive
8. pure, clear, precise, clean pure/clear
9. brassy, metallic brassy/metallic
10. raspy, guttural, grainy, gravelly raspy/grainy
11. ringing, long decay ringing/long decay
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 35

12. sparkling, shimmering, brilliant, bright sparkling/brilliant


13. airy, breathy airy/breathy
14. resonant, vibrant resonant/vibrant
15. hollow hollow
16. woody woody
17. muted, veiled muted/veiled
18. sustained, even sustained/even
19. open open
20. focused, compact focused/compact

Discussion

Results from the open-ended transcriptions are consistent with previous research in timbre

linguistics suggesting that timbre linguistics are largely metaphorical (Saitis & Weinzierl, 2019;

Wallmark & Kendall, 2018). Our participants used all of the strategies described by Porcello

(2004), who also analyzed spoken language: spoken/sung imitations, lexical onomatopoetic

metaphors, pure metaphors, association, and evaluation. Wallmark (2019a) proposes seven

conceptual categories of timbre description, resulting from a corpus study of orchestration

treatises, including Affect (emotional and aesthetic), Matter (physical features), Cross-modal

correspondence (descriptions borrowed from other senses), Mimesis (comparison to other

sounds), Action (physical actions or qualities of movement), Acoustics, and Onomatopoeia. In

general, interview transcriptions in the current study exhibited responses that fall into these

categories. More specifically, we can consider the final 20-dimensional model in relation to

Wallmark’s conceptual categories. Because our final dimensions are composed of up to five

terms, dimensions with multiple terms may belong to multiple conceptual categories.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 36

While onomatopoeia was used frequently in interviews, these terms generally did not

make it through the dimension reduction process, largely because of the semantic variety of

onomatopoetic terms used. However, the word “buzzy,” which is part of the final dimension

nasal/reedy, does fall into Wallmark’s category of Onomatopoeia. Mimesis accounts for a few of

the terms, such as “nasal,” “reedy,” “singing,” and “voice-like.” Affective terms are also

relatively less common but include terms on the soft/singing and direct/loud dimensions. Action

terms occur with similar frequency and include “pinched,” “constrained,” and “open.” However,

the conceptual categories that are most common in our final dimension labels are Acoustics,

Cross-modal correspondence, and Matter, with Acoustics dominating in terms of overall

importance. Seven of the twenty dimensions can be categorized fully as Acoustic, and Acoustic

terms play roles in four other dimensions. A table of the dimension descriptions for the current

study and the conceptual categories into which they would likely be sorted on Wallmark’s

scheme can be found in Appendix G.

In a recent review of timbre semantics research, Saitis and Weinzierl (2019) write that the

most salient timbre dimensions have been found to be “brightness/sharpness (or luminance),

roughness/harshness (or texture), and fullness/richness (or mass)” (p. 135). Our 20-dimensional

model offers apparent parallels to these three dimensions: sparkling/brilliant maps onto

brightness/sharpness/luminance, raspy/grainy and/or shrill/noisy onto

roughness/harshness/texture, and rumbling/low onto fullness/richness/mass. One critical

observation, however, relates to the variance explained by each of our twenty dimensions. As

listed in Appendix F, the dimensions are listed approximately by variance explained from

greatest to least (this ordering can only be approximate because the components ultimately were
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 37

drawn and chosen from the superset of components, rather than a single PCA model).4 However,

our model’s direct parallels to brightness and roughness are surprisingly low in terms of relative

variance explained.

In the interest of comparing our data to previous three-dimensional models, we ran a

three-dimensional PCA with the promax rotation; this model explains just 39% of the total

variance. The first component in the three-dimensional model from our data corresponds with the

first component of the 20-dimensional model, rumbling/low. The second component corresponds

generally with the second dimension of our final model, soft/singing, with the highest-loading

term, “beautiful,” loading at .83—at first glance, this dimension does not appear to be directly

mappable onto the brightness/roughness/fullness model. The third dimension includes top-

loading term “shrill/harsh/annoying” at .70; the next highest-loading terms are “piercing/sharp”

(.64), “noisy” (.64), and “buzzy” (.60). The presence of “piercing/sharp” suggests that this

dimension may be related to the conventional brightness/sharpness dimension, yet the top-

loading term includes the word “harsh,” suggesting it is instead parallel to the conventional

roughness/harshness dimension. Given all three components in the model, it seems logical to

consider this model’s second component (beautiful/soft/singing, etc.) to be the opposite of

harsh/rough and the third component to correspond to brightness/sharpness, despite the inclusion

of the word “harsh.” In sum, while our data can be recast as a three-dimensional model that

plausibly demonstrates correspondences with the conventional brightness-roughness-fullness

model, these correspondences are not completely straightforward, and the three-dimensional

model explains only 39% of the variance in our data.

4
One exception to this is watery/fluid, which was separated from the soft/singing dimension at a late stage, based on
responses to the supplementary polls. Thus, the second and third dimensions, soft/singing and watery/fluid together
explain the second greatest amount of variance.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 38

If we can reduce our dimensionality further as illustrated above, and previous studies

have found that semantic timbre space can be, in the words of Saitis and Weinzierl (2019, p.

120), “adequately explained” using just two or three dimensions, why propose a model with 20

dimensions? Our response to this question comes from the observation that adequate explanation

is entirely relative. Depending on the goal, three dimensions may be sufficient. However, for

musical purposes, there are few situations in which three timbral dimensions would be adequate.

A music instructor or ensemble conductor who only had three semantic dimensions of timbre

available would not only find their vocabulary impoverished, but absolutely inadequate. For

some purposes of everyday musical life, even 20 dimensions may prove insufficient. We

emphasize that the amount of variance explained by some dimension does not directly determine

its artistic value. Since our motivating purpose for this study is to contribute to the development

of a descriptive language for music theoretical endeavors, the number of useful dimensions

remains an open question. Our experience as music theorists suggests to us that a useful

descriptive language will likely involve more than three dimensions.

Two principal concerns come with such high dimensionality. First, is this unwieldy—are

20 dimensions simply too many to handle in future empirical work and in analysis? And second,

do we actually need this many—would fewer dimensions be sufficient? Ultimately, the answer to

both questions will be determined by practice. Our experience with the current study and other

ongoing studies making use of the 20-dimensional model suggests that participants do not have

difficulties working with the 20 dimensions in rating tasks. In terms of computation and analysis,

modern software is quite capable of working with high-dimensional data. We anticipate that

music theorists will be able to use the model for analysis via a computational program, currently

under development, that presents the resulting data to program users in easily interpretable ways,
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 39

including through visualizations. It may be the case that future work demonstrates that some of

our dimensions are less useful than others and could be removed from the model; however, it

would be easier to prune, rather than to add, dimensions, and so we prefer to err on the side of

too many rather than too few dimensions.

In describing musical timbres, a pertinent question is why certain qualia descriptors arise

but not others. Why do listeners tend to characterize sounds as open but not free? Why

brassy/metallic but not plastic/rubbery? Why watery/fluid but not oily/sticky? In the ensuing

discussion, we propose potential acoustical correlates for many of our dimensions, based on

previous research in acoustics and timbre. While the discussion that follows at present remains

speculative, we anticipate that the ideas presented in this section will stimulate useful conjectures

that may be tested in future hypothesis-driven experiments.

Acoustic Attributes

A number of the 20 timbre qualia dimensions can be directly related to acoustical properties of

the sound generator. Notably, these include descriptions of modes of activation and the materials

of the acoustic vibrator.

Mode of Activation

Percussive, Sustained/Even, and Ringing/Long Decay.

Three of the dimensions relate to contrasting sound envelopes. Two represent the extreme

envelope possibilities, namely percussive and sustained/even. The former is associated with

sounds whose mode of activation is struck, hit, or dropped. The latter is associated with sounds

whose mode of activation is blown or rubbed. The third dimension, ringing/long decay, suggests

a sort of intermediate envelope category in which a struck sound generator exhibits low internal

friction, resulting in a slow decay of the sound.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 40

These three dimensions are consistent with evidence from MDS studies suggesting that

attack time is a salient factor in how listeners make dissimilarity judgments, in particular when

the sounds that they are judging contain both sustained and impulsive sounds. Faure, McAdams,

& Nosulenko (1996) found that similar terms in French were correlates of attack time, including

“pas soufflé” (not blown), “pincé,” (plucked), and “attaque rapide” (fast attack). The current

results imply that musicians find not only the attack quality to be salient, but also consider the

duration of the decay.

Physical Material

Brassy/Metallic and Woody.

The terms included in the brassy/metallic and woody dimensions suggest that the material of the

acoustic vibrator is important; however, as we will see, the mode of activation plays a critical

role in whether the physical material of a sound generator is perceptually salient.

In the case of struck metal plates, Fletcher, Perrin, and Legge (1989) have drawn

attention to distinctive nonlinear acoustical features. Bending metal changes its stiffness, and so

striking a plate can provoke momentary nonlinearities that lead to telltale acoustic features,

notably dynamic frequency shifts. This is evident from the musical instruments rated highest on

“metallic,” which are generally struck instruments made of metal: cymbals (6.5), triangle (6.5),

and vibraphone (5.2). Similar nonlinear pitch shifts can be heard as the distinctive “twang” sound

that is produced when a loose metal string is plucked with a large displacement. Accordingly, the

banjo, which was rated most “twangy” (6.6), was also ranked as the fourth most “metallic”

instrument (4.0).

The problem with identifying material of construction from sounded air columns is

illustrated in the case of brass instruments. Brass instruments are technically “lip reed”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 41

instruments. They employ cup- or funnel-shaped mouthpieces that are activated by buzzing lips

and produce a complicated acoustical phenomenon referred to as “regeneration” (Elliott &

Bowsher, 1982; Freour & Scavone, 2012). Regeneration is a form of feedback that leads to a

ringing dynamic filter effect. In the production of a single tone, the power of successive

harmonics increases over time. The overall result is a sort of “bwah” sound where successive

partials are emphasized in a quick upward and then slower downward frequency sweep.5

The perceptual importance of the acoustics of regeneration over the material of

construction in conveying “brassiness” is nicely illustrated by the success of plastic “brass”

instruments. A number of manufacturers market plastic trumpets whose sounds are uncannily

indistinguishable from metal instruments. Furthermore, early “brass” instruments like the

cornetto and serpent were made of wood.

In short, a “brassy” sound is not the result of an instrument’s material, but rather is a

consequence of the specific non-linear dynamics of buzzing lips in combination with a

mouthpiece interacting with a resonant tube (Myers et al., 2012). Since at least 1600, the vast

majority of lip-reed instruments have been made of brass, and it is this simple association that

explains the tendency for listeners to describe the characteristic regenerative acoustical

phenomenon as “brassy.” The acoustical—rather than physical—basis for these associations is

readily apparent in the case of the alto saxophone, which is a blown instrument made of brass,

yet receives low ratings for both “metallic” (3.2) and “brassy” (3.3).

5
Brass instruments with long narrow bores (trumpet and trombone, but not tuba or flugelhorn) are capable of
producing an especially intense brassy sound that some performers refer to as “sizzle.” The narrow bore permits
high air pressures that have been shown to produce a shock wave within the instrument—generating distinctive
inharmonic partials (Hirschberg, Gilbert, Msallam, & Wijnands, 1996). These sounds are particularly associated
with a “brassy” character. It also explains why some wide-bore lip-reed instruments—like the tuba and didgeridoo—
are commonly described as sounding less brassy. In the case of the didgeridoo, the absence of a bell has the further
effect of reducing the intensity of the higher harmonics and so diminishes the potential brassy character.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 42

The top “woodiest” instruments were the wood block (6.5), bass clarinet (5.6), English

horn (5.4), and oboe (4.7). The wood block, a struck sound source, stands nearly a full point

higher on average than the next-woodiest instrument, which is a wind instrument, consistent with

the claim that struck sounds are better at conveying information about the physical material of

the source than are aerophones. As with brass instruments, woodwinds are also sometimes made

with plastic rather than wood. Manufacturers of top-of-the-line plastic oboes and English horns

continue to improve their imitation of the tone color of wood instruments; such plastic

instruments have recently started to be adopted by some professional players. As with the

characterization “brassy,” it seems as though the perception of “woody” timbres in woodwind

instruments is largely based on the fact that woodwinds have traditionally been crafted from

wood.

By way of summary, timbral qualities that listeners associate with physical material are

better understood as arising from characteristic acoustical patterns that are only indirectly related

to the material of construction. Terms like “woody” and “brassy” might be regarded as

misnomers. Instead, the distinguishing features arise from phenomena like acoustical

regeneration or dynamic frequency shifts, and these features provide the basis for learned

associations for certain classes of sound generators. Moreover, these acoustical patterns are

strongly dependent on the mode of activation, so struck materials are more likely to exhibit

distinctive acoustical features than blown forms of activation.

Intensity-related dimensions

Soft, smooth, singing, voice-like, sweet, gentle, calm.

The words “soft” and “smooth” are primarily related to touch, indicating generally pleasurable,

low-intensity tactile sensations. “Soft” is also a common synonym for “quiet.” “Soft,” “smooth,”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 43

“gentle,” and “calm” suggest that a major feature of the soft/singing dimension is low intensity.

However, there is more to this dimension than simply a quieter dynamic, as is evident from the

terms “singing,” “voice-like,” and “sweet.” The soft/singing dimension appears to focus on an

especially light form of singing, based on the high loadings on terms such as “gentle” and

“calm.”

Singing-like sounds are presumably limited to pitches that fall within the range of human

voices. Hence instruments like the tuba and piccolo are less likely to produce sounds

characterized as “singing.” Of the participant-rated instruments, the flute was judged as

exhibiting the highest “singing/voice-like” character, followed by the English horn and oboe.

These instruments fall in a higher pitch range, indicating that this dimension may be suggestive

of a female voice.

Wallmark (2014) discusses the relationship between timbre and the voice, reviewing

evidence for timbral vocality in relation to an embodied theory of timbre. He reports evidence

from a related neuroimaging study that suggests that motor regions of the brain involved in

vocalization are selectively involved in the perception of positively-valenced timbres. The

component loadings on the soft/singing dimension of the current study, which include “singing”

and “voice-like,” are consistent with this finding, as the other terms included are generally

positively valenced (e.g. “sweet,” “gentle,” “calm”). Terms that also loaded onto this component

strongly but did not meet the criteria to be included with the dimension label are also quite

positively valenced and include the terms “beautiful,” “mysterious/ethereal,” “soaring/floating,”

and “warm.”
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 44

Acoustically, we might summarize a soft/singing timbre as a relatively moderate-to-high-

pitched harmonic non-percussive sound with relatively steady or slowly evolving spectral and

dynamic features.

Direct, projecting, loud, aggressive, commanding, assertive, powerful.

The direct/loud dimension implies more than simply a high intensity level—as evident with the

inclusion of the terms “projecting,” “aggressive,” “commanding,” “assertive,” and “powerful.”

Loud sounds arise from more energetic sound sources, so they are more “powerful” in the literal

sense of physical energy. Loud sounds are less susceptible to auditory masking, so they are also

more noticeable or salient. That is, they are capable of commanding greater attention; they are

sounds that project or stand out.

In ethology, loud sounds are associated with aggression and alarm (Morton, 1977, 1994).

Consequently, the contrast of direct/loud with soft/singing includes an affective connotation

beyond intensity. Whereas “sweet,” “gentle,” and “calm” load onto the soft/singing dimension,

“commanding,” “aggressive,” and “assertive” load onto the direct/loud dimension.

Hence, the contrast between soft/singing and direct/loud is not merely one of intensity,

but also of sweet versus aggressive, soft versus powerful, singing versus assertive, and gentle

versus commanding. Notice that the emphasis on the terms “singing” and “voice-like” in the

soft/singing dimension suggests a more human, prosocial, or friendly quality that is absent from

the direct/loud dimension.

One might ask why soft/singing and direct/loud did not collapse into a single dimension.

If these dimensions simply represented opposite levels of intensity, then a single dimension

would surely suffice. We might speculate that it is the affective connotations (aggression versus
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 45

gentleness, sweet singing versus commanding assertion, etc.) that account for the independence

of these two dimensions.

Frequency-related dimensions

Rumbling, booming, low, deep, thick, fat, heavy.

Large masses or large volumes vibrate at a lower frequency than small masses or small volumes.

Low frequency sounds are therefore readily associated with larger objects (Hinton, Nichols, &

Ohala, 1994)—consistent with the terms “fat,” “heavy,” and “thick.” Moreover, the term “low”

explicitly implies low frequency.

Although “deep” can evoke associations such as profound, heartfelt, and rapturous,

dictionary definitions first associate “deep” with synonyms like cavernous, gaping, or huge—

physical properties that are consistent with large volumes, and so associated with low

frequencies. For example, we generally expect someone with a “deep” voice to be a larger

person.

Finally, a characteristic of very low frequencies is that they are generally devoid of pitch.

For extremely low frequencies, individual cycles of vibration may be evident, such as when a

passing truck creates a trembling vibration. An instrument like the bass drum is quite capable of

producing such low frequencies. Such pitchless sounds may be described as “rumbling.” In short,

rumbling/low appears to represent a coherent qualia dimension strongly linked to low frequency

sounds.

Sparkling, shimmering, brilliant, bright.

Notice that all four descriptors in the sparkling/brilliant dimension originate as visual descriptors

rather than auditory descriptors. Among timbre researchers, brightness has been especially
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 46

valued since it is strongly correlated with spectral centroid, an easily calculated acoustical

measure. “Brilliant” and “bright” also imply an energy-related component, suggesting something

high and loud, as perhaps in the sound of a piccolo or piccolo trumpet. On the other hand,

“sparkling” and “shimmering” imply something dynamic. Definitions of “sparkling” suggest

synonyms including effervescent and animated; definitions for “shimmer” offer synonyms such

as twinkle, glimmer, and glitter. In vision, these terms are associated with the presence of small

points of intense light that are evident in reflected cut glass, gems, or the twinkling of stars. Due

to movement, these “glints” are typically momentary or transient.

Notice that it is difficult to strike a triangle without the instrument swinging or rotating.

This movement causes phase shifting that adds a dynamic aspect to the sound that might be

likened to twinkling or shimmering effects in vision. In the auditory domain, sparkling and

shimmering sounds may not simply be higher in frequency, but also perhaps involve pointillistic

performance manners such as staccato articulation or rapid tempo.

Harmonicity- and noise-related dimensions

Pure, clear, precise, clean.

What makes listeners judge a sound as pure/clear? The instrument rated most pure was the harp,

while the instrument rated most clear was the piccolo. In both cases, the kazoo was rated the

polar opposite. Dictionary synonyms for “pure” emphasize the concept of something being

uncontaminated, unmixed, or refined. “Clean” evokes an unpolluted or untainted character,

including affective connotations such as fresh, wholesome, sanitary or healthy.

There are several possible acoustical factors that might account for the descriptions of

sounds as pure or clean. Acoustically, “clean” might be interpreted as referring to sounds with a

high signal-to-noise ratio. For instruments that produce pitched sounds, it is possible to
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 47

distinguish noise components (like bow noise or breathiness) from periodic components. Such an

analysis can be made using the tone-to-noise ratio (TNR) (Sottek, Kamp, & Fiebig, 2013; Sottek,

2014; Qi & Hillman, 1997).

Apart from the ratio of periodic-to-noise components, another factor that might account

for the pure/clear dimension is the degree to which the periodic components of a tone conform to

a harmonic series. This conformity is reflected in the concept of harmonicity, which is

commonly measured by the harmonics-to-noise ratio (HNR) (Fernandes, et al., 2018; Wayland,

Gargash, & Longman, 1995). The HNR can be calculated by comparing the aggregate energy of

all partials whose frequencies conform to the harmonic series with the aggregate energy for all

other spectral components. An example of low HNR is evident in “hoarse” speech.

Yet another factor that might contribute to the pure/clear quale is Terhardt’s concept of

pitch weight or “toneness” (Huron, 2016; Parncutt, 1989; Terhardt, Stoll, & Seewan, 1982a,

1982b). Even for sounds that are highly harmonic, the clarity of evoked pitches is known to be

influenced by pitch height. Especially high or low tones are associated with weak or vague pitch.

When asked which sounds exhibit the clearest pitch, listeners identify complex tones in the

region between E2 and G5—a region centered near middle C and spanning the combined range

of the bass and treble staves.

Recall that the instruments rated most and least “pure” were the harp and the kazoo

respectively. Similarly, the instruments rated most and least “clear” were the piccolo and (again)

the kazoo. In contrast to the kazoo, both the harp and piccolo produce strictly harmonic complex

tones with low noise components, suggesting that tone-to-noise ratio and harmonicity may be

pertinent factors. However, the high average rating on “clear” for the high-pitched piccolo
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 48

suggests that pitch weight or toneness may not necessarily be critical to the pure/clear qualia

dimension.

Finally, we might consider the term “precise,” which also loaded highly on the pure/clear

dimension. The instrument rated most “precise/clean” was the wood block. Yet again, the kazoo

was rated the polar opposite. Although the wood block involves inharmonic partials, it

nevertheless provides a clear pitch—especially compared with other struck instruments. What

the harp and wood block share in common is abrupt onsets, which might contribute to a sense of

precision.

In summary, pure/clear timbres might be associated with sounds that have a high tone-to-

noise ratio, high harmonicity, and percussive onsets capable of conveying a sense of precision.

Focused/Compact.

Focused/compact may similarly be associated with sounds exhibiting high harmonicity. That is,

one may suppose that a focused/compact sound would include few or no inharmonic partials with

little or no noise components. A single plucked guitar string or oboe tone might qualify as highly

focused/compact, whereas a kazoo or rattle would exhibit low ratings focused/compact. The term

“compact” might also imply sounds produced by smaller sound sources, such as the contrast

between a piccolo and a bass drum, or narrower bores, such as the contrast between a trumpet

and a tuba.

Airy/breathy.

The component terms that load onto the dimension airy/breathy refer to both a blown mode of

activation and to accompanying unpitched noise components arising from gross air movement.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 49

In jazz styles, singers commonly perform with a notable degree of breathiness, which tends to

convey a sense of proximity or intimacy to the sound. This same breathiness is commonly

audible in the performance of wind instruments played in a jazz style, like the flute, trumpet, or

flugel horn.

In speech and singing, breathiness arises when the vocal folds do not exhibit full closure

or adduction (Grillo & Verdolini, 2008). In brass instruments, breathiness similarly arises when

the vibrating lips of the performer are relaxed so as to avoid full closure. For instruments like the

saxophone, clarinet, oboe, and bassoon, breathiness can be produced by directing a proportion of

the air around the mouthpiece rather than into it.

Phoneticians measure the breathiness of the voice by computing the harmonics-to-noise

ratio (HNR). Recall that this is calculated by comparing the aggregate energy of all partials

whose frequencies conform to the harmonic series with the aggregate energy for all other

spectral components (Wayland et al., 1995).

Shrill, harsh, noisy.

Both “shrill” and “noisy” suggest an association with high energy, while the term “shrill” further

suggests an association with high frequency. When activated by high energy, many vibrators

produce non-linear (chaotic) oscillations leading to inharmonic partials or noise bands, which

can be measured through the harmonics-to-noise ratio. Non-linear oscillation is also

characteristic of vocal production for animals experiencing stress (Blumstein, Davitian, & Kaye,

2010), an observation that may be pertinent to the potential source of this dimension.

In a discussion of noisy timbres, Wallmark (2014) notes that timbral noise does not have

a single, consistently agreed-upon correlate. He identifies inharmonicity and spectral flatness, or


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 50

the relative smoothness/spikiness of a signal, as two of the likely correlates. Such a multiplicity

of sources of noise may be related to the fact that multiple dimensions in our final model seem to

be related to noise, including, of course, shrill/noisy, but also the previously-discussed

pure/clear, focused/compact, and airy/breathy, along with raspy/grainy (discussed in the section

below). Wallmark considers “noisy” timbres to have components of brightness, noise, and

roughness, each markers of physical exertion; “noisy” timbres parallel the embodied experiences

of anger and fear and are accordingly consistent with negative appraisal. Of our potentially

noise-related dimensions in the current model, shrill/noisy is certainly the most negatively-

valenced and high-arousal: the dimension also includes the term “harsh,” and “annoying” was

also a related term. As noted earlier, pure/clear carries a positive connotation, and for musicians,

focused/compact is usually also a desirable tone quality. Airy/breathy seems to communicate a

kind of low-arousal noise in a sound that is related to intimacy and physical closeness rather than

physical exertion: this would suggest a positively-valenced, rather than negatively-valenced,

interpretation of noise that is distinct from Wallmark’s “noisy” timbre, even though airy/breathy

timbres may share some of the same acoustic properties as noisy ones.

Raspy, guttural, grainy, gravelly.

Synonyms for “raspy” include scraping, grating, and grinding; synonyms for “guttural” include

throaty, husky, and gruff. “Grainy” and “gravelly” are linked to fragmentary or gritty particles.

Raspy/grainy appears to provide a contrast with pure/clean and a strong relationship with the

concept of timbral roughness, which has featured prominently in recent timbre research (e.g.

Wallmark, 2019b; Wallmark, Iacoboni, Deblieck, & Kendall, 2018).


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 51

Both “grainy” and “gravelly” imply some sort of rapid amplitude modulation that

interrupts the sound, producing a more pointillistic, quick series of sounds. Bowed instruments

are quite capable of producing a highly rough sound when a bow is slowly and forcefully drawn

across a string. The resulting sound might well be characterized as raspy or gravelly. Bowed

strings produce sound according to the so-called “stick-slip principle,” where the bow

momentarily grabs and displaces the string until the restoring pressure releases the string

(Casado, 2017). In regular bowing, this stick-slip cycle occurs hundreds of times per second.

However, with forceful bowing, the cycle is greatly slowed and so the sound descends into the

frequency region of roughness. Notice that this slow stick-slip mechanism characterizes all forms

of scraping. This mode of sound activation is consistent with common synonyms for “rasp,”

including scraping, grating, grinding, and scratching. In the case of the kazoo, the grainy or raspy

quality is likely a consequence of the rapid amplitude modulations associated with the vibrating

tissue paper.

The raspy/grainy qualia dimension also implies links to a number of acoustic phenomena

in vocal production, including pathological and non-pathological aspects of voice conditions.

Poyatos (1991) catalogued 40 unusual vocal qualities from adenoidal voice to velarized voice.

However, raspy and grainy are not among the voice qualities that he identified. Nevertheless,

Poyatos includes voice qualities that are commonly regarded as synonyms, especially among

linguists. These include creaky voice or vocal fry (Keating, Garellek, & Kreiman, 2015).

In the case of “guttural,” linguists identify a number of “guttural phonemes”—vocal

sounds that are mostly associated with Semitic languages like Arabic, Assyrian, and Hebrew.

Guttural consonants involve constrictions or closures in the lower vocal tract, such as at the root
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 52

of the tongue, with the velum, or via the glottis. These consonants combine low frequency

resonances with temporal interruptions, such as the glottal stop (Goldstein, 1994).

By way of summary, sounds that are characterized as raspy/grainy appear to be

associated with temporal roughness and/or low resonance associated with guttural phonemes.

Other acoustical features

Hollow.

Several qualia dimensions in the final model are suggestive of unique resonant features. The

dimension hollow is a notably clear example. A space is described as “hollow” when it is empty.

Spaces that are occupied with various objects lead to high acoustic dispersion and greater energy

absorption with correspondingly less prominent resonances. A major acoustical difference

between hollow and non-hollow cavities is filter Q. In general, low energy absorption produces

resonances whose filter shapes have steep slopes (high Q). By contrast, high energy absorption is

associated with less steep filter slopes (low Q) (Pyzdek, 2015; Chowning, 1973).

For very large spaces, the main effect of a “hollow” (or empty) environment is audible

echoes, but for small spaces (like a Chinese wood block), the effect will be high filter Q.

Evidently, listeners hear the high Q as symptomatic of an unoccupied space and so tend to

describe the sound using a term such as “hollow.”

The timbre of the clarinet sound, which is sometimes described as “hollow,” has been

linked to a concentration of spectral energy near odd harmonics, or the odd-to-even harmonic

energy ratio (Caetano et al., 2019). While the B-flat clarinet was not rated in the current study,

the bass clarinet was ranked fourth on hollow with a rating of 3.71.

Muted/Veiled.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 53

The dimension muted/veiled similarly suggests a frequency-domain filtering effect—in this case,

the effect of a low-pass filter. Merely placing your hand in front of your mouth while speaking

will have the effect of attenuating high frequency partials, producing a “veiled” sonic

impression. The mutes used in brass and string instruments reduce the overall intensity of the

sound, but also attenuate the low frequency harmonics. Instrument mutes also tend to introduce a

resonance, often in the region of 1-3 kHz, corresponding to the region of many speech

formants—consequently rendering the timbre slightly more voice-like in character (Yoshikawa

& Nobara, 2017).

Open.

The term open is commonly used in the musical training of both vocalists and instrumentalists.

The term implies something unobstructed, unconstrained, spacious, or free. In linguistic

phonetics, the term “open” has a quite specific meaning. It is used to refer to any articulation in

which the mouth opening is wide—as when the chin is lowered. Slawson (1981) offers a more

general interpretation, suggesting that openness refers to those sounds produced by any relatively

wide tube or a tube that has no significant narrowing. Slawson notes that non-open sounds are

produced by tubes that have at least one cross-sectional narrowing. In speech, the vowel [u] (as

in “food”) involves narrowing between the lips; the vowel [i] (as in “feed”) involves narrowing

between the tongue and the hard palette; open vowels, like [aw], [a], and [ae] exhibit no

significant narrowings (Slawson, 1981; 135–136). Acoustically, Traunmüller (1981) found that

open speech sounds exhibit a higher first formant (F1) relative to the fundamental frequency

(F0). That is, openness is apparent when there is a relatively large gap between the pitch of a

sound and its first higher resonance.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 54

It is open to debate whether what our participants meant by “open” is the same as what

speech researchers mean by open. However, wind instrumentalists are often taught to mimic

open vowels sounds as they play, which suggests a potential relationship between open vowels

and timbral openness. Furthermore, the relationship of the term “open” as applied to timbre and

vocality has been discussed in previous research. Traube, in a study of classical guitar timbre

(2004), notes that words such as “open” seem to refer to phonetic gestures. She details

similarities between speech and timbre perception, proposing a phonetic mode of timbre

perception. A related experiment in which participants associated consonants and vowels to

attacks and releases, respectively, of guitar tones, provided support for this phonetic mode of

timbre perception.

Resonant/vibrant.

The term “resonant” has a specific acoustic interpretation that may or may not accurately reflect

the meaning of this word as used by our participants. In acoustics, a vibrator is said to be

resonant in situations of especially efficient acoustic transduction: that is, where a small amount

of input mechanical energy generates a large amount of sound energy. This can occur, for

example, when a sound is produced in a highly reverberant room. It also occurs when a sound

continues long after being imparted with some mechanical energy, such as a bell continuing to

sound well after it has been struck. Due to the efficiency of the sound production, resonant

sounds are typically also loud. The joint qualifier “vibrant” seems to add an intentional,

deliberate, or willful character to the dimension. Dictionary definitions of “vibrant” include such

synonyms as energetic, enthusiastic, strong, full, and rich.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 55

Nasal, reedy, buzzy, pinched, constrained.

The nasal/reedy dimension is also characterized by the terms as pinched, buzzy, and constrained.

In speech, most vowels and consonants are produced with tight velopharyngeal closure, meaning

that there is little or no airflow through the nose. In many languages, exceptions are found in the

nasal consonants: [m], [n], and [ŋ]. Of these sounds, the m sound exhibits a “humming” quality,

whereas the [n] and [ŋ] sounds are more clearly “nasal” sounding. The [m] sound is produced

with closed lips, and thus sound is emitted only through the nose. (Ironically, this is the least

nasal sounding of the nasal consonants.) For [n] and [ŋ], the resulting sound issues from both the

mouth and the nose (Thompson, 1978).

Two identical (or near identical) sound sources lead to a pattern of constructive and

destructive interference that produces a spectrum characterized by alternating nodes and

antinodes. The result is a comb-shaped spectrum that tends to be perceived as “nasality.”

Although the acoustics is not clear, it may be that the nasality associated with the oboe, bassoon,

and bagpipes arises from the use of double reeds, where two coupled vibrators produce a comb

spectrum. This speculative account might also explain why single-reed instruments such as the

clarinet and saxophone exhibit lower nasality ratings.

Especially when played at high intensity, muted brass instruments can also produce a

raspy, piercing, or nasal quality (Smith, 1980). As noted earlier, mutes act as high-pass filters,

attenuating low frequency components and effectively creating a brighter sound (Backus, 1976).

More importantly, mutes produce a series of prominent minima (Causse & Sluchin, 1982). That

is, mutes superimpose a comb-like spectrum akin to those characteristic of “nasal” sounds. The

acoustical effect is stronger for metal mutes than for cardboard mutes due to the higher reflection

and consequently higher filter Q.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 56

Pinched or constrained sounds appear to be related to the relative absence of low

frequency content. The high-pass effect of mutes probably contributes to this pinched or

constrained quality. Compared with other instruments, the oboe exhibits relatively intense higher

partials—a high spectral centroid that likely contributes to the pinched sound. The choice of the

bass drum as the polar opposite instrument reinforces the suggestion that the absence of low

frequencies contributes to the pinched or constrained qualia.

The distinctive “reedy” sound is something of an enigma. Despite decades of research,

the acoustics of reed instruments remains obscure. Much of the difficulty lies in the nonlinear

behavior of the reed as well as the complex geometry of single-reed mouthpieces (Almeida,

George, Smith, & Wolfe, 2013; Chatziioannou & van Walstijn, 2012; Dalmont, Gilbert, &

Ollivier, 2003; Fabre, Gilbert, Hirschberg, & Pelorson, 2012).

Watery/Fluid.

Of all of the qualia dimensions in the 20-dimensional model, watery/fluid would seem to be the

most metaphorical. The term “watery” seems quite concrete—implying a liquid, wet, or

potentially teary state. Yet, with the exception of the orchestral “bird whistle,” no common

musical instrument makes use of water or fluid in its sound production. Instead, “watery” is more

likely to evoke the image of a river or stream. Flowing water is often quite quiet; at the same

time, waterfalls, fountains, or streams generate characteristic sounds arising from the frequent

bursting of bubbles.

The instrument with the highest average watery/fluid rating is the harp. Harp arpeggios

and glissandi, in particular, seem most acoustically analogous to the sequential percussive

bursting of bubbles. Although the solo harp literature is relatively small, the repertoire curiously
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 57

includes disproportionately many water-themed works. A sample includes Samuel Pratt’s Little

Fountain, John Charles Thomas’s Echoes of a Waterfall, Anthony Sidney’s From a Chinese

Waterfall, Jean-Michel Damase’s Pluie [Rain], Charles Oberthür’s Au bord de la mer [By the

Sea], Ian Hovhaness’s The World Beneath the Sea, and many others.

In comparison to the word “watery,” the term “fluid” is perhaps more abstract.

Definitions for “fluid” include synonyms such as flowing, flexible, effortless, graceful, and

elegant. In the case of music, the term fluid seems more closely tied to performance manner

rather than some static spectral feature per se. A fluid performance is likely fast-paced, yet

smooth, and relaxed or effortless—like a smoothly flowing river.

Cultural and historical influences on timbre semantics

In the above discussion, we have focused on possible acoustic origins of the timbre descriptors in

our final model. However, it should also be acknowledged that cultural and historical factors

have likely contributed to shaping the timbre vocabulary observed throughout these studies,

especially those terms that are not predominantly acoustic. In particular, such cultural factors

likely have a strong influence on the ways in which specific instruments are characterized. For

example, the harp, which has a strong Western cultural association with the image of an angel

playing atop a fluffy cloud, may be characterized as pure/clear both because of the harmonicity

of the sound itself, but also because of the association with the purity of heaven and angels. As is

often the case with understanding the development of meaning, however, we might ask to what

extent the harmonicity of the sound itself supported and encouraged the development of such an

association. Furthermore, while the harp is rated as most “pure” in our study, several other
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 58

instruments rated nearly as highly on “pure” do not have such explicitly ethereal connotations,

including piano, vibraphone, triangle, and French horn.

In the vocabulary of the final model writ large, we can identify potential cultural

influence manifesting in dimensions including soft/singing, direct/projecting, pure/clear, and

watery/fluid. Specifically, the soft/singing dimension carries stereotypical Western “feminine”

overtones, with the inclusion of terms such as “sweet,” “gentle,” and “calm.” On the other hand,

the direct/loud dimension resonates with Western stereotypes of masculinity, including terms

such as “aggressive,” “commanding,” “assertive,” and “powerful.” Common gendered

perceptions of musical instruments are likely related to instrument timbre description. As

previously mentioned, the pure/clear descriptions may be related through association to an ideal

of moral purity, as evidenced by the symbolic import of the harp in Christian cultural history.

The repertoire of water-themed works for harp suggests a cultural association between the harp

and water. However, it is also notable that relatively few of the final dimensions carry obvious

cultural overtones: the majority of terms, like ringing/long decay and resonant/vibrant seem to

relate more to acoustic or physical properties.

On the other hand, the earlier collection of 77 categories derived from the interviews

contains more terms likely to be reflective of cultural values and/or expressions, such as

“beautiful,” “sad/melancholy,” “heroic/noble,” and “dramatic/expressive.” The category “folk-

like/pastoral” provides an excellent example: for Western-enculturated listeners, these terms

largely derive from a Western classical—or more specifically, Romantic—musical mindset,

suggesting instrument timbres that evoke a certain culture or landscape. We can imagine that

these qualia would not be evoked for individuals from other cultures.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 59

Both in the context of the current study and more generally, more research is needed to

develop an understanding of which aspects of timbre semantics are more or less culturally

determined. Future cross-cultural and cross-linguistic work will be especially valuable to

advancing our knowledge in this area.

Whence Timbre

Where does the experience of timbre come from? What is the function of timbre? In addressing

this question, a number of possible accounts may be entertained. For example, one or more

timbre categories or dimensions might reflect innate evolved biological functions, represent

social constructions that arise through enculturation, and/or emerge from individual perceptual

learning apart from socialization.

First, we might consider timbre from an evolutionary perspective. Three other

perceptually salient aspects of hearing—localization, loudness, and pitch—seem stable and

useful sensory experiences. Localization provides crucial information about the position of sound

sources, including potential predator and prey (Heffner & Heffner, 1992). Loudness conveys

useful information regarding energy, power, and proximity of sound sources (Huron, Kinney, &

Precoda, 2006). Pitch is an integral aspect of parsing multi-source scenes into distinct acoustical

sources (Bregman, 1994). Plausible arguments might be advanced in support of the idea that

these three auditory phenomena represent adaptive traits that arose through evolution by natural

selection.

Might the timbre categories or dimensions encountered in the current study be susceptible

to similar evolutionary accounts? A handful of qualia dimensions offer potentially suitable

candidates. For example, “rumbling,” “low,” “shrill,” or “noisy” qualities might be linked to
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 60

some survival function, as they can provide ethological cues as to the size and energy of a

potentially threatening sound source.

Less convincingly, it is possible that recognizing “nasal” or “breathy” sounds might

confer some adaptive value such as alerting a listener to possible infectious disease; indeed, we

can often tell by a person’s voice whether they have a cold or sinus infection. However, if the

recognition of nasal timbres was intended to help avoid potential disease, one might expect

qualia categories such as “sneezy,” “wretching,” or “phlegmatic” to be salient.

On the one hand, timbre-based sound source identification is plausibly important for

survival, helping to identify threats and friendly overtures. However, most of the dimensions in

our model do not appear to have an obvious evolutionary rationale. What is the survival value,

for example, of the ability to distinguish the sounds of wood and metal when they are struck?

Qualities like reedy, brassy, metallic, woody, percussive, ringing, sparkling, brilliant, resonant,

vibrant, open, focused, compact, and hollow seem poorly linked to survival. Most of the qualia

categories encountered in this study seem far from having any biologically necessary reason for

their existence.

In the case of social constructions, there are a number of timbre descriptions that have

clear cultural links. Earlier, we mentioned the association of the harp with purity as a possible

consequence of the cultural association of harps with celestial angels and a heavenly afterlife in

Western culture. Other examples include the association of the flute with pastoral settings, the

association of the trumpet with royalty or nobility, the association of the horn with hunting, or

the military connotations of the snare drum. While our interview participants did indeed mention

such associations in the open-ended descriptive task, and some of these associations were
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 61

included in the 77 initial timbre qualia categories, they were not sufficiently robust for inclusion

in the 20-dimensional model.

Timbre dimensions appear to be largely driven by the simple presence of discriminable

acoustic features. For example, vibrating metal plates generate different spectral patterns

depending on whether the plate is round, triangular, or rectangular. Kunkler-Peck and Turvey

(2000) showed that listeners can discriminate among these different geometries by sound alone.

However, no one has suggested how distinguishing round, triangular, or rectangular steel plates

might enhance survival. When contrasted with listener inability to discriminate metal, wood, or

plastic wind instruments, the research suggests that timbre categories arise not from specific

adaptations, but from the availability of discriminable acoustic features.

This “generalized learning” account gains credence when one considers research in the

field of ecological acoustics. A telling example is evident in the work of Li, Logan, and Pastore

(1991) on walking sounds. Li et al. observed that listeners are generally skilled in deciphering

whether a person is male or female on the basis of the sound of their footsteps. Using

standardized footwear, they showed that the key indicator is the time delay between the sound of

the heel contact and the sound of the sole slap. For a given gait, male walkers tend to have long

time delays between these two sounds—a consequence of the fact that men generally exhibit

disproportionately longer feet than women. Although the motivation to decipher sex may be

innate, the specific mechanism discovered by Li et al. cannot be innate since the key acoustic

feature identified in their work can be heard only when individuals wear shoes and walk on hard

surfaces—conditions that would not have existed in the long period of human adaptiveness.

Experience has a demonstrable effect on timbre perception. For example, some listeners

might not be able to recognize the sound of an English horn as distinct from an oboe. But
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 62

professional oboe players not only make this distinction, they are readily able to distinguish

American and European oboe timbres and can detect subtle differences in timbre that result from

miniscule adjustments to a reed. As the ability to make and describe finer discriminations of

timbre appears to be related to exposure, more exposure and close attention to a certain range of

timbres might lead to the development of a more refined vocabulary for those timbres. We would

thus expect those individuals who have the greatest experience with a range of musical sounds to

develop more refined musical timbre vocabularies.

While we have proposed several plausible connections between timbre qualia and

acoustical features, such connections are currently largely theoretical, and future research is

called for to provide empirical evidence supporting such connections. However, if our account of

the timbre qualia dimensions and their connections to acoustical features is accurate, it is striking

that the dimensions that emerged in this study seem to have emerged from specific acoustical

patterns such as comb spectra, frequency-shifts due to changing stiffness, high filter Q,

regeneration, envelope features, or other acoustical patterns that permit general learning across a

class of sound sources. In short, the dimensions that emerge in this study are less suggestive of

evolved adaptations or socially constructed categories, and more suggestive of a general capacity

for auditory learning based on acoustical patterns encountered in the sonic environment.

It bears acknowledging that learning itself is an adaptive trait. When an environment is

highly stable over hundreds of thousands of years, natural selection can favor the development of

innate sensory and perceptual mechanisms that aid survival. However, when an environment is

highly variable, survival is better ensured through an individual’s capacity to learn rather than

innate or fixed response behaviors. In variable conditions, learning itself becomes a powerful

adaptive trait. Timbre has long been recognized as the “grab-bag” of auditory phenomena
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 63

(Bregman, 1990; Siedenburg & McAdams, 2017). If, as suggested here, timbre qualia traits tend

to reflect the idiosyncrasies of real-world acoustical features, then the apparent haphazard

assortment of timbre categories or dimensions might seem less enigmatic or peculiar.

It is possible that the plasticity of human timbre perception originates in our capacity for

learning speech sounds. Each generation must learn spoken language anew—a process requiring

timbre discrimination. Since the sounds used in language can vary considerably, the human

auditory system may be optimized for learning whatever acoustical features are salient in some

environment.

If timbre categories are learned from experience, there may be no fixed or optimum

number of timbre categories. This points to the possibility of a timbre-category hierarchy.

Depending on the context in which a timbre model is being applied, one may want either a more

refined or more broad timbre model. The 77 categories arising from our initial content analysis

offer a finer level of detail and explain more variance than the 20 categories in our distilled

model. On the other hand, an MDS study that distills timbre to two dimensions of “brightness”

and “percussiveness” offers an even more streamlined classification scheme than a five-

dimensional or 20-dimensional model, trading variance explained for succinctness.

Conclusion

The research described in this project was motivated by the question: what are the

phenomenological experiences associated with different musical timbres? Our aim was to

construct a model of the cognitive linguistic dimensions of Western musical instrument timbre

qualia that might ultimately prove useful in musical analysis, especially the analysis of classical

instrumental music. The motivating question was approached via two principal studies. In the

first study, open-ended interviews were conducted with musicians who were asked to describe a
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 64

variety of instrument sounds. The interview transcripts were subject to content analysis, leading

to 77 descriptive categories for timbre. In a subsequent study, musically experienced listeners

rated how appropriate each of the 77 descriptive categories was for characterizing a variety of

Western instrument sounds. Principal component analyses were then conducted on these results

in order to reduce the number of dimensions by collapsing those categories exhibiting high

shared variance. In two supplementary studies, musician participants rated the relevance of

various components. The results of several PCA analyses and the results of the follow-up rating

studies were combined to produce a model consisting of 20 dimensions.

In examining the 20 dimensions of the final model, we identified possible sources for the

qualia characterizations. In considering possible origins for the qualia dimensions, it was noted

that many of the dimensions can be related to distinctive acoustic patterns, suggesting that timbre

categories may originate in the availability of discriminable acoustic features rather than from

some innate perceptual dispositions that reflect possible evolutionary origins.

It is appropriate to bear in mind a number of methodological and conceptual caveats

when considering this research. First, all of the research was conducted in English. Possible

pertinent descriptive categories common in languages other than English may not be represented.

Secondly, the research relied entirely on musicians and musically sophisticated listeners, whose

descriptions and experiences may not be representative of the general listening population. A

further caveat is that the research relied exclusively on imagined rather than heard sounds and

was focused on the description of prototypical rather than specific or instantiated sounds.

Consequently, our results describe cognitive representations of timbre, which may or may not be

applicable to the perceptual dimensions of timbre. More research is needed to determine how

these cognitive dimensions, derived from imagined sounds, relate to the perception of heard
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 65

timbre. Moreover, the imagined sounds were limited to a subset of 20, mostly Western classical

instruments, and not all instruments were equally familiar to the participants. Next, given the

open-ended nature of the descriptive task, participants used descriptive terms that may relate to

pitch and loudness, which have traditionally been definitionally separate from timbre. In fact,

two of the final dimensions, including direct/loud and rumbling/low include loudness- and pitch-

related terms, respectively. We chose not to eliminate such terms from our analyses, as they

represented features of the imagined sounds that were important to participants. However, this

feature of our model is important to keep in mind when comparing our results with other studies

that may take a stricter definition of timbre. During the initial rating task (Study 2), participants

were asked to rate each of their imagined instrument sounds on 77 dimensions; the length of this

list has potential negative implications for participant attention and concentration, and it is

possible that participants may have lost track of which instrument they were rating. Finally, it

must be acknowledged that some of the methodologies used, namely the content analysis and

PCA analyses, involved interpretive decisions by the researchers and are thus open to researcher

bias.

The results of the initial interviews confirm that the ways in which we describe timbre are

complex and rich, with musicians in the study making creative use not only of adjectives, but of

metaphors, sound comparisons, and onomatopoeia. Our proposed 20-dimensional model is

necessarily a reduction and may still represent a considerably impoverished characterization of

the world of timbre perception.

We offer the 20-dimensional model as a compromise between semantic richness and

practicality, but it is possible that in some contexts, the intermediate 77 categories resulting from

the initial content analysis may prove more useful. For example, a single timbre descriptor (such
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 66

as “ping/ding/ting”) may exhibit little utility for distinguishing certain instruments, such as

distinguishing the oboe from the trombone, but nevertheless have value in discriminating just

one or two sounds from many others, such as distinguishing the triangle from all other

instruments. Though evident in the interview data and the 77-category model, these useful but

narrowly specific descriptive categories are largely lost in the process of dimensionality

reduction, due to their inability to account for much general variance.

MDS is highly useful when the number of dimensions is small. However, when the

number of dimensions is high, MDS becomes impractical due to the high number of required

paired comparisons. Nevertheless, MDS may prove useful in confirmatory studies of models

involving a high number of dimensions. A high-dimensional model may itself be used to select a

dramatically reduced set of paired comparisons, increasing the tractability of gathering the

perceptual data, while still allowing the potential for a high number of dimensions to emerge in

an MDS solution.

This paper has emphasized a cognitive linguistic approach to timbre by examining the

natural language of musicians’ real-time, conversational descriptions of timbre qualia. Data from

the rating study demonstrate consistencies in how timbre language is applied to the sounds of

musical instruments. These results imply the existence of a large number of timbre categories,

suggesting that future timbre research may need to consider further creative methodological

approaches that allow for models entailing many more qualities or dimensions. Finally, the

current study sets the stage for future research aimed at integrating linguistic characterizations of

timbre with the analysis of music.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 67

References

Almeida, A., George, D., Smith, J., & Wolfe, J. (2013). The clarinet: how blowing pressure, lip
force, lip position and reed “hardness” affect pitch, sound level, and spectrum. Journal of
the Acoustical Society of America, 134(3), 2247–2255.

Arthur, C. (2006). When the Leading Tone Doesn’t Lead: Musical Qualia in Context. Doctoral
dissertation, The Ohio State University.

Backus, J. (1976). Input impedance curves for the brass instruments. The Journal of the
Acoustical Society of America, 60(2), 470–480.

Blumstein, D. T., Davitian, R., & Kaye, P. D. (2010). Do film soundtracks contain nonlinear
analogues to influence emotion? Biology Letters, 6(6), 751–754.

Bregman, A. S. (1994). Auditory scene analysis: The perceptual organization of sound. MIT
press.

Caetano M., Saitis C., Siedenburg K. (2019) Audio Content Descriptors of Timbre. In:
Siedenburg K., Saitis C., McAdams S., Popper A., Fay R. (eds) Timbre: Acoustics,
Perception, and Cognition. Springer Handbook of Auditory Research, vol 69. Springer
International Publishing.

Casado, S. (2017). Studying friction while playing the violin: exploring the stick–slip
phenomenon. Beilstein Journal of Nanotechnology, 8(1), 159-166.

Causse, R., & Sluchin, B. (1982). Mutes of brass instruments—Experiments and


calculations. Journal of the Acoustical Society of America, 71(S1), S91–S92.

Chatziioannou, V., & van Walstijn, M. (2012). Estimation of clarinet reed parameters by inverse
modelling. Acta Acustica united with Acustica, 98(4), 629–639.

Chowning, J. M. (1973). The synthesis of complex audio spectra by means of frequency


modulation. Journal of the Audio Engineering Society, 21(7), 526–534.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 68

Dalmont, J.P., Gilbert, J., & Ollivier, S. (2003). Nonlinear characteristics of single-reed
instruments: Quasistatic volume flow and reed opening measurements. Journal of the
Acoustical Society of America, 114(4), 2253–2262.

de Munck, V. C. (2009). Research design and methods for studying cultures. Rowman Altamira.

Elliott, S. J., & Bowsher, J. M. (1982). Regeneration in brass wind instruments. Journal of Sound
and Vibration, 83(2), 181–217.

Elliott, T. M., Hamilton, L. S., & Theunissen, F. E. (2013). Acoustic structure of the five
perceptual dimensions of timbre in orchestral instrument tones. The Journal of the
Acoustical Society of America, 133(1), 389–404. https://doi.org/10.1121/1.4770244

Fabre, B., Gilbert, J., Hirschberg, A., & Pelorson, X. (2012). Aeroacoustics of musical
instruments. Annual Review of Fluid Mechanics, 44, 1–25.

Faure, A., McAdams, S, & Nosulenko, V. (1996). Verbal correlates of perceptual dimensions of
timbre. In 4th International Conference on Music Perception and Cognition, Montréal,
Canada (pp. 79–84).

Fernandes, J., Teixeira, F., Guedes, V., Junior, A., & Teixeira, J. P. (2018). Harmonic to Noise
Ratio Measurement-Selection of Window and Length. Procedia Computer Science, 138,
280–285.

Fletcher, N. H., Perrin, R., & Legge, K. A. (1989). Nonlinearity and chaos in
acoustics. Acoustics Australia, 18(1), 9–13.

Freour, V., & Scavone, G. (2012). Investigation of the effect of upstream airways impedance on
regeneration of lip oscillations in trombone performance. Proceedings of the Acoustics
2012 Nantes Conference. Nantes, France.

Goldstein, L. (1994). Possible articulatory bases for the class of guttural consonants.
Phonological structure and phonetic form. Papers in Laboratory Phonology, 3, 234–241.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 69

Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. The Journal of the
Acoustical Society of America, 61(5), 1270–1277.

Grillo E.U., Verdolini K. (2008). Evidence for distinguishing pressed, normal, resonant, and
breathy voice qualities by laryngeal resistance and vocal efficiency in vocally trained
subjects. J Voice, 22, 546–552.

Hailstone, J. C., Omar, R., Henley, S. M., Frost, C., Kenward, M. G., & Warren, J. D. (2009). It's
not what you play, it's how you play it: Timbre affects perception of emotion in music.
The Quarterly Journal of Experimental Psychology, 62(11), 2141–2155.

Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural
correlates of perceived and imagined musical timbre. Neuropsychologia, 42(9), 1281–
1292. https://doi.org/10.1016/j.neuropsychologia.2003.12.017

Heffner, R. S., & Heffner, H. E. (1992). Evolution of sound localization in mammals. In The
evolutionary biology of hearing (pp. 691–715). Springer, New York, NY.

Hinton, L., Nichols, J., & Ohala, J. J. (Eds.) (1994). Sound symbolism. Cambridge University
Press.

Hirschberg, A., Gilbert, J., Msallam, R., & Wijnands, A. P. J. (1996). Shock waves in
trombones. The Journal of the Acoustical Society of America, 99(3), 1754–1758.

Huron, D. (2006) Sweet Anticipation. MIT Press.

Huron, D. (2016). Voice leading: The science behind a musical art. MIT Press.

Jolliffe, I.T. (2002). Principal Components Analysis, Second Edition. Springer: New York.

Keating, P.A., Garellek, M., & Kreiman, J. (2015). Acoustic properties of different kinds of
creaky voice. Proceedings of the 18th International Congress of Phonetic Sciences.
Glasgow, UK: The Scottish Consortium for ICPhS 2015, University of Glasgow.

Kendall, R. A., & Carterette, E. C. (1991). Perceptual scaling of simultaneous wind instrument
timbres. Music Perception: An Interdisciplinary Journal, 8(4), 369–404.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 70

Kendall, R. A., & Carterette, E. C. (1993a). Verbal attributes of simultaneous wind instrument
timbres: I. von Bismarck's adjectives. Music Perception: An Interdisciplinary Journal,
10(4), 445–467.

Kendall, R. A., & Carterette, E. C. (1993b). Verbal Attributes of Simultaneous Wind Instrument
Timbres: II. Adjectives Induced from Piston’s “Orchestration.” Music Perception: An
Interdisciplinary Journal, 10(4), 469-501.

Kendall, R. A., Carterette, E. C., & Hajda, J. M. (1999). Perceptual and acoustical features of
natural and synthetic orchestral instrument tones. Music Perception: An Interdisciplinary
Journal, 16(3), 327–363.

Kunkler-Peck, A. J., & Turvey, M. T. (2000). Hearing shape. Journal of Experimental


Psychology: Human Perception and Performance, 26(1), 279–294.

Li, X., Logan, R. J., & Pastore, R. E. (1991). Perception of acoustic source characteristics:
Walking sounds. The Journal of the Acoustical Society of America, 90(6), 3036–3049.

McAdams, S. (2019). The Perceptual Representation of Timbre. In Siedenburg, K., Saitis,


C., McAdams, S., Popper, A.N., Fay, R.R. (eds.), Timbre: Acoustics, Perception,
Cognition (pp. 23–57). Springer International Publishing.

McAdams, S., Winsberg, S., Donnadieu, S., De Soete, G., & Krimphoff, J. (1995). “Perceptual
scaling of synthesized musical timbres: Common dimensions, specificities, and latent
subject classes.” Psychological research 58.3, 177–192.

Morton, E.S. (1977). On the occurrence and significance of motivation-structural rules in some
bird and mammal sounds. American Naturalist, 111(981), 855–869.

Morton, E.S. (1994). Sound symbolism and its role in non-human vertebrate communication. In
L. Hinton, J. Nichols & J. Ohala (eds.), Sound Symbolism, Cambridge: Cambridge
University Press, pp. 348–365.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 71

Myers, A., Pyle Jr., R.W., Gilbert, J., Campbell, D.M., Chick, J.P., & Logie, S. (2012). Effects of
nonlinear sound propagation on the characteristic timbres of brass instruments. Journal of
the Acoustical Society of America, 131(1), 678–688.

Nykänen, A., Johansson, Ö., Lundberg, J., & Berg, J. (2009). Modelling perceptual dimensions
of saxophone sounds. Acta Acustica United with Acustica, 95(3), 539–549.

Ollen, J. E. (2006). A criterion-related validity test of selected indicators of musical


sophistication using expert ratings. Doctoral dissertation, The Ohio State University.

Parncutt, R. Harmony: A Psychoacoustical Approach. Berlin: Springer-Verlag, 1989.

Plomp, R. (1970) Timbre as a multidimensional attribute of complex tones. In Plomp, R. &


Smoorenburg, G.F. (Eds.) Frequency analysis and periodicity detection in hearing.
Sijthoff, Leiden, 397–410.

Poyatos, F. (1991). Paralinguistic qualifiers: Our many voices. Language &


Communication, 11(3), 181–195.

Pratt, R. L., & Doak, P. E. (1976). A subjective rating scale for timbre. Journal of Sound and
Vibration, 45(3), 317–328.

Pyzdek, A. (2015). “The World Through Sound: Resonance.” Acoustics Today.


https://acousticstoday.org/8-the-world-through-sound-resonance/.

Qi, Y., & Hillman, R. E. (1997). Temporal and spectral estimations of harmonics-to-noise ratio
in human voice signals. The Journal of the Acoustical Society of America, 102(1), 537–
543.

Saitis, C. & Weinzierl, S. (2019). The Semantics of Timbre. In Siedenburg, K., Saitis,
C., McAdams, S., Popper, A.N., Fay, R.R. (eds.), Timbre: Acoustics, Perception,
Cognition (pp. 119–149). Springer International Publishing.

Siedenburg, K., & McAdams, S. (2017). Four Distinctions for the Auditory “Wastebasket” of
Timbre. Frontiers in psychology, 8, 1747.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 72

Slawson, W. (1981). The color of sound: a theoretical study in musical timbre. Music Theory
Spectrum, 3, 132–141.

Smith, N. (1980). The Horn Mute: An Acoustical and Historical Study. DMA dissertation,
Eastman School of Music, University of Rochester.

Sottek, R. (2014). Progress in calculating tonality of technical sounds. In INTER-NOISE and


NOISE-CON Congress and Conference Proceedings (Vol. 249, No. 4, pp. 3319–3327).
Institute of Noise Control Engineering.

Sottek, R., Kamp, F., & Fiebig, A. (2013). A new hearing model approach to tonality. In INTER-
NOISE and NOISE-CON Congress and Conference Proceedings, Innsbruck.

Terhardt, E., Stoll, G., & Seewann, M. (1982a). Pitch of complex signals according to virtual-
pitch theory: Tests, examples, and predictions. The Journal of the Acoustical Society of
America, 71(3), 671–678.

Terhardt, E., Stoll, G., & Seewann, M. (1982b). Algorithm for extraction of pitch and pitch
salience from complex tonal signals. The Journal of the Acoustical Society of
America, 71(3), 679–688.

Thompson, A. E. (1978). Nasal air flow during normal speech production. Master’s thesis,
Department of Speech and Hearing Sciences, University of Arizona.

Traunmüller, H. (1981). Perceptual dimension of openness in vowels. The Journal of the


Acoustical Society of America, 69(5), 1465–1475.

Tużnik, P., Augustynowicz, P., & Francuz, P. (2018). Electrophysiological correlates of timbre
imagery and perception. International Journal of Psychophysiology, 129, 9–17.
https://doi.org/10.1016/j.ijpsycho.2018.05.004

von Bismarck, G. (1974). Timbre of steady sounds: A factorial investigation of its verbal
attributes. Acta Acustica united with Acustica, 30(3), 146–159.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 73

von Helmholtz, Hermann. (1885). On the sensations of tone as a physiological basis for the
theory of music, 2nd English edition. Translated by Alexander J. Ellis. London:
Longmans, Green, and co.

Wallmark, Z. (2019a). A corpus analysis of timbre semantics in orchestration treatises.


Psychology of Music, 47(4), 585–605.

Wallmark, Z. (2019b). Semantic Crosstalk in Timbre Perception. Music & Science, 2, 1–18.

Wallmark, Z., Iacoboni, M., Deblieck, C., & Kendall, R. A. (2018). Embodied listening and
timbre: Perceptual, acoustical, and neural correlates. Music Perception: An
Interdisciplinary Journal, 35(3), 332–363.

Wallmark, Z., & Kendall, R. A. (2018). Describing sound: The cognitive linguistics of timbre.
The Oxford handbook of timbre. New York, NY: Oxford University Press. http://dx. doi.
org/10.1093/oxfordhb/9780190637224.013, 14.

Wayland, R., Gargash, S., & Longman, A. (1995). Acoustic and perceptual investigation of
breathy voice. The Journal of the Acoustical Society of America, 97(5), 3364–3364.

Wessel, D.L. (1973) Psychoacoustics and music: a report from Michigan State University.
PACE: Bulletin of the Computer Arts Society 30, 1–2.

Yoshikawa S., Nobara Y. (2017) Acoustical Modeling of Mutes for Brass Instruments. In:
Schneider A. (eds.) Studies in Musical Acoustics and Psychoacoustics. Current Research
in Systematic Musicology, vol 4. Springer.

Zacharakis, A., Pastiadis, K., & Reiss, J. D. (2014). An Interlanguage Study of Musical Timbre
Semantic Dimensions and Their Acoustic Correlates. Music Perception: An
Interdisciplinary Journal, 31(4), 339–358. https://doi.org/10.1525/mp.2014.31.4.339

Zacharakis, A., Pastiadis, K., & Reiss, J. D. (2015). An Interlanguage Unification of Musical
Timbre: Bridging Semantic, Perceptual, and Acoustic Dimensions. Music Perception: An
Interdisciplinary Journal, 32(4), 394–412.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 74

Zhang, J. D., & Schubert, E. (2019). A Single Item Measure for Identifying Musician and
Nonmusician Categories Based on Measures of Musical Sophistication. Music
Perception: An Interdisciplinary Journal, 36(5), 457–467.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 75

Appendix A, Participant Data.

A.1. Preliminary Study

The average age of the 17 participants was 28.5 years (range 19–42, SD=6.3). As a group, the

participants reported an average of 17.1 years of regular music practice (range 7–30, SD=5.2).

Participants listed 11 different instruments as primary instruments, including flute, oboe, clarinet,

bassoon, French horn, trombone, percussion, piano, violin, viola, and cello. Five of these

participants identified primarily as composers, and two of those five identified secondarily as

conductors.

A.2. Study 1: Interviews

In recruiting participants for our interview study, we concentrated our efforts on professional

orchestral musicians, conductors, and composers. The average age of the 23 participants was

32.8 (range 19–69, SD=13.5). As a group, the 23 participants reported an average of 21.1 years

of regular musical practice (range 8–59, SD=12.5) and 18.9 years of large ensemble experience

(range 7–50, SD=11.9). The principal instrument for these musicians included flute, oboe, alto

saxophone, trumpet, violin, viola, cello, double bass, guitar, and percussion. Five participants

self-identified primarily as composers and four others self-identified primarily as conductors.

Participants were also asked to report secondary instruments on which they had practiced

regularly at any time and on which they considered themselves to be proficient. Each musician

reported at least one additional instrument. In total, 30 unique primary and secondary

instruments were reported.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 76

A.3. Study 2: Rating Task

As in Study 1, recruitment for Study 2 focused on professional orchestral musicians, conductors,

and composers. Participants (n = 460) were recruited in two ways. First, participants were

recruited via the Internet (n = 399) using email listservs and social media. This subset of

participants took the study in a self-determined location. Second, participants were recruited

from the Ohio State University School of Music subject pool (n = 61). Subject pool participants

were second year undergraduate music students and were tested individually in an Industrial

Acoustic Corporation sound attenuation room. All participants took part in the study using the

same Qualtrics survey.

The average age of the 460 participants was 26.1 years (range 18–69, SD=9.3). As a

group, the participants reported an average of 13.4 years of regular music practice (median=10.5,

range 1–60, SD=9.7). Recall that participants identified their musical backgrounds as one of six

possibilities according to the Ollen single-question index of musical sophistication. The

distribution of self-identified sophistication categories is reported in Table X.

Table A.3.1. Self-reported musical identity of participants

Ollen self-reported musical identity # participants


Non-musicians 0
Music-loving non-musicians 6
Amateur musicians 81
Serious amateur musicians 153
Semi-professional musicians 140
Professional musicians 80
Total 460

Although recruitment was aimed at musicians, we did not exclude data from the six participants

who self-identified as music-loving non-musicians for two reasons: all reported at least two years
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 77

of regular musical practice, and they all successfully provided sufficient ratings for familiarity

and vividness on the instruments that they rated.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 78

Appendix B, Instructions.

B.1. Preliminary Study

“Below is a list of 44 musical instruments. From this list, we want you to select 20 instruments

that, as a group, you consider to exhibit the greatest contrast—that is, where each instrument

sound is relatively unique compared with the other 19 instruments you select. Do you have any

questions?”

B.2. Study 1: Interviews

“The purpose of this experiment is to gather information about how people experience the

sounds of different musical instruments. Rather than play any sounds to you, we simply want

you to imagine the sounds in your head. For example, we might ask you to imagine the sound of

a violin playing a single sustained tone. In imagining these sounds, you should imagine a sound

produced by a professional musician rather than a beginner or amateur. In addition, you should

imagine a typical or common sound rather than some unusual sound that an instrument might be

able to make.

I’ll mention the name of a specific instrument. Then you should do your best to imagine the

sound of that instrument. You can take your time doing that. When you are ready, I’ll ask you to

describe the sound. What does the sound sound like? Think of as many words or phrases,

adjectives or descriptions as you can to describe the sound you imagine.

I want you to say as much as you can about the sound and about how you experience the sound. I

will be transcribing your remarks, so I might ask you to slow down or repeat what you said.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 79

I may ask you some questions, but the purpose of the questions is simply to get you to talk about

the sound you imagine. Ideally, I wouldn’t ask you any questions at all.

Feel free to talk about any aspect of the sound—whatever catches your attention, whatever you

think, whatever it reminds you of. My preference is for you to simply talk about what you

imagine without my prompting.

When I first mention an instrument, I’ll ask you to judge on a 10-point scale how familiar you

are with the sound. For example, you might give a “10” to an instrument that you play regularly

and so are very familiar with. Conversely, if you really don't know an instrument well, you might

give it a rating of 2 or 3.

After you finish imagining the sound, I’ll then again ask you to rate on a 10-point scale how

clearly or vividly you were able to imagine the sound. Then I’ll ask you to imagine the sound

again and we’ll continue with you describing the sound. Do you have any questions about this?”

B.3. Pile Sort Task (Analysis)

“In this task you are required to sort ideas into what you consider appropriate categories. There

are some 502 items requiring sorting. There is no prescribed number of categories. Use as many

categories as you feel necessary to form coherent groups. After sorting some of the items you

may find it helpful to revise some of your existing categories. If necessary, feel free to create a

“miscellaneous” category that might be used to group items that don’t seem to fit with anything

else.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 80

After you have completed this task, provide an identifying label for each category. Write the

label on one of the colored paper slips provided.

If you use a “miscellaneous” category, after sorting all of the items, please return to the

miscellaneous category and determine whether any items might be reasonably added to one of

the existing categories.”

B.4. Study 2: Rating Task

“In this study, we are interested in how people describe the sound quality or character of

different musical instruments.

You will be asked to imagine the sounds produced by particular instruments, such as the trumpet

or oboe. For each instrument you will be presented with a list of terms (such as “metallic” or

“buzzy”) and asked to rate how well the term describes the instrument. For example, you might

respond that “heroic/noble” describes the trumpet well, but that “timid” is a poor descriptor of

the trumpet.

For each instrument, you will be asked to rate 77 descriptive terms. It is likely that it will take no

more than 5 to 10 minutes to complete the task for one instrument. However, the survey asks you

to complete the task for two instruments. Accordingly, it should take about 15 to 20 minutes to

rate both. After you rate two, you will be given the option to end the survey or to rate another

instrument (which would really help us out with this project). After each instrument you rate,

you will have the option to rate another instrument, or to end the survey.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 81

Before making your ratings, you will be asked how familiar you are with the instrument. We

only want to collect data for instruments that you recognize pretty well. If you are not at all

familiar with the instrument, we’ll move on to another instrument.

We will also ask you to rate how clearly or vividly you were able to imagine the sound.

Do your best to continue imagining the sound of the instrument as you make your judgments.

Thanks for your participation.”


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 82

Appendix C, Number of participants providing ratings for each instrument.

Instrument # of participants providing ratings


Piano 104
Crash cymbals 94
Snare drum 91
Flute 89
Oboe 87
Triangle 87
Bass drum 84
Tuba 84
Alto saxophone 79
Harp 79
Wood block 78
Bagpipes 76
French horn 76
Banjo 75
Kazoo 75
Timpani 75
Piccolo 69
Vibraphone 65
Bass clarinet 53
English horn 51
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 83

Appendix D, Pile sort categories and reconciled results.

Reconciled Experimenter 1 Experimenter 2


aggressive (friendly/threatening) aggressive/serious
high (pitch height) high pitch/treble
rough (texture) roughness
airy/breathy airiness breathy/airy
hollow (sound source) hollow/open
round shape roundness
beautiful aesthetic beautiful/pleasant
valence
sexiness
light (in weight) weight (big/heavy)
rumbling/booming (onomatopoeia) rumbling/booming
big big (big/heavy)
loud loudness loudness
sad/melancholy (sad) sad/melancholy
brassy (sound source) brassy
low (pitch height) low pitch/deep
salient/present (salience) attention-grabbing
presence
blends/stands-out
bright brightness brightness/brilliance
metallic (sound source) metallic/tinny
serious/solemn serious/somber/solemn (aggressive/serious)
(majestic/noble)
brilliant (luster) (brightness/brilliance)
mournful/wailing (sad) plaintive/crying
(arousal: relaxing-to-
shrill/harsh/annoying irritating) harsh/annoying
(salience)
buzzy (onomatopoeia) buzzy
muted/veiled covered muddy/unclear
simple (complexity) (miscellaneous)
clear (clarity) clear/precise/focused
mysterious/ethereal otherworldliness mysterious/magical
singing/voice-like vocality voice-like
colorful color (richness)
nasal nose nasal
soaring/floating floating effortless/floating
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 84

commanding/assertive boldness/shyness (bold/confident)


(aggressive/serious)
noisy noisiness noisy/chaotic
soft/smooth (texture) soft/smooth
cool/cold (temperature) cool/cold
open (density) (hollow/open)
sparkling/shimmering (luster) tinkling/sparkling
(brightness/brilliance)
cute/innocent aesthetic innocent/cute
(friendly/threatening)
percussive (articulation) plucked/percussive
supportive/foundational groundedness grounded/foundational
dark brightness darkness
piercing/sharp sharpness (attention-grabbing)
(salience) (plucked/percussive)
sustained/even consistency sustained/even
deep depth (low/deep)
pinched/constrained (resonance) pinched/strained
sweet olfactory/gustatory sweetness
direct/projecting sound movement (attention-grabbing)
ping/ding/ting (onomatopoeia) (tinkling/sparkling)
thick/fat (thick/thin) (thin/thick)
dramatic/expressive expressivity emotional/expressive
(character-like)
powerful power powerful/forceful
thin/narrow (thick/thin) (thin/thick)
focused/compact (stability) (clear/precise/focused)
(density)
precise/clean (articulation) (clear/precise/focused)
(clarity)
twangy (sound source) twangy
folk-like/pastoral associations traditional/rustic
pure purity pure
unclear/indistinct (clarity) (muddy/unclear)
full (density) full
quick decay (decay) (decay/taper)
unique/distinct uniqueness unique/exotic
funny/comical funny funny/playful
raspy/guttural throat guttural/throaty
versatile/flexible flexibility agile/flexible
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 85

(arousal: relaxing-to-
gentle/calm irritating) peaceful/relaxing
speed
reedy sound source reedy
warm (temperature) warm
happy/joyful happy happy/joyful
spaciousness/reverb/presen
resonant/vibrant (resonance) ce
watery/fluid fluidity liquid/watery
heavy weight (big/heavy)
rich/complex (complexity) (richness)
overtones
wavy/undulating (stability) vibrato/wavy
heroic/noble (character-like) (triumphant)
(majestic/noble)
ringing/long decay (resonance) (decay/taper)
(decay) (crash/clang)
woody sound source woody

N.B. Parentheses indicate that the words in the listed category were distributed among more than

one reconciled final category.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 86

Appendix E, Pertinence Ratings

Figure F.1 identifies the 33 components, rank-ordered by average pertinence. In seeking a more

parsimonious model, one would hope to see evidence of some discontinuity in the ratings where

one group of components is rated considerably lower than another group. However, no obvious

discontinuity is evident in Figure 1. The largest drop in pertinence ratings is between 15 and 16

components, from 74.7 (“loud, aggressive, commanding, assertive, powerful, direct, projecting”)

to 64.9 (“quick decay”). However, 64.9 is still a high pertinence rating, given that 50 was labeled

as “moderately pertinent.”

Figure F.1. Mean pertinence ratings for each of the 33 components. On the x-axis, each set
is represented by its initial descriptive term.
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 87

Appendix F, Principal component analysis and interpretation.

We began our analysis of PCA results with the typical approaches: looking at the Eigenvalues,

considering the scree plot/knee point, and examining the percent of cumulative variance

explained. Table D.1. below includes a summary of the unrotated PCA on the ratings dataset;

Figure D.1. displays the corresponding scree plot.

Table D.1. Principal component analysis, Study 2 results

Proportion Cumulative
Variance Variance
Eigenvalue Explained Explained
PC1 12.41 0.16 0.16
PC2 11.08 0.14 0.31
PC3 6.33 0.08 0.39
PC4 5.46 0.07 0.46
PC5 3.08 0.04 0.50
PC6 2.45 0.03 0.53
PC7 1.71 0.02 0.55
PC8 1.47 0.02 0.57
PC9 1.45 0.02 0.59
PC10 1.24 0.02 0.61
PC11 1.15 0.01 0.62
PC12 0.96 0.01 0.63
PC13 0.89 0.01 0.65
PC14 0.86 0.01 0.66
PC15 0.82 0.01 0.67
PC16 0.81 0.01 0.68
PC17 0.78 0.01 0.69
PC18 0.76 0.01 0.70
PC19 0.74 0.01 0.71
PC20 0.71 0.01 0.72
PC21 0.69 0.01 0.73
PC22 0.67 0.01 0.73
PC23 0.66 0.01 0.74
PC24 0.63 0.01 0.75
PC25 0.60 0.01 0.76
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 88

PC26 0.58 0.01 0.77


PC27 0.57 0.01 0.77
PC28 0.55 0.01 0.78
PC29 0.54 0.01 0.79
PC30 0.53 0.01 0.79
PC31 0.51 0.01 0.80
PC32 0.51 0.01 0.81
PC33 0.49 0.01 0.81
PC34 0.49 0.01 0.82
PC35 0.47 0.01 0.83
PC36 0.46 0.01 0.83
PC37 0.46 0.01 0.84
PC38 0.44 0.01 0.84
PC39 0.44 0.01 0.85
PC40 0.43 0.01 0.86
PC41 0.42 0.01 0.86
PC42 0.41 0.01 0.87
PC43 0.40 0.01 0.87
PC44 0.40 0.01 0.88
PC45 0.39 0.01 0.88
PC46 0.39 0.01 0.89
PC47 0.38 0.00 0.89
PC48 0.37 0.00 0.90
PC49 0.36 0.00 0.90
PC50 0.35 0.00 0.91
PC51 0.35 0.00 0.91
PC52 0.34 0.00 0.91
PC53 0.33 0.00 0.92
PC54 0.33 0.00 0.92
PC55 0.32 0.00 0.93
PC56 0.32 0.00 0.93
PC57 0.31 0.00 0.94
PC58 0.31 0.00 0.94
PC59 0.30 0.00 0.94
PC60 0.29 0.00 0.95
PC61 0.29 0.00 0.95
PC62 0.28 0.00 0.95
PC63 0.28 0.00 0.96
PC64 0.27 0.00 0.96
PC65 0.26 0.00 0.97
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 89

PC66 0.26 0.00 0.97


PC67 0.25 0.00 0.97
PC68 0.25 0.00 0.98
PC69 0.24 0.00 0.98
PC70 0.23 0.00 0.98
PC71 0.23 0.00 0.98
PC72 0.22 0.00 0.99
PC73 0.22 0.00 0.99
PC74 0.20 0.00 0.99
PC75 0.19 0.00 1.00
PC76 0.19 0.00 1.00
PC77 0.17 0.00 1.00

Figure D.1. Scree plot, Study 2 results.

As can be seen in both the Table and the Figure, the cutoff based on the arbitrary rule of

Eigenvalue > 1 would result in a model containing 11 dimensions. Indeed, in reporting

preliminary data from this study in a conference proceeding (Reymore & Huron, 2018), this is

how we approached the model, which contained 11 dimensions. The “knee point” method would

suggest around just seven components, depending on where exactly the “knee” was read.

However, after collecting more data, looking carefully at possible interpretations for rotated
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 90

components, and considering our research goals, which include the use of this model in music

theoretical scholarship, we were ultimately not satisfied with the 11- or 7-component models. To

start, the 11-component model suggested by the Eigenvalue method explains just 62% of the

variance. A 7-component model suggested by the knee plot explains just 55% of the variance.

After considering the variances explained and interpreting the components of these models, we

felt, as professional musicians, that even the 11-dimension model was intuitively incomplete.

The process described below, of considering the available models and alternative methods of

interpretation, was the result of about six months of active consideration and continuous

discussion of this issue.

To address our concerns about variance explained, we might have turned to another rule

of thumb proposed by statisticians for PCA interpretation: to choose a cutoff of dimensions

based on the acceptable amount of variance explained (Jolliffe, 2002). We considered variance

explained by models from previous research in timbre semantics: von Bismarck found 81%

variance explained with four factors for synthetic sounds, and Kendall and Carterette (1993a),

testing a restricted set of natural recorded tones, found nearly 98% of variance explained with

only two factors. Kendall and Carterette 1993b also resulted in a different two-dimensional space

explaining 96% of variance in the data. In order to achieve even the lowest of these numbers,

81%, in the current study, we would need to include 32 dimensions. To achieve variance

explained along the order of 96–98%, as reported Kendall and Carterette, our data would require

63–71 dimensions.

Thus, we found ourselves in a position where none of the typical rules of thumb for PCA

interpretation ended with a model that would sufficiently meet our goals. Eigenvalues and scree

plots suggested models that did not explain enough variance, but a model based simply on a
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 91

variance goal would include far too many dimensions to be practical. Accordingly, we needed

another way of choosing a model. Rather than inventing further arbitrary rules of thumb, we set

out on an exploratory expedition and dug deep into many possible models. One approach would

have simply been for us to pick the model we felt best represented our music theoretical

intuitions. However, such a choice would have been especially biased in that it would reflect

only our own timbral values. Ultimately, researcher bias is unavoidable in the interpretation of

PCA, and we knew we would at some point have to simply make a decision. However, to

mitigate that bias as much as possible, we computed the component superset and added further

rating tasks, as described in the main text of the manuscript.


COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 92

Appendix G, Qualia dimensions and classifications after Wallmark (2019a).

The left column contains the twenty dimensions of the final timbre qualia model constructed in

the current paper. The column on the right contains what we have judged to be the corresponding

categories of timbre description as proposed by Wallmark (2019a). Because the dimensions

contain multiple terms, some can be sorted into multiple categories; the terms we have attributed

to each category are listed in parentheses when this is the case.

Table G.1.

Dimension Classification after Wallmark, 2019a

1. rumbling, booming, low, deep, thick, ACTION (rumbling, booming), ACOUSTICS


fat, heavy (low), MATTER (deep, thick, fat, heavy)

CROSS-MODAL CORRESPONDENCE (soft,


2. soft, smooth, singing, voice-like,
smooth, sweet), MIMESIS (singing, voice-like),
sweet, gentle, calm
AFFECT (gentle, calm)

3. watery, fluid MATTER


4. direct, projecting, loud, aggressive, ACOUSTICS (direct, projecting, loud); AFFECT
commanding, assertive, powerful (aggressive, commanding, assertive, powerful)

5. nasal, reedy, buzzy, pinched, MIMESIS (nasal, reedy); ONOMATOPOEIA


constrained (buzzy); ACTION (pinched, constrained)

6. shrill, harsh, noisy ACOUSTICS


7. percussive ACOUSTICS

CROSS-MODAL CORRESPONDENCE (pure,


8. pure, clear, precise, clean
clear, clean), ACOUSTICS (precise)
COGNITIVE LINGUISTIC DIMENSIONS OF TIMBRE QUALIA 93

9. brassy, metallic MATTER

MIMESIS (guttural); ACOUSTICS (raspy)


10. raspy, guttural, grainy, gravelly CROSS-MODAL CORRESPONDENCE (grainy,
gravelly)
11. ringing, long decay ACOUSTICS
12. sparkling, shimmering, brilliant,
CROSS-MODAL CORRESPONDENCE
bright
13. airy, breathy MIMESIS
14. resonant, vibrant ACOUSTICS
15. hollow ACOUSTICS
16. woody MATTER
17. muted, veiled ACOUSTICS
18. sustained, even ACOUSTICS
19. open ACTION

20. focused, compact ACTION (focused), MATTER (compact)

View publication stats

You might also like