The Communication of Musical Expression
Author(s): Roger A. Kendall and Edward C. Carterette
Source: Music Perception: An Interdisciplinary Journal , Winter, 1990, Vol. 8, No. 2
(Winter, 1990), pp. 129-163
Published by: University of California Press
Stable URL: https://www.jstor.org/stable/40285493
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
University of California Press is collaborating with JSTOR to digitize, preserve and extend
access to Music Perception: An Interdisciplinary Journal
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Music Perception © 1990 BY THE REGENTS OF THE
Winter 1990, Vol. 8, No. 2, 129-164 UNIVERSITY OF CALIFORNIA
The Communication of Musical Expression
ROGER A. KENDALL & EDWARD C. CARTERETTE
University of California, Los Angeles
This study focuses on the performer-listener link of the chain of musical
communication. Using different perceptual methods (categorization,
matching, and rating), as well as acoustical analyses of timing and
amplitude, we found that both musicians and nonmusicians could dis-
cern among the levels of expressive intent of violin, trumpet, clarinet,
oboe, and piano performers. Time-contour profiles showed distinct sig-
natures between instruments and across expressive levels, which affords
a basis for perceptual discrimination. For example, for "appropriate"
expressive performances, a gradual lengthening of successive durations
leads to the cadence. Although synthesized versions based on perfor-
mance timings led to less response accuracy than did the complete nat-
ural performance, evidence suggests that timing may be more salient as
a perceptual cue than amplitude. We outline a metabolic communica-
tion theory of musical expression that is based on a system of sequences
of states, and changes of state, which fill gaps of inexorable time. We
assume that musical states have a flexible, topologically deformable
nature. Our conception allows for hierarchies and structure in active
music processing that static generative grammars do not. This theory
is supported by the data, in which patterns of timings and amplitudes
differed among and between instruments and levels of expression.
explore the basic question, "How does the performer conv
ideas to the listener?" Other studies have investigated vari
pects of expressive performance (e.g. Seashore, 1938; Bengtsso
brielsson, 1983; Clarke, 1988; Clynes, 1983; Gabrielsson, 198
berg, 1988; Sundberg, Frydén, & Askenfelt, 1983), primarily fo
acoustic measurements of timing and amplitude. Indeed, musica
sion has been commonly defined in terms of deviations from m
performance of canonical notations. However, a distinction must
between random performance variability and that attributable t
sive intent. Over 50 years ago, Seashore (1938) remarked
Requests for reprints may be sent to Roger A. Kendall, Department of Ethnom
and Systematic Musicology, or Edward C. Carterette, Department of Psycho
versity of California, Los Angeles, Los Angeles, CA 90024.
129
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
130 Roger A. Kendall & Edward C. Carterette
As a fundamental proposition we may say that the artistic expression
of feeling in music consists in esthetic deviation from the regular-
from pure tone, true pitch, even dynamics, metronomic time, rigid
rhythms, etc. (p. 9)
In one study, Seashore's (1938, p. 247) pianist performed the first 25
measures of Chopin's "Nocturne," op. 27, no. 2 in "artistic" and "at
tempted metronomic" time. Patterns of accumulated measure and phras
durations were similar between the two renditions, and the dynamic range
of the metronomic version was restricted. Seashore (1938) also noted th
relative absence of intensity cues in accenting, saying ". . . time is alway
a rival of intensity in giving accent" (p. 243).
Some explorations of the role of "deviations" in expressive performance
have come from measurement-qua-measurement and analysis-by-syn-
thesis studies. In these studies the relationship between the performer and
listener has been largely neglected. Musical communication is concerne
not merely with a single frame of reference, but includes the complex
relationships among composer, performer, and listener. Gabrielsso
(1988) notes that
To find truly general results in performance data is therefore difficult.
The generalities should rather be sought in the relations between per-
formance and (the listener's) experience, (italics his, p. 46)
There are a few relevant experimental studies on musical communi-
cation. Nakamura (1987) investigated the ability of the performer to com-
municate dynamics to the listener. In general, the performer's intention
were communicated, particularly for Crescendos and across the differin
dynamic ranges of violin, recorder, and oboe. Tro (1989) investigated
perceptual differences in performed dynamics by using entire pieces. Per-
ception varied among sex, singers and instrumentalists, performers an
audience; perception of dynamic range depended on the contrast level t
the low-intensity passages.
The semantic differential was used by Senju and Ohgushi (1987) to
evaluate the ability of a violinist to convey her ideas to the audience. I
playing the first movement of Mendelssohn's Violin Concerto in e minor,
the performer tried to represent 10 musical feelings labeled with suc
words as "weak," "sophisticated," "bright," "powerful," and "fashion-
able," which referred to playing style. Generally, the semantic differential
responses of musically trained listeners showed weak correspondence with
the intent. One limitation was the wide range of intended styles in relation
to the music. Another was the use of inappropriate verbal attributes t
define the performance instead of simply using commonly notated ex-
pressive markings.
Campbell and Heller (1979) investigated the ability of a cellist and a
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 131
violist to communicate detailed notational ch
fingerings, and articulations. They found th
79%, 65%, and 50%, respectively, of normal
synthesized notational interpretations. This use
sical communication relies on notational cu
interpretation rather than on behavioral m
manipulated notational cues. He moved all t
dundant composition (Satie's Vexations) and
"show consistent deviations from strictly metr
that produce a profile of partially periodic
states, ". . . expressive profile is generated at t
information specified in the musical structu
Clarke minimizes the distinction between an
expression. All deviations from mechanical p
pressive"; expression's domain is the mind of t
such as bar lines can be tied to the pitch/tim
(q.v. Ives, "The Cage"; Webern, "Variation
ment; Stravinsky, "Marche du soldat" from
question is, what defines the musical structu
subject? G. Houle (1987) says
It seems to be the belief of most seventeenth- an
theorists that musical meter is naturally and a
the listener and only secondarily heightened
techniques. Most performers today are aware
suggest that the measure is identified by regu
stress based on bar lines and time signature,
We shall return to the issue of the nature o
section on theory.
A Metabolic Communication Theory o
Any model of communication involves the
of messages. Our view of musical communicatio
components. l The process of musical communi
ed musical message that is recoded from idea
poser, then recoded from notation to acoustica
finally recoded from acoustical signal to ideatio
1. For this paper, we deal only with traditional Weste
performer, and listener are involved. We do not consi
each person, or indeed, the absence of components al
indigenous music of India, some electronic musics, jazz,
In addition, feedback loops, both acoustical and gestur
Heller (1981) discuss a similar model, differing consi
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
132 Roger A. Kendall & Edward C. Carterette
Fig. 1. A model of musical communication. C = composer; P = performer; L = listener
We assume that a basic mental capability of the human is the groupin
and parsing of elementary thought units (metasymbols).2 In fact, th
capacity is one basis for the survival of the organism. Human languag
systems are obvious examples of this capacity in respect of both referentia
and areferential elements. Our view is that music is but another mani-
festation of this capacity wherein the referential is largely suppressed.
Abstract mathematics thus resembles music. From this perspective the
answer to the question, "Is music necessary for survival?", is yes, insofar
as human survival depends on the capacity for generating, synthesizing,
and analyzing metasymbolic structures.
We conceive of music as based on a system of sequences of states that
fill gaps of inexorable time. Depending on the frame of reference, these
states can be, for example, sound states (perception), acoustic states (vi-
brational signal), symbol states (notational signal), and mental states (cog-
nitive metasymbols). In fact, the transformation from one state sequence
to another is the very core of the difficulty in musical communication. A
sequence of states is not yet a pattern, not until some organization has
2. The mental representation, or form, of the idea that is manipulated by the creator
varies by field and by person. Musicians "hear" or "feel" some sound structures, but others
are vaguer or less immediate. Mathematicians say that they do not operate on symbols,
but on indefinite (metasymbolic) mental forms and kinetic feelings. We assume that group-
ing and parsing by the composer, performer, or listener is carried out on lexical items stored
as schémas in implicit and explicit long-term memories. Lexical items may be simple or
complex, but generally they are stored as complex forms, in a phrasal lexicon. "Phrasal"
is intended to convey the notion that the creator (composer) groups preformed structures,
gestures, or schémas, like those of speech, into larger schémas. A composer may not select,
order, group, and notate each individual note of a glissando or mordent.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 133
been imposed on it by some mental process.
what defines a state is relative to a frame o
a frame of reference, various levels of s
macrostates) can exist at the same time
State changes are the basic building blocks
sequences (protean units, as few as two stat
ticularly in relation to the listener's schemata,
more than itself; this suggests expectation. I
state is defined relative to its context. We assu
a flexible, topologically deformable nature, a
frustration in grammatical or symbolic mod
Consider a glissando for two octaves start
protean sequence consists, in the notational fra
(Figure 2a). Another representation simply u
(Figure 2b). Of course, Figure 2b is less spe
pretations are possible: a chromatic series,
a portamento. Both examples rely on conven
the relation of the glissando to the quarter not
composer's ideation to notation depends o
knowledge of performer/instrument propertie
Protean sequences may be termed "gesture
gesture, motif, and pattern is open to operatio
investigation. The potential pattern implied
consists of structural states hung on inexora
of microstructural state changes, a gap-filling
is characterized by a degree of salience, this
frame of reference. It is possible to have a g
not only pitch, but temporal, dynamic, and
well. The dynamics of gap-fill stem from co
iological, physical, perceptual, and cognitive
processes reach a limit and must change mag
tinue. In this way, quasiperiodic state changes m
of the musical message.4
The recoding of musical messages depends
implicit and explicit knowledge (Figure 1).
phasizes implicit over explicit knowledge: W
3. The term "inexorable" is meant to emphasize the im
flow of time. The perceptual time window appears to
and attention, scanning events of the past and antici
motion of music can only be a forward reflection of
relations onto musical structures as in fractal music l
negative time.
We consider the "gap-fill" process to be a general architectonic principle. In this sense,
the term is quite different from that proposed by Meyer (1973).
4. Here we can only touch on the ramifications and possibilities of our position.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
134 Roger A. Kendall & Edward C. Carterette
Fig. 2. Two ways of notating a glissando.
ecute micromuscular embouchure adjustments, achieve timbrai balances
and fusion, and parse the acoustical signal into a musical message.
The relationship between implicit and explicit operations is represented
schematically in Figure 3 as an information processing model. In essence
the right-hand side of the figure deals with symbols; the left-hand side with
metasymbols. Conscious awareness has direct access only to workin
memory and, through working memory, to explicit procedures and ex
plicit long-term memory. Although conscious awareness can direct a prob-
lem or query to implicit procedures, it cannot access directly their content
or form. External world inputs are parsed (differentiated and categorized)
by implicit procedures, albeit under the potential direction of consciou
awareness, and are conditioned by schémas in implicit long-term memory.
The translator maps metasymbolic to symbolic units, and vice versa. What
is perceived at the conscious level via working memory is a translated
version of implicit knowledge; one never can know to what extent th
mapping is isomorphic. In a similar way, explicit knowledge is, throug
rehearsal, mapped in a translated form to the implicit. In the figure, pairs
of arrows in opposite directions imply parallel processing; the double
headed arrow implies serial processing.
It would be erroneous to interpret the model in terms of transforma
tional grammars. Explicit and implicit procedures are not equivalent to
deep and surface structures and transformational rules. Hierarchies are
clearly a part of the musical message and its representation in any fram
of reference, as any redundancy implies structural organization. The na
ture of hierarchies, whether strict, symmetrical, and unique or flexible
asymmetric, and ambiguous, must be taken into account. The role of
hierarchy in our approach is flexible, asymmetric, and ambiguous. There-
fore generative and transformational procedures are neither necessary nor
sufficient in message recoding. Our processing concept is one of manifold
procedures and multiple strategies, not of grammars. Indeed, generativ
theories have been found wanting in a number of domains. Roads (1985
p. 429) notes that ". . . the rewrite rule by itself has been shown to b
insufficient as a representation in music." Minsky holds that grammar
are static representations that distract researchers from studying music as
a cognitive process (Roads, 1980). Further, Winograd (1972) found
necessary to eschew transformational grammars in favor of case grammars
for the parsing of natural language.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 135
Fig. 3. Information-processing model showing some deta
and explicit knowledge and procedures of the person
One of the problems in studying musical e
domain lies in implicit procedures. Evidence
difficulty of verbalizing about the generatio
siveness. It is no accident that most performan
one-on-one in an interactive, modeling envir
covert nature, something is being imparted by
from the large number of multiple recordin
same music notation. In our study, expressiven
generated by the performer and directed at th
intended message is received, communication
er's message is a synchronous modulation of th
as the carrier. In the process of message parsin
may impose meanings unintended by either
Indeed, it is in the nature of the listener alw
the composer's message is unparsable, the pe
may be grasped by the listener. In this case,
as little more than a framework for the dial
listener.5
The work reported here is concerned with some of these issues of mu-
sical communication. We propose to integrate performer-generated levels
of expressive intent with perceptual and acoustical analyses in a series of
5. We admit the possibility of asynchronicity between expressive (performed) and com-
posed state structures, particularly for generating performer hallmarks or novelty. The
relationship between the two forms part of the dynamics of information content in the
message, as is the case for pitch and time in the cognitive coding of melodies (Monahan,
Kendall, & Carterette, 1987). Strict dodecaphonicism, pointillism, and use of sound struc-
tures à la Stockhausen may exceed cognitive limits, yet the performance may be parsed
for meaning.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
136 Roger A. Kendall & Edward C. Carterette
studies, which investigate the communication of musical expression be
tween performer and listener as moderated by compositional style. In this,
we go beyond previous research by considering at the same time many
aspects of musical communication.
Experimental Rationale
As mentioned earlier, most previous investigations have focused on
analysis and manipulation of acoustical aspects of musical performance
In contrast, we begin our study by conducting perceptual experiments
whose outcomes are then correlated with acoustical data. The initial re-
search questions are directed at the performer-listener part of the com-
munication model of Figure 3.
A number of methods were employed in order to converge on an answer
to the question "to what extent are the performer's expressive intentions
conveyed to the listener?" Different methods induce different listening
strategies, a fact often ignored. For this reason, we adapted or invented
techniques to explore the variety of listener responses. We used catego-
rization, matching, and rating methods and two distinct classes of lis-
teners, musicians and nonmusicians, and, in one experiment, we used both
natural and artificial music. Categorization reveals the ability to assign an
instance to a model when the whole range of possibilities can be reviewed.
In contrast, matching compels the listener to remember the possibilities
and respond only in the presence of a single model. Ratings require that
the listener provide a measure of the degree to which an attribute is
possessed by an instance. The use of artificial [Musical Instrument Digital
Interface (MIDI) synthesized] performances was meant to remove some
of the effects of the degrees of freedom at the disposal of the performer.
Thus, data produced using these methods allow comparison and contrast.
Materials and Methods
Musical materials were drawn from the vocal literature of four stylistic periods, ba-
roque, classical, romantic, and twentieth century, in order eventually to assess the influence
of compositional style on the generation and communication of expressiveness. Because
instruments were used, biases potentially present in instrumental music were reduced, we
hope, by the use of vocal music.6 Selection of phrases was guided by tessitura consid-
erations for the five instruments used: piano, clarinet, oboe, violin, and trumpet. The tonal
center for all selections was standardized by transposition to g minor; the meter was 4.
The melodies were "Thy Hand, Belinda" from Dido and Aeneas by Purcell (converted from
\ to l), "Der Wanderer" by Haydn, "Der Müller und der Bach" by Schubert and "Weise
im Park" from Vier Lieder by Webern. This paper reports only on results obtained using
6. Whatever music might be chosen, the problem of appropriateness arises. Music
written specifically for each instrument would be ideal; however, the resultant confounding
of expression and materials would cloud comparisons. We have chosen to take a middle
ground, providing the same musical materials across instruments. We note the pervasive
transcription of vocal melodies to instruments and the relative rarity of the inverse.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 137
"Thy Hand, Belinda" (Figure 4). The bracketed phrase
the entire phrase was performed and recorded, howev
Performances were recorded digitally (Sony PCM-
reverberant concert hall (Schoenberg Hall, reverber
coincident microphone (AKG Model 422) feeding a m
ciated Model MS 38) set for a crossed orthogonal figur
to the source. The height of the microphone was 1.6
piano center was 1.24 m, to wind instrument chair, 1
to trumpet chair, 1.75 m. These placements were estab
engineer, David Cloud, to optimize sound quality.
As a model, a professional concert pianist, Johana H
the four melodies in their entirety at three intended
expression (senza espressione), with appropriate expr
exaggerated expression (con troppo espressione). We i
without expression as mechanical, with expression as
expression as "too much." Dynamic range was establis
were asked to imagine a space of dynamics with mezzo
mechanical version statically at mezzo forte, and the
level. The authors and recording engineer monitored th
other instructions were given to the performers.
Four professional instrumentalists (oboe, clarinet, vio
phones to the recorded model performances of each of
They auditioned each model only once. The instrume
scores ad libitum. Following this, the instrumentalist pl
of the intended level of expression. The order of piec
Digital tape recordings were reproduced in analog for
hard disk for experimental use. Sampling was monoph
at 28,571.4 samples/sec, 16-bit, using Canetics' PC-DM
digital-to-analog interface. Krohn-Hite model 3202 filte
smoothing. Playback from hard-disk was through a S
heiser HD-222 headphones. All experiments were contro
collected by computer program.
Experiment 1
PROCEDURE
A computer-based categorization paradigm was used (Kendall, 1988). Mo
(piano) were represented by three colored bars on the computer screen (
selections to be categorized were displayed in a different region of the sc
ferentiated white bars. Both models and selections were randomly ordere
used a mouse to point at a bar and either select it for playing or for mov
was to match a selection to an appropriate model by placing its bar under
Fig. 4. Notation of excerpt from "Thy Hand, Belinda" by Purcell. The bra
was the musical material of the experiments.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
138 Roger A. Kendall & Edward C. Carterette
Fig. 5. Schematic diagram of the computer screen seen by a subject during categorizin
the model. A practice session with different stimuli familiarized subjects with the procedu
and equipment. Neither here nor in the main experiment was feedback provided regardi
correctness of response.
Subjects were told that each piano model represented a different level of expressio
and that the remaining 12 musical selections were four instruments each playing at thre
levels of expression. Subjects were informed that each of the four instruments occurred
only once for a given level of expression.
Subjects were allowed to replay and move stimuli until they informed the experimente
that they were finished. Therefore, at the end of the experiment, four different instrumen
were associated with a given model on the screen (see Appendix). This final procedur
was the distillation of results obtained from 18 subjects, nine musicians and nine non
musicians, who participated in various pilot studies.
SUBJECTS
There were 19 subjects in all, 10 nonmusicians and 9 musicians. Nonmusicians ha
less than 1 year of formal music instruction; most of the musicians were graduate studen
in musicology and had a minimum of 10 years of formal instruction. None had participat
in the pilot experiments.
RESULTS AND DISCUSSION
Tables 1A and IB show a set of eight confusion matri
nonmusicians and four for musicians. The three columns
three expressive levels of the model performance (none, ap
aggerated); each row represents the corresponding express
instrument (oboe, clarinet, violin, or trumpet). Each entry
sum over all subjects who categorized the selection with
model. For example, musicians placed eight clarinet Level 1
under the piano Level 1, one under piano Level 2, and non
exaggerated Level 3. (PI, P2, and P3 refer to piano mod
expression. Likewise, El, E2, and E3 refer to expressive le
of the other instruments.) Therefore, the upper-left to low
onals of each matrix represent the "correct" (intended) match
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 139
TABLE 1
Categorization Frequencies
A. Musicians
Clarinet Oboe Violin Trumpet
Expressive Level PI P2 P3 PI P2 P3 PI P2 P3 PI P2 P3
El 810 810 900 810
E2 162 162 027 162
E3 018 036 072 027
Mean hits (%) 81 74 48 78
B. Nonmusicians
Clarinet Oboe Violin Trumpet
Expressive Level PI P2 P3 PI P2 P3 PI P2 P3 PI P2 P3
El 631 622 811 802
E2 172 163 127 262
E3 226 226 163 235
Mean hits (%) 63 60 43 70
of expression. Figures 6a and
and IB.7 These data are collap
square analyses of Table 2 row
(p < .025, df = 2). Data were
Table 3. Chi-square analyses of
significant (p < .025, df = 2).
each row and that the alpha lev
A two-way analysis is inapprop
of observations in columns.
Note that the intended expression level matches the subject categori-
zation well beyond chance level. However, for violin the categorization
of expressive Levels 2 and 3 was poor, with the two categories being
reversed. We will return to these findings later.
In addition to subject categorization data, the computer recorded the
amount of time (in sec) the subject spent listening to a given selection, the
number of hearings, and the number of category changes. Analysis of
variance using multiple general linear hypotheses (ANO VA using MGLH)
on repeated measures indicated that the difference in the mean listening
durations for musicians (mean = 48.2 sec) and nonmusicians (mean =
72.1 sec) was not statistically significant [F(l,17) = 3.14, .05 < p < .082].
7. Because the expected frequencies were too low, and because of repeated measures,
chi-square analysis was not performed on the frequencies of Tables 1A and IB. Tables
2, 3, and 4 had adequate minimum expected frequencies, and were analyzed by row only,
because each row (but not column) was independent.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
140 Roger A. Kendall & Edward C. Carterette
Fig. 6. Graphs of the data of Tables 1A and IB.
TABLE 2
Categorization Frequencies for Combined Musicians and Nonmusici
Clarinet Oboe Violin Trumpet
Expressive Level PI P2 P3 PI P2 P3 PI P2 P3 PI P2 P3
El 14 4 1 14 3 2 17 1 1 16 1 2
E2 2 13 4 2 12 5 1 4 14 3 12 4
E3 2 3 14 2 5 12 1 13 5 2 5 12
Mean hits (%) 72 67 46 74
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 141
However, across all combinations of instrum
(Table 4), the listening time was significantly d
p < .0009]. Figure 7, based on the means in
table 3
Categorization Frequencies Summed over Instruments
Musicians Nonmusicians
Expressive Level ~P1 P2 P3~ ~P1 P2 P3~
El 33 3 0 28 5 7
E2 3 20 13 6 21 13
E3 0 13 23 6 14 20
Mean hits (%) 70 64
TABLE 4
Mean Listening Du
Expressive Level Clarinet Oboe Violin Trumpet
El 49.74 53.11 36.37 45.05
E2 73.05 68.90 77.32 59.37
E3 71.00 67.63 70.37 57.90
Fig. 7. Mean listening durations du
levels.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
142 Roger A. Kendall & Edward C. Carterette
shows that decision-making required more listening time in the case o
nonmechanical expression. In addition, the appropriate level of expres
siveness, Level 2, consistently had the highest listening mean duration
It is notable that the violin Level 2 listening mean durations were th
highest in the group; this level of expressiveness was often interchange
with Level 3 in the categorization task (Table 1). Musicians listened
fewer number of times (mean = 5.7) to the stimuli than did the nonmu
sicians (mean = 9.1)[P(1,17) = 5.11, p<.037]. Across instruments an
levels of expression the mean number of times a stimulus was auditione
was significantly different [F(l 1,187) = 2.85, p = .002] (Table 5, Figur
8). Except for the oboe, the second level of expressiveness yielded a higher
frequency of listening than the other levels. Comparison of the mean
durations across instruments and levels of expression (Figure 7) with
table 5
Mean Number of Times Auditioned
Expressive Level Clarinet Oboe Violin Trumpet
El 7.05 7.58 5.68 6.32
E2 8.47 8.47 8.68 7.32
E3 7.74 9.11 7.68 7.00
Fig. 8. Mean number of tim
struments and expressive l
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 143
the means for times auditioned (Figure 8) show
significant outcome for category changes w
instruments and levels of expression [F(ll,1
6). A line graph of these means is shown in Fig
more category changes were made for Level
mances. We suggest that these data indicate
discriminating between expressive Levels 2
Experiment 2
In the categorization paradigm, subjects can review any and all stimuli.
In this way, the listener can build a perceptual space. In contrast, matching
table 6
Mean Number of Category Changes
Expressive Level Clarinet Oboe Violin Trumpet
El 1.32 1.16 1.11 1.37
E2 1.42 1.53 1.68 1.53
E3 2.05 1.63 1.42 1.37
Fig. 9. Mean number of cate
expressive levels.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
144 Roger A. Kendall & Edward C. Carterette
constrains the listener in developing a perceptual space; emphasis is placed
on the immediate model and choice set. As an independent measure o
a listener's ability to discern intended levels of expressiveness, we used
contrasting, matching method.
PROCEDURE
The task was a three-alternative forced-choice paradigm with subject review
trial, one of the 12 selections (four instruments x three levels of expressi
followed by all three piano renditions. The subject then selected the piano re
best fit the level of expressiveness of the initial example. The subject coul
the example and piano choices. Stimuli were randomized. The subject respo
by highlighting a letter label (A, B, or C) and pressing a mouse button. Two
12 selections were presented.
SUBJECTS
Ten musicians, each paid for participation, served as subjects. In order to q
had to have 10 or more years of formal musical training, and could not have
in categorization Experiment 1.
RESULTS AND DISCUSSION
Table 7 is a confusion matrix like those shown in Table
the hit rates are well above chance (33%), they are consid
those obtained in Experiment 1, except for the case of the
fared poorly there. Figure 10 is a bar graph of Table 7 dat
determine differences in correct responses across instruments
all scores within diagonals of Table 7. We scored subjec
correct when the model and choice were matched, average
and submitted the data to a 12-level repeated-measures AN
instruments, correct matches were statistically equivalen
1.135, p < .343], as the mean hits of Table 7 show. In orde
correct and incorrect matches for a given level of expressiven
were summed across the four instruments, creating a sing
Because row sums were equal (80), leading to singularities,
perturbed by adding a noise value that randomly ran
+ /-1.00 (Table 8).8
Three two-factor 3 x 3 -level ANOVAs on repeated measu
formed, one for each column of the resulting summed matrix
were adjusted accordingly. In all cases, the correct match
8. For a discussion of perturbation of data in singular matrices, see P
Teukolsky, and Vetter ling (1986). While not identical, our procedure e
their ideas. In our case, the lack of proportional weighting of the pert
diagonal values works against the null hypothesis, a conservative appr
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 145
Table 8) produced higher mean scores than
pression: F(2,6) = 34.55, p<.001; with ap
F(2,6) = 7.97, p<.02; with exaggerated e
p<.002].
TABLE 7
Match Frequencies (Musicians)
Clarinet Oboe Violin Trumpet
Expressive Level PI P2 P3 PI P2 P3 PI P2 P3 PI P2 P3
El 11 7 2 10 6 4 14 2 4 14 2 4
E2 4 14 2 6 13 1 3 10 7 7 6 7
E3 2 9 9 3 5 12 1 10 9 5 2 13
Mean hits (%) 57 58 55 55
Fig. 10. Graphs of data of Table
TABLE 8
Perturbed Match Frequenc
Expressive Level PI P2 P3
El 12.63 4.84 3.98
E2 4.99 10.72 4.12
E3 2.75 6.17 10.83
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
146 Roger A. Kendall & Edward C. Carterette
Musical Signal Analysis
We explored various signal correlates to musical expression, includin
timing, amplitude rates of change, vibrato and time-variant spectral char-
acteristics. In terms of the musical expression communication model (Fi
ure 4), we investigated some aspects of performer and listener recodin
in the acoustical signal frame of reference.
TIMING
Computer algorithms aided in partitioning the acoustical signal
nals were parsed in 100-sample windows (3.5 msec). First, the ab
value of the average of minimal and maximal amplitudes within t
dow was obtained. A second-order forward-difference operator w
plied to these values in sequence (Press et al., 1986). A threshold
for determining the onset of a note. Figure 1 1 illustrates the output
procedure for the piano expression Level 1 performance. Vertic
represent tracked note onsets, which were based on the filter fu
output shown as a dotted line above the signal envelope. An arr
dicates the tracking of a subtle legato transition.
Figure 12 illustrates timings for clock-time performance of the rh
of "Thy hand, Belinda." The first row of timings (msec) is cum
individual note durations are shown in the second row. All clock times
were based on an M.M. quarter note equal to 63 beats per minute. Figure
13, based on timings extracted by the computer algorithm, plots deviations
of performed durations (msec) from the clock times of Figure 12. Thus,
from Figure 13, note 4 of clarinet Level 2 (with appropriate expression),
it can be seen that the clarinetist lengthened the note almost 600 msec.
As this quarter note was 952 msec in clock time (Figure 12), the total
length was about 952 + 600 = 1552 msec.
Fig. 11. Screen display shows the segmented acoustical signal of Piano 1. The arrow
indicates the tracking of a subtle legato transition.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 147
Fig. 12. Clock-time durations, cumulative (top) and in
In Figure 13, the three levels of expressive
left to right. It is notable that, for a given c
are similar; across expression levels (rows), h
contour differences. To quantify this relations
ings to a paired-correlation analysis (Table
then submitted to cluster analysis with com
few features deserve comment: Mechanical e
TABLE 9
Correlations of Time Deviations
Piano 1 Oboe 1 Trumpet 1 Violin 1 Clarinet 1
Piano 1 1.000
Oboe 1 .658 1.000
Trumpet 1 .167 .443 1.000
Violin 1 .038 .374 .222 1.000
Clarinet 1 .007 .136 .371 .441 1.000
Piano 2 Oboe 2 Trumpet 2 Violin 2 Clarinet 2
Piano 2 1.000
Oboe 2 .831 1.000
Trumpet 2 .677 .824 1.000
Violin 2 .156 .389 .271 1.000
Clarinet 2 .628 .726 .475 .545 1.000
Piano 3 Oboe 3 Trumpet 3 Violin 3 Clarinet 3
Piano 3 1.000
Oboe 3 .421 1.000
Trumpet 3 .135 .621 1.000
Violin 3 .442 .661 .721 1.000
Clarinet 3 .523 .707 .803 .922 1.000
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
148 Roger A. Kendall & Edward C. Carterette
Fig. 13. Performance deviations from ontological time (msec) for three expressive le
and five instruments.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 149
Fig. 14. Cluster analysis (complete linkage) of correlation
expressive levels.
relate less well (mean = .287, r = .007-.658
Level 2 (mean = .553, r = .156- .831) and expressive Level 3
(mean = .589, r = .135-.922). Cluster analyses with complete linkage
(Figure 14) show the close timing correlation of piano and oboe across
expressive levels. Similarly, clarinet and violin timing correlations form a
pair. Trumpet timings consistently appear as a separate branch.
Within instruments and across levels of expression, the piano perfor-
mances each have a distinctive timing profile. The timing correlation of
expressive Levels 1-3 for piano averaged only -.045. The four imitators
were much less able to create distinctive separations between expressive
level timings, with mean correlations of .213, .433, .412, and .560, re-
spectively for oboe, clarinet, violin, and trumpet. Indeed, a comparison
of timing correlations for the first six notes versus the last six notes (omit-
ting note 13) supports the above finding. In general, timing correlations
are greater for the second half of the phrase with increasing expressive
level (unfortunately, the data sets are too numerous to report here).
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
150 Roger A. Kendall & Edward C. Carterette
AMPLITUDE
For each of the five instruments at each of the three expressio
root-mean square (RMS) amplitude values were calculated wit
secutive 100-sample (3.5-msec) windows. Figure 15 presents th
this RMS analysis. Note that time vs. amplitude relations can be d
from the vertical position of note-onsets (triangles). In genera
values by note are smaller for the middle, appropriate expressive
for the other two; the dynamic range is greatest for Level 3, fol
Level 1. One notes that, for instruments commonly played w
in this country (oboe, violin, and trumpet), amplitude vibrato
creases with each expressive level (although the dynamic range of
vibrato is barely visible in Figure 15). Total duration of the
shown by the position of the last plotted point in each of the
Without exception, the expressive performances are longer. I
graphs of deviations from ontological time (Figure 13) show t
few exceptions, note values get longer with increasing expres
The RMS values based on 100-sample segments were average
note, producing a single value. Correlations of RMS means w
struments and between expressive levels are high in the cas
clarinet, and trumpet, with mean correlations of .85, .90, an
spectively. The oboe RMS means are high between expressive L
2 (.67) and 2 and 3 (-.43); however, almost no correlation
between Levels 1 and 3 (-.05). Violin RMS correlations are co
low (mean = .12) across expressive levels.
INTERACTIONS OF TIMING AND AMPLITUDE
We correlated time deviations and mean RMS values for each note
within instruments and across expressive levels. Figure 16 plots z-scores
of mean RMS and time deviations for piano, clarinet, trumpet. RMS
values (dotted lines) are provided for the three levels of expressiveness;
time deviations (solid lines) are only plotted for expressive Levels 2 and
3. Correlations of time deviations and RMS values are moderately high
and largely negative (Table 10). This indicates a general tendency for notes
longer than ontological time to be lower in RMS value, and vice versa.
Sixteenth notes in particular have a tendency to be higher in RMS and
shorter than ontological time, in other words, the sixteenth notes are
emphasized in time and loudness.
Figure 16 shows that the overall patterns of RMS and time deviations
are, at the macrostate level, similar, sometimes with a displacement (slight
phase shift) on the time axis. However, at a microstate level, patterns
homogeneous within instruments can be quite different across them. A
general tendency at the microstate level is for the performer to alternate
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 151
Fig. 15. Relative root-mean-square (RMS) grap
strument in order of intended expressive le
mark note onsets, (a) Piano, (b) clarinet, (c) o
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
152 Roger A. Kendall & Edward C. Carterette
Fig. 15 continued
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 153
Fig. 15 continued
TABLE 10
Correlations of Time Deviations and RMS (Z-Scores)
Expressive Level Piano Clarinet Oboe Violin Trumpet
El .60 -.63 .04 .12 -.21
E2 -.35 .06 .31 -.17 -.67
E3 -.145 -.22 -.49 -.57 -.50
time deviations and RMS c
ample, the pattern short-lo
is pervasive. But rule-writin
the wide range of microst
the General Discussion to the time/RMS interaction.
Experiment 3
In an initial attempt to explore the relative contributions of variables
such as timing and RMS to expressive musical communication, we syn-
thesized a new set of signals. Experiment 3 deals only with timing de-
viations.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
154 Roger A. Kendall & Edward C. Carterette
Fig. 16. Plots of root-mean-square (RMS) averages and time deviations as standardized
(Z) scores for (a) piano, (b) clarinet, and (c) trumpet.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 155
Fig. 16 continued
PROCEDURE
We used a Kwai Kl-M digital synthesizer module driven by MIDI prog
assembly code in order to achieve maximum timing accuracy conservativel
msec (monophonie). The factory internal signals, based on single period
instruments, were used modified as follows: The "sustain level" portions of t
were adjusted to achieve approximately equal loudnesses. No amplitude o
modulations were permitted; the amplitudes were fixed across notes. The obo
de novo. Note-on events alone were initiated at the clock times of the real p
for the three levels of expressiveness. Data were collected under the same
procedure as described in Experiment 1.
SUBJECTS
Twelve nonmusicians served as subjects. None of these subjects particip
previous expression experiment.
RESULTS AND DISCUSSION
The categorization outcomes are presented in Table 11. A
of these data with those of Table IB shows notable similarity o
of hits. Consider, for example, violin matrices: The same
expressive Level 2 and 3 categorization is evident. Timing
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
156 Roger A. Kendall & Edward C. Carterette
TABLE 11
Nonmusician Categorization Frequencies: Artificial Performances
Clarinet Oboe Violin Trumpet
Expressive Level PI P2 P3 PI P2 P3 PI P2 P3 PI P2 P3
El 732 705 552 822
E2 480 282 228 264
E3 11 10 345 552 246
Mean hits (%) 69 56 33 56
reasonably natural signals, se
racy that is nearly equal to tha
not imply that these synthet
that they contained some asp
reasonable categorization acc
Experiment 4
In this experiment, we applied a third convergent method to examine
the communication of musical expression. It was intended to assess the
relationship between expressive level and the perceived magnitude of ex-
pression gauged by ratings.
PROCEDURE
Both natural and artificial sets of mechanical, appropriate, and exaggerate
performances were used. Each set of stimuli, natural and artificial, was p
group. A given subject received a randomly determined order within sets and
assigned to hear either natural or artificial signals first, followed by the rem
The subject's task was to rate a stimulus on a scale from 0 (without) to
expressiveness. The subjects were instructed as follows:
We want to know how musicians communicate expression to listeners
expression can be likened to the expression of an actor in speaking his par
speak in a monotone, in a manner appropriate to the idea, or he might exa
In this experiment, five instrumentalists played the same music with di
levels of expression. I will now play some examples which are in order of
expression. [Examples were played.]
You will hear a set of 15 interpretations. Rate each example along t
according to your judgement of the degree of expression.
A labeled scale was presented on the computer screen. Subjects responded
pointer on the computer screen by using a mouse. The position of the pointe
99, was read by the computer and stored as score data.
SUBJECTS
Eight musicians served as subjects. None had participated in previous experi
musicians were used in order to parallel the matching Experiment 2.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 157
RESULTS AND DISCUSSION
ANO VA on repeated measures (n = 8) indicated significan
between mean ratings of the natural performances [F(14,
p < .001] but not of the synthetic performances. Mean expr
as a function of instrument are plotted for both natural a
performances in Figures 17a and 17b, respectively. Tukey
post-hoc paired-comparisons revealed no significant differe
any expressive Level 2 or 3 means. Post-hoc analyses (Tuke
indicate that those means in Figure 17a that are 25.86 uni
statistically different. This value clearly differentiates the me
formance ratings from those of expressive performance
natural renditions.
We combined values for expressive Levels 2 and 3 on
post-hoc analysis. These data were subjected to a Group
Natural) by instrument/expression (mechanical/expressive) by
peated measures ANOVA, with adjustment of a for multip
results indicate that the profile of means (Figure 17) is differen
and synthetic (time-deviation-based) performances [F(
p<.046].
Intersubject variability for the rating task is very high and clearly ac-
counts for the general lack of significant differences. We strongly suspect
that data from more subjects would differentiate the general trends shown
in Figures 17a and 17b, namely that appropriate expression (Level 2)
would be rated higher in expressiveness than either mechanical or ex-
aggerated performances.
Cluster analysis with single linkage (nearest neighbor) was performed
on natural performance ratings (Figure 18). There is a general tendency
for ratings to cluster according to expressive level. Violin Level 3 is a
conspicuous exception, being displaced into association with expressive
Level 2. You will recall that subjects tended to switch violin Level 3 with
violin Level 2 in categorizing.
General Discussion and Conclusions
We conclude that, in general, both musicians and nonmusicians can
discern expressive intent. Greater than chance responses were obtained for
all cases other than artificial violin performances (Table 11). Differing
response tasks, however, yielded somewhat different results.
In particular, matching yielded lower overall hit rates than did cate-
gorization. There was a tendency for adjacent intended expressive levels
to be confused by the listener in a matching task. However, statistical
analysis confirmed that, across instruments, intended expressive levels
were matched with the models.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
158 Roger A. Kendall & Edward C. Carterette
Fig. 17. Mean expressiveness ratings across instruments and levels of expression for (a
natural and (b) artificial stimuli.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 159
Fig. 18. Cluster analysis (single linkage) of expressivenes
Ratings of expressiveness produced data par
Expressive Levels 2 and 3 produced nearly i
worth noting that for both natural and art
expressiveness ratings were highest for obo
pressive Level 2, appropriate expression. The
subject that higher expressiveness ratings we
deed, it is likely that our instructions work
Cluster analysis of the ratings of natural p
well between levels of expressiveness (Figur
ratings were reversed in the cluster diagram. A
in the categorization task.
This reversal of expressiveness levels has it
task, which permits extensive review of the
choices, whereas matching does not. Extensive
task leads the listener further astray. Confirm
was found in acoustical signal cross-correlati
Level 3 amplitudes and time-deviations w
(r = .530, .542, respectively). Piano Level 2 an
were not correlated. Piano Level 3 and all violin time-deviations were
moderately correlated (.349, .306, .442, respectively). In all other cases,
piano models and intended choices had time-deviations that were posi-
tively correlated.
These data suggest the existence of conditions conducive to the con-
fusion of violin categories. Apparently, repeated listenings exposed salient
features for reversing categories; reversal was not as evident in matching
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
160 Roger A. Kendall & Edward C. Carterette
data. Such a feature may be the peculiar pattern of longer and shorter tim
deviations around note 11 for violin Levels 2 and 3, which is not eviden
in the cases of other instruments (Figure 13).
An interesting finding was the lack of timing correlation among ex-
pressive levels for the piano, in contrast to moderate, positive correlations
among levels for other instruments. This world-class pianist, given the task
of creating three distinctive expressive levels, was able to do so in term
of timing-deviation profiles, yet the other, less-experienced instrumental-
ists did not do so. Correlations among RMS values within instrument
were high for piano, clarinet, and trumpet, and less homogeneous for oboe
and violin. This finding generally supports the proposition that RMS va
ues were less useful as cues for categorizing than timing-deviation profiles
This conclusion is further supported by the fact that, for nonmusician
with artificial renditions based on timing-deviations, the categorization hit
levels were comparable with the natural performances.
The relation of "musical structure" to deviations from canonical no-
tation is emphasized by a number of researchers, including Todd (1985)
and Clarke (1988).9 Clynes (1983) suggests the idea of "composers
pulses," which consist of periodic patterns of deviations, a generalization
strongly disputed by Repp (1989).
Our data fail to support something as strict and invariant as the musical
grammar, performer grammar, or listener grammar. Piano Levels 1 and
3 of Figure 15, for example, show the following pattern of relative RMS
values for the first six quarter notes (high = H, low = L): HLH-HLH.
Piano Level 2 (appropriate) is, on the other hand, LLH-LLH. Clarinet
RMS (Figure 15), across all levels, is in the pattern: LHL-LHL. In Figure
15e, trumpet, a symmetrical arch of RMS values is observed. Similar
patterns and variations of patterns likewise can be discerned in timing-
deviations.
It is clear that all performers signal salient structural points by temporal
and dynamic contrast. Examples are the sixth to seventh note transition,
signaling pitch-time contour direction change, and the final cadence. The
number of ways to signal these structural features is very large. Contrast
is a key operative principle, not merely "accent" in terms of increased
magnitude. Contrast patterns form the microstructural tissue that fills
structural gaps.
9. Relevant to our conceptual position is that of Clarke (1985), who acknowledges that
"... musical structures may be thought of as possessing a double aspect: A relatively fixed
canonical representation ... in a score and a more flexible and indeterminant represen-
tation that is evident in expressive performance." (p. 211). In addition he notes the vari-
ability in performance attributable to performer intent. Philosophically, he admits flex-
ibility, but chooses the more formal approach in his experimental work (Clarke, 1988),
focusing on "the structure."
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 161
There is a very large number of possibili
performer for solving the problems of music
tinguishes the great performer from the mer
invention of the solutions. In this sense, the c
similar roles: They both solve musical prob
References
Bengtsson, I., & Gabrielsson, A. Analysis and synthesis
(Ed.), Studies of music performance. Stockholm: R
1983, #39, pp. 27-60.
Campbell, W. C, 6c Heller, J. J. Judgements of interpr
presented at the Research Symposium on the Psy
University of Kansas, Lawrence, Kansas, February
Campbell, W. C, & Heller, J. J. Psychomusicology &
separate ways. Psychomusicology, 1981, 1(2), 3-14
Clarke, E. C. Structure and expression in rhythmic p
& R. West (Eds.) Musical structure and cognition.
Clarke, E. Generative principles in music performanc
processes in music. Oxford: Clarendon Press, 1988
Clynes, M. Expressive microstructure in music, linked
(Ed.), Studies of music performance. Stockholm: R
1983, #39, pp. 76-181.
Gabrielsson, A. I iming in music performance and its re
Sloboda (Ed.), Generative processes in music. Oxfo
27-51.
Houle, G. Meter in music, 1600-1800. Bloomington, IN: Indiana University Press, 19
Kendall, R. A sample-to-disk system for psychomusical research. Behavior Research Met
ods, Instruments, & Computers, 1988, 20(2), 129-136.
Meyer, L. B. Explaining music: Essays and explorations. Berkeley, CA: University
California Press, 1973.
Monahan, C, Kendall, R., oc Carterette, h. Ine ertect or melodic and temporal contou
on recognition memory for pitch change. Perception and Psychophysics, 1987, 41
576-600.
Nakamura, T. The communication of dynamics between musicians and listeners through
musical performance. Perception and Psychophysics, 1987, 41, 525-533.
Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. Numerical recipes.
Cambridge, England: Cambridge University Press, 1986, pp. 133-135.
10. A preliminary interpretation of some of the data re
as invited papers to the First International Conference
Kyoto, Japan, 17-19 October 1989, and to The Institu
Seoul National University, Seoul, Korea, 24 October
11. We thank our many subjects, musicians and no
arranging and conducting the recording sessions; Johan
piano models of expressive levels, and the graduate mu
these models: Amanda Walker, clarinet; Margaret Gilin
pet; and Jacke Carrasco, violin. We are grateful to our
Cornett, Scott Lipscomb, Kathryn Vaughn, and Suk W
provided by the UCLA Academic Senate Committee
corporated, Pasadena, CA.
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
162 Roger A. Kendall & Edward C. Carterette
Repp, B. Perceptual evaluations of four composers' "pulses." In Proceedings of the Firs
International Conference on Music Perception and Cognition, Kyoto, Japan: The Jap
anese Society of Music Perception and Cognition, 1989, pp. 23-28.
Roads, C. Interview with Marvin Minsky. Computer Music Journal, 1980, 4, 25-39.
Roads, C. Grammars as representations for music. In C. Roads & J. Strawn (Eds.), Foun
dations of computer music. Cambridge, MA: MIT Press, 1985, pp. 401-442.
Seashore, C. E. Psychology of music. New York: McGraw-Hill, 1938. (Reprinted, New
York: Dover, 1967).
Senju, M., & Ohgushi, K. How are the player's ideas conveyed to the audience? Musi
Perception, 1987, 4, 311-323.
Sundberg, J. Computer synthesis of music performance. In J. A. Sloboda (Ed.), Generative
processes in music. Oxford: Clarendon Press, 1988, pp. 52-69.
Sundberg, J., Frydén, L., & Askenfelt, A. What tells you the player is musical? An analysis-
by-synthesis study of music performance. In J. Sundberg (Ed.), Studies of music per-
formance. Stockholm: Royal Swedish Academy of Music, 1983, #39, pp. 61-67.
Todd, N. A model of expressive timing in music. Music Perception, 1985, 3 (1), 33-58.
Tro, J. How loud is music? Experience with the evaluations of musical strength. In Pro-
ceedings of the First International Conference on Music Perception and Cognition.
Kyoto, Japan: The Japanese Society of Music Perception and Cognition, 1989, pp.
353-358.
Winograd, T. Understanding natural language. New York: Academic Press, 1972.
Appendix
Subject Instructions for the Categorization Procedure
We are interested in musical communication; the ability of the performer to send a
message to the listener. None of the parts of our study is a test of musical talent or a test
of your ability to hear.
Before we run the main experiment, a practice session has been designed to acquaint
you with the procedure and equipment.
You will notice on the screen bars of various colors in different positions. The white
and yellow bars stand for musical selections.
Use the mouse to move the red pointer to a bar. Press the LEFT button. You will hear
the musical selection associated with that bar. Try another bar by moving the pointer and
playing the selection.
The screen is divided into two sections by a white line. The bars at the bottom of the
screen are to be moved to one of the columns headed by yellow and purple bars. The purple
bar defines a group. The yellow bar underneath the purple bar is the model (prime example)
for the group. The model can be played by pointing and pressing the LEFT button.
The task is to listen to the models, and assign choices which best fit within a group.
To move a white bar to a group, point to the bar and press the RIGHT button. The bar
will turn red. Now, point to the purple group bar and press a button. The white bar will
move from the bottom of the screen to the selected group.
Try it.
A bar can be moved from one group to another in exactly the same manner. Try it.
Also, you can replace a bar to the selection area at the bottom of the screen. Select the
bar and point anywhere below the line and press the LEFT button. The bar will return
to its original position.
Try it.
Now, as practice, listen to the models and then move choices to the group it best fits.
You will have two in each group.
Are there any questions?
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms
Musical Expression 163
We want to know how musicians communicate expr
pression can be likened to the expression of an actor i
in a monotone, in a manner appropriate to the idea,
In this experiment, five instrumentalists played the sam
of expression: with little expression, with moderate
expression.
The yellow bar under each group is the model example. The order of the groups is
random as is the order of the choices. Listen to the model examples. Move choice examples
to the group they best fit.
NOTE that there are FOUR choices which fit under EACH model for a total of 12.
EACH CHOICE in EACH GROUP is a different instrument.
You may listen to any choice or model as often as you like. You may change your mind
and move choices from one group to another until you are satisfied.
Remember- group on the basis of similar level of musical expression, not instrument
type.
Are there any questions?
This content downloaded from
193.190.238.28 on Thu, 15 Feb 2024 11:13:32 +00:00
All use subject to https://about.jstor.org/terms