Fpsyg 06 00316
Fpsyg 06 00316
"m − 1 #
rhythmic class of mora-timed languages (in which moras are sup- X
posedly perceived as roughly equal in duration, e.g., Japanese, r PVI = dk − dk + 1 /(m − 1) (2)
West Greenlandic) was added. Experimental studies, however, k=1
Time-Intensity-Grid-Representation of incoming continuous German share phonological parameters that are known to affect
speech is based on innate perception mechanisms that help them the rhythm metrics. Both of these languages are classified as
to construct the first representation of their language. Ramus stress-timed in terms phonetic timing patterns captured by met-
et al. (1999) observed that languages with similar rhythmic ric scores (Grabe and Low, 2002) and exhibit the phonological
properties tend to share more typological characteristics of characteristics typical of stress-timed languages (Dauer, 1987;
grammatical and phonological structure. This led to the hypoth- Schiering, 2007). Therefore German learners of English do not
esis that alongside with constructing the first representation of have to acquire phonological characteristics like production of
their native language, babies use rhythmic patterns to bootstrap complex syllables and complex consonantal clusters, opposition
on the syntactic properties of the language and on lexicon of long and short vowels, etc. Table 1 provides the metric scores
(Christophe and Dupoux, 1996; Mazuka, 1996; Mehler et al., in monolingual adult speech delivered by adult native speak-
1996, 2004; Nespor et al., 1996). Rhythmic patterns are also used ers of either German or English, as reported in various studies.
to develop strategies for segmentation of continuous speech No unambiguous tendency is evident as for in which of these
and consequent word extraction and learning (Christophe et al., languages the durational variability is higher. %V seems a bit
2003; Thiessen and Saffran, 2007). In light of these considera- lower in German, which can be explained by a slightly higher
tions, we could suggest that the ability to recognize the durational syllabic complexity and a higher number of C clusters in Ger-
cues pertaining to the speech rhythm is of the utmost importance man than in English (Delattre, 1965 cited in Gut, 2009). Com-
for language acquisition and speech processing (e.g., for devel- parison of the metric scores for German and English with those
opment and implementation of language-specific segmentation reported for traditional syllable-timed languages (Ramus et al.,
strategies). Therefore, sensitivity to timing differences, which is 1999; Grabe and Low, 2002; White and Mattys, 2007) shows
already observed in infancy, also persists in adulthood (Ramus that both German and English exhibit higher duration variability
and Mehler, 1999; White et al., 2012). Adults also use rhythmic and lower %V.
cues to recognize the foreign accent in L2 speech and to detect
the linguistic origin of the speaker (Kolly and Dellwo, 2014), to Research Question
evaluate the degree of accentedness in L2 speech (Polyanskaya Previous studies have showed that rhythmic patterns change as
et al., 2013), to extract discrete linguistic units from continuous language acquisition progresses even when the native and the
speech (Christophe et al., 2003). target languages of the learners are rhythmically similar (Ordin
et al., 2011). In our study, we were interested whether these
Rhythm Changes in Second Language developmental changes in speech rhythm are perceptually rel-
Acquisition evant. It is already known that the listeners are sensitive to
Papers focussed on acquisition of speech rhythm in L2 are rare. the rhythmic differences between rhythmically contrastive lan-
Most of these studies concentrate on comparing rhythm in L2 guages (Ramus and Mehler, 1999) as well as between German
speech with the target represented by an adult native speaker. and English, i.e., between rhythmically similar languages (Vicenik
Examined L2 speech is usually produced by rather advanced and Sundara, 2013). Listeners are also able to distinguish rhyth-
learners. The results showed that the rhythm scores in L2 speech mic patterns of the utterances from the same language (White
are intermediate between those in the native and the target lan- et al., 2012; Arvaniti and Rodriquez, 2013). Therefore, we think
guage of the learners (White and Mattys, 2007). This is usually that the fine distinctions between rhythmic patterns typical of L2
interpreted as the influence of the native language of the learner English of adult learners at different proficiency levels might be
on his speech production in the L2. Low et al. (2000) showed that detected.
nPVI-V in L2 Singaporean English is influenced by the L1 Chi- These important findings regarding sensitivity of listeners to
nese language. Rhythm in L2 English was shown to be affected by rhythmic differences have been done using discrimination tests.
L1 Chinese, French, Spanish, Romanian and Italian (White and We know that people can discriminate between utterances even
Mattys, 2007; Gut, 2009; Mok, 2013, etc.). with small differences in durational variability. However, certain
The studies with the emphasis on development of rhythmic functions attributed to speech rhythm are not based on discrim-
patterns in the course of L2 acquisition are even rarer. One of ination, but rather on classification (segmentation, evaluation of
the few exceptions is the study by Ordin and Polyanskaya (2014) accentedness, detection of linguistic origin of the speaker, etc.).
who compared how speech rhythm develops in L1 and in L2 Classification is different from discrimination. The listener may
acquisition. They found that speech rhythm develops from more be able to perceive some acoustic differences when attending to
syllable-timed toward more stress-timed patterns both in child them, but nevertheless ignore these differences when attributing
L1 and in adult L2 speech. The authors showed that both vocalic an acoustic signal to a certain group, or when making a decision
and consonantal variability in duration in L2 English increases as whether an acoustic signal is a representative of a certain class.
a function of the length of residence in the UK in adult speech In this particular study we focused not merely on whether the
when the target (English) and the native (Italian or Punjabi) differences in L2 rhythm between utterances delivered by learn-
languages of the learners are rhythmically contrastive. ers at different proficiency levels are detected. We were rather
Ordin et al. (2011) showed that durational variability in interested in whether listeners are able to reliably classify the
speech of L2 learners also increases with proficiency growth utterances of L2 learners into distinct classes based on timing
when the target (English) and the native (German) languages differences between utterances, and if so, which timing patterns
of the learners exhibit similar rhythmic properties. English and listeners use to form the classes.
TABLE 1 | Metric scores for German and English as reported in various studies.
English 38.0 64.0 73.0 70.0 59.0 White and Mattys, 2007
42.0 55.7 Dellwo and Wagner, 2003
41.1 57.2 64.1 56.7 Grabe and Low, 2002
40.1 53.5 Ramus et al., 1999
45.7 54.8 59.9 68.9 60.0 Arvaniti (2012)—overall score
41 48 55 83 68 57 Arvaniti (2012)—scores obtained on read sentences that were deliberately designed to
enhance durational variability
50 46 51 57 49 53 Arvaniti (2012)—scores obtained on read sentences that were deliberately designed to
inhibit durational variability
44 50 56 61 55 55 Arvaniti (2012)—scores obtained on sentences uncontrolled for phonotactics
48 66 66 77 68 59 Arvaniti (2012)—scores obtained on spontaneous speech
Speech Material Germany at the time of the recordings. However, they reported
to have little to no command of German, lived in close English-
Participants speaking community at the UK military bases in Nord-Rhein
Piske et al. (2001) analyzed a range of factors that influence pro- Westphalia, worked in only English-speaking environment, had
nunciation of L2 learners. These factors, among others, included English as their home and neighborhood language, came from
the age and the length of exposure to the L2, amount of L2 monolingual English-speaking families and were raised in mono-
use, language learning aptitude and motivation, learning mode. lingual environment.
In our study we controlled these factors by collecting the rel-
evant information in a detailed language-background question- Elicitation Procedure
naire (see Appendix 1 in online Supplementary Materials). Based The selected learners of English first underwent a pronuncia-
on the questionnaire, we selected only those speakers who formed tion test so that we could assess the learners’ mastery of pro-
a homogeneous group and varied only in the degree of L2 mas- nunciation. The test was devised by the authors and consisted
tery. The relevant information gleaned from the questionnaire of two parts: Perception and production. The perception part
was further verified in an informal interview during the recording was compiled from Vaughan-Rees (2002) and included phoneme
sessions. recognition, emotion recognition, intention recognition tasks.
We have recorded 51 German learners of L2 English (17–35 The production part included sentence reading. The sentences
years old, M = 21; 27 females). We selected for participation for production were composed to evaluate segmental realizations
only those people who grew up in or near the city of Bielefeld and prosodic control of the participants in the second language.
in North-Rhein Westphalia. The variety of German spoken in The test ran for approximately 20 min. The test and the details on
that region closely resembles what is understood as a Northern the controlled pronunciation features and assessment criteria can
standard variety of German (Hochdeutsch). The selected partic- be found in Appendix 2 in online Supplementary Materials.
ipants did not exhibit features of regional varieties of German. Further on, a 5-min phonetic aptitude test (PAT) was admin-
All the participants were monolingual native speakers of German istered. The authors devised this test based on the oral mimicry
without speech or hearing disorders. tests described by Pike (1959), Suter (1976), and Thompson
We have also recorded 10 native speakers of English (south- (1991). The test is aimed to predict the general phonetic ability by
ern British variety, 25–40 years, M = 30, 6 females) to compare asking the participant to imitate novel sounds that do not exist in
the metric scores of the L2 learners of English with those of their native or target language and to mimic novel prosodic phe-
the L1 English speakers. The English speakers were residents in nomena (e.g., lexical tones, tonal contours with accents which are
not aligned according to the convention of the learner’s target or ratings between the teachers, we used Cronbach alpha, which
native language, etc.). The test and the details on the assessment is 0.90 for vocabulary, 0.89 for fluency and 0.92 for grammati-
criteria can be found in Appendix 3 in online Supplementary cal accuracy. This shows high agreement between the raters and
Materials. The sounds to imitate were presented by the holder of confirms the reliability of their assessments. We averaged three
the IPA certificate confirming his proficiency in producing and ratings across the parameters for each rater and each interview,
perception of sounds existing in world languages. The perfor- and thus got three mean ratings per learner.
mance of participants in PAT did not correlate with their L2 pro- The teachers’ assessments and the results of the pronunciation
ficiency (we had both high and low proficiency learners with both tests were used to place the learner into one of the following profi-
high and low phonetic aptitude). Neither did the performance of ciency groups: beginners (12 speakers with ratings between 4 and
the L2 learners in the PAT correlate with their performance in the 6), intermediate (9 speakers with mean ratings between 6 and 8),
English pronunciation test with any of the metrics calculated on and advanced learners (22 speakers with ratings above 8)1 . We
their speech. This shows that the ability to imitate rhythmic pat- used the results of the pronunciation test to assess the pronun-
terns of the target language is not related to the general phonetic ciation skills of the learners. Eight speakers were not attributed
aptitude and we can eliminate a potential alternative explanation to any group, either because the teachers did not agree with
that the differences in rhythmic patterns between learners at dif- each other in their assessments (2 speakers were excluded for
ferent proficiency levels are pertaining to the phonetic aptitude this reason) or because of the discrepancy between the results of
rather than to the overall proficiency. the pronunciation tests and the teachers’ assessment of accuracy,
At the next stage, an informal interview was conducted by fluency and vocabulary resources. Pronunciation skills do not
the first author. General questions about preferences in read- always agree with the general assessment of the learner’s reading,
ing and music, lifestyle, career choice, biography, and childhood writing, listening and speaking skills, vocabulary size, grammar
were asked (Appendix 4 in online Supplementary Materials). The accuracy, etc. That is why we deemed it necessary to combine the
interviews were recorded and lasted approximately 12 min long tutors’ assessment of fluency, accuracy and vocabulary on the one
with each participant. hand and the mastery of pronunciation on the other hand. In case
Following the interview, we ran a sentence elicitation task, when pronunciation lags far behind the general L2 mastery or
similar to one used by Bunta and Ingram (2007). Thirty three exceeds the expected level, the learner was not attributed to any
sentences were elicited from each speaker. We used 33 pic- of the proficiency groups.
ture prompts for the elicitation procedure. The participants
viewed picture slides in PowerPoint presentation. Each slide was Segmentation
accompanied with a descriptive sentence. The participants were Thirty three elicited sentences per speaker were annotated in
instructed to remember the sentences. The participants could Praat (Boersma and Weenink, 2010). Annotation was performed
move to the next image or to go back to the previous slide at by the second author. Each sentence was divided into V and C
their own pace. When they had viewed all the slides, they were intervals. The segmentation was carried out manually by the sec-
asked to look at the images again, without the accompanying text, ond author based on the criteria outlined in Peterson and Lehiste
and to recall and say the sentences that they had been asked to (1960) and Stevens (2002) for V and C intervals.
remember. In a very rare case (<5%) when the speaker could not The burst of energy corresponding to the release of the closure
remember the sentence or retrieved a modified sentence from was taken as the starting point of a consonantal interval with the
memory, verbal prompts were used to help the speaker to pro- initial voiceless plosive sound after a pause and at the beginning
duce the correct sentence. For example, the participant said “The of a sentence. Either the stop release, or apparent beginning of a
dog is running after the cat,” and the expected sentence was “The voice bar, or other cues indicating apparent vibration of the vocal
dog is chasing the cat.” The researcher responded to the partici- folds (whatever came first) were considered as the beginning of
pant: “Yes, it is. You could also say chasing, which means running a consonantal interval with the initial voiced plosive. The mark-
after. Can you say what you see at this picture once again?” One ers of the turbulent noise were taken as the beginning of fricative
verbal prompt was sufficient to elicit the expected sentence when consonants. The beginning of the first formant was taken as the
there was a mismatch in the first trial. The recording ran con- beginning of a sonorant consonant. Consonantal intervals in the
tinuously throughout the sentence elicitation procedure. The list
of elicited sentences and the examples of picture prompts can be 1 According to the evaluators’ opinion, we did not have true beginners, and our
found in Appendix 5 in online Supplementary Materials. participants better correspond to the lower-intermediate (B1.1 according to the
The tests and recordings were made individually with every Common European Framework for Languages), upper-intermediate (B2.1) and
participant in a sound-treated booth of the audio-visual studio at advanced (C1.1 and C1.2) levels. CEFL specifies skills the learner should achieve
at each of the six levels: A1, A2, B1, B2, C1, C2. However, as each level is usu-
the Bielefeld University in Germany. The recordings were made ally covered in language schools during two intensive courses, teachers split each
in WAV PCM at 44 kHz, 16 bit, mono. level into two sublevels, e.g., C1.1 and C1.2. However, official CEFL guidelines do
not split six levels into sublevels, and division into C1.1 and C1.2 is done—rather
Assessment of Learners’ Proficiency arbitrarily—by the teachers. We did not want to resort to commonly used place-
Three experienced teachers of English as a foreign language lis- ment tests to evaluate the learners’ proficiency level because standard placement
tests are designed to make an initial assessment to place the student into the course
tened to the recorded interviews and evaluated learners’ fluency, that fits his level. Placement tests are not designed to estimate the proficiency level
grammatical accuracy, and vocabulary resources. They used a 10- for certification of the achieved proficiency level. Therefore, using several human
point scale for each parameter. To estimate the consistency of evaluators (to avoid human bias) is the best methodological option, in our opinion.
middle of a sentence were considered to start after the vowel fin- sentence to account for possible developmental changes in speech
ishes, and to stretch until the onset of the following vowel. The tempo in the course of L2 acquisition, and for the interaction of
end of the consonantal interval in the final position was marked speech rhythm and speech tempo.
at the end of the acoustic energy. The consonantal intervals in Although some rhythm metrics were claimed to be better
the final positions were considered to start immediately after the than others at quantifying rhythm, there is no consensus on
vowel and finish at the end of the fricative noise (for obstruents) which metrics have more discriminative power. White and Mat-
or at the end of the first formant (for sonorants). Conventional tys (2007), for example, advocated for pairwise metrics, while
procedure based on the analysis of the waveform and the spec- Ramus et al. (1999) favored 1C and %V. Loukina et al. (2011)
tral characteristics of the speech signal was based to identify the performed the analysis of 15 rhythm metrics and in experiments
boundaries of the vocalic intervals. The end of the vowel was separating pairs of languages by rhythmic properties showed that
identified by the abrupt change in the vowel formant structure a rhythm measure that is successful at separating one pair often
or by termination of the formants, and by the significant drop in performs poorly at separating another pair. Considering the lack
the waveform amplitude. The onset of the vowel was marked at of consensus on the optimal set of metrics, we decided not to limit
the beginning of the voicing identified as the start of the regular our investigation to the metrics which were found more useful
vertical stripes on the spectrogram in the region of the second in certain studies. Instead we tested all the metrics in order to
and higher formats. The marker indicating the vowel onset or see which ones better capture the differences in rhythm between
offset was placed at the point closest to the zero crossing on the sentences produced by L2 learners at different proficiency levels.
waveform. A series of by-sentence ANOVA tests (Table 3) with the values
In difficult cases where it was necessary to place the boundary of the metrics as the dependent variables and proficiency level as
between the consonantal interval represented by a sonorant con- the factor shows that non-normalized rhythm metrics (1V, 1C,
sonant with a clear formant structure and a vowel, the decision rPVI-v, rPVI-c) and %V do not differ between the proficiency
was based on the amplitude of the first format. Such difficult cases levels. As the raw metrics do not differ between the proficiency
were associated with the boundaries or categorizing allophones levels, we are not including them into further statistical tests.
of /l/ (e.g., in the words girl, ball, table). We based our segmen-
tation on purely phonetic criteria, therefore /l/ was sometimes
marked as a vowel (in case of a vocalized [l]), and sometimes
TABLE 2 | Metrics used in this study.
as a consonant. The decision was based on (1) auditory analy-
sis by an experienced phonetician, and (2) amplitude of the first Metric Description
formant. If the amplitude did not drop after the preceding vowel
and the segment was perceived by a phonetician as a vocalized [l], %V Percentage of vocalic intervals
then the segment was segmented as a vocalic interval. We did not 1V Standard deviation of vocalic intervals duration
want to pre-define certain types of segments either as consonan- 1C Standard deviation of consonantal intervals duration
tal or vocalic. We adopted a phontic approach to speech rhythm. nPVI-V Averaged of the mean differences between successive vocalic
intervals
Within the adopted framework, speech rhythm is represented by
the surface timing patterns, which are purely phonetic, and pho- nPVI-C Averaged of the mean differences between successive consonantal
intervals
netic properties are not discrete and cannot be pre-assigned to a
rPVI-V Averaged difference in duration of successive vocalic intervals
certain phonological category a-priory.
rPVI-C Averaged difference in duration of successive consonantal intervals
Pauses and hesitations were not included into V or C intervals
VarcoV Coefficient of variation of vocalic intervals, i.e., standard deviation
and were discarded. If the same type of the interval was annotated
divided by the mean
prior and following the pause, we treated them as two separate
VarcoC Coefficient of variation of consonantal intervals, i.e., standard
intervals because they are likely to be perceived as such. Final deviation divided by the mean
syllables were included into analysis. MeanV Mean duration of vocalic intervals
MeanC Mean duration of consonantal intervals
ANOVAs on the rate-normalized rhythm measures revealed the pairwise durational variability and speech rate discriminate
significant difference between proficiency levels at p < 0.0005 between the proficiency levels much better than utterance-wise
for each metric. These metrics were included into multivari- variability.
ate model. The MANOVA test with nPVI metrics, Varco met- We also wanted to see how close the advanced German learn-
rics and mean durations of V and C intervals as the dependent ers of English are to their target in regard to acquisition of rhyth-
variables and proficiency level as the factor revealed a significant mic patterns. For this, we compared the metric scores calculated
effect of proficiency level on the rhythm measures, 3 = 0.856, on the sentences produced by the advanced learners of English
F(12, 2822) = 19.06, p < 0.0005, µ2 = 0.075. Figures 1–3 show with those calculated on the sentences spoken by native English
that the metric scores increase as L2 acquisition progresses, which speakers. T-tests (Table 7) reveal that the metric scores do not
indicates that German learners of English deliver L2 speech at
a higher rate and with higher degree of stress-timing as their
L2 mastery grows. The differences between the proficiency lev-
els pairwise for each metric are mostly significant (significance
values are given in Table 4).
The MANOVA was followed up with the discriminant anal-
ysis. We used only those metrics that were found to differ sig-
nificantly between proficiency levels in our previous tests. The
analysis revealed two discriminant functions. The first function
explained 96.9% of variance, canonical R2 = 0.14, and the sec-
ond explained only 3.1% of variance, R2 = 0.005. In combination
these functions significantly differentiated the proficiency levels,
3 = 0.856, χ2(12) = 220.318, p < 0.005. The second function
alone did not significantly differentiate between the proficiency
levels, 3 = 0.995, χ2(5) = 7.232, p = 0.204. This can also be
seen on the discriminant function plot (Figure 4). Classification
results (Table 5) show that the model classifies correctly 57% of
cases (chance is 33%).
The correlations between the outcomes and discriminant
functions revealed that the measures of local—pairwise—
variability and of speech rate loaded on the first function, and FIGURE 2 | nPVI-v and nPVI-r in the sentences produced by native
global measures of variability loaded more highly on the second English speakers and by German learners of English at beginning,
function (see Table 6). As the first function explains substantially intermediate and advanced proficiency levels. Error bar shows 95%
more variance that the second function, we can conclude that confidence interval.
FIGURE 1 | VarcoV and VarcoC in the sentences produced by native FIGURE 3 | meanV and meanC in the sentences produced by native
English speakers and by German learners of English at beginning, English speakers and by German learners of English at beginning,
intermediate and advanced proficiency levels. Error bar shows 95% intermediate and advanced proficiency levels. Error bar shows 95%
confidence interval. confidence interval.
TABLE 4 | Significance for comparisons of rhythm metrics between proficiency levels pairwise (with Hochberg’s correction).
Function
1 2
The stars indicate larger correlation between each variable and one of the discriminant
functions.
TABLE 7 | t-tests comparing metric scores in English speech of native English speakers and advanced L2 learner of English.
t(1054) 1.216 3.371 −1.314 −0.064 1.818 −1.433 0.131 1.014 4.506 1.412 4.506
p = 0.224 = 0.001 = 0.189 = 0.949 = 0.069 = 0.152 = 0.896 = 0.311 < 0.0005 = 0.158 < 0.0005
less vowel shortening (e.g., French, Japanese). These factors become more stable and consistent as a result of the acquisition
increase the proportion of vocalic material in speech. Therefore, progress.
we assume that %V is a powerful predictor to discriminate To conclude, the analysis confirms significant differences in
between rhythmically contrastive languages. %V can also reflect rhythmic patterns between proficiency levels in L2. Rhythm mea-
the differences in lexical material, i.e., whether the utterances per sures are more consistently stress-timed at higher proficiency
se differ in phonotactic characteristics. In our study, we used the levels. Raw metrics are influenced by conflicting tendencies to
same set of sentences elicited from different speakers, thus the deliver speech at a faster rate and with higher durational vari-
lexical differences that could potentially influence %V were elim- ability at higher proficiency levels, and thus do not increase with
inated. The target and the native languages of the L2 learners proficiency. The developmental tendency to increase the degree
were similar in terms of phonotactic and phonological proper- of stress-timing in L2 speech has been observed even when both
ties, and the learners did not have problems with producing the the native and the target languages of the learner are rhythmi-
clusters of consonants in English sentences. %V captures phono- cally similar. The main research question of our study was to
tactic and phonological differences, but the sentences spoken by investigate the perceptual relevance of the rhythmic differences
learners at different proficiency levels in our study manifested between proficiency levels. Based on the literature review, we
only phonetic differences in timing patterns, phonotactics and assumed that listeners are sufficiently sensitive to the durational
phonological characteristics were the same. Therefore, it is not variability of C and V intervals to discriminate timing patterns of
surprising that %V was not found to differ between sentences L2 utterances delivered by learners at different proficiency levels.
produced by L2 learners at different proficiency levels. We wanted to find out whether the detected differences in timing
The discriminant analysis also reveals that the advanced learn- patterns between proficiency levels are used to classify utterances
ers are more consistent in realization of timing patterns com- into discrete categories. To address this question, we set up the
pared to lower-proficient learners. Inspection of the discriminant perception experiment.
function plot (Figure 7) reveals that the variate scores for the
advanced learners are more compact, while the variate scores for Experiment
the beginners are spread more evenly along the first discrimi-
nant function. The discriminant function plot also showed that Methods
the variate scores for different groups of acquirers overlap (see Participants
overlapping circles on Figure 7). This means that beginners pro- We have recruited 25 native English speakers to act as listeners
duced sentences sometimes with high degree of durational vari- in the perception study (age range—21–24 years, M = 22; 13
ability, and sometimes with lower degree of durational variability. females). Care was taken to form a socially homogeneous group
Advanced learners constantly produced the sentences with high of listeners with the same language background. All participants
degree of durational variability. In other words, the productions were students of Ulster University, monolingual English speak-
of beginners varied greatly between stress-timed and syllable- ers (see our criteria for monolinguality in the description of the
timed rhythm patterns, but productions of advanced learners participants for Experiment 1). All listeners grew up in or around
were more consistently stress-timed. Belfast and were speaking the same regional variety of English
We can draw the same conclusion if we look at Table 5. (verified by a native speaker of English, phonetician and Belfast
Rhythm and tempo measures correctly predict the speaker’s pro- resident). We ensured that the participants did not differ in age,
ficiency level for 57% of sentences. The overall accuracy is sig- educational level, social status, language background, experience
nificantly above chance (33%), but the accuracy for the sentences with foreign languages, and all had equal exposure to educated
produced by speakers on different proficiency levels varies sub- standard British English.
stantially. Sentences produced by advanced speakers were clas-
sified correctly in 90.2% of cases, while sentences produced by Stimuli
beginners were classified correctly only in 38.4% of cases. This We selected sentences elicited from seven speakers per profi-
means that the half of the sentences spoken by beginners exhibit ciency group in the first experiment to prepare the stimuli. The
higher degree of variability that is typical of stress-timed rhythm selected speakers from the advanced group had the highest mean
in 90.2% sentences spoken by advanced learners. On the other ratings given by the evaluators (see description of the first exper-
hand, only 9.8% of sentences spoken by advanced learners exhibit iment, Section Procedure). The selected speakers from the begin-
lower durational variability overlapping with 38.4% of sentences ners had the lowest mean ratings from the evaluators. We also
from beginners. The analysis of the discriminant function plot randomly selected seven speakers from the group of intermediate
and the classification accuracy indicates that the timing patterns learners.
Eighteen out of thirty tree elicited sentences per speaker were was played. When all 108 stimuli were presented, the partici-
selected for stimuli preparation. Six sentences had three stressed pant had a 2-min break before the stimuli were played again.
syllables (e.g., the ‘dog is ‘ eating the ‘bone), six sentences included The training procedure was repeated three times. Supposedly,
two stressed syllables (e.g., the ‘book is on the ‘table) and six sen- during the training session the participants formed new percep-
tences had only one stressed syllable (e.g., it’s ‘raining outside). tion categories for further discrimination between the stimuli
The selected sentences produced by the selected speakers were from different groups. Then the testing session began.
listened to in order to make sure that the sentences were indeed For the testing session, we prepared 270 stimuli (different
pronounced with the expected number of stressed syllables. The from those used in the training session, 5 speakers per proficiency
selected sentences are marked with asterisk in Appendix 5 in group, 18 sentences per speaker). The procedure was the same as
online Supplementary Materials. We selected 378 sentences in in the training session, but the listeners received no feedback, and
total for the perception experiment (21 speakers ∗ 18 sentences). all the stimuli were played only once.
We used the speech resynthesis technique (Ramus and The duration of the experiment varied between participants
Mehler, 1999) to prepare the stimuli. We replaced all consonantal and usually exceeded 90 min. The participants could take a short
intervals in the selected sentences with “s” and all vocalic intervals break and have a rest pause during the training session and
with “a” and resynthesizing sentences with constant fundamen- between the training and the testing session, but not during
tal frequency in MBROLA. The durations of “s” and “a” intervals the testing session. During the experiment the participants were
were equal to the duration of C and V intervals in the origi- offered hot and cold drinks and sweet snacks to help them cope
nal sentences. This technique degraded segmental and most of with possible fatigue. The participants could have their drinks
the prosodic information from the sentences. The only preserved and snacks during the rest pauses as well as during the train-
differences between the identical sentences spoken by learners ing session. The order of stimuli presentation was randomized
at different proficiency levels were the differences in durational using the internal Praat algorithm in attempt to counterbalance
ratios of C and V intervals. Regardless of the recent criticism of for possible fatigue effect.
this technique (Arvaniti and Rodriquez, 2013), its usefulness has
been demonstrated in a number of studies (Ramus et al., 1999; Results
Ramus and Mehler, 1999; Vicenik and Sundara, 2013; Kolly and
Dellwo, 2014, etc.), and we found this delexicalization method to We calculated rhythm metrics on the stimuli that were classified
be optimal for the purposes of our study. by the majority of listeners as Burabah, Losto, and Mahutu. The
metrics were calculated on V and C intervals. We performed the
Procedure discriminant analysis to test whether rhythm metrics statistically
The experiment was carried out with each participant individ- discriminate between the stimuli classified into three groups. The
ually in the phonetic laboratory of Ulster University. The stim- analysis revealed two discriminant functions. The first function
uli were presented to the listeners in two sessions: Training and explained 94.8% of variance, R2 = 0.52, and the second function
testing. The listeners were not informed that the stimuli were explains 5.2% of variance, R2 = 0.05. These functions in combi-
derived from L2 English speech because we did not want the nation significantly differentiate between the groups, λ = 0.457,
listeners use linguistic expectations regarding what the stim- χ2(20) = 149.9, p < 0.0005. The second function alone is not sig-
uli in L2 English might sound like. This might have created nificant, λ = 0.945, χ2(9) = 11.101, p = 0.282. The overall accu-
a bias that would be difficult to control. Instead, the listeners
racy of the model is 69% (chance is 33.3%), accuracy of Burabah
were told that the stimuli were derived from three rare exotic
is 91%, Losto—52%, Mahutu—58.5% (chance level is 33.3% for
African languages. We coined these languages Burabah (sen-
each category). See Table 8 for the details on the classification
tences of the advanced L2 learners converted into “sasasa” stim-
accuracy.
uli), Losto (stimuli based on durations in sentences of intermedi-
The structure matrix (Table 9) reveals that the first function
ate learners of English), and Mahutu (resynthesized sentences of
is loaded with the raw metrics and mean durations of V and C
beginners).
intervals, while the second function is loaded with the normalized
We chose 108 stimuli for the training session (18 stimuli per
metrics. This means that the normalized metrics cannot discrim-
speaker, 2 speakers per proficiency group). Before the session,
inate between the groups, but mean durations and raw metrics
each listener was exposed to nine stimuli, randomly selected from
discriminate between the stimuli identified as Burabah, Losto,
those used later in the training session, 3 stimuli per proficiency
and Mahutu with probability significantly above chance.
group, i.e., per “exotic language.” The listener had 1 min to listen
to these stimuli by clicking with a mouse on nine buttons on the
computer screen. Each button had a caption with the “language”
TABLE 8 | Classification Results (prior probabilities: all groups equal).
name. After 1-min familiarization, the stimuli were presented to
listener one by one. The listener had to identify from which lan- Predicted Group membership
guage (Mahutu, Losto, or Burabah) it originates. The listener was
Mahutu(%) Losto(%) Burabah(%)
expected to click one of the three buttons on the computer screen
Original
with a mouse pointer. Each button had a caption with the “lan- Mahutu 58.5 26.4 15.1
guage” name. On response, the listener was provided with the Losto 28.4 52.2 19.4
feedback which “language” it really was, and the next stimulus Burabah 1.3 7.6 91.1
Metrics Function
I II
The stars indicate larger correlation between each variable and one of the discriminant
functions.
FIGURE 6 | 1V and 1C for the stimuli identified as Mahutu, Losto, or
Burabah. Error bar shows 95% confidence interval.
TABLE 10 | Coefficients and parameters of the regression model with Frequency_Burabah as the dependent variable.
TABLE 11 | Coefficients and parameters of the regression model with Frequency_Mahutu as the dependent variable.
added, and the model was significantly improved. Adding raw tempo equals 5.62 syl/s. for the stimuli identified as Burabah,
rhythm metrics as predictors did not improve the model fur- 4.41 syl/s. for the stimuli identified as Losto, and 4.4 syl/s. for the
ther. Table 10 summarized the main details of the regression stimuli identified as Mahutu. ANOVA analysis showed that the
model. difference in tempo between the groups is significant, F(2, 196) =
The results show that the most important predictors are mean 64.077, p < 0.0005. Pairwise comparisons (with the Bonfer-
durations of V and C intervals, which are negatively correlated roni correction) reveal that the difference lies between “Losto”
with the frequency of “Burabah” response. This means that the and “Burabah” stimuli, while the difference between “Losto”
shorter the speech intervals (i.e., the faster the tempo), the more and “Mahutu” groups is not significant. Speech tempo in the
likely the listener will classify the stimulus as Burabah. stimuli identified as Burabah is 25.7% higher than in the stim-
We also performed stepwise multiple regressions with Fre- uli identified as Losto. This increase is above the threshold for
quency_Mahutu and with Frequency_Losto as dependent variable just noticeable tempo difference (Quene, 2007; Thomas, 2007).
(details of the regression models are in Tables 11, 12 respec- Speech tempo in the stimuli classified as Mahutu is 6.6% slower
tively). The analyses show that the most influential predictors for than in the stimuli identified as Losto, and this difference is below
both Frequency_Mahutu and Frequency_Losto are meanV and the just noticeable threshold.
meanC. The predictors are positively correlated with the fre- Listeners’ sensitivity to speech tempo can be explained by a
quency of “Losto” and “Mahutu” responses, which means that the number of studies in physiology of hearing. Schreiner and Urbas
stimuli with longer C and V intervals (i.e., slower speech rate) are (1986, 1988) showed that auditory neurons fire in response to a
more likely to be identified as Losto or Mahutu. sharp increase in intensity that usually coincides with the vowel
onset. Consequently, the rate at which “s” and “a” alternate in the
Discussion stimuli determines the rate at which the neurons fire. Moreover,
some studies suggest a direct relation between a syllable-length
The results show that listeners classify the stimuli based on speech unit (“sa” unit in our stimuli) and the neural response in the
tempo and ignore the differences in the durational variability auditory cortex (Viemeister, 1988; Greenberg, 1997; Wong and
between the “sasasa” sequences. The Figures 5–7 also show that Schreiner, 2003; Greenberg and Ainsworth, 2004). Besides, the
there is no difference between the stimuli identified as Losto auditory system imposes certain limitations on the speech tempo.
and Mahutu for 1V, 1C, rPVI-V, rPVI-C, meanC, and meanV If the assumptions to the speech rate and to the length of the
measures. Faster stimuli with both low and high variation in syllable-like units are violated, speech processing and decoding
duration of V and C intervals were classified as Burabah, and of speech at the cortical level is compromised (Ghitza and Green-
slower stimuli were almost randomly attributed to either Losto berg, 2009; Ghitza, 2011). Therefore, there is a physiological basis
or Mahutu. We conclude that the listeners formed only two cate- for discriminating fast and slow stimuli, or stimuli with longer
gories: one for faster stimuli that were classified as Burabah, and and shorter syllable-like units.
the other for slower stimuli that were randomly identified either We are not aware of any evidence of direct physiologi-
as Mahutu or Losto. cal correlates for the ability to differentiate fine distinctions in
This result agrees with psychoacoustic data in tempo percep- durational variability. Thus, we assume that differentiation of
tion. Quene (2007) and Thomas (2007) studied just-noticeable fine distinctions in rhythmic patterns involves cognitive pro-
differences in tempo and found that 5–8% change in tempo cessing. Peculiarities of predominant rhythmic patterns in a
(expressed as beats per minute for non-speech stimuli and certain language correlate with grammatical, morphological and
syllables-per-minute for speech stimuli) is easily detected by the other structural characteristics. Rhythmic patterns guide the
subjects. We analyzed the tempo differences between the stimuli way the language is acquired. They influence the strategies of
which were classified as Losto, Mahutu, and Burabah. Average segmentation of continuous speech. Rhythmic cues are exploited
TABLE 12 | Coefficients and parameters of the regression model with Frequency_Losto as the dependent variable.
FIGURE 8 | Splitting the “sasasa” stimuli into two categories based on durational variability of vocalic and consonantal intervals and speech rate.
patterns into separate groups, paid attention to the differences between the stimuli are not sufficiently large to be linguistically
in speech rate and ignored the differences in speech rhythm relevant.
between the utterances produced by the L2 learners at differ-
ent proficiency levels. Faster utterances were grouped separately Acknowledgments
from slower utterances. Both groups included utterances with
high and low durational variability of speech intervals. This trend We acknowledge the financial support of the German Research
is schematically illustrated on Figure 8. The sensitivity of the Foundation (DFG) and the Open Access Publication Fund of
listeners to speech tempo is physiologically determined. The fact Bielefeld University for the article processing charge. This work
that listeners ignore rhythmic differences in classification can was supported by the Alexander von Humboldt foundation. We
be explained by non-linguistic nature of the stimuli. Process- are also thankful to Ferenc Bunta and David Ingram for sharing
ing of “sasasa” stimuli in our experiment, assumingly, does not the picture prompts that we used for the sentence elicitation task.
involve cognitive mechanisms that are employed in processing of
linguistic material, and listeners pay attention to those features Supplementary Material
of the acoustic signal that have direct physiological correlates.
Further research is necessary to understand whether the cogni- The Supplementary Material for this article can be found
tive filter is not applied to processing these stimuli because they online at: http://www.frontiersin.org/journal/10.3389/fpsyg.
are not perceived as speech, or because the differences in rhythm 2015.00316/abstract
References Gut, U. (2009). Non-native Speech. A Corpus-Based Analysis of the Phonetic and
Phonological Properties of L2 English and L2 German. Frankfurt: Peter Lang.
Arvaniti, A. (2012). The usefulness of metrics in the quantification of speech Kim, J., Davis, C., and Cutler, A. (2008). Perceptual tests of rhythmic similarity: II.
rhythm. J. Phonet. 40, 351–373. doi: 10.1016/j.wocn.2012.02.003 Syllable rhythm. Langu. Speech 51, 343–359. doi: 10.1177/0023830908099069
Arvaniti, A., and Rodriquez, T. (2013). The role of rhythm class, speaking rate and Kolly, M.-J., and Dellwo, V. (2014). Cues to linguistic origin: the contribution
F0 in language discrimination. Lab. Phonol. 4, 7–38. doi: 10.1515/lp-2013-0002 of speech temporal information in foreign accent recognition. J. Phonet. 42,
Boersma, P., and Weenink, D. (2010). Praat: Doing Phonetics by Com- 12–23. doi: 10.1016/j.wocn.2013.11.004
puter (Version 5.1.22). Retrieved: December 15, 2010. Available online at: Loukina, A., Kochanski, G., Rosner, B., Shih, C., and Keane, E. (2011). Rhythm
http://www.praat.org/ measures and dimensions of durational variation in speech. J. Acoust. Soc. Am.
Bunta, F., and Ingram, D. (2007). The acquisition of speech rhythm by bilingual 129, 3258–3270. doi: 10.1121/1.3559709
Spanish- and English-speaking four-and five-year-old children. J. Speech Lang. Low, L., Grabe, E., and Nolan, F. (2000). Quantitative characterizations of speech
Hear. Res. 50, 999–1014. doi: 10.1044/1092-4388(2007/070) rhythm: syllable-timing in Singapore English. Langu. Speech 43, 377–401. doi:
Christophe, A., and Dupoux, E. (1996). Bootstrapping lexical acquisition: the role 10.1177/00238309000430040301
of prosodic structure. Linguist. Rev. 13, 383–412. doi: 10.1515/tlir.1996.13.3- Mazuka, R. (1996). “How can a grammatical parameter be set before the first
4.383 word?,” in Signal to Syntax: Bootstrapping from Speech to Grammar in Early
Christophe, A., Gout, A., Peperkamp, S., and Morgan, J. L. (2003). Discovering Acquisition, eds J. L. Morgan and K. Demuth (Mahwah, NJ: Lawrence Erlbaum
words in the continuous speech stream: the role of prosody. J. Phonet. 31, Associates Inc), 313–330.
585–598. doi: 10.1016/S0095-4470(03)00040-8 Mehler, J., Dupoux, E., Nazzi, T., and Dehaene-Lambertz, G. (1996). ĎCoping with
Dauer, R. (1983). Stress-timing and syllable-timing reanalyzed. J. Phonet. 11, linguistic diversity: the infant’s viewpoint,” in From Signal to Syntax: Bootstrap-
51–62. ping from Speech to Grammar in Early Acquisition, eds J. Morgan and K. D.
Dauer, R. (1987). “Phonetic and phonological components of language rhythm,” Demuth (Hillsdale, NJ: Erlbaum), 101–116.
in Proceedings of the 11th International Congress of Phonetic Sciences, (Tallinn, Mehler, J., Sebastian-Galles, N., and Nespor, M. (2004). “Biological foundations
Estonia), 447–450. of language: language acquisition, cues for parameter setting and the bilingual
Dellwo, V. (2006). “Rhythm and speech rate: a variation coefficient for deltaC,” in infant,” in The New Cognitive Neuroscience, ed M. Gazzaniga (Cambridge, MA:
Language and Language-Processing, eds P. Karnowski and I. Szigeti (Frankfurt MIT Press), 825–836.
am Main: Peter Lang), 231–241. Mok, P. (2013). Speech rhythm of monolingual and bilingual children at 2;06: Can-
Dellwo, V., and Wagner, P. (2003). “Relations between language rhythm and tonese and English. Bilingualism: Language and Cognition 16, 693–703. doi:
speech rate,” in Proceedings of the 15th International Congress of Phonetics 10.1017/S1366728910000453
Sciences (Barcelona), 471–474. Murty, L., Otake, T., and Cutler, A. (2007). Perceptual tests of rhythmic similarity:
Ghitza, O. (2011). Linking speech perception and neurophysiology: speech decod- I. Mora Rhythm. Langu. Speech 50, 77–99. doi: 10.1177/00238309070500010401
ing guided by cascaded oscillators locked to the input rhythm. Front. Psychol. Nazzi, T., Bertoncini, J., and Mehler, J. (1998). Language discrimination by new-
2:130. doi: 10.3389/fpsyg.2011.00130 borns: toward an understanding of the role of rhythm. J. Exp. Psychol. Hum.
Ghitza, O., and Greenberg, S. (2009). On the possible role of brain rhythms in Percept. Perform. 24, 756–766. doi: 10.1037/0096-1523.24.3.756
speech perception: intelligibility of time-compressed speech with periodic and Nazzi, T., and Ramus, F. (2003). Perception and acquisition of linguistic rhythm
aperiodic insertions of silence. Phonetica 66, 113–126 doi: 10.1159/000208934 by infants. Speech Commun. 41, 233–243. doi: 10.1016/S0167-6393(02)00106-1
Grabe, E., and Low, L. (2002). “Durational variability in speech and the rhythm Nespor, M., Guasti, M., and Christophe, A. (1996). “Selecting word order: the
class hypothesis,” in Papers in Laboratory Phonology, Vol. 7, eds C. Gussenhoven rhythmic activation principle,” in Interfaces in Phonology, ed U. Kleinhenz
and N. Warner (New York, NY: Mouton de Gruyter), 515–546. (Berlin: Akademie Verlag), 1–26.
Greenberg, S. (1997). “Auditory function,” in Encyclopedia of Acoustics, ed M. Nolan, F., and Asu, E. (2009). The pairwise variability index and coexisting
Crocker (New York, NY: Wiley), 1301–1323. rhythms in language. Phonetica 66, 64–77. doi: 10.1159/000208931
Greenberg, S., and Ainsworth, W. (2004). “Speech processing in the auditory sys- Ordin, M., and Polyanskaya, L. (2014). Development of timing patterns in first and
tem: An Overview,” in Speech Processing in the Auditory System, eds S. Green- second languages. System 42, 244–257. doi: 10.1016/j.system.2013.12.004
berg, W. Ainsworth, A. Popper, and R. Fay (New York, NY: Springer-Verlag), Ordin, M., Polyanskaya, L., and Ulbrich, C. (2011). “Acquisition of timing patterns
1–62. in second language,” in Proceedings of Interspeech 2011 (Florence), 1129–1132.
Pamies Bertran, A. (1999). Prosodic typology: on the dichotomy between stress- Suter, R. W. (1976). Predictors of pronunciation accuracy in second lan-
timed and syllable-timed languages. Lang. Design 2, 103–130. guage learning. Lang. Learn. 26, 233–253. doi: 10.1111/j.1467-1770.1976.tb
Peterson, G., and Lehiste, I. (1960). Duration of syllable nuclei in English. J. Acoust. 00275.x
Soc. Am. 32, 693–703. doi: 10.1121/1.1908183 Thiessen, E., and Saffran, J. (2007). Learning to learn: infants’ acquisition of stress-
Pike, E. V. (1959). A test for predicting phonetic ability. Lang. Learn. 9, 35–41. doi: based strategies for word segmentation. Lang. Learn. Dev. 3, 73–100. doi:
10.1111/j.1467-1770.1959.tb01127.x 10.1080/15475440709337001
Piske, T., McKay, I., and Flege, J. (2001). Factors affecting degree of foreign accent Thomas, K. (2007). Just noticeable differences and tempo change. J. Sci. Psychol.
in an L2: a review. J. Phonet. 29, 191–215. doi: 10.1006/jpho.2001.0134 14–20.
Polyanskaya, L., Ordin, M., and Ulbrich, C. (2013). “Contribution of timing pat- Thompson, I. (1991). Foreign accents revisited: the English pronunciation
terns into perceived foreign accent,” in Elektronische Sprachsignalverarbeitung of Russian immigrants. Lang. Learn. 41, 177–204. doi: 10.1111/j.1467-
2013, ed P. Wagner (Dresden: TUDpress), 71–79. 1770.1991.tb00683.x
Prieto, P., del Mar Vanrell, M., Astruc, L., Payne, E., and Post, B. (2012). Phonotac- Vaughan-Rees, M. (2002). Test Your Pronunciation: Book With Audio CD. London:
tic and phrasal properties of speech rhythm. Evidence from Catalan, English, Longman.
and Spanish. Speech Commun. 54, 681–702. doi: 10.1016/j.specom.2011.12.001 Vicenik, C., and Sundara, M. (2013). The role of intonation in lan-
Quene, H. (2007). On the just noticeable difference for tempo in speech. J. Phonet. guage and dialect discrimination by adults. J. Phonet. 41, 297–306. doi:
35, 353–362. doi: 10.1016/j.wocn.2006.09.001 10.1016/j.wocn.2013.03.003
Ramus, F., Hauser, M., Miller, C., Morris, D., and Mehler, J. (2000). Language dis- Viemeister, N. (1988). “Psychophysical aspects of auditory intensity coding,” in
crimination by human newborns and by cotton-top tamarin monkeys. Science Auditory Function, eds G. Edelman, W. Gall, and W. Cowan (New York, NY:
288, 349–351 doi: 10.1126/science.288.5464.349 Wiley), 213–241.
Ramus, F., and Mehler, J. (1999). Language identification with suprasegmental White, L., and Mattys, S. (2007). Calibrating rhythm: first language and sec-
cues: a study based on speech resynthesis. J. Acoust. Soc. Am. 105, 512–521. ond language studies. J. Phonet. 35, 501–522. doi: 10.1016/j.wocn.2x007.
doi: 10.1121/1.424522 02.003
Ramus, F., Nespor, M., and Mehler, J. (1999). Correlates of linguistic rhythm in the White, L., Mattys, S., and Wigit, L. (2012). Language categorization by adults is
speech signal. Cognition 73, 265–292. doi: 10.1016/S0010-0277(99)00058-X based on sensitivity to durational cues, not rhythmic class. J. Mem. Lang. 66,
Roach, P. (1982). “On the distinction between ‘stress-timed’ and ‘syllable-timed’ 665–679. doi: 10.1016/j.jml.2011.12.010
languages,” in Linguistic Controversies, ed D. Crystal (London: Edward Arnold), Wiget, L., White, L., Schuppler, B., Grenon, I., Rauch, O., and Mattys, S. (2010).
73–79. How stable are acoustic metrics of contrastive speech rhythm? J. Acoust. Soc.
Russo, R., and Barry, W. J. (2008). “Isochrony reconsidered. Objectifying relations Am. 127, 1559–1569. doi: 10.1121/1.3293004
between rhythm mesures and speech tempo,” in Proceedings of Speech Prosody Wong, S., and Schreiner, C. (2003). Representation of stop-consonants in cat pri-
2008 (Campinas, Brazil), 419–422. mary auditory cortex: intensity dependence. Speech Commun. 41, 93–106. doi:
Schiering, R. (2007). The phonological basis of linguistic rhythm. Cross-linguistic 10.1016/S0167-6393(02)00096-1
data and diachronic interpretation. Sprachtypologie Universalienforschung 60,
337–359. doi: 10.1524/stuf.2007.60.4.337 Conflict of Interest Statement: The authors declare that the research was con-
Schreiner, C., and Urbas, J. (1986). Representation of amplitude modulation in the ducted in the absence of any commercial or financial relationships that could be
auditory cortex of the cat. I. The anterior auditory field (AAF). Hear. Res. 21, construed as a potential conflict of interest.
227–241. doi: 10.1016/0378-5955(86)90221-2
Schreiner, C., and Urbas, J. (1988). Representation of amplitude modulation in the Copyright © 2015 Ordin and Polyanskaya. This is an open-access article distributed
auditory cortex of the cat. II. Comparison between cortical fields. Hear. Res. 32, under the terms of the Creative Commons Attribution License (CC BY). The use,
49–64. doi: 10.1016/0378-5955(88)90146-3 distribution or reproduction in other forums is permitted, provided the original
Stevens, K. (2002). Toward a model for lexical access based on acoustic land- author(s) or licensor are credited and that the original publication in this jour-
marks and distinctive features. J. Acoust. Soc. Am. 111, 1872–1891. doi: nal is cited, in accordance with accepted academic practice. No use, distribution or
10.1121/1.1458026 reproduction is permitted which does not comply with these terms.