Language Learning 49:2, June 1999, pp.
275–302
Global Foreign Accent and Voice Onset Time
Among Japanese EFL Speakers
Timothy J. Riney
International Christian University
Naoyuki Takagi
Tokyo University of Mercantile Marine
This study follows R. C. Major (1987) and J. E. Flege and
W. Eefting (1987a) in its investigation of the correlation
between global foreign accent (GFA) and voice onset time
(VOT). VOT values for /p/, /t/, and /k/ were measured at 2
times, separated by an interval of 42 months, produced by
11 Japanese speakers of English as a foreign language; 5
age-matched native speakers of English served as the
control group. The GFA scores of the same 16 speakers are
taken from T. J. Riney and J. E. Flege (1998). One finding,
that VOT generally did not change over time, is attributed
to phonological similarity between Japanese and English
diaphones. A second finding, that of a GFA-VOT correla-
tion, links global and discrete measures of accent and
supports an earlier claim by R. C. Major (1987).
Timothy J. Riney, Language division; Naoyuki Takagi, Department of Inter-
national Cultural Studies.
We would like to thank three anonymous reviewers for their insights and
suggestions on an earlier draft of this paper. The VOT measurements for this
study and the computations related to Table 3 were done in 1996–97 by the
first author (T.J.R.) in consultation with Jim Flege in the Biocommunications
Laboratory at the University of Alabama at Birmingham. We would also like
to thank Tetsuya Jo for directing our attention to Shimizu (1996). Any errors
that may appear in this article, however, are entirely our own.
Correspondence concerning this article may be addressed to Timothy J.
Riney, Division of Languages, International Christian University, 3-10-2
Osawa Mitaka, Tokyo 181-8585, Japan. Internet:
[email protected] 275
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
276 Language Learning Vol. 49, No. 2
Two previous studies have investigated the relationship be-
tween voice onset time (VOT) and global foreign accent (GFA) in
interlanguage (IL)1 development. In a study of 53 adult Brazilian
English as a foreign language (EFL) speakers, Major (1987) found
that “VOTs in English are significantly correlated with degree of
foreign accent and that some speakers achieved native-like VOT
production in English” (p. 197); in other words, “the higher the
accent score the closer the VOT conforms to the American English
norm” (p. 199). Flege and Eefting (1987a), in a study that involved
50 adult Dutch EFL speakers and was designed to examine to
what extent language “sets” influenced how subjects perceived
stimuli, also found a significant correlation between VOT and
foreign accent scores. They reported, however, that "the signifi-
cance of the correlation was due largely to the fact that a small
subset of speakers with low foreign accent scores (i.e., ‘strong’
accents) produced English /t/ with short lag VOT values" (p. 194).2
Thus, evidence in the literature is not entirely consistent. Major
found a positive correlation between GFA and VOT across the
entire range of GFA scores; Flege and Eefting (1987a) found that
speakers with low GFA scores produced short VOT values, but that
speakers with moderate to high GFA scores had little correlation
between GFA and VOT.
Major (1987) used 53 students and teachers at a Brazilian
university; Flege and Eefting (1987a) used 50 native Dutch speak-
ers aged 20 to 35, 40 of whom were majoring in English and 10 of
whom were majoring in engineering. Major used a word list
composed of four words (Pete, tell, cap, and cab) that was read once.
Flege and Eefting used mean VOT values based on three tokens
of tot (produced in a carrier phrase) in determining the correlation.
Whereas Major used rank orders, Flege and Eefting (like the
current study) used VOT mean values to calculate r. Major re-
ported that results were significant but did not report the value
of r.
To our knowledge neither study has been replicated, nor has
any subsequent study attempted to determine whether a signifi-
cant correlation existed between GFA and VOT. Furthermore, in
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 277
a review of second language (L2) literature on Japanese pronun-
ciation of English, Riney and Anderson-Hsieh (1993) reported
finding no studies that had investigated VOT or aspiration among
Japanese EFL (JEFL) speakers. The present study addresses the
need for additional research in this important area. In a world in
which linguistic bias goes largely unacknowledged and public
attitudes are difficult to change, accents, both first language (L1)
and L2, carry great social significance for all speakers (Lambert,
1967; Long, 1990; Scovel, 1988), and it is of interest to attempt to
understand if and how global and discrete measures such as GFA
and VOT are related to one another and if they change over time.
Using data gathered from JEFL speakers, this study inves-
tigates the following two questions:
1. Do interlanguage VOT values approach the target language
(TL) norms3 over time?
2. Is there a correlation between interlanguage VOT values
and GFA scores?
Related Literature
In what is regarded as a classic VOT study, Lisker and
Abramson (1964) described aspiration and voicing in stops in
terms of VOT, or “lag,” defined as the time between the release of
a stop closure and the onset of vocal cord vibration. Based on their
study of VOT in word-initial, singleton, voiceless stops in 11
languages, Lisker and Abramson proposed that across languages
there were three general categories of stops that could be defined
by their VOT values: (a) voiceless unaspirated stops (0 to 25 ms),
(b) voiceless aspirated stops (60 to 100 ms), and (c) voiced stops,
in which the onset of vibration precedes the release of the closure,
resulting in a negative VOT value (e.g., –25 ms).
Brazilian Portuguese, Dutch, Japanese, and American En-
glish all have two categories of stops. In Table 1, one may see that
the voiceless stops in English (/p/, /t/, and /k/) involve aspiration
and a relatively long lag, whereas the voiceless stops in Dutch and
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
278 Language Learning Vol. 49, No. 2
Brazilian involve little or no aspiration and a relatively short lag
(Lisker & Abramson, 1964; Major, 1987). The corresponding set of
voiceless stops in Japanese, however, have been found to be more
intermediate in value, ranging from 30 to 66 ms (Shimizu, 1996),
and it is not clear whether they should be called aspirated (Vance,
1987). Thus, the Japanese voiceless stops do not fit neatly into
either of the two categories of voiceless stops, aspirated and
unaspirated, proposed by Lisker and Abramson; Japanese VOT
values fall between the two categories.4 In Table 1, one notes that
the mean values of the sum of the three voiceless stops (/p/, /t/, and
/k/) for the two short lag languages, Dutch and Brazilian, are about
the same, 16.7 ms and 11.1 ms, respectively. The mean value for
English (69.3 ms) is much greater. The mean value for Japanese
(45.7 ms), however, is located close to the midpoint between the
VOT values of Brazilian and Dutch, on the one hand, and English,
on the other.
All of Shimizu’s (1996) instrumental work was done for his
Ph.D. in the Phonetics Laboratory of the Department of Linguis-
tics at the University of Edinburgh. Shimizu’s Japanese speakers
(aged 26 to 35) were all postgraduate students at Edinburgh and
“were considered to have a good command of English” (p. 22). What
is being compared here is the Japanese VOT values of Shimizu’s
bilingual Japanese speakers (who presumably acquired English
in adulthood) with the English VOT values of the JEFL speakers
(aged 18 to 22) of the current study. For a review of “observed
significant L2 effects on L1 production,” see Flege and Eefting
(1987a, pp. 197–199).
An earlier and more limited Japanese VOT study was con-
ducted by Homma (1980). Homma’s study involved only three
speakers and only one word with a /ta/ onset: /tada/ (“free”), which
is accented on the first mora (or syllable). Homma reported a mean
VOT of 25 ms for /t/ and interpreted Japanese VOT to fall between
the typical VOT means for voiceless unaspirated /t/ and voiceless
aspirated /t/ in other languages—posing counterevidence to Lisker
and Abramson’s (1971, p. 770) claim that “there is rough agree-
ment across languages in the placement of categorical boundaries
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 279
Table 1
Mean VOT values for /p/, /t/, and /k/ in Dutch, Brazilian
Portuguese, Japanese, and American English
Dutch Portuguese Japanese English
(Lisker & (Major, 1987) (Shimizu, 1996) (Lisker &
Abramson, 1964) Abramson, 1964)
1 speaker 5 speakers 6 speakers 4 speakers
VOT in ms, VOT in ms VOT in ms VOT in ms,
range (SD) (SD), range range
/p/ 10, 0–30 6.9 (2.6) 41 (17.1), 15–65 58, 20–120
/t/ 15, 5–35 10.8 (2.8) 30 (12.7), 15–50 70, 30–105
/k/ 25, 10–35 15.7 (3.8) 66 (12.1), 50–100 80, 50–135
M 16.7 11.1 45.7 69.3
Note. Lisker and Abramson (1964) did not report the standard deviation, and
Major (1987) did not report the range.
along the dimension of voice onset timing, yielding 3 phonetic
types: voiced, voiceless unaspirated, and voiceless aspirated
stops.” (See also Keating, 1984.) Shimizu’s (1996) study involved
six speakers producing /pi/, /pa/, /po/, /ta/, /te/, /to/, /ka/, /ki/, /ke/,
/ko/, and /ku/. As a function of the following vowels, his finding for
/p/ were: 37, 48, and 50 ms before /a/, /i/, and /o/, respectively (data
were not obtained for /u/ and /i/); for /t/: 29, 29, and 31 ms before
/a/, /e/, and /o/, respectively (/ti/ and /tu/ involve affrication and
were not included); for /k/: 53, 87, 73, 55, and 63 ms before /a/, /i/,
/u/, /e/, and /o/, respectively. One notes that for /ta/ the VOT findings
of Shimizu (29 ms) and Homma (25 ms) were quite close, and this
could be used to support Homma’s claim that Japanese VOT is
different (although Shimizu does not refer to Homma’s work).
Vance (1987), based in part on considerations of stressed and
accented syllables, questioned Homma’s (1980) claim, concluding
that “there is no solid evidence that Japanese /p, t, and k/ fall
between the putatively universal voiceless unaspirated and voice-
less aspirated types in terms of VOT. . . . I therefore tentatively
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
280 Language Learning Vol. 49, No. 2
conclude that Japanese /p, t, k/ are voiceless unaspirated, but VOT
in Japanese certainly deserves further study” (p. 19).5
One similarity between our study and those of Major (1987)
and Flege and Eefting (1987a) is that all three studies involved
an L1 that had voiceless stops with VOT values that were shorter
than those of the TL (English). One difference, however, is that
whereas the studies of Major (1987) and Flege and Eefting (1987a)
involved unequivocal short lag L1s (Brazilian and Dutch, respec-
tively), our study involves an L1 (Japanese) with an unusual type
of intermediate lag that to our knowledge has never been investi-
gated in any crosslinguistic study. It was of interest, therefore, to
learn whether the JEFL speakers could establish new long lag
VOT values separate from the intermediate lag values of their L1.
With its intermediate VOT values, Japanese provides an
interesting test case for investigating what phonetic distance
(measured as VOT in milliseconds) is required before phonetic
modification occurs in the IL. A number of studies have found that
groups of adult EFL speakers whose L1 contains short lag stops
(e.g., French) will realize English stops with VOT values that are
located somewhere between the L1 short lag VOT values and the
TL long lag English values. Conversely, groups of L1 English
speakers have been found to realize stops in TLs that contain short
lag stops with mean VOT values that are “too long” and positioned
between the English long lag norm and the TL short lag norm
(Flege & Eefting, 1987a, pp. 186–187; Flege & Eefting, 1987b,
p. 68; Flege & Hillenbrand, 1984). Both Major (1987) and Flege
and Eefting (1987a) found that some Brazilian and Dutch (respec-
tively) EFL speakers adjusted their VOT when speaking English
to a value that was intermediate between the VOT of their L1 and
the VOT of the TL, English.
One question of interest for the current study is whether
JEFL speakers, with L1s with intermediate lags and relatively
little VOT difference between the L1 and the TL, do the same. If
they do not, this may be an indication that the VOT difference (or
phonetic distance) between the L1 and the TL is not great enough
to be perceived and adjusted to. If they do adjust, however, then
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 281
this may be an indication that the VOT difference (phonetic
distance) was sufficient to bring about the formation of new
phonetic categories (Flege, 1995, discussed below) with VOT val-
ues located between the L1 and the TL VOT values.
In a broad sense this investigation contributes to research in
how learning and acquisition are affected by degrees of similarity
and difference between the items a learner possesses and those
items the learner intends to acquire. In the 1940s and 1950s,
research (that was not crosslinguistic) began to investigate trans-
fer and interference effects of similar and dissimilar items in
semantic and cultural domains (e.g., Osgood, 1946; Osgood, Suci,
& Tannenbaum, 1957). In a review of more recent research that is
crosslinguistic and devoted to the phonological domain, Major and
Kim (1996) note that many researchers “have investigated the
relationship between phonological similarity/dissimilarity (vari-
ously called new or different phenomena) and difficulty or order
of acquisition” (pp. 467–468). According to Oller and Ziahosseiny
(1970), “wherever patterns are minimally distinct in form or
meaning in one or more systems, confusion may result” (p. 186).
According to Wode (1983, p. 180; as cited in Major & Kim, 1996),
L1-L2 transfer requires “crucial similarity measures” between the
L1 and L2 items. In a similar vein, Flege (1987) used “equivalence
classification” to describe that situation where the speaker per-
ceives L2 sounds to be equivalent to those in the L1, and these
equivalent sounds are therefore more difficult than new, different,
or dissimilar sounds. More recently, Major and Kim (1996) have
proposed the Similarity Differential Rate Hypothesis (SDRH): “An
L2 phenomenon that is dissimilar to an L1 phenomenon is ac-
quired faster than an L2 phenomenon that is similar to this same
L1 phenomenon” (p. 474). Thus, beyond investigating a GFA-VOT
correlation, the present study contributes to an established and
growing literature that investigates the role of phonological simi-
larity in language learning.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
282 Language Learning Vol. 49, No. 2
Question and Hypothesis
To some extent our study may be viewed as a replication of
Major (1987) and Flege and Eefting (1987a), although we intro-
duce two variables into the experimental design: (a) Japanese as
the L1, with its voiceless stops that involve an intermediate lag
rather than a short lag, and (b) a 42-month longitudinal dimension
that will allow us to examine whether VOT changes over time
vis-à-vis GFA. Regarding changes over time, and based on the
literature summarized above, one question we posed was whether
between T1 and T2 JEFL speakers’ VOT values for /p/, /t/, and /k/
would become longer and more like native English (NE) VOT
values. Although we were aware of no previous longitudinal L2
studies that have investigated change in VOT over time, we had
four reasons for posing this question: (a) Major (1987) had inferred
on the basis of latitudinal data that improvement in GFA and VOT
were related and simultaneous developments among his Brazilian
EFL speakers. (b) The results of Riney and Flege (1998) showed
that their JEFL speakers (the same as ours) had made improve-
ment between T1 and T2 in several other areas. The GFA of three
individual JEFL speakers had improved and the JEFL group as
whole had improved in the identifiability and accuracy of their
productions of /l/ in clusters. Although Riney and Flege did not
report individual results for liquids, on the basis of their published
data we were able to determine that of 36 (9 speakers × 4 onset
types) possible areas of improvement between T1 and T2, there
were 8 cases of improvement, 4 cases of getting worse, and 24 cases
of no change.6 Given these signs of improvement, it seemed rea-
sonable to investigate whether the VOT values of some of these
11 speakers might improve as well. (c) Although the speakers
(students) had spent the majority of their time in Japan, their
particular university setting may have been one of the most
favorable environments for linguistic input and the acquisition of
English in Japan (see discussion of speakers below). (d) Although
all JEFL speakers are aware that English /r/ and /l/ pose a special
difficulty for them, probably very few have any idea of any VOT
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 283
differences between Japanese and English, or even know what
VOT is. It was consequently of interest to know whether this less
conspicuous difference between English and Japanese would also
improve for some speakers between T1 and T2, as GFA and liquids
had.
In addition to providing the rationale for posing the general
question above about change in VOT over time, the review of
related literature above motivated the testing of the following
hypothesis:
Hypothesis: For the JEFL speakers as a group, a significant
correlation will exist between GFA and VOT; that is, the
more English-like the JEFL speakers’ GFA scores, the
more closely their VOT values will correspond to English
VOT norms.
Method
Speakers
The 16 speakers (11 JEFL and 5 NE) and speech materials
used for this project, the same as those used and described in detail
by Riney and Flege (1998), will be described only briefly here. The
11 JEFL speakers (8 females, 3 males) were students at Interna-
tional Christian University (ICU) in Tokyo, a university whose
faculty includes 15% to 30% non-Japanese members and uses two
languages of instruction, Japanese and English, throughout the
university. Most of the JEFL speakers had begun their study of
English at about age 13 in more or less typical Japanese public
schools. At T1 (June 1992) the 11 JEFL speakers were college
freshmen aged 18 to 20 years, and at T2 (fall 1995) they were all
college seniors. The 11 JEFL speakers had similar TOEFL scores
freshman year (range: 437–497) and sophomore year (range:
490–567) but differed in other respects, including preuniversity
schooling. In addition, between T1 and T2, 4 speakers spent an
academic year at universities abroad (2 in California, 1 in Holland,
and 1 in Mexico); 4 others made one or more short trips abroad.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
284 Language Learning Vol. 49, No. 2
As mentioned above, Riney and Flege determined that the accents
of 3 of the 11 Japanese students had significantly improved
between T1 and T2; 2 of these 3 had spent a year between T1 and
T2 in California.
The control group of 5 NE speakers (3 females, 2 males; aged
20 to 23) had all been born and raised in California, where they
graduated from high school as monolingual speakers of English.
At the time of the data collection (May 1996), all NE speakers were
at ICU in Tokyo studying Japanese as year-abroad students from
universities in California. The T1 and T2 data collections of the
NE speakers (who were in Japan for only 1 year) were done at a
2-week interval; it was assumed that the phonetic productions of
NE speakers in their early 20s taken at an interval separated by
2 weeks would not differ significantly from productions separated
by an interval of 42 months, which was the interval for the JEFL
speakers. Except for the dates of T1 and T2, the NE and the JEFL
speakers were recorded under identical conditions in a soundproof
room at ICU.
Speech Materials and Procedures
All 32 speech samples (16 speakers × 2 times) were recorded
using a Sony TC-1290 monaural tape recorder. Speakers’ tasks
included the reading of a list of 84 words, 6 of which were used for
this VOT study, and the reading of a list of 15 sentences, 5 of which
were used for the GFA study described in Riney and Flege (1998).
The 6 words (part, time, tub, cab, can, come) used for VOT mea-
surements involved 1 with word-initial /p/, 2 with /t/, and 3 with
/k/.7 This involved a total of 192 tokens (6 words × 16 speakers ×
2 times). All words were digitized at 22.05 kHz with 16-bit resolu-
tion, then normalized for peak intensity.
It is known that context may affect VOT. Lisker and Abram-
son (1967) found that the VOT of a stop may vary depending on
whether the stop appears in a citation form or in running speech,
whether it appears in a stressed syllable or an unstressed one, and
where it appears in polysyllabic words (i.e., which syllable) and in
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 285
sentences (e.g., beginning or middle). To control for this variance
we used only word-initial singleton stops in monosyllabic words
that were read from a list in isolation as citation forms (and were
therefore stressed). Klatt (1975) has shown that VOT varies ac-
cording to the following vowel environment (i.e., high vowels are
related to longer VOTs). Although our study involved several types
of following vowels (in part, time, tub, cab, can, come), we controlled
the following vowels only insofar as we had all speakers say the
same words and vowels at T1 and T2. (Those who wish to compare
our vowel environments with Shimizu’s, provided above, may do
so. Our feeling, however, is that because VOT may range widely
and the number of tokens of each vowel-type in our study is small,
one could make tentative conclusions at best from such a comparison.)
To determine if there was a correlation between GFA and VOT
it was of course necessary to have measurements of both. From
spectrograms (using Cool Edit, 1995) of the six words, VOT measure-
ments were made by the first author (T.J.R.) from the beginning of
the release burst to the onset of periodicity of the following vowel.
The GFA scores of the 16 speakers were obtained from Riney
and Flege (1998). For the determination of the GFA scores in that
study, 5 NE listeners (raters) from different areas of the United
States each heard, in counterbalanced order, a different randomi-
zation of the stimuli and rated each of the five sentences on a
9-point scale. For each of five listeners, the final three of four
judgments of each sentence by each listener were averaged; then
the ratings of all 5 listeners were averaged to obtain a robust
estimate of each speaker’s accent. These 11 JEFL speakers’ GFA
scores were then correlated with the VOT values in the current study.
Results
Change in VOT Over Time
The results are shown in Tables 2, 3, 4, and 5, and Figures 1
and 2. Table 2 shows the mean VOT value, standard deviation, and
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
286 Language Learning Vol. 49, No. 2
Table 2
Mean VOT values, standard deviation, and range for /p/, /t/, and
/k/ (separately) at T1 and T2 for native English (NE) speakers and
Japanese EFL (JEFL) speakers
JEFL NE
(n = 11) (n = 5)
M SD Range M (SD) Range
/p/ (n = 1)
T1 41.3 13.4 19–59 65.8 17.3 40–88
T2 38.6 13.8 15–59 82.6 18.2 64–111
Both 40.0 13.4 15–59 74.2 19.0 40–111
/t/ (n = 2)
T1 40.6 20.0 15–74 64.3 12.9 43–86
T2 41.6 23.4 17–101 84.1 16.8 51–107
Both 41.1 21.5 15–101 74.2 17.8 43–107
/k/ (n = 3)
T1 69.9 16.2 37–102 84.9 19.8 44–110
T2 65.2 17.0 34–95 85.5 19.0 59–114
Both 67.6 16.6 34–102 85.2 19.1 44–114
range for /p/, /t/, and /k/ at T1 and T2 for the NE control group and
for the JEFL group. The number of measurements from which
each mean was calculated varied because (a) the JEFL group had
11 speakers and the NE control group had 5 speakers, and (b) for
each time (T1 and T2), /k/ involved three tokens, /t/ involved two,
and /p/ involved one. For the JEFL group, the mean VOT values
for /k/ were based on 66 measurements (3 words × 11 speakers ×
2 times); for /t/ 44 measurements (2 words × 11 speakers × 2 times);
and for /p/ 22 measurements (1 word × 11 speakers × 2 times). For
the NE group, the mean VOT values for /k/ were based on 30
measurements (3 words × 5 speakers × 2 times); for /t/ 20 mea-
surements (2 words × 5 speakers × 2 times); and for /p/ 10
measurements (1 word × 5 speakers × 2 times). The range of values
of the NE control group generally approximated those of Lisker
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 287
and Abramson (1964) shown in Table 1; in both studies the VOT
values range considerably, as is commonly found in VOT studies.
The NE control group values for /p/ and /t/ were 16.8 ms and
19.8 ms higher (respectively) at T2 than at T1. One notes in Table
4 that one reason for this increase appears to be that 1 NE speaker,
E5, for some reason produced highly aspirated stops at T2. This
increase in VOT among NE speakers between T1 and T2 may be
because this control group (n = 5) was too small or, alternatively,
the number of words (six) and productions (one) at each time that
we had each speaker produce was too few to achieve a valid mean
for a variable phenomenon such as VOT. It is probably no coinci-
dence that the two categories in which a VOT increase was
observed involved the smallest number of tokens of the six cate-
gories: For the NE speaker group (n = 5), /p/, /t/, and /k/ involved
at each time only 5, 10, and 15 tokens respectively; for the JEFL
speaker group (n = 11), the numbers were 11, 22, and 33, respec-
tively. Thus, the greatest increase occurred in NE /p/ and
/t/—where the tokens were fewest—and possibly this increase was
random in a sample that was too small. Nonetheless, as is evident
in Table 3, the increase for /p/, /t/, and /k/ (combined) in VOT for
the NE group from T1 (74.9 ms) to T2 (79.7 ms) was less than 5
ms and not significant. Furthermore, the NE speakers’ VOT
means at both T1 and T2 fell within a range that was similar to
the range of VOT values for NE speakers found by Lisker and
Abramson (1964) shown in Table 1. For these reasons, we regard
our control group VOT values as reliable.
Table 2 also shows that the JEFL group’s VOT mean values
for /p/ (T1: 41.3 ms, T2: 38.6 ms) and /k/ (T1: 69.9 ms; T2: 65.2 ms)
were very close to those in Table 1 found by Shimizu (41 ms for /p/
and 66 ms for /k/) for native Japanese (NJ) speaking Japanese.
The mean value for /t/ (T1: 40.6 ms, T2: 41.6 ms), however, was
located somewhat between the Japanese value reported by
Shimizu (30 ms) and the values of this study’s NE control group
(80.3 ms; cf. 70 ms of Lisker & Abramson, 1964).8 The mean
Japanese VOT value for /t/ was 11.6 ms above the NJ speaker
mean (30 ms) established by Shimizu (1996). A t-test, however,
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
288 Language Learning Vol. 49, No. 2
revealed that this was not significant: t(15) = 1.28, p < .22 (df = 11
+ 6 – 2).
Table 3 shows the group mean VOT values for /p/, /t/, and /k/
(combined) at T1 and T2 for NE speakers and JEFL speakers.
Recall the general question posed earlier was whether between T1
and T2 JEFL speakers would lengthen their VOT values for /p/,
/t/, and /k/. The results that address this question will first be
discussed for the JEFL speakers as a group and then for them
individually.
Group results. Recall that this study involved from each
speaker at each time one token of /p/, two of /t/, and three of /k/.
For each voiceless stop the English VOT norm was higher than the
Japanese norm (see Table 1). We combined the six voiceless stops
into one group and used the mean VOT of the six consonants to
compute the mean of four groups: (a) the NE speakers at T1, (b)
the NE speakers at T2, (c) the JEFL speakers at T1, and (d) the
JEFL speakers at T2. Table 3 shows these four means. A two-way
analysis of variance (ANOVA; 2 Groups × 2 Times) determined
that for /p/, /t/, and /k/ collectively there was no significant change
in mean VOT values between T1 and T2 for either the NE or the
JEFL group. There was also no main effect of Time, F(1, 14) =
1.335, p > .05, or interaction of Time and Group, F(1, 14) = 3.771,
p > .05. There was, however, a main effect of Group, F(1, 14)
= 11.163, p < .05.
In sum, the answer to our question posed above is that the
JEFL speakers’ VOT values for /p/, /t/, and /k/ combined did not
change over time.
Individual results. Next, the T1 and T2 VOT values of the six
pairs of words that were produced by each speaker were compared.
It was decided that in order for a speaker “to have learned to
produce a longer VOT for English at T2 than at T1,” that longer
VOT would have to appear in six of six comparisons. If the speaker
had shown no improvement, then the probability of T2 being
greater than T1 would have been .5. Thus the probability of T2
being greater than T1 in all six comparisons was .0156. If and only
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 289
Table 3
Group mean VOT values for /p/, /t/, and /k/ (combined)
at T1 and T2 for native English (NE) speakers and
Japanese EFL (JEFL) speakers
NE JEFL
(n = 5) (n = 11)
T1 T2 T1 T2
X 74.9 79.7 55.4 54.1
SD 12.5 10.5 13.4 13.2
SE 5.6 4.7 4.0 4.0
if T2 was greater than T1 in all six words, then and only then could
the speaker be said to have learned to produce a longer VOT.
On this basis we found that Speaker J1 produced a longer
VOT at T2 than at T1, Speaker J6 produced a shorter VOT at T2
than at T1, and the other 14 speakers (9 JEFL and 5 NE) involved
no change. Speaker J1 was 1 of the 3 speakers found by Riney and
Flege (1998) to improve in GFA; the other 2 speakers, J2 and J4,
had a higher VOT at T2 than at T1, but the difference was not
statistically significant. Speaker J4’s VOT was longer at T2 in five
of the six words; if this speaker had read more words, a difference
might have been detected. Furthermore, as can be seen in Table
4, of the 11 JEFL speakers, J1 and J4 had the largest mean VOT
at T2, with 76.2 and 77.0 ms respectively, larger than that of NE
Speakers E3 and E4. For the NE speakers, using the same bino-
mial technique discussed above, no T1-T2 difference was found,
although the mean VOT value of 1 NE speaker (E5, discussed
above) was much higher at T2 than at T1 (see Table 4).
In brief, based on the binomial technique described above,
between T1 and T2 1 JEFL speaker had VOT values improve
(increase in length), 1 had them get worse (decrease in length),
and the other 9 JEFL speakers and all 5 NE speakers had no
change.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Table 4
Individual mean VOT values (T1, T2, and mean of T1 and T2) for /p/, /t/, and /k/ (combined)
for native English (E) speakers and Japanese (J) EFL speakers (ranked by highest mean of T1 and T2)
(range in parentheses)
E1 E5 E2 J4 J3 E3 E4 J1 J8 J6 J7 J10 J5 J9 J2 J11
T1 92.2 69.0 83.8 68.3 75.5 65.2 64.2 58.5 72.0 64.0 52.3 53.3 43.8 46.2 37.0 38.0
(72–110) (40–107) (68–107) (48–88) (51–100) (44–89) (51–72) (32–84) (35–102) (43–79) (24–70) (25–80) (15–71) (21–69) (20–53) (16–67)
T2 91.0 101.3 85.3 77.0 68.7 73.0 72.0 76.2 61.5 47.2 42.3 39.0 48.2 39.7 43.5 38.8
(70–114) (74–112) (70–101) (52–101) (42–91) (51–99) (60–97) (51–95) (35–78) (30–63) (30–70) (18–72) (20–86) (23–66) (18–67) (15–68)
M 91.6 85.2 84.6 72.7 72.1 69.1 68.1 67.4 66.8 55.6 47.3 46.2 46.0 43.0 40.3 38.4
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 291
The GFA-VOT Correlation
Our hypothesis for this study was that a significant correla-
tion would exist between GFA and VOT; that is, the more English-
like the JEFL speakers’ GFA scores, the more closely the JEFL
speakers’ VOT values for /p/, /t/, and /k/ would resemble NE
speaker VOT norms. Table 5 shows the six correlation coefficients
between GFA scores and mean VOT values for /p/, /t/, and /k/ at
T1 and T2. Figure 1 and Figure 2 present corresponding scatter-
grams for T1 and T2, respectively. (Recall that each /p/, /t/, and /k/
entry at each time is based on 11, 22, and 33 observations,
respectively.) Three of the six correlations in Table 5 are clearly
significant: /t/ at T1, /p/ at T2, and /t/ at T2. In two of the remaining
three cases the p value is very close to .05: /p/ at T1 (.059) and /k/
at T1 (.055). Only in one of the six cases is there clearly no
significant correlation: /k/ at T2. Viewed as a whole, we interpret
these results as support for our hypothesis that there would be a
positive correlation between GFA and VOT: As the GFA scores
increase, so do the VOT values.
One might suggest that the data points of one JEFL speaker
(J11 at both T1 and T2) with the lowest GFA scores should be
treated as outliers. However, Speaker J11’s GFA scores were low
and so were his VOT values, which conforms to the general trend.
When we reanalyzed the data without Speaker J11, correlation
coefficients stayed above .5 except for /p/ (r = .44 at T1, and r = .32
at T2). Although none attained significance at the .05 level in this
analysis without Speaker J11’s data points, this may be due to the
lowered statistical power (due to a smaller sample of 10 JEFL
speakers and 10 data points rather than 11). If the correlations in
Table 5 were solely due to speaker J11’s data points, one would
expect much lower correlations without these points. The fact that
positive correlations were observed at both T1 and T2 for each of
the three stops, with or without Speaker J11's data, indicates that
there was a positive correlation: The higher and more nativelike
the GFA score, the longer and more nativelike the VOT value.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
292 Language Learning Vol. 49, No. 2
Table 5
Correlation coefficients between GFA and VOT for /p/, /t/, and
/k/ for 11 Japanese EFL speakers
Time 1 Time 2
/p/ /t/ /k/ /p/ /t/ /k/
r value .58 .66 .59 .61 .66 .43
p value .059 .029 .055 .049 .027 .192
Note. Any correlation exceeding .602 is significant at the .05 level (two-tailed)
given 9 degrees of freedom (11 – 2).
Thus, we found a significant correlation between GFA and VOT
and support for our hypothesis.
Discussion
One assumption that we make about VOT is that at the very
beginning point of adult (i.e., about age 12–13) L2 learning, L1
VOT values are substituted for L2 VOT values. For the speakers
in our study, who began their study of English at the beginning of
middle school (age 12–13), this refers to what we will call “T-zero.”
Additionally we assume that Shimizu’s (1996) Japanese VOT
values (taken from bilingual Japanese speakers in Scotland) re-
flect monolingual Japanese VOT values in Japan.9 Based on these
assumptions, we interpreted there to have been no change be-
tween T-zero (age 12–13) and T1 (age 18–19), or between T1 and
T2 (age 22–23). (Although /t/ did increase 11.6 ms between T-zero
and T1, this increase was not significant.) This finding of no
change in VOT for the JEFL group over a 10-year period is
surprising. Had the JEFL speakers spent most of their time in an
English as a second language (ESL) environment (e.g., in the
United Kingdom requiring heavy exposure to and daily use of the
TL) rather than in an EFL environment in Japan (requiring much
less use of the TL), the VOT values of the JEFL group might have
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 293
Figure 1. Time 1: GFA scores and VOT values for /p/, /t/, and /k/. 1 = heavy
foreign accent, 9 = no foreign accent; not shown is rating area of 6 to 9 in
which no speaker obtained a rating.
improved. The EFL/ESL distinction, however, does not account for
why VOT showed less improvement than GFA and liquids.
As was explained above, between T1 and T2, 3 of 11 JEFL
individuals’ GFA scores improved, in 8 individual cases liquids in
onsets improved (while 4 got worse), and for the JEFL group as a
whole productions of /l/ in clusters improved. Why was there not
parallel improvement in VOT among these same speakers? We
suggest two reasons. One reason for this lack of improvement may
be that aspiration is rarely if ever addressed by pedagogy. Riney
and Anderson-Hsieh (1993) reported that of all English consonants
/r/ and /l/ had received the most attention in the JEFL research
literature; they also reported finding inconsistent pedagogical
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
294 Language Learning Vol. 49, No. 2
Figure 2. Time 2: GFA scores and VOT values for /p/, /t/, and /k/. 1 = heavy
foreign accent, 9 = no foreign accent; not shown is rating area of 6 to 9 in which
no speaker obtained a rating.
descriptions of aspiration in Japanese (perhaps related to poten-
tially confusing intermediate VOT values found in Japanese), and
did not include the teaching of aspiration among a list of eight
prioritized areas for pronunciation focus for JEFL speakers. If
aspiration was not addressed in JEFL pronunciation pedagogy,
and GFA and liquids were, this pedagogical neglect may have
contributed to the general lack of improvement observed in this
study for VOT vis-à-vis that of GFA and liquids.
A second reason for lack of JEFL improvement in VOT may
be phonological similarity. With stops, more than with any other
type of segmental, perception adheres less to the physical scale,
and categorical perception (insensitivity to differences within a
category) is greatest (Borden & Harris, 1980; Jusczyk, 1997). JEFL
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 295
listeners are less likely to attend to the differences between
Japanese and English stop diaphones and are much more likely
to assume that /p/, /t/, and /k/ are the same in English as they are
in Japanese. This interpretation generally supports the accounts
cited above that point to the difficulty that L1-TL phonological
similarity can pose in language learning: Flege (1987, 1995), Major
and Kim (1996), Oller and Ziahosseiny (1970), and Wode (1983).
Of these accounts, perhaps the most applicable to phonetics and
VOT is that of the Speech Learning Model of Flege (1995), of which
two component hypotheses seem relevant here. One is that “A new
phonetic category can be established for an L2 sound that differs
phonetically from the closest L1 sound if bilinguals discern at least
some of the phonetic distances between the L1 and L2 sounds.”
The second and related hypothesis is that “Category formation for
an L2 sound may be blocked by the mechanism of equivalence
classification. When this happens, a single phonetic category will
be used to process perceptually linked L1 and L2 sounds (dia-
phones). Eventually the diaphones will resemble one another in
production.” Unfortunately our study did not measure speakers’
Japanese productions or Japanese perceptions of English sounds;
it appears, however, that the JEFL speakers may have regarded
the /p/, /t/, and /k/ in the L1 and the TL as the same or equivalent
sounds (i.e., phonologically similar), producing an IL with mean
VOT values that were almost identical to the L1 Japanese values
based on Shimizu (1996).10
Flege and Eefting (1987a) speculated that perhaps “the dif-
ference between the L1 and L2 [TL] phones must, collectively,
exceed some ‘phonetic distance’ criterion before the processes
leading to establishment for them of a new category are triggered”
(p. 186). In the present study, the difference between the L1 and
TL /p/, /t/, and /k/ diaphones in terms of VOT values may have been
too short to be detected; perhaps for this reason the JEFL speaker
group failed to establish a new phonetic category with a longer
VOT value.11 With its intermediate VOT values Japanese thus
provided a test case for investigating what phonetic distance was
necessary before phonetic modification would be made in the IL.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
296 Language Learning Vol. 49, No. 2
As mentioned above, a number of studies had found that groups
of adult EFL speakers whose L1 contains short lag stops (0–25 ms,
e.g., French and Spanish) would realize English stops with VOT
values that are too short and lie between the normal L1 VOT
values and the target English values. Conversely, groups of L1
English speakers have been found to realize stops in TLs that
contain short lag stops (e.g., French or Spanish) with VOT values
that are too long and, again, lie between the L1 (English) VOT
norms and the TL (e.g., French or Spanish) VOT norms. (See Flege
& Eefting, 1987a, pp. 186–187; Flege & Eefting, 1987b, p. 68).
Unlike the above studies, however, the current study found no
change in VOT for the experimental group of L2 speakers; the VOT
values remain at the values assumed for Japanese L1 VOT, based
on Shimizu (1996).
The findings in this study of (a) no change in VOT and (b) a
GFA-VOT correlation, coupled with the finding in Riney and Flege
(1998) that 3 of these same 11 GFA speakers improved their GFA
over time, may seem contradictory. After all, if some speakers had
GFA that improved over time, and VOT correlates with GFA, then
it follows that their VOT should also improve over time. Our
explanation for this not being the case is that the relationship
between GFA and VOT here (like the relationship between GFA
and liquids found by Riney & Flege, 1998) may vary considerably
from one speaker to the next: Improvement in GFA does not
necessarily entail improvement in all of GFA’s components (as-
suming that liquids and VOT are such components); nor does
improvement in one component necessarily entail improvement
in another. For purposes of illustrating this general point we may
consider the performances of JEFL Speakers J3 and J5. Riney and
Flege found that at T1 (and T2), Speakers J3 and J5, unlike the
other JEFL speakers, produced liquids that were as identifiable
as those produced by the 5 NE speakers. Both Speakers J3 and
J5, however, like all the JEFL speakers, had GFA ratings that
clearly marked them as “foreign-accented” and distinguished
them from all NE speakers at T1 and T2. Thus Speakers J3 and
J5 had both highly identifiable liquids and noticeably accented
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 297
speech, and this shows how GFA and liquid identifiability for some
speakers can be independent of each other. What about these same
two speakers’ VOT values? Speaker J3’s VOT values (T1: M = 75.5,
range = 51–100; T2: M = 68.7; range = 42–91) were comparable to
those of the NE speakers (see Table 4), and ranked 5th among the
16 speakers (based on a mean VOT value of T1 and T2; see Table
4) and ranked higher than 2 NE speakers. Speaker J5’s VOT
values (T1: M = 43.8, range = 15–71; T2: M = 48.2; range = 20–86),
however, were much lower and Speaker J5 ranked 13th of the 16
speakers. One sees that components of accent may pattern inde-
pendently of one another across individuals. The view one gets of
accent development, and the relation between the parts and the
whole, is that it is neither completely chaotic nor completely
systematic and structured. Speakers whose GFA improves are
more likely to have certain component accent features (such as
VOT) improve, but there are clearly exceptions.
Are some components of accent more likely to be acquired
than others, and can L2 VOT be acquired? In Table 4 one notes
that 2 JEFL speakers, J3 (T1: 68.3 ms, T2: 77.0 ms) and J4 (T1:
75.5 ms, T2: 68.7 ms), had VOT values (both means and ranges)
that were like those of the NE speakers. These 2 JEFL speakers
with high VOT values were exceptions (and their changes occurred
between T-zero and T1 and not between T1 and T2). Major (1987)
also found that some of his speakers “achieved native-like VOT
proficiency” (p. 201) and that “this aspect of second language
acquisition is within the grasp of learners (p. 201).” These findings
may constitute counterevidence to the claim by Flege and Hillen-
brand (1984) that “adult learners of a foreign language will never
succeed in producing L2 stops with complete accuracy when stops
in their native language differ substantially in VOT from those in
the L2” (p. 717; see also Flege & Eefting, 1987a, p. 187). On the
basis of Major (1987) and this study, we will make the counter-
claim that VOT is in fact one aspect of pronunciation that some
speakers age 12 or over can acquire with accuracy. How common
this ability is of non-native adults to acquire TL-VOT values is
still unknown.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
298 Language Learning Vol. 49, No. 2
In this study we investigated the correlation between the
VOT of /p/, /t/, and /k/ and GFA at two times separated by an
interval of 42 months. We investigated this correlation among 11
JEFL speakers, whose L1, Japanese, has an unusual intermediate
voicing lag. There were two basic findings: One finding was that
VOT for the JEFL group did not change over time, which we
attributed to the constraints imposed by phonological similarity
and pedagogical neglect. A second finding was that VOT correlated
with GFA. Following the rather inconsistent findings of Major
(1987) and Flege and Eefting (1987a), our second finding supports
the claim by Major that in L2 pronunciation there is a basic
correlation between GFA and VOT. We are now seeking to identify
other discrete features of pronunciation that may correlate with
degree of GFA.
Revised version accepted 30 October 1998
Notes
1
We use the term “interlanguage” rather than “second language” or “L2” here
to make a distinction between the developing language of the learner (in this
case the English spoken by Japanese college students) and the target lan-
guage (in this case English spoken by native speakers of English). We would
not use the term “interlanguage,” however, with reference to mature non-na-
tive speakers of English as an additional language, such as in Singapore or
India, whose English phonological systems are much more stabilized and
influenced by local and regional norms than was the case with our students
in Japan (cf. Selinker, 1972, 1992).
2
Later, however, Flege and Eefting (1987a) described this significant corre-
lation as being “in close agreement” (p. 196) with Major (1987).
3
“VOT norm” is used here to refer to the mean value observed in a language
for a group of speakers (cf. Flege & Eefting, 1987a, p. 187).
4
Lisker and Abramson (1964) identified the range of voiceless aspirated stops
across languages as 60 to 100 ms, but found variation related to place of
articulation. In all 11 languages they examined, /k/ had longer VOT than /p/
and /t/, and the range of values for aspirated /k/ phonemes was from 80 to
126 ms. For this reason, the values for Japanese /k/ found by Shimizu (1996;
see Table 1) are considered here to be intermediate lag rather than long lag.
5
The first author is now replicating Shimizu’s study but doing it in Ja-
pan—using three groups (Japanese monolinguals, Japanese bilinguals, and
American bilinguals)—but that study is still a year from completion.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 299
6
Riney and Flege (1998) assessed improvement in liquid identifiability and
accuracy between T1 and T2 for groups but not for individuals; also, in the
JEFL group they used only 9 of 11 speakers because 2 JEFL speakers had
nearly all their liquids correctly identified at both T1 and T2 (as did the NE
control group) and no improvement was possible. Based on the data in Table
2 in Riney and Flege (p. 226), we assessed individual improvement in liquid
identifiability. Let us explain how we did this: Suppose you have N identifi-
cation trials at both T1 and T2, and there are x correct identification trials
at T1 and y correct identification trials at T2. Further, we assume that on
each trial, the probability of correct identification is p at both T1 and T2 (this
is our null hypothesis; i.e., there is no change between T1 and T2). Then the
best estimate of p = (x + y) / (2 × N). Let b(N, p) be a binomial distribution
with N trials and the probability parameter expressed by the above equation.
If Pr(x <= b (N, p) <= y), or Pr (y <= b (N, p) <= x) depending on x and y, exceeds
.95, we can conclude that the difference between x and y is significant above
the .05 level.
We may take JEFL speaker J1 and singleton /r/ as an example. At T1, 11 of
15 of speaker J1’s singleton /r/ productions were correctly identified (11 of 15
= 0.733); at T2, 15 of 15 were correctly identified. If our null hypothesis is
true, the best estimate of p = (11 + 15)/30 = 0.866. We now compute the
probability of b(15, 0.866) is greater than or equal to 11 and smaller than or
equal to 15. If this probability is greater than .95, we conclude the difference
is significant. Pr(11 <= b(15, 0.866) <=15) = .96, which is greater than .95.
Hence, we conclude this difference is significant.
In this fashion, the difference between T1 and T2 for each consonant and
each individual was tested. The following differences were significant at the
.05 level (* = negative improvement): /r/ singleton: J1, *J2, J4, J11; /r/ in
cluster: *J2, *J10; /l/ singleton: J9, J10; and /l/ in cluster: J1, J2, J10. In sum,
of 36 (9 speakers × 4 onsets) possible cases of improvements in liquids being
made, there were 8 cases of improvement, 4 cases of getting worse, and 24
cases of no change.
7
In retrospect, we wish we had collected equal numbers of tokens in identical
vocalic contexts, and collected Japanese words produced by the same set of
Japanese speakers, so that we could have compared on an individual basis
whether each speaker learned to produce longer VOT values in English than
in Japanese. The tokens here, however, are from a 42-month multitask and
multipurpose study. When we collected data at T1, we did not realize that
VOT would become one of our areas of inquiry.
8
Shimizu (1996), Homma (1980), and our study all involved nonhigh vowels
(including /a/ from the beginning of the diphthong in “time” in our study).
9
We have acknowledged that this is perhaps a risky assumption, but we had
no other Japanese VOT values to base our study on except the measurements
of Homma (1980) which, as reported above, were in line with the measure-
ments of Shimizu (1996).
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
300 Language Learning Vol. 49, No. 2
10
If Japanese speakers substitute Japanese /p, t, k/ for English /p, t, k/, the
Japanese stops may be identified correctly by English listeners even if their
VOT values are somewhat off the norm. On the other hand, if Japanese
speakers substitute Japanese /r/ for both English /r/ and /l/, the distinction
will be lost and the Japanese /r/ may be mistaken for any one of several
English phonemes (Sekiyama & Tohkura, 1993), forcing the speaker to
modify liquid productions to make them identifiable as intended.
11
One new avenue for future research investigating phonological similarity
or equivalence might be the following: In Table 1 one notes that the Japanese
VOT values for bilabial /p/, alveolar /t/, and velar /k/ vary somewhat according
to place of articulation. Japanese has a /t/ with a shorter VOT than that of
Japanese /p/, which is related to another difference: /t/ involves a much
greater difference (40 ms) between Japanese and English diaphones than
does /p/ (17 ms) and /k/ (14 ms). Because /t/ involves a greater phonetic
distance than either /p/ or /k/, /t/ might be more likely to be perceived as
different and modified in production, and hence lead to the formation of a
new phonetic category of /t/ between the L1 norm and the TL norm. Given
this situation, and using a large sample of tokens in similar vowel environ-
ments, one might consider testing the following hypothesis: “Where the
phonetic distance (measured as VOT in milliseconds) is greatest between the
Japanese (L1) and English (TL) diaphones /p/, /t/, and /k/, change in the IL
VOT values in the direction of the TL English VOT values will also be
greatest.” Support for this hypothesis would be a finding that /k/ and /p/ do
not change from the L1 Japanese VOT norms, but /t/ does shift in the direction
of the TL English VOT norm. Change could occur either before T1 or between
T1 and T2. In our own study, the mean VOT of /t/ at T1 of the 11 JEFL speakers
(41.1 ms) fell between that of the NE group (80.3 s; cf. 70 ms in Lisker &
Abramson, 1964) and the JEFL speakers in Shimizu (1996), which was 30
ms (cf. 25 ms in Homma, 1980). A t-test found this difference, however, to be
insignificant.
References
Borden, G., & Harris, K. (1980). Speech science primer: Physiology, acoustics,
and perception of speech. Baltimore: Williams and Wilkins.
Cool edit. (1995). [Computer software]. Phoenix, AZ: Syntrillium Software
Corporation.
Flege, J. E. (1987). The production of new and similar phones in a foreign
language: Evidence for the effect of equivalence classification. Journal of
Phonetics, 15, 47–65.
Flege, J. E. (1995). Second language speech learning: Theory, findings, and
problems. In W. Strange (Ed.), Speech perception and linguistic experience
(pp. 233–237). Timonium, MD: York Press.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Riney and Takagi 301
Flege, J. E., & Eefting, W. (1987a). Cross-language switching in stop consonant
perception and production by Dutch speakers of English. Speech Commu-
nication, 6, 185–202.
Flege, J. E., & Eefting, W. (1987b). Production and perception of English stops
by native Spanish speakers. Journal of Phonetics, 15, 67–83.
Flege, J. E., & Hillenbrand, J. (1984). Limits on phonetic accuracy in foreign
language production. Journal of the Acoustical Society of America, 76(3),
708–721.
Homma, Y. (1980). Voice onset time in Japanese stops. Onsei Gakkai Kaiho,
163, 7–9.
Jusczyk, P. (1997). The discovery of spoken language. Cambridge, MA: MIT
Press.
Keating, P. (1984). Phonetic and phonological representation of stop conso-
nant voicing. Language, 60, 286–319.
Klatt, D. H. (1975). Voice onset time, frication, and aspiration in word initial
consonant clusters. Journal of Speech and Hearing Research, 18, 686–706.
Lambert, W. (1967). A social psychology of bilingualism. Journal of Social
Issues, 23, 91–109.
Lisker, L., & Abramson, A. S. (1964). A cross-language study of voicing in
initial stops: Acoustical measurements. Word, 20, 384–422.
Lisker, L., & Abramson, A. S. (1967). Some effects of context on voice onset
time in English stops. Language and Speech, 10, 1–28.
Lisker, L., & Abramson, A. S. (1971). Distinctive features and laryngeal
control. Language, 47, 767–785.
Long, M. (1990). Maturational constraints on language development. Studies
in Second Language Acquisition, 12, 251–285.
Major, R. C. (1987). English voiceless stop production by speakers of Brazilian
Portuguese. Journal of Phonetics, 15, 197–202.
Major, R. C., & Kim, E. (1996). The similarity-differential hypothesis. Lan-
guage Learning, 46(3), 465–496.
Oller, J. W., & Ziahosseiny, S. M. (1970). The contrastive analysis hypothesis
and spelling errors. Language Learning, 20, 183–189.
Osgood, C. E. (1946). Meaningful similarity and interference in learning.
Journal of Experimental Psychology, 36, 277–301.
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of
meaning. Chicago: University of Illinois Press.
Riney, T. J., & Anderson-Hsieh, J. (1993). Japanese pronunciation of English.
JALT Journal, 15(1), 21–36.
Riney, T. J., & Flege, J. E. (1998). Changes over time in global foreign accent
and liquid identifiability and accuracy. Studies in Second Language Ac-
quisition, 20, 213–243.
14679922, 1999, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/0023-8333.00089 by Shenzhen University, Wiley Online Library on [29/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
302 Language Learning Vol. 49, No. 2
Scovel, T. (1988). A time to speak: A psycholinguistic inquiry into the critical
period for human speech. Cambridge, MA: Newbury House.
Sekiyama, K., & Tohkura, Y. (1993). Inter-language differences in the influ-
ence of visual cues in speech perception. Journal of Phonetics, 21, 427–444.
Selinker, L. (1972). Interlanguage. International Review of Applied Linguis-
tics 10(3), 209–231.
Selinker, L. (1992). Rediscovering interlanguage. New York: Longman.
Shimizu, K. (1996). A cross-language study of voicing contrasts of stop conso-
nants in six Asian languages. Tokyo: Seibido.
Vance, T. (1987). An introduction to Japanese phonology. Albany: State Uni-
versity of New York Press.
Wode, H. (1983). Phonology in L2 acquisition. In H. Wode (Ed.), Papers on
language acquisition, language learning, and language teaching
(pp. 175–187). Heidelberg, Germany: Groos.