0% found this document useful (0 votes)

33 views

Recurrent Word Combinations in Academic Writing by Native and Non-Native

Uploaded by

Nikola Nikolajević

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Recurrent Word Combinations in Academic Writing by Native and Non-Native

Uploaded by

Nikola Nikolajević

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/251600597

Recurrent word combinations in academic writing by native and non-native

speakers of English: A lexical bundles approach

Article in English for Specific Purposes · April 2012

DOI: 10.1016/j.esp.2011.08.004

CITATIONS READS

248 2,278

2 authors:

Annelie Ädel Britt Erman

Dalarna University English Department
39 PUBLICATIONS 952 CITATIONS 19 PUBLICATIONS 1,418 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Metadiscourse Across Genres 2019 (MAG2019), 27-29 June 2019, Italy View project

La metáfora en el lenguaje académico oral: la producción metafórica en los seminarios de los programas AICLE en la Educación Superior (METCLIL: segunda fase de
EuroCoAT) View project

All content following this page was uploaded by Britt Erman on 15 March 2019.

The user has requested enhancement of the downloaded file.

English for Speciﬁc Purposes 31 (2012) 81–92

Contents lists available at SciVerse ScienceDirect

English for Speciﬁc Purposes

journal homepage: www.elsevier.com/locate/esp

Recurrent word combinations in academic writing by native

and non-native speakers of English: A lexical bundles approach
Annelie Ädel ⇑, Britt Erman
Department of English, Stockholm University, 10691 Stockholm, Sweden

a r t i c l e i n f o a b s t r a c t

Article history: In order for discourse to be considered idiomatic, it needs to exhibit features like ﬂuency
Available online 1 October 2011 and pragmatically appropriate language use. Advances in corpus linguistics make it possi-
ble to examine idiomaticity from the perspective of recurrent word combinations. One
Keywords: approach to capture such word combinations is by the automatic retrieval of lexical bun-
Corpus-based research dles. We investigated the use of English-language lexical bundles in advanced learner writ-
Academic writing ing by L1 speakers of Swedish and in comparable native-speaker writing, all produced by
Native and non-native speakers of English
undergraduate university students in the discipline of linguistics. The material was culled
Formulaic language
Lexical bundles
from a new corpus of university student writing, the Stockholm University Student English
Corpus (SUSEC), amounting to over one million words. The investigation involved a quan-
titative analysis of the use of four-word lexical bundles and a qualitative analysis of the
functions they serve. The results show that the native speakers have a larger number of
types of lexical bundles, which are also more varied, such as unattended ‘this’ bundles,
existential ‘there’ bundles, and hedging bundles. Other lexical bundles which were found
to be more common and more varied in the native-speaker data involved negations. The
ﬁndings are shown to be largely similar to those of the phraseological research tradition
in SLA.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Formulaic language

It is notoriously difficult to achieve idiomaticity, that is, the knowledge of conventionalized combinations of words, in
academic discourse, whether from the perspective of the beginning university student coming to grips with formal academic
style or from the perspective of the advanced doctoral student being socialised into a specific discipline. One way in which
idiomaticity is realised is through the successful use of recurrent word combinations that are typical of a specific academic
register and discipline. Recurrent word combinations do not only contribute to idiomaticity, but also contribute to demon-
strating membership in a specific discourse community: ‘‘when we speak, we select particular turns of phrase that we per-
ceive to be associated with certain values, styles and groups’’ (Wray, 2006, p. 593). In the case of a student who is an
academic novice, this means learning to use specific word combinations as a ‘badge of identity’ (Wray, 2006). For any student
who is also a language learner in this context, yet another dimension of difficulty is naturally added.
Combinations of words that fulfill specific functions and that are called up more or less automatically by native speakers
have come to be known by the term ‘formulaic language’ (Schmitt & Carter, 2004). Research in second language acquisition
(SLA) shows that native speakers rely more on formulaic language, especially collocations, than non-native users. Further-
more, the degree of proficiency correlates significantly with the proportion and/or types of formulaic language used

⇑ Corresponding author. Tel.: +46 8 163613; fax: +46 8 159667.

E-mail addresses: [email protected] (A. Ädel), [email protected] (B. Erman).

0889-4906/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.esp.2011.08.004
82 A. Ädel, B. Erman / English for Speciﬁc Purposes 31 (2012) 81–92

(Forsberg, 2008; Lewis, 2009; Wiktorsson, 2003). For example, formulaic markers of vagueness (e.g., sort of, kind of) are
underrepresented in non-native compared to native speech (de Cock, 2004). For a subgroup within the formulaic family, col-
locations, there are mixed results, which can partly be explained by different studies applying different methodologies.
Although the majority of studies of collocations within SLA have found that non-native speakers underuse collocations com-
pared to native speakers in writing (Bolly, 2009; Erman, 2009; Granger, 1998; Howarth, 1998), some have shown that non-
native speakers use the same quantity of collocations as native speakers but overuse high-frequency collocations, which
makes type/token measures differ significantly between native and non-native writers (Durrant & Schmitt, 2009). Further-
more, it has been shown that non-natives have poorer intuitions about (a)typical collocations, and take 30% longer to make
judgements regarding collocational frequencies (Siyanova & Schmitt, 2008). Collocations are an elusive group and constitute
an area within formulaic language that ‘‘many teachers can identify as a problem though cannot describe’’, and that learners
‘‘are only dimly aware of’’ (Howarth, 1998, pp. 161–162). Although non-native academic writers can handle demanding
grammatical structures, they may fail to use the appropriate verb with a specific noun, which may hamper their communi-
cation (Howarth, 1998, p. 161).
Phenomena that fall under the rubric of formulaic language are not only difficult to acquire by language learners or novice
academics, but they are also somewhat difficult to identify and measure in naturally-occurring discourse. However, the last
couple of decades have seen an increasing use of corpora to explore large quantities of spoken and written language in search
of patterns (e.g., Hunston & Francis, 2000). Sinclair (1987) was among the first to demonstrate through corpus data that
words co-occur in specific patterns carrying specific meanings. This insight led him to formulate his ‘idiom principle’ to
the effect that words are co-selected rather than selected on an item-by-item basis.
The basic assumption of the present study is that the notion of idiomaticity must be expanded to encompass any native-
like selection of expression (cf. Pawley & Syder, 1983), be it a ‘phraseological unit’ or a ‘lexical bundle’. As the terms suggest,
these are approached and analyzed using different methodologies. Phraseological units coincide with traditional grammat-
ical units (e.g., verb + noun collocations, such as make a contribution), typically extracted from part-of-speech-tagged cor-
pora. A phraseological unit is composed of at least two words and is identified, usually manually, through at least one of
its members being used in a specialized, restricted sense, precluding the substitutability of a synonymous word. Collocations
are thus characterized by the restricted combinability of members. SLA-oriented phraseological research has used relatively
small corpora of native and non-native material (Granger, 1998; Howarth, 1998; Nesselhauf, 2003). Lexical bundles are ex-
tracted automatically from raw data—typically using sizable corpora—disregarding any pre-defined linguistic categories. De-
spite the frequency with which they occur, lexical bundles are ‘‘not idiomatic in meaning and not perceptually salient’’ (Biber
& Barbieri, 2007, p. 269). They typically do not coincide with traditional grammatical units, but instead represent clause or
phrase fragments, such as it is possible to or at the beginning of. In fact, Biber, Johansson, Leech, Conrad, and Finegan (1999)
report that less than 5% of the lexical bundles in academic prose represent complete structural units.

1.1. Lexical bundles

The analytical framework adopted for this study is captured in the notion of lexical bundles: multi-word sequences that
recur frequently and are distributed widely across different texts (Biber, 2010, p. 170). Specific, albeit varying, cut-off points
are used for the two criteria of frequency and dispersion. The frequency cut-off point used to identify lexical bundles is
‘‘somewhat arbitrary’’, but it tends to be called ‘‘conservative’’ or ‘‘relatively high’’ if set at 40 times per million words for
four-word bundles (Biber & Barbieri, 2007, p. 267). Especially in the case of spoken data, it may be useful to set the bar high,
since spoken data generally have been shown to rely more extensively on lexical bundles (Biber, Conrad, & Cortes, 2004;
Biber et al., 1999). Studies of written corpus data have used cut-offs of 25 times per million words (e.g., Chen & Baker,
2010) and 20 times per million words (e.g., Cortes, 2004). The dispersion criterion is also arbitrary, leading to varying prac-
tices in the literature. A criterion of three to five texts is often used for four-word bundles (e.g., Biber & Barbieri, 2007; Biber
et al., 2004; Chen & Baker, 2010; Cortes, 2004), but percentages are also sometimes used (Hyland, 2008b). In order to avoid
overly context-dependent expressions, ‘content bundles’ are also sometimes excluded, for example, if they are present in an
essay question or if they incorporate proper nouns (e.g., Chen & Baker, 2010).
The framework is appealing in that it is essentially data-driven and that the retrieval can be fully automatized, if we dis-
regard some manual manipulation involving content-based bundles and overlaps. Note that lexical bundles refer to contig-
uous word relations only; the method does not capture syntagmatic relations that are variable in terms of the position of the
individual words included, such as cases in which modification occurs (to a /very/ large extent) or various types of word order
variation (a quantitative study of the two genitive constructions; this part of the study was mainly quantitative). This has been
pointed out as one of the main drawbacks of the framework (Durrant, 2009).
Despite the fact that lexical bundles do not represent complete structural units, they are still seen as ‘‘important building
blocks in discourse’’ (Biber & Barbieri, 2007, p. 270) which serve important functions—not least in academic discourse, where
lexical bundles have been found to be pervasive (Biber & Barbieri, 2007). Starting with Biber et al. (2004), different attempts
have been made to pin down the functions of lexical bundles, typically resulting in three primary ones: expressions of stance
(e.g., I don’t know what the voltage is here), discourse organisers (e.g., What I want to do is quickly run through the exercise. . .)
and referential expressions (e.g., . . .students must define and constantly refine the nature of the problem. . .) (Biber & Barbieri,
2007, p. 270ff). The distribution of these functions has been shown to vary on the basis of register: for example, similar
A. Ädel, B. Erman / English for Specific Purposes 31 (2012) 81–92 83

to face-to-face conversation, classroom teaching draws heavily on stance bundles, but, like academic writing, it also draws
heavily on referential bundles (Biber et al., 2004).
Much of the original interest in lexical bundles stemmed from a concern with register, but recent work also considers
learner/expert production. Since its introduction in the Longman Grammar (Biber et al., 1999), the framework has been used
in a range of studies, including comparison of the characteristics of different registers, such as textbooks versus classroom
discourse (Biber et al., 2004; see also Biber & Barbieri, 2007), or the behaviour of different populations, such as native versus
non-native speakers (e.g., Chen & Baker, 2010; see also de Cock, 2000 and Römer, 2009 for similar approaches) and student
versus expert writers (e.g., Cortes, 2004; Hyland, 2008a). In addition, there is a small number of studies of lexical bundles
from the perspective of academic disciplines (e.g., Cortes, 2004; Hyland, 2008b), but it is as yet unclear to what extent
disciplinary variation plays a role in lexical bundles (cf. Hyland, 2008b).

1.2. Overview of the present study

If we compare the lexical bundle approach (represented by Chen & Baker, 2010) to the phraseological approach, we will
see that the fundamental findings concerning the use of formulaic language in native and non-native speaker populations
converge. Recurrent word combinations are more frequent overall in native than non-native production. Not only do native
speakers have a broader repertoire of types, but they also tend to display greater variety in form. Furthermore, certain groups
of recurrent word combinations are typically found to be underused by non-native speakers (e.g., conventionalized
adverb + adjective combinations, such as acutely aware/painfully clear, reported in Granger, 1998, p. 150 and bundles, such
as in the context of, reported in Chen & Baker, 2010, p. 30), while others are found to be overused (I/We framed constructions,
e.g., I claim that/we could say that, reported in Granger, 1998, pp. 154–155 and bundles, such as all over the world, reported in
Chen & Baker, 2010, p. 30).
The aim of the present study is to investigate the use of English-language lexical bundles in advanced learner writing by
L1 speakers of Swedish and in comparable native-speaker writing, all produced by undergraduate students of linguistics.
Comparing the two groups, we carry out a quantitative analysis of the use of lexical bundles and a qualitative analysis of
the functions served by lexical bundles. Furthermore, we relate our findings to similar data by comparing our results to those
of Chen and Baker (2010), which are based on writing from many different disciplines by native-speaker students and non-
native students with L1 Chinese. On the basis of previous research, it is hypothesized that the non-native students will pro-
duce fewer bundles overall (cf. Erman, 2009; Howarth, 1998), and less varied ones (e.g., Granger, 1998; Lewis, 2009). We
believe this to be the first systematic study of lexical bundles used by undergraduate EFL students in a European setting.
Previous studies exist for ESL contexts (Chen & Baker, 2010) and for EFL settings in Asia at master’s and doctoral levels
(Hyland, 2008a,b). Römer (2009) includes graduate EFL students who are L1 speakers of German, but considers only the
20 most frequent bundles (regardless of dispersion) and does not filter out content-related bundles (such as Quirk et al.,
1985 and the University of Michigan).

2. Material and method

The material used for the study is from a new corpus of essays by university students of linguistics, both native and non-
native speakers of English. The method follows that of Chen and Baker (2010), which is based on the pioneering work of Biber
and colleagues.

2.1. The corpus material

The material is from the Stockholm University Student English Corpus (SUSEC) and includes 325 essays, amounting to
over one million words. The writing represented in the corpus is by students of linguistics at different levels who are
non-native speakers (L1 Swedish) and native speakers of (British) English. The non-native material consists of writing by stu-
dents of English linguistics from the ﬁrst to the fourth term of study in the Department of English at Stockholm University,
while the native-speaker material for comparison consists of writing by students of linguistics from the second and third
year of study at the Department of Linguistics at King’s College in London. Table 1 gives an overview of the data used for
the learner subcorpus, while Table 2 details the native-speaker subcorpus.

Table 1
Material for the non-native-speaker subcorpus.

Student level Texts Words Average text length

First term 83 81,873 986
Second term 81 171,014 2111
Third term 62 417,772 6738
Fourth term 17 192,548 11,326
Totals 243 863,207 3552
84 A. Ädel, B. Erman / English for Speciﬁc Purposes 31 (2012) 81–92

Table 2
Material for the native-speaker subcorpus.

Student level Texts Words Average text length

Second year 29 80,750 2784
Third year 53 166,685 3145
Totals 82 247,435 3018

The two subcorpora are largely comparable, although there are important differences between them, one of which con-
cerns the year of study of the two student groups. The learner material spans 2 years of study, from the first term through to
the fourth (and final term) of undergraduate study. The native-speaker material does not include any first-year data, but only
data from the second and third years of study. However, in both groups, around 70% of the material is produced by students
in the final stages of the program. The two subcorpora are also different with respect to revision and the number of students
represented. The third- and fourth-term essays of the non-native-speaker data have been revised by the student writers
themselves and, to some extent, by their supervisors, while the native-speaker essays have not undergone any revision.
The number of students represented is considerably larger in the non-native material, with almost all of the 243 essays hav-
ing different authors. Many of the native speakers, by contrast, have two or more of their texts represented (although written
for different courses), so the total number of individual students is just over 30, which is a drawback from the perspective of
dispersion. Although the native-speaker subcorpus is considerably smaller, it still offers a relatively large collection,
especially considering that it represents writing from one single discipline (topics covering phonetics, psycholinguistics,
semantics, sociolinguistics and discourse analysis). Further differences between the subcorpora have to do with text
length—varying from approximately 1000 to 11000 words—and also, to some extent, differences in task—with the native-
speaker material involving discussions of previous research to a far greater extent than reports of students’ own small-scale
research projects. See Section 3 for further discussion on the possible effects these differences could have on the use of lexical
bundles.

2.2. Analytical steps

Only four-word lexical bundles were considered for the study, in order to make the analysis more manageable and com-
parable to that of Chen and Baker. The four-word scope is ‘‘the most researched length for writing studies, probably because
the number of 4-word bundles is often within a manageable size (around 100) for manual categorization and concordance
checks’’ (Chen & Baker, 2010, p. 32). However, we can only concede that bringing in lexical bundles of other scopes would
have made for a richer study. A word of caution is called for in this context also, from the perspective of a recent study by
Simpson-Vlach and Ellis (2010, p. 509), which found that many important recurrent word combinations are actually three-
word bundles. That said, we can make the point (with Cortes, 2004, p. 401) that three-word bundles are often subsumed in
four-word bundles (e.g., as a result of contains as a result).
In order to retrieve the four-word lexical bundles, WordSmith’s (Scott, 2007) function WordList cluster was used. The cut-
off frequency was set at 25 times per million words1; since the two subcorpora differ in size, the cut-off point was equivalent
to a raw frequency of 22 in the non-native material and 6 in the native material. Following Chen and Baker (2010), the disper-
sion criterion was that a word combination had to occur in at least three texts in the native-speaker data in order to be included.
However, since the non-native material is considerably larger, the dispersion criterion was set at nine texts in an attempt to
make the results as comparable as possible. The purpose of the dispersion criterion is to guard against idiosyncrasies introduced
by individual writers (cf. Biber et al., 2004).
The retrieved bundles were checked manually for context-dependent content bundles and overlapping bundles, following
Chen and Baker (2010). We excluded content bundles involving proper nouns (the New York Times, at King’s College London),
terms that were directly related to the topic (e.g., between men and women used in essays on the topic of language and gen-
der, and of the English language used in essays on specific linguistic phenomena in English) and, thus, terms and expressions
specific to the discipline (e.g., as a second language, in the present tense), but we included terms and expressions used in re-
search in general (e.g., the majority of the informants, the age of #). The omission of proper nouns resulted in three exclusions
from each list. The omission of topic- and discipline-specific terms resulted in nine exclusions from the non-native list and
28 from the native list. The reason for excluding topic-specific bundles is to guard against idiosyncrasies introduced by those
topics that happen to be represented in the material. Furthermore, the presence of content bundles specific to linguistics
would have affected the comparison of our results to the multidisciplinary material of Chen and Baker (2010). Nevertheless,
there were in fact not many discipline-specific terms to exclude, presumably because a range of subfields of linguistics were
represented and possibly also because specialist terms are often shorter than four words, or occur as acronyms. The proce-
dure for dealing with overlapping bundles involved merging examples such as due to the fact and to the fact that into due to
the fact + that, the justification being to guard against inflated numbers (cf. Chen & Baker, 2010, p. 33). These were checked
manually by concordancing. Table S1 represents extensions of a four-word bundle by a plus sign, which is put in parenthesis
if the four-word bundle does not co-occur with it in all cases.

1
Note that, because of rounding used in the normalisation, many native bundles have a frequency of 24.
A. Ädel, B. Erman / English for Speciﬁc Purposes 31 (2012) 81–92 85

3. Results

The overall quantitative comparison between our two groups is presented ﬁrst, followed by a discussion of those bundles
that are shared and not shared between the native and non-native speakers. The comparison to Chen and Baker’s results is
presented thereafter, followed by the functional classiﬁcation of the lexical bundles found.

3.1. Overall comparison

Table S1 (see Supplementary material) provides a list of the lexical bundles, divided into those that are found in either
group and those that are shared by both groups.2 Note that the fact that a given lexical bundle made it onto the non-native
list does not necessarily mean that it was not used at all by the native group (or vice versa); it simply means that the frequency
and dispersion criteria were not met in the other group’s material. In the second column of the table, the first number refers to
the non-native group and the second number to the native group. The hash symbol (#) represents any number represented by
digits in the subcorpora (thus representing a minor solution to the variability problem); the actual frequencies of some of these
bundles would have been higher if numerals written out in full had also been included.
The native-speaker writing shows a considerably wider range of lexical bundles than the learner writing, with a total of
130, as compared to 60. This general pattern verifies our hypothesis.
The frequency differences across subcorpora were tested for statistical significance, using the log-likelihood statistic.3
Applying statistical tests goes against the tradition in the literature on lexical bundles, which is characterized primarily by sim-
ple descriptive statistics. Some, however, such as Simpson-Vlach and Ellis (2010, p. 492), have argued that statistics such as log-
likelihood are ‘‘useful for comparing the relative frequency of words or phrases’’ across corpora. The bundles in Table S1 are
marked with if they occur in the list for only one subcorpus and if the difference in frequency between the two subcorpora
(not shown in the table) does not reach statistical significance. This symbol is also used below when a bundle is discussed for
which statistical significance was not found. The dispersion cut-offs have been taken into account in that only when a given
bundle meets the dispersion criterion has it been tested for statistical significance. The tests show that as many as 70% of
the lexical bundles occurring in only one list (43 types in the non-native data and 89 in the native data) do so with a significance
level of p < 0.01. While 70% is a large proportion, it is still the case that 30% of the bundles types do not reach statistical signif-
icance—despite the initial frequency and dispersion cut-offs.4 While our study does not depart from the established procedure
for selecting bundles based on simple descriptive statistics (we merely mark those that are not significant), these results suggest
that future research should consider augmenting the procedures used for bundle selection with more sophisticated inferential
statistics.

3.1.1. Bundles shared

A total of 22% of the bundles are shared by the two groups. It is difficult to say whether this constitutes a large or a small
proportion, without comparison to other data. If the Chen and Baker (2010) results are presented this way, we find that 16%
of the bundles are shared between the native and non-native students. Considering that the bundles of our study are based
on texts written in one single academic discipline, we would expect a larger proportion of overlap than for Chen and Baker,
whose material is spread across disciplines. Additional factors may also play a role (cf. Section 3.2).
Even if a given lexical bundle occurs in both materials, it does not necessarily mean that it is used equally frequently by
the two groups. Indeed, in the present data set we find major frequency differences between the groups for a number of bun-
dles. If we consider shared bundles that are used more than twice as often by one group compared to the other, we find a
total of seven types, three of which are used more frequently in the non-native-speaker data and four of which are used more
frequently in the native-speaker data. In the non-native types, we find as well as in, which is likely seen by the non-native
writers as a stylistic marker, adding to the formality of their texts in ways in which a simple and could not. This may be influ-
enced by the fact that L1 Swedish has a coordinator used in formal registers only (samt). The bundle the aim of this is meta-
discursive, referring mostly to the topic of the current text. The overuse of metadiscourse in learner data has been reported
elsewhere (Ädel, 2006). Recall that a normalised dispersion value has been applied in order to offset the imbalance in the
number of essays in the two subcorpora. The greater use of the results from the is genre-related in that the two groups used
somewhat different methodologies, the general pattern being that the non-natives’ essays more often involve analysis of
empirical data, while the natives’ essays involve discussion of published writing. In the bundles that are more frequent in
the native data, we find as a result of, which is used with a resultative meaning, and not with the intent to report on results
(unlike the results from the). This strikes us as a useful bundle for the learners to adopt to a greater extent. The bundle at the
beginning of is also used more frequently in the native-speaker data; in fact, the only preposition used by the native writers in
this context is at, whereas in also occurs in the non-native group. The bundle to look at the appears to be a genre-related way

2
Here we follow Erman and Lewis’ study of multiword structures in native and non-native speech (2011 – see my notes in the reference list).
3
The frequencies of all of the bundle types in either list were checked against the frequencies in the other subcorpus, using Paul Rayson’s online calculator
(http://ucrel.lancs.ac.uk/llwizard.html).
4
We also tested the frequency differences of the shared bundles for statistical significance (suggested by Stefan Gries, p.c.). This analysis showed that most of
the shared bundles (87%) were not used differently by the two groups, but that 13% of the shared bundles were significantly overused by either group at the
level of p < 0.01.
86 A. Ädel, B. Erman / English for Specific Purposes 31 (2012) 81–92

of introducing the topic, considering that the native speakers’ essays mostly involve expository discussion. This suggests that
the non-native speakers’ overuse of this type of metadiscourse (such as the aim of this) is less strong, but also that they use
different wordings from the native speakers. Finally, it is unclear why the bundle can be used to is used more extensively by
the native speakers. Testing the statistical significance of these shared bundles, we find that, with the exception of the results
from the and can be used to, they are used more often by both group with a statistical significance of p < 0.01.

3.1.2. Bundles not shared

If we turn next to those lexical bundles that are not shared between the two groups, what differences emerge? Three gen-
eral patterns emerge with respect to the general vocabulary in the lexical bundles used by the non-native group only. One
pattern points to potential register difficulties, as evidenced by items such as ‘find out’, ‘easy’ and ‘hard’. A second pattern
reflects a concern with empirical data and results, as in the occurrence of ‘table’ (five different bundles), ‘percent’, ‘total’,
‘majority’ and ‘range’. The third pattern appears to reveal a greater awareness of the text as a text, as in ‘present study’ (five
different bundles) and ‘essay’ (two bundles). Some lexical bundles, such as the metadiscursive in this essay, enjoy a special
status in the individual texts in that they are likely to occur no more than once—if at all—per text. A topic is typically only
introduced once, which means that the number of texts per subcorpus becomes an important consideration. Since the lear-
ner material consists of 243 and the native-speaker material 82 texts, we should expect more occurrences of metadiscursive
bundles, such as in this essay in the former. In fact, this ought to be adjusted for through a ‘normalised’ dispersion value, or
the totals will be skewed precisely because of the imbalance between the total number of essays represented. Varying text
lengths could also be an issue in that, if one set of texts is considerably longer, it could boost references to other parts of the
text. We have not come across any discussions of this in the literature.
The native-speaker group, having produced a greater range of lexical bundles, offers a more complex picture. The general
pattern is that the features that appear in the lists identify the native-speaker writers as more mature academic writers—for
example, the greater use of unattended ‘this’, existential ‘there’, hedges and passives. Although frequently used by both
groups, one feature that specifically marks the non-native writers as less mature academic writers is the greater use of antic-
ipatory ‘it’ constructions, coupled with relatively informal lexical choices, involving ‘hard’ and ‘easy’, for these constructions.
These patterns will be discussed next.
One striking difference between the groups is the large number (nine strings) of patterns in the native list involving unat-
tended ‘this’—that is, where there is no summarising head noun (as in ‘this essay’). These bundles all constitute explanations
or explications, expressed with varying degrees of speaker certainty:

(1) this can be seen

this is. . . seen in/because the/due to/supported by/the case
this may be because
this shows that the
this would suggest that

Previous research has shown that unattended ‘this’ is not a rare phenomenon in published academic writing, but rather is
even more common than attended ‘this’ in some disciplines (Swales, 2005, p. 10). We can note that attended ‘this’ bundles
are rare in our native-speaker list (in this essay I), but common in our non-native-speaker list, predominantly with the meta-
discursive head nouns ‘study’ and ‘essay’. There are no bundles including unattended ‘this’ in the non-native list.
Like unattended ‘this’, existential ‘there’ constructions display a striking pattern in that a large group of such bundles oc-
cur in the native list, while none appear in the non-native list. As many as seven types are included:

(2) there appears to be

there are. . . a number + (of) /many different
there has been a
there is. . . a lot/also a /evidence of

Some of these serve as ‘‘a springboard in developing the text’’ (Biber et al., 1999, p. 952), by, for example, introducing a series
of elements:

(3) However, there are a number of limitations to the data that may have affected my analysis of them and so my
account of Amaljeet’s language use. Firstly, it does not seem necessary to... [KC_2_024]

It should be noted that the shared list also includes four ‘there’ bundles (there is a difference; there seems to be; that there is a;
that there is no), so the overall picture is not that the non-native writers do not use these at all, but that the native writers
draw on such structures to a much greater extent, and with more variation.
A. Ädel, B. Erman / English for Speciﬁc Purposes 31 (2012) 81–92 87

Another way in which the two groups differ is in the use of hedges. While there are some shared types of hedging,
expressed by ‘seem’ (seems to be a; there seems to be) or ‘can’ (e.g., as can be seen), the native writers appear to use more
hedges than the non-native writers. Indeed, the native list includes over 20 hedges, while the non-native list has only four
(two involving ‘can’ and two involving ‘seem’).5 In addition to ‘can’ and ‘seem’, the native writers frequently use expressions
such as to a certain extent as well as hedges involving ‘may’, ‘appear’ and ‘could’. Thus, the native speakers not only use a
larger number of recurrent hedges, but also draw on a wider variety of lexical resources for expressing uncertainty and doubt.
This is in line with previous research; for example, Chen and Baker (2010, p. 43) found that their non-native group did not
demonstrate control of hedging ‘‘as diversely and robustly as native writers do’’. This appears to be the case also in spoken
discourse, where vagueness tags (e.g., or something, sort of, kind of) are underused by learners (de Cock, Granger, Leech, &
McEnery, 1998).
Five passive bundles appear in the shared list, showing that the students have adopted, to some extent, this highly char-
acteristic feature of academic writing (cf. Biber et al., 1999). Despite the existence of shared passive strings, however, the
distribution of passive constructions is another area of discrepancy between the groups, where, again, the native writers
make greater use of these complex structures than the non-native writers (25 versus 8 types).6 The verbs found in the native
list include see (nine bundles), refer to (five bundles), find (two bundles), say, note, attribute to, relate to, define, support, suggest,
assume and use. The list of verbs found in the non-native list is shorter, including only four of the verbs above, plus show (shown
in table #).
‘It’-clauses, especially those followed by an extraposed ‘to’-clause and, to a lesser extent, those followed by an extraposed
‘that’-clause, have been found to be unusually frequent in academic writing as opposed to other registers (Biber et al., 1999).
The fact that anticipatory ‘it’ structures are seen as useful devices by our student writers is evident from the list of shared
bundles, which contains no fewer than six of these structures:

(4) it is [difﬁcult/important/interesting/necessary/possible] to
it would be interesting + to

It has been suggested that anticipatory ‘it’ patterns are sometimes exploited by learners and student writers to ‘‘state
propositions more forcefully than is appropriate’’ (Hewings & Hewings, 2002, p. 381), although there is no clear evidence
for this in the current data. However, the lists indicate that the learners make use of these structures to a greater extent,
in that it is easy to, it is hard to and it is clear that are unique to the non-native list. We have already noted the potentially
inappropriate use of ‘easy’ and ‘hard’, which are often avoided in formal registers. Nevertheless, we suggest that it is a useful
strategy for L1 Swedish students to use this structure, simply for the reason that it helps the writer to project a detached
writing persona; impersonalized evaluative statements have been found to be underused by learner writers (e.g., Milton,
1998).

3.2. Comparison to Chen and Baker (2010)

There are a few ways in which our material is different from that of Chen and Baker (2010): our corpus is discipline-spe-
cific rather than covering many disciplines; it is larger in size by approximately 70% than the BAWE subcorpora used by Chen
and Baker; the L1 of our non-native group is Swedish, not Chinese and our corpus material has been marked up for quoted
material, such that any bundles captured definitely represent the students’ own writing (cf. Ädel, 2010). Table S2 (see Sup-
plementary material) provides an alphabetically sorted list of the lexical bundles found by Chen and Baker (2010), which we
have divided into those that are found in either group and those that are shared by both groups.
Comparing our results (Table S1) to Chen and Baker’s, we find that the overall pattern is the same, but that there is greater
discrepancy within our data set. Chen and Baker (2010, p. 44) come to the conclusion that ‘‘the use of lexical bundles in non-
native and native student essays is surprisingly similar’’. We find 54 bundles unique to the non-native student list (compared
to 60 in our data), 78 bundles unique to the native student list (compared to 130), and 26 bundles that are shared between
the two groups (compared to 55). Thus, the range in their data is from 54 to 78, but from 60 to 130 in our data. Expressed in
proportions, the difference in the number of types is found in the non-native-speaker groups (34% in Chen & Baker versus
24% in SUSEC) and in the shared bundles (16% in Chen & Baker versus 22% in SUSEC), but not in the native-speaker groups,
which produce half of the types in both corpora (49% in Chen & Baker versus 53% in SUSEC).
This raises the question of why the difference is greater between our student groups. There are several possibilities.
One is that our non-native material represents a context in which English is a foreign language, while Chen and Baker’s
non-native material represents a context in which English is a second language, with the student writers being interna-
tional students in an English-speaking country. On the basis of EFL–ESL, we would expect greater differences within our
groups. Another possible source of discrepancy is in the method itself, as ‘‘larger corpora will generate fewer recurrent
word combinations with the same cut-off normalized frequency, when compared with smaller corpora, because large

5
The majority of these occur with statistical significance in the native data, while three of four in the non-native list do not.
6
The majority of these occur with statistical significance in the native data, while three of eight in the non-native list do not.
88 A. Ädel, B. Erman / English for Specific Purposes 31 (2012) 81–92

corpora will elicit higher converted raw frequencies’’ (Chen & Baker, 2010, p. 43; see also Biber & Barbieri, 2007, p.
269fn). This could have resulted in lower frequencies in the non-native subcorpus, since it is larger than the native-
speaker subcorpus by two thirds, thus increasing the difference between our two groups. Yet another possible factor
is that only 30 different writers are represented in the native-speaker material. According to the principle that ‘the fewer
speakers represented, the greater the likelihood of uniform behaviour’, this could have boosted the number of bundles.
There are also factors likely to work toward closing the gap between the subcorpora. One is the fact that the L1 of our non-
native students (Swedish) is closely related to English, while the L1 of the Chen and Baker group (Chinese) is not. Another
factor is discipline-specificity; since our material is restricted to one single discipline, unlike Chen and Baker’s multidisciplin-
ary material, we would expect stronger convergence within our groups. That said, it seems that different disciplines may
vary considerably in the overall number of lexical bundles yielded; the results for research articles in history and biology
given in Cortes (2004, pp. 404, 407) show that history only has half as many (54) as biology (109). We eagerly await further
research into the distribution of bundles to account for the various factors that may have an effect on studies of this type.
Finally, we can note that the differences would have been greater had we not excluded content bundles (which was also
done by Chen and Baker).
Our results clearly converge with Chen and Baker’s, demonstrating that non-native speakers produce fewer and less
varied lexical bundles than native speakers. Even in the case of advanced learners, this has been a robust finding in pre-
vious research in SLA. There are two of studies of lexical bundles, however, in which it could be said that the opposite
pattern is found: the non-native groups in Römer (2009) and Hyland (2008a) produce a larger number of bundles than
the native groups. The results in the first case could have to do with study design. For example, the cut-off points are
highly restricted (to a top 20 list), which is likely to favour the learners who have a more restricted repertoire but tend
to use their favourite bundles unusually often (the use of ‘lexical teddy bears’ by learners has been noted in the liter-
ature, referring to the reliance on that which is familiar, by ‘‘choosing words and phrases closely resembling their first
language or those learnt early or widely used’’ (Hasselgren, 1994, p. 237)). The results in the second case showed dif-
ferences between three different registers produced by different populations: research articles by native-speaking profes-
sionals and Master’s and PhD theses by L1 Cantonese speakers that had been awarded high passes. The fact that the
research articles had the smallest number of lexical bundles, however, is explained by reference to register—nativeness
is not considered relevant.

3.2.1. Bundles shared and not shared

If we compare the bundles that are shared by our groups to those that are shared by Chen and Baker’s groups of native
and non-native student writers, we ﬁnd as many as 15 shared bundles:

(5) at the same time

as a result of
as well as the
can be used to
in order to make
in the [case/form] of
[is/was] one of the
it is [difﬁcult/important/necessary] to
on the other hand
one of the most
the rest of the
to be able to

A comparison of these two lists of shared bundles in student writing to Chen and Baker’s list of bundles in published writ-
ing from the FLOB corpus reveals that as many as 14 of the 15 shared bundles are also found in professional academic writ-
ing, the one exception being as well as the. These widely shared core bundles must be highly useful, hence their abundance.
Corpus-based investigations of native and non-native speaker production often reveal patterns of over- and underuse.
Chen and Baker (2010) found that the non-native student writers overused (over-)generalizing expressions such as all over
the world, which were rarely used by native-speaker academics. No similar expressions were found in our data. We do find
some patterns of underuse by our non-native students, one of which involves negations with not and no, which are smaller in
number in the non-native groups, as illustrated in Table 3.
This can be explained by the fact that such negated structures are rather complex, and thus likely to be learned later.
Another complex structure which is more common in the native groups is ‘fact’-headed bundles. The Swedish group shares
six of these with the native speakers (due to the fact + that, is the fact that, [to/of] the fact that, the fact that [the/they]), who also
have an additional three (by the fact that ; despite the fact that ; in the fact that). The Chinese group only has one (to the fact
that), while the native-speaker group has four ((due) + to the fact that, is the fact that, the fact that [the/they]). Both the
negation and the ‘fact’ patterns are also confirmed by Chen and Baker’s data from published writing.
A. Ädel, B. Erman / English for Specific Purposes 31 (2012) 81–92 89

Table 3
Bundles involving negation from the non-native and native lists.

Non-native bundles Native bundles

Our study does not seem to but not in the
that it is not it is not possible + (to)
that there is no [3] it would not be
may or may not
not be able to
not seem to be
that it is not
that there is no [8]
Chen & Baker is not only a because it is not
last but not least [2] is by no means
not be able to
there is no evidence
there would be no [5]

3.3. Functional classiﬁcation

In an attempt to examine the functions served by the lexical bundles, Chen and Baker’s classification, based on Biber et al.
(2004), was applied. The purpose was to investigate the extent to which the natives and the non-natives use lexical bundles
for different functions.
Biber et al. (2004, p. 384) describe the three main categories in the functional classification of lexical bundles as follows:
Referential bundles make direct reference to physical or abstract entities, or to the textual context itself, either to identify the
entity or to single out some particular attribute of the entity. Stance bundles express attitudes or assessments of certainty that
frame some other proposition. Discourse organisers reflect relationships between prior and coming discourse. A category of
interactional bundles is sometimes also used if spoken data is considered (e.g., Biber et al., 1999). Within these main catego-
ries, which reflect the Hallidayan metafunctions of language, finer distinctions are made, involving three to four subcatego-
ries for each category. The subcategories used by Chen and Baker (2010, pp. 37–38) are shown in Fig. 1.
We have several reservations about this classification, despite the fact that it is presented as unproblematic in Chen and
Baker (2010). Here, the classification is introduced as fully established, even though it was called ‘‘preliminary’’ when first
presented in Biber et al. (2004, p. 383). The main problem is that no clear criteria are given for how to decide which (sub)cat-
egory a given bundle should belong to. While some of the subcategories are somewhat well-defined by previous research or
are intuitively clear (e.g., topic introduction, quantifying), others are vague (e.g., identification/focusing, framing). In fact, this
vagueness has led to several inconsistencies in previous research. For example, Focusing is labelled Discourse organising in
Chen and Baker, but Referential in Biber et al. (2004) and Simpson-Vlach and Ellis (2010). Framing is labelled Referential in
Chen and Baker and Biber et al. (2004), but Discourse organising in Cortes (2004). An additional problem is the multifunc-
tionality of many lexical bundles. When classifying a given type, as emphasized in Biber et al. (2004), it is therefore necessary
to consider the extended context to determine what the predominant function is.

Fig. 1. The subcategories of the functional classiﬁcation.

90 A. Ädel, B. Erman / English for Speciﬁc Purposes 31 (2012) 81–92

Fig. 2. The distribution of discourse functions (types).

Despite considerable difficulty, we were able to improve the initial 66% agreement rate through extensive discussion,
checks of bundle classifications in the literature, and the contextual analysis of concordance lines and finalise the classifica-
tion with almost 100% agreement, albeit with some 10% of the labels marked with question marks. It is not clear, however, to
what extent our understanding of some of the categories matches that of other researchers. The results are shown in Fig. 2.
What we find are rather similar proportions for referential expressions in the two groups, but a greater proportion of
stance bundles and a smaller proportion of discourse organisers among the native speakers. This confirms a pattern already
spotted: the native speakers’ greater reliance on, and greater variation in, stance bundles. One possible explanation for the
differences between referential expressions and discourse organisers is found in the highly frequent PP-bundles, especially
the in-bundles. These are similar in both native-speaker student groups, notably with abstract nouns as complements (e.g.,
purpose, variety, attempt). In contrast, non-native in-bundles in our material have concrete nouns as complements (essay,
study, table), adding to the high numbers for discourse organisers in this group.
Given our reservations about the classification, we have not tested the differences for statistical significance. This is con-
trary to Chen and Baker (2010), where the statistical comparison is strongly emphasized. Our perspective, however, is that
clearer definitions and a more generally agreed-upon classification are needed before inferential statistics could really make
a contribution.

4. Conclusion

The results presented in this study confirm a general pattern found by research in both the phraseological tradition (e.g.,
Erman, 2009; Howarth, 1998) and the lexical bundles tradition (Chen & Baker, 2010), which is that non-native speakers ex-
hibit a more restricted repertoire of recurrent word combinations than native speakers. This was found to be the case despite
the fact that our specific learner group is highly advanced, studying English linguistics at university level in a country where
the general proficiency level of English is relatively high. The range of the number of types in our data was found to be con-
siderable, with 60 bundles unique to the non-native speakers, 55 bundles shared between both groups, and as many as 130
bundles unique to the native speakers. The hypothesis that the non-native student writers would produce not only fewer
types overall, but also less varied ones, was also verified. For example, more varied means of expression among the native
speakers were found in unattended ‘this’ constructions, existential ‘there’ constructions, hedges, and passive constructions.
Other complex structures which were found to be more common and more varied in the native-speaker data were negated
patterns (e.g., not be able to, there is no evidence) and ‘fact’-headed bundles (e.g., the fact that they).
Several issues in design and methodology emerged which ought to be considered in future work. One issue is to do with
the relationship between descriptive and inferential statistics in the consideration of lexical bundles. In testing the frequency
difference between bundles in the native and non-native lists for statistical significance, which is rarely (if ever) done in the
literature, we found that 70% of the bundles in each list occurred with statistical significance. While currently not applied in
the definition and/or selection of lexical bundles, some measurement of statistical significance should perhaps be considered
in future work. Another issue concerns corpus comparability. Even though the corpus material used was largely comparable,
certain aspects of the context of writing appeared to have had an effect on the use of lexical bundles. For example, a larger
number of bundles referring to results was evident in the non-native speaker data, which we took to be due to a difference in
genre or task, made manifest in the non-native students typically reporting on an empirical study and the native speakers
typically discussing different approaches to linguistic issues. This can be seen as a confirmation of previous research showing
that there is not just ‘‘one single pool of lexical bundles’’ that speakers or writers draw on, but that ‘‘each register employs a
A. Ädel, B. Erman / English for Specific Purposes 31 (2012) 81–92 91

distinct set of lexical bundles, associated with the typical communicative purposes of that register’’ (Biber & Barbieri, 2007,
p. 265).
It needs to be stressed that only four-word bundles have been considered in this study; had three- and two-word bundles
also been covered, a vastly larger number of recurrent patterns would have been retrieved—and a fuller picture of the use of
formulaic language among our populations could have been given. By way of evaluating the types of structures that were and
were not captured by the four-word scope, we can point out that a large number of PPs and NPs (involving many abstract
nouns) were retrieved. This is in line with previous research showing that lexical bundles in academic writing are predom-
inantly phrasal rather than clausal: Biber (2010, p. 172) calculates that ‘‘70% of the common bundles in academic prose con-
sist of a noun phrase with an embedded prepositional phrase fragment (e.g., the nature of the) or a sequence that bridges
across two prepositional phrases (e.g., as a result of).’’ What the four-word scope did not capture, however, was adjective
phrases (e.g., statistically significant, most important), adverbial phrases (e.g., not necessarily, more specifically) and verb
phrases other than passives (e.g., draw attention to).
We compared our results to Chen and Baker’s (2010) specifically to be able to relate our results to similar data and better
evaluate our numbers. It would be a great step forward if future research were to develop clear criteria for comparing fre-
quencies across different studies, taking potential confounding factors into account. Ensuring the feasibility of comparison is
also important with respect to the functional classification of bundles. In fact, we found the functional classification to be the
most challenging part of the study, primarily because the various (sub)categories need to be better defined and more gen-
erally agreed upon in the literature.
If we were to single out only one future direction in which to continue this line of research, we would choose to focus on
qualitative analyses of context. By way of illustration, consider the expression in terms of. While in our material it is drawn on
equally frequently by the two groups, contextual analysis reveals rather different usage patterns. For example, the native
speakers’ uses co-occur with items such as define, vary/variable and similar, while the non-native speakers’ uses co-occur
with items such as differ/difference/different, describe and view. The non-natives appear to have identified the expression
as a marker of formal writing (and possibly as a useful information-structuring device), but they are not using it in a fully
target-like manner. A qualitative perspective would be likely to prove especially fruitful when comparing native and non-
native populations. It could prove quite revealing to examine those lexical bundles that are more or less equally frequent
in both groups, because even though a given lexical bundle is drawn on to the same extent, it does not necessarily follow
that it is used in the same way by native and non-native speakers.

Acknowledgements

We are grateful to two anonymous reviewers for their valuable comments on the manuscript. We also gratefully
acknowledge support from The Bank of Sweden Tercentenary Foundation for the compilation of SUSEC.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.esp.2011.08.004.

References

Ädel, A. (2006). Metadiscourse in L1 and L2 English. Amsterdam: John Benjamins.

Ädel, A. (2010). Using corpora to teach academic writing: Challenges for the direct approach. In M. C. Campoy-Cubillo, B. Belles-Fortuño, & L. Gea-Valor
(Eds.), Corpus-based approaches to ELT (pp. 39–55). London: Continuum.
Biber, D. (2010). Corpus-based and corpus-driven analyses of language variation and use. In B. Heine & H. Narrog (Eds.), The Oxford handbook of linguistic
analysis (pp. 159–191). Oxford: Oxford University Press.
Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes, 26, 263–286.
Biber, D., Conrad, S., & Cortes, V. (2004). ‘If you look at. . .’: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25, 371–405.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. London: Longman.
Bolly, C. (2009). The acquisition of phraseological units by advanced learners of French as an L2: High frequency verbs and learner corpora. In E. Labeau & F.
Myles (Eds.), Revisiting advanced learner varieties: The case of French. Oxford: Peter Lang.
Chen, Y.-H., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning and Technology, 14(2), 30–49.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23(4),
397–423.
de Cock, S. (2000). Repetitive phrasal chunkiness and advanced EFL speech and writing. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory
(pp. 51–68). Amsterdam: Rodopi.
de Cock, S. (2004). Preferred sequences of words in NS and NNS speech. BELL – Belgian Journal of English Language and Literature, 225–246.
de Cock, S., Granger, S., Leech, G., & McEnery, T. (1998). An automated approach to the phrasicon of EFL learners. In S. Granger (Ed.), Learner English on
computer (pp. 67–79). London: Longman.
Durrant, P. (2009). Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28, 157–169.
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? IRAL, 47, 157–177.
Erman, B. (2009). Formulaic language from a learner perspective: What the learner needs to know. In B. Corrigan, H. Quali, E. Moravcsik, & K. Wheatley
(Eds.), Formulaic language (pp. 27–50). Amsterdam: John Benjamins.
Erman, B., & Lewis, M. (2011). Multiword structures in the speech of non-native and native speakers of English. Paper presented at EUROSLA 21, 21st annual
conference of the European second language association, 8–10 September.
Forsberg, F. (2008). Le langage préfabriqué: Formes, fonctions et frequences en français parlé L2 et L1. Bern, Switzerland: Peter Lang.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and
applications (pp. 145–160). Oxford: Clarendon Press.
92 A. Ädel, B. Erman / English for Specific Purposes 31 (2012) 81–92

Hasselgren, A. (1994). Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International
Journal of Applied Linguistics, 4, 237–260.
Hewings, M., & Hewings, A. (2002). ‘It is interesting to note that’: A comparative study of anticipatory ‘it’ in student and published writing. English for Specific
Purposes, 21(4), 367–383.
Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 24–44.
Hunston, S., & Francis, G. (2000). Pattern grammar: A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins.
Hyland, K. (2008a). Academic clusters: Text patterning in published and postgraduate writing. International Journal of Applied Linguistics, 18(1), 41–62.
Hyland, K. (2008b). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27, 4–21.
Lewis, M. (2009). The idiom principle in L2 English: Assessing elusive formulaic sequences as indicators of idiomaticity, fluency, and proficiency. Saarbrücken,
Germany: VDM Verlag.
Milton, J. (1998). Exploiting L1 and interlanguage corpora in the design of an electronic language learning and production environment. In S. Granger (Ed.),
Learner English on computer (pp. 186–198). London: Longman.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2), 223–242.
Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. Richards & R. Schmidt (Eds.), Language and
communication (pp. 191–226). London: Longman.
Römer, U. (2009). English in academia: Does nativeness matter? Anglistik: International Journal of English Studies, 20(2), 89–100.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action: An introduction. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use.
Amsterdam: John Benjamins.
Scott, M. (2007). WordSmith tools (Version 4.0) [Computer software]. Oxford: Oxford University Press.
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list (AFL). Applied Linguistics, 31, 487–512.
Sinclair, J. McH. (1987). The nature of the evidence. In J. McH. Sinclair (Ed.), Looking up: An account of the COBUILD project in lexical computing (pp. 150–159).
London: Collins.
Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: A multi-study perspective. The Canadian Modern Language Review,
64(3), 429–458.
Swales, J. (2005). Attended and unattended ‘‘this’’ in academic writing: A long and unfinished story. ESP Malaysia, 11, 1–15.
Wiktorsson, M. (2003). Learning idiomaticity: A corpus-based study of idiomatic expressions in learners’ written production. Lund Studies in English (Vol. 105).
Lund, Sweden: Lund University.
Wray, A. (2006). Formulaic language. In K. Brown (Ed.), Encyclopedia of language and linguistics (pp. 590–597). Oxford: Elsevier.

Annelie Ädel’s main research areas are discourse analysis, corpus linguistics and English for Academic Purposes. She has been afﬁliated with the University
of Michigan’s English Language Institute as a post-doctoral fellow and as Director of Applied Corpus Linguistics. She is currently a research fellow at
Stockholm University, Sweden.

Britt Erman’s main research ﬁelds are Conversation Analysis; Discourse Analysis; Grammaticalization; the Mental Lexicon. Currently involved in a large-
scale project at the Centre of Bilingualism, Stockholm University, with a focus on ‘multiword expressions’ in L1 and L2 English, aimed at establishing
ultimate attainment in highly advanced and immersed L2 users.

View publication stats

Collins Cobuild English Grammar
From Everand
Collins Cobuild English Grammar
HarperCollins UK
4/5 (13)
Teaching Speaking, Revised Edition
From Everand
Teaching Speaking, Revised Edition
Tasha Bleistein
No ratings yet
McNeill Dysphagia Therapy Program
No ratings yet
McNeill Dysphagia Therapy Program
5 pages
Teaching Vocabulary, Revised Edition
From Everand
Teaching Vocabulary, Revised Edition
Michael Lessard-Clouston
4/5 (1)
Blood Relationship As A Basis of Inheritance Under Islamic Law A Case Study of The Inner and Outer Circles of Family
No ratings yet
Blood Relationship As A Basis of Inheritance Under Islamic Law A Case Study of The Inner and Outer Circles of Family
206 pages
The Language of Male and Female Serial Killers
100% (2)
The Language of Male and Female Serial Killers
63 pages
Collaborative Writing in L2 Classrooms
From Everand
Collaborative Writing in L2 Classrooms
Neomy Storch
No ratings yet
Language Curriculum Design and Socialisation
From Everand
Language Curriculum Design and Socialisation
Peter Mickan
5/5 (3)
3M and Six Sigma
No ratings yet
3M and Six Sigma
21 pages
Chen, Y.H. and Baker, P., 2010. Lexical Bundles in L1 and L2 Academic Writing
No ratings yet
Chen, Y.H. and Baker, P., 2010. Lexical Bundles in L1 and L2 Academic Writing
20 pages
1 s2.0 S0889490603000851 Main
No ratings yet
1 s2.0 S0889490603000851 Main
27 pages
Teaching Lexical Bundles in The Disciplines: An Example From A Writing Intensive History Class
No ratings yet
Teaching Lexical Bundles in The Disciplines: An Example From A Writing Intensive History Class
16 pages
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
From Everand
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
Georgette Nicolas Jabbour
No ratings yet
Vocabulary Learning Strategies and Foreign Language Acquisition
From Everand
Vocabulary Learning Strategies and Foreign Language Acquisition
Višnja Pavičić Takač
2/5 (8)
The English Linguistics Project: English Manual (8th Edition)
From Everand
The English Linguistics Project: English Manual (8th Edition)
Jonathan Malicsi
5/5 (3)
amp058
No ratings yet
amp058
26 pages
Formulaic Language in Native and Second Language Speakers: Psycholinguistics, Corpus Linguistics, and TESOL
No ratings yet
Formulaic Language in Native and Second Language Speakers: Psycholinguistics, Corpus Linguistics, and TESOL
22 pages
Formulaic Language in Native And
No ratings yet
Formulaic Language in Native And
22 pages
Explorations of Language Transfer
From Everand
Explorations of Language Transfer
Terence Odlin
No ratings yet
Pronunciation in EFL Instruction: A Research-Based Approach
From Everand
Pronunciation in EFL Instruction: A Research-Based Approach
Jolanta Szpyra-Kozłowska
3.5/5 (2)
Vlach and Ellis, 2010 - An Academic Formulas List New Methods in Phraseology Research
No ratings yet
Vlach and Ellis, 2010 - An Academic Formulas List New Methods in Phraseology Research
27 pages
Investigating Tasks in Formal Language Learning
From Everand
Investigating Tasks in Formal Language Learning
María del Pilar García Mayo
No ratings yet
Insights into Task-Based Language Teaching
From Everand
Insights into Task-Based Language Teaching
Sima Khezrlou
No ratings yet
Using VOA Special English To Improve Advanced English Learners' Productive Use of High Frequency Words
No ratings yet
Using VOA Special English To Improve Advanced English Learners' Productive Use of High Frequency Words
7 pages
Brůhová and Vašků - 2021 - Lexical Bundles Ending in That in Academic Writing
No ratings yet
Brůhová and Vašků - 2021 - Lexical Bundles Ending in That in Academic Writing
19 pages
Transforming Practices for the English as a Foreign Language Classroom
From Everand
Transforming Practices for the English as a Foreign Language Classroom
Holly Hansen-Thomas
No ratings yet
A Corpus-Based Study On The Use of MAKE by Turkish EFL Learners
No ratings yet
A Corpus-Based Study On The Use of MAKE by Turkish EFL Learners
8 pages
Cross-linguistic Similarity in Foreign Language Learning
From Everand
Cross-linguistic Similarity in Foreign Language Learning
Håkan Ringbom
No ratings yet
2022 O'Flynn
No ratings yet
2022 O'Flynn
28 pages
Nama: Nur Mardiyani Fauziah Fuad. Semester: II. Final Test: Reading in Professional Context
No ratings yet
Nama: Nur Mardiyani Fauziah Fuad. Semester: II. Final Test: Reading in Professional Context
6 pages
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
1 s2.0 S1877042814025853 Main
No ratings yet
1 s2.0 S1877042814025853 Main
6 pages
Applied Linguistics: A Genre Analysis Of: Research Articles Results and Discussion Sections in Journals Published in Applied Linguistics
From Everand
Applied Linguistics: A Genre Analysis Of: Research Articles Results and Discussion Sections in Journals Published in Applied Linguistics
Veronica M. Mutinda
No ratings yet
Input for Instructed L2 Learners: The Relevance of Relevance
From Everand
Input for Instructed L2 Learners: The Relevance of Relevance
Anna Niżegorodcew
No ratings yet
Kazemi 2014
No ratings yet
Kazemi 2014
6 pages
(First Edition) Steve Hart - Expand Your English - A Guide To Improving Your Academic English-HKU Press (2017)
0% (2)
(First Edition) Steve Hart - Expand Your English - A Guide To Improving Your Academic English-HKU Press (2017)
242 pages
Fossilization in Adult Second Language Acquisition
From Everand
Fossilization in Adult Second Language Acquisition
ZhaoHong Han
No ratings yet
Strategies To Improve English Vocabulary and Spelling in The Classroom For ELL, ESL, EO and LD Students
No ratings yet
Strategies To Improve English Vocabulary and Spelling in The Classroom For ELL, ESL, EO and LD Students
17 pages
Too Chatty: Learner Academic Writing and Register Variation
No ratings yet
Too Chatty: Learner Academic Writing and Register Variation
21 pages
Japan's Built-in Lexicon of English-based Loanwords
From Everand
Japan's Built-in Lexicon of English-based Loanwords
Constance Brittain Bouchard
No ratings yet
Formulaic Language in Advanced Second Language Acquisition and Use
No ratings yet
Formulaic Language in Advanced Second Language Acquisition and Use
38 pages
Discovering Voice: Lessons to Teach Reading and Writing of Complex Text
From Everand
Discovering Voice: Lessons to Teach Reading and Writing of Complex Text
Nancy Dean
No ratings yet
Socializing Identities through Speech Style: Learners of Japanese as a Foreign Language
From Everand
Socializing Identities through Speech Style: Learners of Japanese as a Foreign Language
Haruko Minegishi Cook
No ratings yet
An Empirical Study of EFL Writing at Primary School
From Everand
An Empirical Study of EFL Writing at Primary School
Ruth Trüb
No ratings yet
Language Learning Environments: Spatial Perspectives on SLA
From Everand
Language Learning Environments: Spatial Perspectives on SLA
Phil Benson
No ratings yet
Akbulut - 2020 - A bibliometric analysis of lexical bundles usage i
No ratings yet
Akbulut - 2020 - A bibliometric analysis of lexical bundles usage i
21 pages
Language Teaching Research and Language Pedagogy
From Everand
Language Teaching Research and Language Pedagogy
Rod Ellis
No ratings yet
Manual of English Grammar and Composition
From Everand
Manual of English Grammar and Composition
J. Nesfield
2/5 (6)
Translingual Pedagogical Perspectives: Engaging Domestic and International Students in the Composition Classroom
From Everand
Translingual Pedagogical Perspectives: Engaging Domestic and International Students in the Composition Classroom
Julia Kiernan
No ratings yet
Teaching English: Linguistics and Literature Combined
From Everand
Teaching English: Linguistics and Literature Combined
Yogendra Butt
No ratings yet
An English Grammar
From Everand
An English Grammar
William Malone Baskervill
1/5 (3)
2 Hatami (2015) Formulaic Sequeces in The ESL Classroom
No ratings yet
2 Hatami (2015) Formulaic Sequeces in The ESL Classroom
7 pages
A Comparative Analysis of Lexical Bundles Used by Native and Non-Native Scholars
No ratings yet
A Comparative Analysis of Lexical Bundles Used by Native and Non-Native Scholars
13 pages
Aziz, S., 2022. Use of Lexical Bundles in Academic Writing in English by Expert Writers, Native Students, And Non-native Students in Applied Linguistics (Doctoral Dissertation, University of Essex)
No ratings yet
Aziz, S., 2022. Use of Lexical Bundles in Academic Writing in English by Expert Writers, Native Students, And Non-native Students in Applied Linguistics (Doctoral Dissertation, University of Essex)
431 pages
Teaching English for Academic Purposes
From Everand
Teaching English for Academic Purposes
Ilka Kostka
No ratings yet
High Frequency Collocations
100% (1)
High Frequency Collocations
349 pages
Formulaic_Sequences_in_First_Language_Acquisition_
No ratings yet
Formulaic_Sequences_in_First_Language_Acquisition_
8 pages
Teaching Collocation
From Everand
Teaching Collocation
Carol Nelson
No ratings yet
Karabacak 2013
No ratings yet
Karabacak 2013
7 pages
English Project Design: Exploratory Practice: Researching The Impact of Songs On Efl Learners' Verbal Memory
No ratings yet
English Project Design: Exploratory Practice: Researching The Impact of Songs On Efl Learners' Verbal Memory
26 pages
Second Language Pronunciation Assessment: Interdisciplinary Perspectives
From Everand
Second Language Pronunciation Assessment: Interdisciplinary Perspectives
Talia Isaacs
No ratings yet
Salvestad, Kap2
No ratings yet
Salvestad, Kap2
20 pages
English Linguistics
From Everand
English Linguistics
Christian Mair
No ratings yet
Folse 2011 TQ
No ratings yet
Folse 2011 TQ
9 pages
Stylistics Handbook
0% (1)
Stylistics Handbook
241 pages
Phonetics of Ambiguity - Irony
No ratings yet
Phonetics of Ambiguity - Irony
27 pages
Raf Analysis Texts
No ratings yet
Raf Analysis Texts
24 pages
Attributing Authorship An Introduction - (CHAPTER ONE Individuality and Sameness)
No ratings yet
Attributing Authorship An Introduction - (CHAPTER ONE Individuality and Sameness)
10 pages
Saraceni Mario World Englishes A Critical Analysis
No ratings yet
Saraceni Mario World Englishes A Critical Analysis
5 pages
Prosodic Profiles: Suspects' Speech During Police Interviews
No ratings yet
Prosodic Profiles: Suspects' Speech During Police Interviews
236 pages
Psycho-Linguistic Forensic Analysis of Internet Text Data
No ratings yet
Psycho-Linguistic Forensic Analysis of Internet Text Data
125 pages
Stylistics Versus Statistics
No ratings yet
Stylistics Versus Statistics
299 pages
Humss 12 Tanglaw Diwa Ecr Sy 2021 2022
No ratings yet
Humss 12 Tanglaw Diwa Ecr Sy 2021 2022
8 pages
Chapter 4. Translational Equilibrium and Friction
No ratings yet
Chapter 4. Translational Equilibrium and Friction
15 pages
Newspaper Project Rubric
No ratings yet
Newspaper Project Rubric
1 page
Evan Mcdermott Resume2
No ratings yet
Evan Mcdermott Resume2
1 page
International Marketing Strategy
No ratings yet
International Marketing Strategy
2 pages
Coaching Skills For Leaders in The Workpla - Jackie Arnold PDF
No ratings yet
Coaching Skills For Leaders in The Workpla - Jackie Arnold PDF
241 pages
A Quantitative Correlational Research Into What Extent Has Technology Affected Communications
No ratings yet
A Quantitative Correlational Research Into What Extent Has Technology Affected Communications
79 pages
An Evaluation of Income Generating Projects in Public Secondary Schools in Nairobi County
No ratings yet
An Evaluation of Income Generating Projects in Public Secondary Schools in Nairobi County
11 pages
Five Disciplines of The Learning Organization
No ratings yet
Five Disciplines of The Learning Organization
2 pages
Erasmus+ International Mobility With Partner Countries (KA107) Application Form For Staff (Sta / STT)
No ratings yet
Erasmus+ International Mobility With Partner Countries (KA107) Application Form For Staff (Sta / STT)
2 pages
Prospectus Jjtu
No ratings yet
Prospectus Jjtu
84 pages
Automation - Exhibitor Data
No ratings yet
Automation - Exhibitor Data
9 pages
Nov 2023 QP
No ratings yet
Nov 2023 QP
26 pages
Dzexams Docs 4am 903955
No ratings yet
Dzexams Docs 4am 903955
4 pages
WHLP Contemporary - Week 5
No ratings yet
WHLP Contemporary - Week 5
2 pages
Nippu Jha CV
No ratings yet
Nippu Jha CV
2 pages
Delivering An Effective Presentation
No ratings yet
Delivering An Effective Presentation
4 pages
Software Development Life Cycle
No ratings yet
Software Development Life Cycle
70 pages
P-3, Unit-1, Part-1, Need For Assessment
No ratings yet
P-3, Unit-1, Part-1, Need For Assessment
4 pages
Anthropology Presentation
No ratings yet
Anthropology Presentation
38 pages
Every Filipino Child A Reader
No ratings yet
Every Filipino Child A Reader
24 pages
Brief - Assessment Point 1
No ratings yet
Brief - Assessment Point 1
9 pages
Word Dissertation How Many Pages
100% (1)
Word Dissertation How Many Pages
7 pages
(Lao) Asynch Activity No. 5
No ratings yet
(Lao) Asynch Activity No. 5
2 pages
Occupancy
No ratings yet
Occupancy
31 pages
DLL in Physics Dec. 10 15 2023
No ratings yet
DLL in Physics Dec. 10 15 2023
9 pages
11-16 Math Lesson Plan Perimeter
No ratings yet
11-16 Math Lesson Plan Perimeter
6 pages

Recurrent Word Combinations in Academic Writing by Native and Non-Native

Uploaded by

Recurrent Word Combinations in Academic Writing by Native and Non-Native

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Recurrent word combinations in academic writing by native and non-native

Article in English for Specific Purposes · April 2012

Annelie Ädel Britt Erman

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Contents lists available at SciVerse ScienceDirect

English for Speciﬁc Purposes

Recurrent word combinations in academic writing by native

⇑ Corresponding author. Tel.: +46 8 163613; fax: +46 8 159667.

1.1. Lexical bundles

1.2. Overview of the present study

2. Material and method

2.1. The corpus material

Student level Texts Words Average text length

Student level Texts Words Average text length

2.2. Analytical steps

3.1. Overall comparison

3.1.1. Bundles shared

3.1.2. Bundles not shared

(1) this can be seen

(2) there appears to be

3.2. Comparison to Chen and Baker (2010)

3.2.1. Bundles shared and not shared

(5) at the same time

Non-native bundles Native bundles

3.3. Functional classiﬁcation

Fig. 1. The subcategories of the functional classiﬁcation.

Fig. 2. The distribution of discourse functions (types).

Appendix A. Supplementary material

Ädel, A. (2006). Metadiscourse in L1 and L2 English. Amsterdam: John Benjamins.

View publication stats

You might also like