Data-Driven Learning: Reasonable Fears and Rational Reassurance
Data-Driven Learning: Reasonable Fears and Rational Reassurance
ABSTRACT
1. BACKGROUND
There has been continual interest in foreign language pedagogy since time
immemorial, but the last 50 years or so has seen particular creativity and
diversity as practitioners seek more efficient ways to go about it. Most
remarkable perhaps were the “designer methods” (Brown et al. 2007: 9) of
the 1970s, such as Suggestopedia, the Silent Way or Total Physical
Response. Their limited adoption world-wide is perhaps partly due to
dogmatic adherence to ideology which remains impervious to evidence or
experimentation, and insufficiently able to adapt to local cultures. Indeed,
their existence has left a certain wariness towards any claim of “revolution”
or “panacea” in the field. The most successful recent methodology globally
has undoubtedly been the very broad church of the communicative
approach (CA). While this implied a fundamental rethink of certain
underpinnings, it has remained highly eclectic, retaining or adapting many
existing tried and tested practices. This makes CA hard to pin down
(Hadley 2002), and many would be hard put to see the “communicative”
nature of many self-proclaimed teachers, materials and practices.
82 ALEX BOULTON
software merely replaces the teacher with an even more rigid guide; it
would not be out of place to replace the word teacher with the word
computer in the quotation from Willis above. The perpetual question
with new technologies is whether we are genuinely doing new things, or
merely rehashing old things in new ways (cf. Noss & Pachler 1999); or
as Higgins and Johns (1984: 10) put it: “the usual reaction from
language teachers is that [CALL materials] contain nothing which
cannot be done already with pencil and paper, and that the gains… do
not justify the expense and trouble.” Sadly, the observation remains
relevant 25 years later.
One particular use of ICT which claims to focus on learning rather
than teaching is data-driven learning (DDL), to use the expression
coined by Tim Johns. He summarises it as “the attempt to cut out the
middleman as far as possible and to give the learner direct access to the
data” (1991b: 30). The “middleman” refers of course to the teacher, but
the computer is not seen as “a surrogate teacher or tutor, but as a rather
special type of informant” (Johns 1991a: 1). DDL typically involves
exposing learners to large quantities of authentic data – the electronic
corpus – so that they can play an active role in exploring the language
and detecting patterns in it. They are at the centre of the process, taking
increased responsibility for their own learning rather than being taught
rules in a more passive mode. Although many of the basic concepts are
widespread in CA (learner-centred, discovery learning, autonomisation,
authentic language, etc.), DDL nonetheless strikes many as quite
revolutionary, and therefore to be treated with caution.
The aim of the present article is to demystify or demythologise
DDL, to examine a number of objections or fears that potentially
interested parties may have. Some are cited by hostile sceptics (e.g.
DATA-DRIVEN LEARNING 83
of DDL, most notably perhaps Kübler (in press), Hidalgo, Quereda and
Santana (2007), Sinclair (2004), Aston (2001), and Burnard and McEnery
(2000). There is as yet no general manual devoted to DDL (the absence in
itself highlights the recent and innovative nature of DDL, and the lack of
instant recipes the responsiveness to local cultures); a number do however
include sections on DDL alongside other applications of corpora in
language teaching and learning not covered here (e.g. the use of learner
corpora, syllabus and materials design, etc.), recently including O’Keeffe,
McCarthy and Carter (2007), Adolphs (2006), Gavioli (2005) and
Hunston (2002). The bibliography of the present article also contains a
number of key references in the field.
2. LEARNING
a specific corpus can help learners to identify the parts of the language
which are relevant to them, to work on the forms frequently used in the
registers and text types they need.
Rule-based learning is extremely demanding – one reason perhaps
why it is so beloved in traditional educational environments as a serious
intellectual activity. Teachers find rules comforting and reassuring, easier
to present and to test, but a false comfort nonetheless. A large literature,
most recently in evolutionary psychology (e.g. Cosmides & Tooby 1992),
demonstrates how and why human beings have evolved to be good at
noticing regularities in nature, interpreting them and extrapolating to
other cases, the very processes which DDL brings to the fore (Scott &
Tribble 2006: 6). Learners can be surprisingly capable: they may not
always be accurate in their conclusions, but neither are rules generally
assimilated completely and accurately at first go – all learning is a
process of gradual approximation to the target (Aston 2001: 13).
Furthermore, it has frequently been observed that learners’ observations
are more accurate and complete than traditional grammar rules; at the
very least, their inferences are likely to be relevant and comprehensible to
them. After all, as Gaskell and Cobb (2004: 304) remind us, foreign
languages are mainly learned “through enormous amounts of brute
practice in mapping meanings and situations to words and structures.
These mappings… lead over a very large number of episodes… to the
slow extraction of patterns that are rarely articulated.” Such a picture of
massive exposure is virtually impossible for most students, whose main
contact with the target language is in the classroom in their L1
environment. And this is of course precisely the advantage of DDL, as it
provides opportunities for substantial amounts of targeted practice on
selected items which otherwise would only be met on occasion or through
invented and impoverished contexts.
DATA-DRIVEN LEARNING 85
3. LEARNERS
compatible with [DDL]” (Cook 1998: 60); but it would seem ethically
dubious to deny learners the opportunity even to try a potentially useful
set of tools and skills on the assumption that they will all adhere to the
precepts of that culture.
Very little is known about the types of learners who take most
readily to DDL or extract most benefit from it. One of the few to
venture an idea is Flowerdew (2008: 117), who notes that it:
It is important to bear in mind that learning styles are not static, but are
subject to change along with the various learning experiences.
Cresswell (2007: 279) takes this to suggest that learners who are
reticent may be won over by a gentle introduction via teacher-mediated
paper-based materials to check rules (what he calls “deductive DDL”)
rather than full-blown autonomous, hands-on “inductive DDL.”
Considerable research is needed before any definite conclusions
can be reached. This becomes particularly apparent as increasing
quantities of empirical research are starting to question the traditional
assumption that DDL is only useful for advanced, sophisticated, adult
learners. The vast majority of published research unsurprisingly
concentrates on this type of public as they are to be found in the
researchers’ own university environments, going right back to Johns
(1986: 161), who was working with:
88 ALEX BOULTON
But this does not preclude others: his following sentence points out that
“it remains to be seen how far the ‘research methodology’ outlined above
would be suitable for other learners.” It also depends greatly on the
activities assigned: DDL is not an all-or-nothing affair, and teachers
should not be put off if they feel their learners are not up to the hands-on
serendipitous learning reported in many papers (Mukherjee 2006: 14).
Teachers who are wary of losing too much control may find inspiration in
hal-00326990, version 2 - 19 Jun 2009
relevance “must depend on whether learners can make it real” (2000: 7).
This is not a new issue: Johns (1988: 10) argued that:
text… and the learner’s engagement with text should play a central role in
the learning process. In that engagement, a key concept is that of
authenticity, viewed from three points of view – authenticity of script,
authenticity of purpose, and authenticity of activity.
More recently, Braun (2005: 53) agrees that “real-language texts... are
only useful insofar as the learner is able to authenticate them, i.e. to
create a relationship to the texts,” but this can be achieved in several
ways. She herself suggests using multi-modal corpora; another
possibility is to use small corpora (e.g. Aston 1997), especially in ESP
contexts (Gavioli 2005), or corpora of learners’ textbooks (e.g.
hal-00326990, version 2 - 19 Jun 2009
4. TEACHERS
From the teacher’s point of view, if DDL has yet to make real inroads
to mainstream teaching practices and environments, the problem could
lie at any one of three stages: a) teachers might not know about DDL; b)
they might know but be unwilling or unable to put it into practice; c)
they might try it and then reject it. The major problem rests perhaps
with the very first stage: DDL has simply not yet penetrated the
consciousness of the teaching profession world-wide. For example, a
recent survey among nearly 250 high school teachers in Germany found
that approximately 80% were entirely unaware of corpus applications in
language learning (Mukherjee 2004). In Britain, questionnaires sent to
92 ALEX BOULTON
The instructor [plays] a more Socratic role, posing questions and guiding
the learning process, rather than taking an ecclesiastical approach,
hal-00326990, version 2 - 19 Jun 2009
5. RESOURCES
with dedicated software (Xaira), but such uses tend to favour research
rather than learning applications. As with the BoE, there is an official
website which allows a number of interesting queries3; more useful for
learning purposes perhaps is the interface4 created by Davies at
Brigham Young University. Davies has also created the useful Time
corpus: the entire collection of Time magazine from 1923 to 2006,
searchable by date. A recent addition is the 360-million-word Corpus of
Contemporary American English, compiled directly from the Internet
and updated twice yearly. The disadvantage of such automatic
collection is that it tends to include more background noise than in
corpora such as the BNC, and may not be as representative of
spontaneous speech in particular.
Some of these large corpora have been marked up to help with part-
of-speech queries, and are searchable by genre or text type, comparing
for example speech and writing, or legal and journalistic English – all
highly desirable for teaching purposes. There are also a number of
specialised corpora, especially in the fields of academic English; these
include the Michigan Corpus of Academic Spoken English (MICASE),
also with an on-line interface,5 and corpora of British Academic Written
English (BAWE) and British Academic Spoken English (BASE).6 Use
may be found for parallel corpora, where texts exist alongside their
translations in one or more languages. One commonly used is EuroParl,
contrasting 11 languages of the European Union; available for
download7 or for on-line searches.8
British and American English unsurprisingly dominate, especially
in the public domain. Where other varieties (or indeed other languages)
are required, an alternative is to use the Internet as a corpus itself.
Search engines such as Google are not without their appeal, but are not
96 ALEX BOULTON
ideal as they are intended for content rather than form-based searches:
this limits the kind of query that can be formulated and, just as
importantly, the presentation of the results. Other tools have been
developed specifically to exploit the web as corpus. WebCorp9 can
produce concordances as the output format, and restrict searches by
date, textual domains, to British or American newspapers, and so on.
Rather faster is Fletcher’s WebConcordancer10 software for direct
searches in 34 languages. He is also in the process of compiling the
very large (one billion word) Web Corpus of English from the Internet.
WebBootCat (available with SketchEngine11 in a free 30-day trial)
allows the user to “seed” the Internet with specific search terms; it then
automatically trawls the web for documents which contain all of these
to create an “instant corpus.”
The web-as-corpus approach is notoriously messy, and many prefer
hal-00326990, version 2 - 19 Jun 2009
home. DDL and corpora have even been used successfully in distance
education (e.g. Boulton [in press]; Collins 2000), although guidance is
essential. An in-class alternative is to have a single focal point. J. Willis
(1998) describes a series of activities using concordances on the
blackboard; an overhead projector or a slide presentation is probably
more practical in most cases (e.g. Estling Vannestål & Lindquist 2007).
The teacher may also use a single computer and projector to
demonstrate techniques and answer questions reactively (e.g. Tribble
2007). Where a small number of computers are available, students may
work in pairs or small groups; not only is the collaborative aspect
motivating for many, but pairing linguistically advanced learners with
more ICT-literate partners may prove particularly fruitful, ensuring
opportunities for each to contribute in their own way.
hal-00326990, version 2 - 19 Jun 2009
6. CONCLUSION
These are an investment for the future – our learners’ and our own: as
Conrad (1999: 3) puts it, “practising teachers and teachers-in-training...
owe it to their students” – and also, ultimately, to themselves.
NOTES
1. <http://eltadvantage.ed2go.com/cgi-bin/eltadvantage/oic/newcrsdes.cgi?
course=3ce&name=eltadvantage&departmentnum=EL>.
2. <http://www.collins.co.uk/Corpus/CorpusSearch.aspx>.
3. <http://sara.natcorp.ox.ac.uk/lookup.html>.
4. <http://corpus.byu.edu/bnc/x.asp>.
5. <http://quod.lib.umich.edu/m/micase/>.
6. <http://www2.warwick.ac.uk/fac/soc/al/research/projects/resources/>.
7. <http://www.statmt.org/europarl/>.
8. <http://www.let.rug.nl/tiedeman/OPUS/lex.php>.
9. <http://www.webcorp.org.uk/wcadvanced.html>.
10. <http://webascorpus.org/>.
11. <http://www.sketchengine.co.uk/>.
12. <http://www.lexically.net/wordsmith>.
13. <http://www.antlab.sci.waseda.ac.jp/software.html>.
14. <http://www.lextutor.ca/>.
15. <http://www.eisu2.bham.ac.uk/johnstf/timconc.htm>.
16. <http://www.eisu2.bham.ac.uk/johnstf/timeap3.htm#revision>.
17. <http://www.corpuslab.com/>.
18. <http://www.geocities.com/tonypgnews/units_index_pilot.htm>.
19. <http://www.vxu.se/hum/utb/amnen/engelska/kig/>.
REFERENCES
Scott, M. & Tribble, C. 2006. Textual Patterns: Key Words and Corpus
Analysis in Language Education. Amsterdam: John Benjamins.
Sealey, A. & Thompson, P. 2004. What do you call the dull words? Primary
school children using corpus-based approaches to learn about language.
English in Education, 38/1, 80-91.
Seidlhofer, B. 2000. Operationalizing intertextuality: Using learner corpora for
learning. In L. Burnard & T. McEnery (Eds.), Rethinking Language
Pedagogy from a Corpus Perspective (pp. 207-223). Frankfurt: Peter Lang.
——. 2002. Pedagogy and local learner corpora: Working with learner-driven
data. In S. Granger, J. Hung & S. Petch-Tyson (Eds.), Computer Learner
Corpora, Second Language Acquisition and Foreign Language Teaching
(pp. 213-234). Amsterdam: John Benjamins.
Sinclair, J. 2003. Reading Concordances: An Introduction. Harlow: Longman.
——. (ed.) 2004. How to Use Corpora in Language Teaching. Amsterdam:
John Benjamins.
Stevens, V. 1991. Concordance-based vocabulary exercises: A viable
alternative to gap-filling. In T. Johns & P. King (Eds.), Classroom
Concordancing. English Language Research Journal, 4, 47-61.
Stubbs, M. 2001. Texts, corpora, and problems of interpretation: A response to
Widdowson. Applied Linguistics, 22, 149-172.
Sun, Y-C. 2003. Learning process, strategies and web-based concordancers:
A case-study. British Journal of Educational Technology, 34/5, 601-613.
——. & Wang, L-Y. 2003. Concordancers in the EFL classroom: Cognitive
approaches and collocation difficulty. Computer Assisted Language
Learning, 16/1, 83-94.
Tan, M. 2003. Language corpora for language teachers. Journal of Language and
Learning, 1/2, 98-105. Available online: <http://www.shakespeare.uk.net/
journal/jllearn/1_2/tan1.html>.
Thomas, J. 2002. A Ten-Step Introduction to Concordancing through the
Collins COBUILD Corpus Concordance Sampler. Brno: Masaryk
106 ALEX BOULTON
——. & Jones, G. 1997. Concordances in the Classroom. 2nd ed. Houston:
Athelstan.
Tsui, A. 2005. ESL teachers’ questions and corpus evidence. International
Journal of Corpus Linguistics, 10/3, 335-356.
Whistle, J. 1999. Concordancing with students using an ‘off-the-web’ corpus.
ReCALL, 11/2, 74-80.
Wible, D., Chien, F., Kuo, C-H. & Wang, C. 2002. Toward automating a
personalized concordancer for data-driven learning: A lexical difficulty filter
for language learners. In B. Kettemann, & G. Marko (Eds.), Teaching and
Learning by Doing Corpus Analysis (pp. 147-154). Amsterdam: Rodopi.
Widdowson, H. 2000. On the limitations of linguistics applied. Applied
Linguistics, 21/1, 3-25.
Willis, D. 2003. Rules, Patterns and Words. Cambridge: Cambridge University Press.
Willis, J. 1998. Concordances in the classroom without a computer. In
B. Tomlinson (Ed.), Materials Development in Language Teaching
(pp. 44-66). Cambridge: Cambridge University Press.
Yoon, H. & Hirvela, A. 2004. ESL student attitudes toward corpus use in L2.
Journal of Second Language Writing, 13/4, 257-283.
ALEX BOULTON
CRAPEL–ATILF/CNRS, NANCY UNIVERSITY, FRANCE.
E-MAIL: <[email protected]>