Corpora of Different Kinds Can Be Used For Different Purposein Translation
Corpora of Different Kinds Can Be Used For Different Purposein Translation
Studies. For example, parallel corpora are useful in exploring how an idea in
one language is conveyed in another language, thus providing indirect evidence
to the study of translation processes. Corpora of this kind are indispensable for
building statistical or example-based machine translation (EBMT) systems, and for
the development of bilingual lexicons and translation memories.Also, parallel
concordancing is a useful tool for translators.
Comparable corpora are useful in improving the translator's underrstanding of
the subject field and improving the quality of translation in terms offluency,
correct choice of term and idiomatic expressions in the chosenfield. They can also
be used to build terminology banks.
Translational corpora provide primary evidence in product-oriented Transla
tion Studies (see Section 14.3.2.1), and in studies of translation universals (see
Section 14.3.3). If corpora of this kind are encoded with socifolinguistic and
cultural parameters, they can also be used to study the socio-cultural environment
of translations (see Section 14.3.2.3).
Even monolingual corpora of source language and target languageare of great
value in Translation Studies because they can raise the translatepr's linguistic and
cultural awareness in general and provide a useful and effective reference tool for translators
and trainees. They can also be used in combination wwith a parallel
corpus to form a so-called translation evaluation corpus that helpstranslator train-
ers or critics to evaluate translations more effectively and objectively.
This section explores the state of the art of corpus-based Translation Studies on
the Holmes-Toury map, that is, applied TS, descriptive TS and theoreticalTS. On the applied TS
front, three major contributions of corpora include corpus
assisted translating, corpus-aided translation teaching and training, and develop
ment of translation tools. An increasing number of studies have demonstrated
the value of corpora, corpus linguistic techniques and tools in assisting transla
tion production, translator training and translation evaluation.For example,
Bernardini (1997) suggests that 'large corpora concordancing' (ICC) can help
students to develop 'awareness', 'reflectiveness' and 'resourcefulness', which are
said to be the skills that distinguish a translator from those unskilled amateurs.
Bowker (1998: 631) observes that corpus-assisted translations are ofa higher qual
ity with respect to subject field understanding, correct term choiceand idiomatic
expressions'. Zanettin (1998) shows that corpora help trainee translators become
aware of general patterns and preferred ways of expressing things in the target
language, get better comprehension of source language textsand improve pro
duction skills; Aston (1999) demonstrates how the use of corpora cain enable trans
lators to produce more native-like interpretations and strategies iin source and
target texts respectively; according to Bowker (2001), an evaluationcorpus, which
is composed of a parallel corpus and comparable corpora of source and target
languages, can help translator trainers to evaluate student translations and
provide more objective feedback; Bernardini (20026), Hansen andTeich (2002)
and Tagnin (2002) show that the use of a multilingual concordancer in conjunc
tion with parallel corpora can help students with 'a range of translation-related
tasks, such as identifying more appropriate target language equivalents and collo
cations, identifying the norms, stylistic preferences and discourse structures associ
ated with different text types, and uncovering important conceptual information
(Bowker and Barlow 2004: 74); Bernardini and Zanettin (2004:650) suggest that
corpora be used in order to 'provide a framework within which textual and linguis
tic features of translation can be evaluated'. Finally, Vintar (2007)reporefforts
to build Slovene corpora for translator training and practice.
Corpora, and especially aligned parallel corpora, are essential for the deevelop
ment of translation technology such as machine translation (MT) systems, and
computer-aided translation (CAT) tools. An MT system is designed to translate
without or with minimal human intervention. MT systems haave become more
reliable since the methodological shift in the 1990s from rule-baseed to text-based
algorithms which are enhanced by statistical models trained usingcorpus data.
Parallel corpora can be said to play an essential role in developingexample-based
and statistical MT systems. Well-known MT systems include exanaples such as Systran,
Babelfish, World Lingo and Google Translation. MT systemss like these are
mainly used in translation of domain-specific and controlled langguage, automated
'gisting' of online contents, translation of corporate communications, and locat
ing text or fragments requiring human translation. CAT toolsare designed to
assist in human translation. There are three major types of CAT tools. The most
important type are translation memory and terminology management tools which
can be used to create, manage and access translation memories (TMs) and term
bases. They can also suggest translation candidates intelligently iin the process
of translation. A second type are localization tools, which are ableto distinguish
program codes or tags from the texts to be translated (e.g. menus,buttons, error
messages etc.), or even better, turn program codes or tags into what a program or
webpage really looks like. Another type of tool is used in audicvisual translation
(e.g. subtitling, dubbing and voice-over). Major products of CAT tools include
SDL Trados, Deja Vu, Transit, and Wordfast for TM and terminology tools, Catalyst
for software localization, Trados TagEditor for webpage translation, and WinCap
for subtitling. CAT tools have brought translation into the industtial age, but they
are useless unless translated units and terminologies have beenstored in transla.
tion memories and termbases. This is where corpora come into the picture.
of translation per se. It answers the question of why a tranislator translates in this
way' instead of dealing with the problem of 'how to translate' (HoImes 1972/1988)
harmony with DTS. Baker (1993: 243) predicted that the availability of large cor
pora of both source and translated texts, together with the development of the
Corpus-based DTS has revealed its full vitality over the past deccade, which will
be reviewed in this section in terms of its three foci: translationas a product, trans
lation as a process and the function of translation (Holmes 1972/1988).
texts in the target language, especially translated and native English. TThe majority
which was built by Mona Baker and colleagues at the Universityof Manchester.
The TEC corpus, which was designed specifically for the purposesof studying
full texts from four genres (fiction, biography, newspaper articles and in-flight
information about translators, source texts and publishing dates is annotated and
stored in the header section of each text. A subcorpus of original English was
specifically selected and is being modified from the BNC to match the TEC in
Presently, the TEC is perhaps the only publicly available corpus of translational
have been based on this corpus, which have so far focused on syntactic and lexical
features of translated and original texts of English. They have porovided evidence
to native English (as represented by the BNC), finding that translational language
has four core patterns of lexical use: a relatively lower proportion of lexical words
(2000)comparison of concordances from the TEC and the BNC sshows that the
thatconnective with reporting werbs say and tell is far more frequent in t
English. These results provide strong evidence for syntactic explicitation in trans
information used to fill in knowledge gaps between source text and target text
quite, rather, pretty and fairly in translated versus native English fictiin an attempt
to uncover the relation between collocation and moderation, finading that pretty
and rather, and more marginally quite, are considerably less frequeent in the TEC-fic
tion subcorpus, but when they are used, there is usually more variatioin usage,
tional language. For example, Kanter et al. (2006) identify new universais charac
terizing the mutual overlaps between native English and translated Esnglish on the
basis of Zipf's Law (Zipf 1949). Øveras (1998) explores the relationaship between
how a collocational clash in the source text is translated using a cconventional com
bination in the target language. Kenny (2001) studies the relationsship between
Italian texts and translated articles from a geopolitics journal, concluding that:
guage. Yet the two differ in what they tend to repeat: translatioins show a ten
the more usual lexicalized collocations in the language. (Baroni and Bermardini
2003:379)
ual styles of translators. One such corpus is the Hong Lou MengParallel Corpus,
classic Chinese novel Hong Lou Meng 'A Dream of Red Chamber'.
Process-oriented DTS aims at revealing the thought processes that take place in
the mind of the translator while she or he is translating. While it itdifficult to study
those processes on-line, one possible way for corpus-based DTSto proceed is to
Think-Aloud Protocols (or TAPs, see Bernardini 2002c). However, the process
between corpus linguistics and geology, both assuming a rrelation between process
and product. A geologist is interested in geological processes, whiech are not directly
(2001 a: 154) agues, 'By and large, the processes are invisible, andmust be inferred
from the products.' The same can be said of translation: Translation as a product
studies can be approached on the basis of corpus data. Process-ordented studies are
typically based on parallel corpora by comparing source and taarget texts while
by comparing translated target language and native target languaage. For example,
draft versions of translation have allowed him not only to rejec:t Toury's (1995)
land China, plus a comparable component of native Chinese texts asthe reference
Chen compares translational and native Chinese texts to find out vwhether connec
tives are significantly more common in the first type of texts in termsof parameters
such as frequency and type-token ratio, as well as statistically deffned common con
examines whether syntactic patterning in the translated texts is diffecrent from native
texts via a case study of the five TDCs that are most statistically signconsection and
English source texts, through a study of the same five TDCs, in an attempt to deter
mine the extent to which connectives in translated Chinese texts are carried over
from the English source texts, or in other words, the extent to whichconnectives
are explicitated in translational Chinese. Both parts of his study support the hypo
cultural context of the target language, thus leading to the 'studyof contexts rather
studies that are corpus-based, possibly because the marriage between corpora and
this type of research, just like corpus-based discourse analysis (e.g. B"aker 2006), is"
One such study is Laviosa (2000), which is concerned with the lexicogramnati-
cal analysis of five semantically related words (i.e. Europe, European, Ettropean Union,
Union and EU) in the TEC corpus. These words are frequently used in translated
newspaper articles and can be considered as what Stubbs (1996, 2)0016) calls cul
translational English, Laviosa (2000) suggests that it is possible tocarry out com
parative analyses between Europe and other lemmas of cultural kteywords such as
Britain and British, France and French, and Italy and Italian, and so on, which may
translated texts.
the TEC corpus, three aspects of linguistic patterning in the works of two British
literary translators, that is, average sentence length, type/token rattio, and indirect
speech with the typical reporting verbs such as say. The results indicate that the
two translators differ in terms of their choices of source texts and intended reader
ship for the translated works. One translator is found to prefer works targeting a
highly educated readership with an elaborate narrative which creates a world of intellectually
sophisticated characters. In contrast, the otherchoosesto translate
texts for an ordinary readership, which are less elaborate inn narrative and con
cerned with emotions. These findings allow Baker (2000) to draw the conclusion
to elaborate the kind of text world that each translatorr has chosen to recreate in
and interaction between the characters than a 'page trans!ation'. She used an
analytical tool that would not only enable her to quantify linguthe provide and and
English. This type of investigation allows her to validate her assumptions that
ogy, syntax, lexis and register of Zulu brought about by translationworks. She
compares the 1959 and 1986 translations of the Book of Matthew into Zulu in a
translational corpus in order to research the role played by Bible translation in the
growth and development of written Zulu in the context of South Africa. She finds
that Toury's (1980) concept of the initial norm (i.e. the socio-cultural constraints)
'scems to have guided the translators of these translations in their selection of the
options at their disposal' (Masubelele 2004: 201). The study shows 'ain inclination
towards the target norms and culture' - while translators of the 1959 version
adopted source text norms and culture, the translators of the 1986 version adopted
elaborates principles, theories and models to explain and predict what the process
lations cannot possibly avoid the effect of translationese (cf. Hatrtmann 1985; Baker
Wilson 2001: 71-2; McEnery and Xiao 2002, 2007). The conceptof TUs is first
proposed by Baker (1993), who suggests that all translations are likely to show
certain linguistic characteristics simply by virtue of being translations, which are caused in and
by the process of translation. The effect of the source langguage on
best an unrepresentative special variant of the target languagge (McEnery and Xiao
comparing translations with comparable native texts, thus throwingg new light on
the translation process and helping to uncover translation norms, owhat Frawley
Over the past decade, TUs have been an important area of researchas well as a
target of debate in DTS. Some scholars (e.g. Tymoczko 1998) arguethat the very
(e.g. Toury 2004) advocate that the chief value of general laws of translation lies
in their explanatory power; still others (e.g. Chesterman 2004) acctept universals
differentiates between two types of TUs: one relates to the prrocess from the source
to the target text (what he calls 'S-universals'), while the other ("T-universals"
comprehensive review of TUs, suggests that the discussions of TUs follow the
elling out (or convergence). Other TUs that have been investigated iinclude under
While individual studies have sometimes investigated moire than one of these
features, they are discussed in the following sections separattely for the purpose of
this presentation.
14.3.3.1 Explicitation
evidence from individual sample texts showing thhat translators tend to make
explicit optional cohesive markers in the target text even thhough they are absent
in the source text. It relates to the tendency in translationis to 'spell things out
rather than leave them implicit' (Baker 1996: 180). Explicitation can be realized
in translated texts than in non-translated texts (see Section 14.4.2.3 for further
discussion), and additions providing extra information essential for a target cul
ture reader, and thus resulting in longer text than the non-translated text. Another
(Chesterman 2004), explicitation would seem to fall most naturally into the S-type.
grammar. While explicitation is found at various linguistic levels ranging from lexis
to syntax and textual organization, 'there is variation even in thesse results, which
has largely come from translational English and related European languages
14.3.3.2 Simplification
used in translation' (Baker 1996: 181-2), which means that translational language
Olohan and Baker (2000) have provided evidence for lexical andsyntactic simpli
simplified stylistically. For example, Malmkjaer (1997) notes that itn translations,
punctuation usually becomes stronger; for example commas are often replaced
with semicolons or full stops while semicolons are replaced with full stop
complexity for easier reading. Nevertheless, as we will see in Section 14.4.2.1, this
tions, evidence produced in early studies that support the simplification hypothe
sis is patchy and not always coherent. Such studies are based on different datasets
and are carried out to address different research questions, and thhus cannot be
compared.
14.3.3.3 Normalization
Mauranen 2007), refers to the 'tendency to exaggerate features of the target lan
guage and to conform to its typical patterns' (Baker 1996: 183). Asa result, trans
lational language appears to be 'more normal' than the target language. Typical
the target language, and the treatment of the different dialects used by certain
Kenny (1998, 1999, 2000, 2001) presents a series of studies of how urnusual and
marked compounds and collocations in German literary texts are translated into English, in an
attempt to assess whether they are normalized by mheans of more
conventional use. Her research suggests that certain trainslators may be more
to lexis in the source text. Nevalainen (2005; in Mauranen 2007: 41)suggests that
clusters.
and structures often occur which are rarely, or perhaps even neverencountered in
translated texts, also shows that 'translations are not readily distinguisfrom
translated texts are "somewhat "sanitized" versions of the original' (Kenny 1998:
515). Another translational universal that has been proposed isthe so-called fea
ture of 'leveling out', that is, 'the tendency of translated text togravitate towards
texts with regard to their own scores on given measures of universal features' that
concerned with the unique items in translation (Mauranen 2007: 441-2). For exam
popular fiction, academic prose and popular science), findingthat the average
frequency of kin in original Finnish is 6.1 instances per 1,000O words, whereas
its normalized frequency in translated Finnish is 4.6 instances per 1,000 words.
in translated Finnish. Aijmer's (2007) study of the use of the English discourse
marker 04 and its translation in Swedish shows that there is no singgle lexical equiva
language.
using the conventional term 'translation universal', the term is hghly debatable
so far are identified on the basis of translational English - mostthe the proverstans and from
closely related European languages, there is a possibility thait such linguistic fea
tures are not 'universal' but rather specific to English and/or genetically related
guages (e.g. Mauranen and Kujamāki 2004). Clearly, if the features of traanslational
the language pairs involved must not be restricted to English and closely related
science books. Nevertheless, as Biber (1995: 278) observes, languhage may vary
across genres even more markedly than across languages. Xiao (20008) also demon
strates that the genre of scientific writing is the least diversified ofall genres
across various varieties of English. The implication is that the similaarity reported in
Chen (2006) might be a result of similar genre instead of language pair. Ideally,
corpora of native Chinese and translated Chinese. This is whatwe are aiming
(LČMC, see McEnery and Xiao 2004) and its translational match in Chinese -
the newly built ZJU Čorpus of Translational Chinese (ZCTC, see Xiao, He anYue
2008).
analysis of the fiction categories in the LCMC corpus and a corpusof translated
Chinese fiction.
The corpus data used in this case study are the five categories of fiction (i.e. gen
and romantic fiction) in the LCMC corpus (LCMC-Fiction herecafter) for native
Chinese, amounting to approximately 200,000 running words in I17 text samples
taken from novels and stories published in China around 1991.*The Contemporary
million words in 56 novels published over the past three deccades, with most of
them translated and published in the 1980s and 1990s. These novels atre mostly
translated from English while other source languages are also repiresented inclu
This section presents and discusses the results of data analysiis. We will first discuss
the parameters used in Laviosa (19986) in an attempt to find out whether the
core patterns of lexical use that Laviosa observes in translational Eraglish also apply
in translated Chinese fiction. We will also compare the frequencyand use of con
defines lexical density as the ratio between the number of lexiccal words (i.e. con
tent words) and the total number of words. This approach is taken in Laviosa
The other approach commonly used in corpus linguistics is the type-token ratio
(TTR), that is, the ratio between the number of types (i.e. unique woreds) and the
number of tokens (i.e. running words). However, since the TTR is seriously affected
by text length, it is reliable only when texts of equal or similar length are com
WordSmith Tools goes through each text file in a corpus. The STTRis the average
The present chapter has explored how corpora have helped to advance Transla
ogy in using corpora in translation and contrastive research, we revlewed the state
of the art of corpus-based Translation Studies. A case study was alsopresented that
universals, which has so far been confined largely to English andi closely related
European languages. It is our hope that more empirical evidernce for or against
parable corpora of translated and native Chinese when our project iscompleted.