What Is Corpus Linguistics
What Is Corpus Linguistics
Corpus linguistics is a study of language and a method of linguistic analysis which uses a
collection of natural or real word texts known as corpus. Corpus linguistics is used to
analyse and research a number of linguistic questions and offers a unique insight into the
dynamic of language which has made it one of the most widely used linguistic
methodologies.
Since corpus linguistics involves the use of large corpora that consist of millions or
sometimes even billion words, it relies heavily on the use of computers to determine what
rules govern the language and what patters (grammatical or lexical for instance) occur.
Thus it is not surprising that corpus linguistics emerged in its modern form only after the
computer revolution in the 1980s. The Brown Corpus, the first modern and electronically
readable corpus, however, was created by Henry Kucera and W. Nelson Francis as early as
the 1960s.
Does not explain why. The study of corpora tells us what and how happened but it
does not tell us why the frequency of a particular word has increased over time for
instance.
Does not represent the entire language. Corpus linguistics studies the language by
using randomly or systematically selected corpora. They typically consist of a large
number of naturally occurring texts, however, they do not represent the entire
language. Linguistic analyses that use the methods and tools of corpus linguistics
thus do not represent the entire language.