Corpus Linguistics


Simply put, Corpus Linguistics is the study of language using computer programs which analyze millions of lines of texts held in a corpus (pl, corpora).

To begin with masses of samples of language are collected: from newspapers, books, transcripts of the spoken word, etc. These can then be marked up, that is, tagged to show the various parts of speech they consist of.

Life + is + a + long + song.
[noun] + [verb] + [article] + [adjective] + [noun] *

* Note that this is just one way of marking up a sentence; there are other ways to do this also.

Special software programs called concordancers can then be used to search through the corpus to find patterns. These patterns are then used to describe the language.

As a very simple example indeed, a concordancer could search through a corpus of language for the occurrence of adjectives and where they appear in relation to other parts of speech. It would soon find that they always come before a noun:

[adjective] + [noun]

This, then, could be suggested as a rule of how language works… until an exception occurs when the rule has to be tweaked to suit the new findings.

Of course this kind of painstaking search through millions of lines of text can only be done through computer power. However, remember that corpus linguistics is not really the collection of data but the interpretation and analysis of that data and the searches made on it.

Corpus Linguistics & TEFL

Corpus Linguistics has affected TEFL in a number of ways. Most notably it has provided a set of real life rules (although rules does seem a bit strong a word for something which is often contradicted) which tell us how language works. These rules then make their way into grammar books, dictionaries, TEFL coursebooks and so on.

However, closer to home, software and corpora have become available online and now anyone with an internet connection can search through millions of lines of text and come up with their own rules. This is incredibly useful for students of English who can work out by themselves how language works (and needless to say, if someone finds an answer by themself rather than being told it then the answer is much more firmly embedded and useful to them).

Useful Links

Corpus‏‎ (pl Corpora) and Language Learning – about the corpora being used in corpus linguistics

CALL‏‎ – Computer Assisted Language Learning – a general look at using computers to teach English

Concordancers‏‎ – the software used to search the corpora

n-grams and TEFL – looking at corpora


Did you know that if you subscribe to our website, you will receive email notifications whenever content changes or new content is added.
1. Enter your e-mail address below and click the Sign Me Up button.
2. You will receive an email asking you to confirm your intention of subscribing to our site.
3. Click the link in the email to confirm. That’s all there is to it!

Enter your email address below to subscribe to IWeb TEFL.

Note: if you wish to unsubscribe from our site, click the unsubscribe link at the bottom of the email you received.
Then indicate you no longer wish to receive our emails.

Thank You
IWeb TEFL Team


Posted in Linguistics.

Leave a Reply