n-grams and TEFL

In the fields of computational linguistics an n-gram is a sequence of items from a corpus‏‎ of language.

An n-gram could be any combination of letters, phoneme‏s, syllable‏‎s or words‏‎, etc. Looking at n-grams is useful to help work out how language works and is used in everyday situations.

Google Books offers an n-gram search online. This allows users to see how a word, etc…, is used. It offers searches through different corpora including:

  • American English‏‎ (155 billion words)
  • British English (34 billion words)
  • Fiction (91 billion words)

Searches can also be refined to books for certain decades and periods from the past.

The results typically show the number of occurrences of a search string. For example, looking at the American corpus for various search strings show:

  • fill in a form – 1,012 examples
  • fill out a form – 5,298 examples

Thus we can say that people are 5 times more likely to fill out a form than fill in a form.

Comparison Graphs

Google books also allows you to create simple graphs showing how word usage has changed over time and comparing different terms. For example, the following graph shows the difference in usage from 1950 to the present for TEFL, TESOL and TESL:


n-gram Etymology

In the expression, n-gram, the n part (usually in italics) stands for one or more (i.e. it’s a number, hence n); the gram goes back to Ancient Greek and means letter.

Thus n-gram means one or more letters.

Useful Links

Google Books n-grams home page

Google Comparison Graph

Posted in Linguistics, Technology & TEFL.

Leave a Reply

Your email address will not be published. Required fields are marked *

Human Verification: In order to verify that you are a human and not a spam bot, please enter the answer into the following box below based on the instructions contained in the graphic.