In the fields of computational linguistics an n-gram is a sequence of items from a corpus of language.
An n-gram could be any combination of letters, phonemes, syllables or words, etc. Looking at n-grams is useful to help work out how language works and is used in everyday situations.
Google Books offers an n-gram search online. This allows users to see how a word, etc…, is used. It offers searches through different corpora including:
- American English (155 billion words)
- British English (34 billion words)
- Fiction (91 billion words)
Searches can also be refined to books for certain decades and periods from the past.
The results typically show the number of occurrences of a search string. For example, looking at the American corpus for various search strings show:
- fill in a form – 1,012 examples
- fill out a form – 5,298 examples
Thus we can say that people are 5 times more likely to fill out a form than fill in a form.
Google books also allows you to create simple graphs showing how word usage has changed over time and comparing different terms. For example, the following graph shows the difference in usage from 1950 to the present for TEFL, TESOL and TESL:
In the expression, n-gram, the n part (usually in italics) stands for one or more (i.e. it’s a number, hence n); the gram goes back to Ancient Greek and means letter.
Thus n-gram means one or more letters.
Google Books n-grams home page
Google Comparison Graph
Did you know that if you subscribe to our website, you will receive email notifications whenever content changes or new content is added.
1. Enter your e-mail address below and click the Sign Me Up button.
2. You will receive an email asking you to confirm your intention of subscribing to our site.
3. Click the link in the email to confirm. That’s all there is to it!
Then indicate you no longer wish to receive our emails.
IWeb TEFL Team