n-grams

n-grams

LanguageTool can optionally make use of large n-gram data sets to detect errors with words that are often confused, like their and there.

To download the n-gram data set of a language:
  1. Open a Web Terminal and run the command below.
/app/pkg/install-ngrams.sh /app/data/ngrams en es
==> Installing en ngram dataset from https://languagetool.org/download/ngram-data/ngrams-en-20150817.zip
/tmp/en.zip 100%[================================================================>]   8.35G   112MB/s    in 77s     
==> Unpacking en ngram dataset
Archive:  /tmp/en.zip
   creating: en/
   creating: en/3grams/
  inflating: en/3grams/_1e4.fdt      
  inflating: en/3grams/_1e4.si       
 extracting: en/3grams/_1e4.nvd      
...
  inflating: en/1grams/_1p.si        
==> en ngram dataset has been installed.
==> Done
Set NGRAM_DATASET_PATH=/app/data/ngrams in /app/data/env and restart the app.

  1. Open /app/data/env using the file manager and check that NGRAM_DATASET_PATH is set to the correct path.
  2. Restart the app.

Info: Large data set
n-gram data sets are large. For example, the en is around 14GB and de is 3GB.



Explore the platform

Find more information about our company, platform, and services on our website.