n-grams

n-grams

LanguageTool can optionally make use of large n-gram data sets to detect errors with words that are often confused, like their and there.

To download the n-gram data set of a language:
  1. Open a Web Terminal and run the command below.
/app/pkg/install-ngrams.sh /app/data/ngrams en es
==> Installing en ngram dataset from https://languagetool.org/download/ngram-data/ngrams-en-20150817.zip
/tmp/en.zip 100%[================================================================>]   8.35G   112MB/s    in 77s     
==> Unpacking en ngram dataset
Archive:  /tmp/en.zip
   creating: en/
   creating: en/3grams/
  inflating: en/3grams/_1e4.fdt      
  inflating: en/3grams/_1e4.si       
 extracting: en/3grams/_1e4.nvd      
...
  inflating: en/1grams/_1p.si        
==> en ngram dataset has been installed.
==> Done
Set NGRAM_DATASET_PATH=/app/data/ngrams in /app/data/env and restart the app.

  1. Open /app/data/env using the file manager and check that NGRAM_DATASET_PATH is set to the correct path.
  2. Restart the app.

Info: Large data set
n-gram data sets are large. For example, the en is around 14GB and de is 3GB.