LanguageTool can optionally make use of large
n-gram data sets to detect errors with words that are often confused, like
their and
there.
To download the n-gram data set of a language:
- Open a Web Terminal and run the command below.
/app/pkg/install-ngrams.sh /app/data/ngrams en es
==> Installing en ngram dataset from https://languagetool.org/download/ngram-data/ngrams-en-20150817.zip
/tmp/en.zip 100%[================================================================>] 8.35G 112MB/s in 77s
==> Unpacking en ngram dataset
Archive: /tmp/en.zip
creating: en/
creating: en/3grams/
inflating: en/3grams/_1e4.fdt
inflating: en/3grams/_1e4.si
extracting: en/3grams/_1e4.nvd
...
inflating: en/1grams/_1p.si
==> en ngram dataset has been installed.
==> Done
Set NGRAM_DATASET_PATH=/app/data/ngrams in /app/data/env and restart the app.
- Open /app/data/env using the file manager and check that NGRAM_DATASET_PATH is set to the correct path.
- Restart the app.
Info: Large data set
n-gram data sets are large. For example, the en is around 14GB and de is 3GB.