Scroll to navigation

ESTIMATE-NGRAM(1) User Commands ESTIMATE-NGRAM(1)

NAME

estimate-ngram - estimates n-gram language model

SYNOPSIS

estimate-ngram [Options]

DESCRIPTION

Estimates an n-gram language model by cumulating n-gram count statistics, smoothing observed counts, and building a backoff n-gram model. Parameters can be optionally tuned to optimize development set performance.

Filename argument can be an ASCII file, a compressed file (ending in .Z or .gz), or '-' to indicate stdin/stdout.

OPTIONS

Print this message.
Set verbosity level.
Default: 1
Set the n-gram order of the estimated LM.
Default: 3
Fix the vocab to only words from the specified file.
Replace all out of vocab words with <unk>.
Default: false
Add counts from text files.
Add counts from counts files.
Specify smoothing algorithms.
Default: ModKN
Specify n-gram weighting features.
Set initial model params.
Specify optimization algorithm.
Default: Powell
Tune params to minimize dev set perplexity.
Tune params to minimize lattice word error rate.
Tune params to minimize lattice margin.
Write LM/counts files in binary format.
Default: false
Write tuned model params to file.
Write LM vocab to file.
Write n-gram counts to file.
Write effective n-gram counts to file.
Write left-branching n-gram counts to file.
Write right-branching n-gram counts to file.
Write ARPA backoff LM to file.
Compute test set perplexity.
Compute test set lattice word error rate.
Compute test set lattice margin.

SEE ALSO

evaluate-ngram(1), interpolate-ngram(1)

January 2013 MITLM