NAME¶

estimate-ngram - estimates n-gram language model

SYNOPSIS¶

estimate-ngram [Options]

DESCRIPTION¶

Estimates an n-gram language model by cumulating n-gram count statistics, smoothing observed counts, and building a backoff n-gram model. Parameters can be optionally tuned to optimize development set performance.

Filename argument can be an ASCII file, a compressed file (ending in .Z or .gz), or '-' to indicate stdin/stdout.

OPTIONS¶

-h, -help: Print this message.
-verbose <int>: Set verbosity level.

: Default: 1

-o, -order <int>: Set the n-gram order of the estimated LM.

: Default: 3

-v, -vocab <file>: Fix the vocab to only words from the specified file.
-u, -unk <boolean>: Replace all out of vocab words with <unk>.

: Default: false

-t, -text <files>: Add counts from text files.
-c, -counts <files>: Add counts from counts files.
-s, -smoothing <ML, FixKN, FixModKN, FixKN#, KN, ModKN, KN#>: Specify smoothing algorithms.

: Default: ModKN

-wf, -weight-features <features-template>: Specify n-gram weighting features.
-p, -params <file>: Set initial model params.
-oa, -opt-alg <Powell, LBFGS, LBFGSB>: Specify optimization algorithm.

: Default: Powell

-op, -opt-perp <file>: Tune params to minimize dev set perplexity.
-ow, -opt-wer <file>: Tune params to minimize lattice word error rate.
-om, -opt-margin <file>: Tune params to minimize lattice margin.
-wb, -write-binary <boolean>: Write LM/counts files in binary format.

: Default: false

-wp, -write-params <file>: Write tuned model params to file.
-wv, -write-vocab <file>: Write LM vocab to file.
-wc, -write-counts <file>: Write n-gram counts to file.
-wec, -write-eff-counts <file>: Write effective n-gram counts to file.
-wlc, -write-left-counts <file>: Write left-branching n-gram counts to file.
-wrc, -write-right-counts <file>: Write right-branching n-gram counts to file.
-wl, -write-lm <file>: Write ARPA backoff LM to file.
-ep, -eval-perp <files>: Compute test set perplexity.
-ew, -eval-wer <files>: Compute test set lattice word error rate.
-em, -eval-margin <files>: Compute test set lattice margin.

Source file:	estimate-ngram.1.en.gz (from mitlm 0.4.1-2+b1)
Source last updated:	2019-02-11T13:09:51Z
Converted to HTML:	2022-09-07T21:03:26Z

NAME¶

SYNOPSIS¶

DESCRIPTION¶

OPTIONS¶

SEE ALSO¶