.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.3. .TH HFST-TOKENIZE "1" "August 2018" "HFST" "User Commands" .SH NAME hfst-tokenize \- =perform matching/lookup on text streams .SH SYNOPSIS .B hfst-tokenize [\fI\,--segment | --xerox | --cg | --giella-cg\/\fR] [\fI\,OPTIONS\/\fR...] \fI\,RULESET\/\fR .SH DESCRIPTION perform matching/lookup on text streams .SS "Common options:" .TP \fB\-h\fR, \fB\-\-help\fR Print help message .TP \fB\-V\fR, \fB\-\-version\fR Print version info .TP \fB\-v\fR, \fB\-\-verbose\fR Print verbosely while processing .TP \fB\-q\fR, \fB\-\-quiet\fR Only print fatal erros and requested output .TP \fB\-s\fR, \fB\-\-silent\fR Alias of \fB\-\-quiet\fR .TP \fB\-n\fR, \fB\-\-newline\fR Newline as input separator (default is blank line) .TP \fB\-a\fR, \fB\-\-print\-all\fR Print nonmatching text .TP \fB\-w\fR, \fB\-\-print\-weight\fR Print weights (overrides earlier \fB\-W\fR option) .TP \fB\-W\fR, \fB\-\-no\-weights\fR Don't print weights (default; overrides earlier \fB\-w\fR, or \fB\-w\fR implied by \fB\-g\fR, options) .TP \fB\-m\fR, \fB\-\-tokenize\-multichar\fR Tokenize multicharacter symbols (by default only one utf\-8 character is tokenized at a time regardless of what is present in the alphabet) .TP \fB\-b\fR, \fB\-\-beam\fR=\fI\,B\/\fR Output only analyses whose weight is within B from best result .TP \fB\-tS\fR, \fB\-\-time\-cutoff\fR=\fI\,S\/\fR Limit search after having used S seconds per input .TP \fB\-lN\fR, \fB\-\-weight\-classes\fR=\fI\,N\/\fR Output no more than N best weight classes (where analyses with equal weight constitute a class .TP \fB\-u\fR, \fB\-\-unique\fR Remove duplicate analyses .TP \fB\-z\fR, \fB\-\-segment\fR Segmenting / tokenization mode (default) .TP \fB\-i\fR, \fB\-\-space\-separated\fR Tokenization with one sentence per line, space\-separated tokens .TP \fB\-x\fR, \fB\-\-xerox\fR Xerox output .TP \fB\-c\fR, \fB\-\-cg\fR Constraint Grammar output .TP \fB\-S\fR, \fB\-\-superblanks\fR Ignore contents of unescaped [] (cf. apertium\-destxt); flush on NUL .TP \fB\-g\fR, \fB\-\-giella\-cg\fR CG format used in Giella infrastructure (implies \fB\-w\fR and \fB\-l2\fR, treats @PMATCH_INPUT_MARK@ as subreading separator, expects tags to be Multichar_symbols, flush on NUL) .TP \fB\-C\fR \fB\-\-conllu\fR CoNLL\-U format .TP \fB\-f\fR, \fB\-\-finnpos\fR FinnPos output .TP \fB\-L\fR, \fB\-\-visl\fR VISL input and output (implies \fB\-W\fR, handles as blocks and