Scroll to navigation

HFST-TOKENIZE(1) User Commands HFST-TOKENIZE(1)

NAME

hfst-tokenize - =perform matching/lookup on text streams

SYNOPSIS

hfst-tokenize [--segment | --xerox | --cg | --giella-cg] [OPTIONS...] RULESET

DESCRIPTION

perform matching/lookup on text streams

Common options:

-h, --help
Print help message
-V, --version
Print version info
-v, --verbose
Print verbosely while processing
-q, --quiet
Only print fatal erros and requested output
-s, --silent
Alias of --quiet
-n, --newline
Newline as input separator (default is blank line)
-a, --print-all
Print nonmatching text
-w, --print-weight
Print weights (overrides earlier -W option)
-W, --no-weights
Don't print weights (default; overrides earlier -w, or -w implied by -g, options)
-m, --tokenize-multichar Tokenize multicharacter symbols
(by default only one utf-8 character is tokenized at a time regardless of what is present in the alphabet)
-b, --beam=B
Output only analyses whose weight is within B from best result
-tS, --time-cutoff=S
Limit search after having used S seconds per input
-lN, --weight-classes=N
Output no more than N best weight classes (where analyses with equal weight constitute a class
-u, --unique
Remove duplicate analyses
-z, --segment
Segmenting / tokenization mode (default)
-i, --space-separated
Tokenization with one sentence per line, space-separated tokens
-x, --xerox
Xerox output
-c, --cg
Constraint Grammar output
-S, --superblanks
Ignore contents of unescaped [] (cf. apertium-destxt); flush on NUL
-g, --giella-cg
CG format used in Giella infrastructure (implies -w and -l2, treats @PMATCH_INPUT_MARK@ as subreading separator, expects tags to be Multichar_symbols, flush on NUL)
-C --conllu
CoNLL-U format
-f, --finnpos
FinnPos output
-L, --visl
VISL input and output (implies -W, handles <s> as blocks and <STYLE> inline)

Use standard streams for input and output (for now).

REPORTING BUGS

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at: <https://github.com/hfst/hfst/issues>

hfst-tokenize home page: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstTokenize>
General help using HFST software: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstHome>

COPYRIGHT

Copyright © 2017 University of Helsinki, License GPLv3: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
August 2018 HFST