NAME¶

hfst-tokenize - =perform matching/lookup on text streams

SYNOPSIS¶

hfst-tokenize [--segment | --xerox | --cg] [OPTIONS...] RULESET

DESCRIPTION¶

perform matching/lookup on text streams

Common options:¶

-h, --help: Print help message
-V, --version: Print version info
-v, --verbose: Print verbosely while processing
-q, --quiet: Only print fatal erros and requested output
-s, --silent: Alias of --quiet
-n --newline: Newline as input separator (default is blank line)
-a --print-all: Print nonmatching text
-w --print-weight: Print weights
--tokenize-multichar: Tokenize multicharacter symbols (by default only one utf-8 character is tokenized at a time regardless of what is present in the alphabet)
-t, --time-cutoff=S: Limit search after having used S seconds per input
--segment: Segmenting / tokenization mode (default)
--xerox: Xerox output
--cg: cg output
--finnpos: FinnPos output

Use standard streams for input and output (for now).

REPORTING BUGS¶

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at: <https://sourceforge.net/tracker/?atid=1061990&group_id=224521&func=browse>

hfst-tokenize home page: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstTokenize>
General help using HFST software: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki//HfstHome>

COPYRIGHT¶

Copyright © 2010 University of Helsinki, License GPLv3: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

January 2016

HFST

Source file:	hfst-tokenize.1.en.gz (from hfst 3.10.0~r2798-3)
Source last updated:	2017-03-23T13:18:47Z
Converted to HTML:	2019-06-03T07:44:12Z