table of contents
SIMSTRING(1) | General Commands Manual | SIMSTRING(1) |
NAME¶
simstring - build database and find similar words
SYNOPSIS¶
simstring [OPTIONS]
DESCRIPTION¶
This utility finds strings in the database (DB) such that they have similarity, in the similarity measure (SIM), no smaller than the threshold (TH) with queries read from STDIN. When -b (--build) option is specified, this utility builds a database (DB) for strings read from STDIN.
OPTIONS¶
These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. For a complete description, see the Info files.
- -b, --build
- build a database for strings read from STDIN
- -d, --database=DB
- specify a database file
- -u, --unicode
- use Unicode (wchar_t) for representing characters
- -n, --ngram=N
- specify the unit of n-grams (DEFAULT=3)
- -m, --mark
- include marks for begins and ends of strings
- -s, --similarity=SIM
- pecify a similarity measure (DEFAULT='cosine'):
exact exact match dice dice coefficient cosine] cosine coefficient jaccard jaccard coefficient overlap overlap coefficient - -t, --threshold=TH
- specify the threshold (DEFAULT=0.7)
- -e, --echo-back
- echo back query strings to the output
- -q, --quiet
- suppress supplemental information from the output
- -b, --benchmark
- show benchmark result (retrieved strings are suppressed)
- -v, --version
- show this version information and exit
- -h, --help
- show summary of options and exit
SEE ALSO¶
/usr/share/doc/simstring-dev/examples
January 26, 2015 |