Scroll to navigation

SIMSTRING(1) General Commands Manual SIMSTRING(1)

NAME

simstring - build database and find similar words

SYNOPSIS

simstring [OPTIONS]

DESCRIPTION

This utility finds strings in the database (DB) such that they have similarity, in the similarity measure (SIM), no smaller than the threshold (TH) with queries read from STDIN. When -b (--build) option is specified, this utility builds a database (DB) for strings read from STDIN.

OPTIONS

These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). A summary of options is included below. For a complete description, see the Info files.

build a database for strings read from STDIN
specify a database file
use Unicode (wchar_t) for representing characters
specify the unit of n-grams (DEFAULT=3)
include marks for begins and ends of strings
pecify a similarity measure (DEFAULT='cosine'):

exact exact match
dice dice coefficient
cosine] cosine coefficient
jaccard jaccard coefficient
overlap overlap coefficient
specify the threshold (DEFAULT=0.7)
echo back query strings to the output
suppress supplemental information from the output
show benchmark result (retrieved strings are suppressed)
show this version information and exit
show summary of options and exit

SEE ALSO

/usr/share/doc/simstring-dev/examples

January 26, 2015