NAME¶
trietool-0.2 - trie manipulation tool
SYNOPSIS¶
trietool-0.2 [
options ]
trie command arg ...
DESCRIPTION¶
trietool-0.2 is the command-line tool for manipulating double-array trie
data. It can be used to query, add and remove words in a trie.
The Trie¶
The
trie argument specifies the name of the trie to manipulate. A trie is
stored in a file with `.tri' extension. However, to create a new trie, one
needs to prepare a file with `.abm' extension, describing the Unicode ranges
of alphabet set of the trie. The ABM defines a set of vectors that map Unicode
characters into a continuous range of integers. The mapped integers will be
used as internal alphabet for the trie. Such mapping can improve the space
allocation within the trie data, regardless of non-continuity of the character
set being used, as the mapped range is always continuous.
The ABM file is a plain text file, with each line listing a range of 32-bit
Unicodes to be added to the alphabet set, in the format:
- [0xSSSS,0xTTTT]
where `0xSSSS' and `0xTTTT' are hexadecimal values of starting and ending
character code for the range, respectively.
For example, for a dictionary that contains only English words witout any
punctuations, one may prepare `
trie.abm' as:
- [0x0041,0x005a]
[0x0061,0x007a]
The first line lists the ASCII codes for A-Z, and the second for a-z.
No more than 255 alphabets are allowed in a trie.
The created `.tri' file will incorporate the ABM data. So, the `.abm' file is
not required after the first creation, and will be ignored.
COMMANDS¶
Available commands are:
- add word data ...
- Add word to trie, associated with integer data. Arbitrary
number of words-data pairs can be given. Two arguments will be read at a
time, the first will be treated as word, and the second as
data.
- add-list [ options ] list-file
- Add words with associated data listed in list-file to trie. The
list-file must be a text file listing one word per line. The
associated data can be put after the word in the same line, separated with
tab (`\t') character. If the data field is omitted, a default value (-1)
will be used instead.
-
- Options are available for this command:
- -e, --encoding enc
- Specify character encoding of the list-file contents, such as
`UTF-8'. If omitted, current locale codeset is assumed.
- delete word ...
- Delete word from trie. Arbitrary number of words to delete can be
given.
- delete-list [ options ] list-file
- Delete words listed in list-file from trie. The list-file
must be a text file listing one word per line.
-
- Options are available for this command:
- -e, --encoding enc
- Specify character encoding of the list-file contents, such as
`UTF-8'. If omitted, current locale codeset is assumed.
- query word
- Search for word in trie. If word exists, its associated data
is printed to standard output. Otherwise, error message is printed to
standard error, with nothing printed to standard output.
- list
- List all words in trie to standard output. The output lists one word-data
pair per line, separated with tab (`\t') character, the format appropriate
for being list-file for the add-list command.
OPTIONS¶
This program follows the usual GNU command line syntax, with long options
starting with two dashes (`--'). A summary of options is included below.
- -p, --path dir
- Set trie directory to dir [default=`.']
- -h, --help
- Show summary of options.
- -V, --version
- Show version of program.
AUTHOR¶
libdatrie was written by Theppitak Karoonboonyanan.
This manual page was written by Theppitak Karoonboonyanan
<thep@linux.thai.net>.