.TH frog 1 "2017 may 1" .SH NAME frog \- Dutch Natural Language Toolkit .SH SYNOPSIS frog [options] frog \-t test\-file .SH DESCRIPTION frog is an integration of memory\(hy-based natural language processing (NLP) modules developed for Dutch. frog's current version will tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, add IOB chunks and will assign a dependency graph to each sentence. .SH OPTIONS .BR \-c " " .RS set the configuration using 'file' .RE .BR \-\-debug =,... .RS set debug level per module. Tokenizer (t), Lemmatizer (l), Morphological Analyzer (a), Chunker (c), Multi\(hyWord Units (m), Named Entity Recognition (n), or Parser (p). (e.g. \-\-debug=l5,n3 sets the level for the Lemmatizer to 5 and for the NER to 3 ) .RE .BR \-d " " .RS set global debug level. (for all modules) .RE .BR \-\-deep\(hymorph .RS generate a deep morphological analisys and add it to the XML. This also includes compound information. The default tabbed output is also more detailed in the Morpheme field. .RE .BR \-e " " .RS set input encoding. (default UTF8) .RE .BR \-h " or " \-\-help .RS give some help .RE .BR \-\-keep\-parser\-files =[yes|no] .RS keep the intermediate files from the parser. Last sentence only! .RE .BR \-\-language='comma separated list of languages' .RS Set the languages to work on. This parameter is also passed to the tokenizer. The strings are assumed to be ISO 639\-2 codes. The first language in the list will be the default, unspecified languages are asumed to be that default. e.g. \-\-language=nld,eng,por means: detect Dutch, English and Portuguese, with Dutch being the default. .RE .BR \-n .RS assume inputfile to hold one sentence per line. Very useful when running interactive, otherwise an empty line is needed to signal end of input. .RE .BR \-\-nostdout .RS suppress the collumned output to stdout. (when no outputfile is specified with \-o or \-\-outputdir) Especially useful when XML output is speifies with \-X or \-\-xmldir. .RE .BR \-o " " .RS send output to 'file' instead of stdout. Defaults to the name of the inputfile with '.out' appended. .RE .BR \-\-outputdir " " .RS send all output to 'dir' instead of stdout. Creates filenames from the inputfilename(s) with '.out' appended. .RE .BR \-\-retry .RS assume a re-run on the same input file(s). Frog wil only process those files that haven't been processed yet. This is accomplished by looking at the output file names. (so this has no effect if neither \-o, \-\-outputdir, \-X or \-\-xmldir is used) .RE .BR \-\-skip =[aclmnpt] .RS skip parts of the process: Tokenizer (t), Chunker (c), Lemmatizer (l), Morphological Analyzer (a), Multi\(hyWord unit (m), Named\(hyEntity recognizer (n) or Parser (p) .RE .BR \-Q .RS Enable quotedetection in the tokenizer. May run havock! .RE .BR \-S " " .RS Run a server on 'port' .RE .BR \-t " " .RS process 'file'. When \-t is omitted, Frog will run in interactive mode. .RE .BR \-x " " .RS process 'xmlfile', which is supposed to be in FoLiA format! If 'xmlfile' is empty, and .BR \-\-testdir = is provided, all '.xml' files in 'dir' will be processed as FoLia XML. .RE .BR \-\-textclass "=" .RS When .BR \-x is given, use 'cls' to find AND store text in the FoLiA document(s). Using \-\-inputclass and \-\-\outputclass is in general a better choice. .RE .BR \-\-inputclass "=" .RS use 'cls' to find text in the FoLiA input document(s). .RE .BR \-\-outputclass "=" .RS use 'cls' to output text in the FoLiA input document(s). Preferably this is another class then the inputclass. .RE .BR \-\-testdir = .RS process all files in 'dir'. When the input mode is XML, only '.xml' files are teken from 'dir'. see also .B \-\-outputdir .RE .BR \-\-tmpdir = .RS location to store intermediate files. Default /tmp. .RE .BR \-\-uttmarker = .RS assume all utterances are separated by 'mark'. (the default is none). .RE .BR \-\-threads = .RS use a maximum of 'n' threads. The default is to take whatever is needed. In servermode we always run on 1 thread per session. .RE .BR \-V " or " \-\-version .RS show version info .RE .BR \-\-xmldir = .RS generate FoLiA XML output and send it to 'dir'. Creates filenames from the inputfilename with '.xml' appended. (Except when it already ends with '.xml') .RE .BR \-X " " .RS generate FoLiA XML output and send it to 'file'. Defaults to the name of the inputfile(s) with '.xml' appended. (Except when it already ends with '.xml') .RE .BR \-\-id "=" .RS When .BR \-X for FoLia is given, use 'id' to give the doc an ID. .RE .SH BUGS likely .SH AUTHORS Maarten van Gompel Ko van der Sloot Antal van den Bosch e\-mail: lamasoftware@science.ru.nl .SH SEE ALSO .BR ucto (1)