.TH lt-proc 1 2006-03-23 "" "" .SH NAME lt-proc \- This application is part of the lexical processing modules and tools ( .B lttoolbox ) .PP This tool is part of the apertium machine translation architecture: \fBhttp://www.apertium.org\fR. .SH SYNOPSIS .B lt-proc [ .B \-a \fR| .B \-c \fR| .B \-g \fR| .B \-n \fR| .B \-p \fR| .B \-s \fR| .B \-t \fR| .B \-v \fR| .B \-h ] fst_file [input_file [output_file]] .PP .B lt-proc [ .B \-\-analysis \fR| .B \-\-case-sensitive \fR| .B \-\-generation \fR| .B \-\-non-marked-gen \fR| .B \-\-post-generation \fR| .B \-\-sao \fR| .B \-\-transliteration \fR| .B \-\-version \fR| .B \-\-help ] fst_file [input_file [output_file]] .SH DESCRIPTION .BR lt-proc is the application responsible for providing the four lexical processing functionalities .RS \(bu \fImorphological analyser\fR ( option \fB\-a\fR ) .PP \(bu \fIlexical transfer\fR ( option \fB\-n\fR ) .PP \(bu \fImorphological generator\fR ( option \fB\-g\fR ) .PP \(bu \fIpost-generator\fR ( option \fB\-p\fR ) .RE \fR .PP It accomplishes these tasks by reading binary files containing a compact and efficient representation of dictionaries (a class of finite-state transducers called augmented letter transducers). These files are generated by \fBlt\-comp(1)\fR. .PP It is worth to mention that some characters (`\fB[\fR', `\fB]\fR', `\fB$\fR', `\fB^\fR', `\fB/\fR', `\fB+\fR') are \fIspecial\fR chars used for format and encapsulation. They should be escaped if they have to be used literally, for instance: `\fB[\fR'...`\fB]\fR' are ignored and the format of a \fIlinefeed\fR is `\fB^\fR...\fB$\fR'. .SH OPTIONS .TP .B \-a, \-\-analysis Tokenizes the text in surface forms (lexical units as they appear in texts) and delivers, for each surface form, one or more lexical forms consisting of lemma, lexical category and morphological inflection information. Tokenization is not straightforward due to the existence, on the one hand, of contractions, and, on the other hand, of multi-word lexical units. For contractions, the system reads in a single surface form and delivers the corresponding sequence of lexical forms. Multi-word surface forms are analysed in a left-to-right, longest-match fashion. Multi-word surface forms may be invariable (such as a multi-word preposition or conjunction) or inflected (for example, in es, \fI"echaban de menos"\fR, \(dqthey missed\(dq, is a form of the imperfect indicative tense of the verb \fI"echar de menos"\fR, \(dqto miss\(dq). Limited support for some kinds of discontinuous multi-word units is also available. Single-word surface forms analysis produces output like the one in these examples: \ \fI"cantar"\fR \-> `\fI^cantar/cantar$\fR' or \ `\fI"daba"\fR \-> \ `\fI^daba/dar/dar$\fR'. .TP .B \-c, \-\-case-sensitive Use the literal case of the incoming characters .TP .B \-g, \-\-generation Delivers a target-language surface form for each target-language lexical form, by suitably inflecting it. .TP .B \-n, \-\-non-marked-gen Morphological generation (like \fB-g\fR) but without unknown word marks (asterisk `*'). .TP .B \-p, \-\-post-generation Performs orthographical operations such as contractions and apostrophations. The post-generator is usually \fIdormant\fR (just copies the input to the output) until a special \fIalarm\fR symbol contained in some target-language surface forms \fIwakes\fR it up to perform a particular string transformation if necessary; then it goes back to sleep. .TP .B \-s, \-\-sao Input processing is in \fIorthoepikon\fR (previously `\fIsao\fR') annotation system format: \fBhttp://orthoepikon.sf.net\fR. .TP .B \-t, \-\-transliteration Apply a transliteration dictionary .TP .B \-v, \-\-version Display the version number. .TP .B \-h, \-\-help Display this help. .SH FILES .B input_file The input compiled dictionary. .SH SEE ALSO .I lt-expand\fR(1), .I lt-comp\fR(1), .I apertium-tagger\fR(1), .I apertium-translator\fR(1). .SH BUGS Lots of...lurking in the dark and waiting for you! .SH AUTHOR (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All rights reserved.