.Dd March 23, 2006 .Dt LT-PROC 1 .Os Apertium .Sh NAME .Nm lt-proc .Nd lexical processor for Apertium .Sh SYNOPSIS .Nm lt-proc .Op Fl a | b | o | c | d | e | g | h | p | s | t | v | h | z | w .Op Fl W .Op Fl N N .Op Fl L N .Op Fl i Ar icx_file .Ar fst_file .Op Ar input_file Op Ar output_file .Sh DESCRIPTION .Nm lt-proc is the application responsible for providing the four lexical processing functionalities: .Bl -bullet .It morphological analyser .Pq option Fl a .It lexical transfer .Pq option Fl n .It morphological generator .Pq option Fl g .It post-generator .Pq option Fl p .El .Pp It accomplishes these tasks by reading binary files containing a compact and efficient representation of dictionaries (a class of finite-state transducers called augmented letter transducers). These files are generated by .Xr lt-comp 1 . .Pp It is worth mentioning that some characters .Po .Ql \&[ , .Ql \&] , .Ql $ , .Ql \(a^ , .Ql / , .Ql + .Pc are .Em special chars used for format and encapsulation. They should be escaped if they have to be used literally, for instance: .So \&[ Sc Ns Ar ... Ns So \&] Sc are ignored and the format of a .Em linefeed is .So \(a^ Ns Ar ... Ns $ Sc . .Sh OPTIONS .Bl -tag -width Ds .It Fl a , Fl Fl analysis Tokenizes the text in surface forms (lexical units as they appear in texts) and delivers, for each surface form, one or more lexical forms consisting of lemma, lexical category and morphological inflection information. Tokenization is not straightforward due to the existence, on the one hand, of contractions, and, on the other hand, of multi-word lexical units. For contractions, the system reads in a single surface form and delivers the corresponding sequence of lexical forms. Multi-word surface forms are analysed in a left-to-right, longest-match fashion. Multi-word surface forms may be invariable (such as a multi-word preposition or conjunction) or inflected (for example, in es, .Dq echaban de menos , .Dq they missed , is a form of the imperfect indicative tense of the verb .Dq echar de menos , .Dq to miss ) . Limited support for some kinds of discontinuous multi-word units is also available. Single-word surface forms analysis produces output like the one in these examples: .Pp .Dq cantar \(-> .Dq \(a^cantar/cantar$ or .Dq daba \(-> .Dq \(a^daba/dar/dar$ . .It Fl b , Fl Fl bilingual Does lexical transference, attaching queues of morphological symbols not specified in the dictionaries. As the analysis mode, supports multiple lexical forms in the target language for a given lexical form in the source language. Works typically with the output of .Xr apertium-pretransfer 1 . .It Fl o , Fl Fl surf-bilingual As with .Fl b , but takes input from .Xr apertium-tagger 1 .Fl p , with surface forms, and if the lexical form is not found in the bilingual dictionary, it outputs the surface form of the word. .It Fl c , Fl Fl case-sensitive Use the literal case of the incoming characters .It Fl d , Fl Fl debugged-gen Morphological generation with all the stuff .It Fl e , Fl Fl decompose-compounds Try to treat unknown words as compounds, and decompose them. .It Fl w , Fl Fl dictionary-case Use the case information contained in the lexicon, instead of the surface case (only applied in analysis mode). .It Fl g , Fl Fl generation Delivers a target-language surface form for each target-language lexical form, by suitably inflecting it. .It Fl n , Fl Fl non-marked-gen Morphological generation (like .Fl g ) but without unknown word marks (asterisk .Ql * ) . .It Fl b , Fl Fl tagged-gen Morphological generation (like .Fl g ) but retaining part-of-speech tags. .It Fl p , Fl Fl post-generation Performs orthographical operations such as contractions and apostrophations. The post-generator is usually .Em dormant (just copies the input to the output) until a special .Em alarm symbol contained in some target-language surface forms .Em wakes it up to perform a particular string transformation if necessary; then it goes back to sleep. .It Fl s , Fl Fl sao Input processing is in .Em orthoepikon (previously .Em sao ) annotation system format: .Lk https://orthoepikon.sf.net . .It Fl t , Fl Fl transliteration Apply a transliteration dictionary .It Fl i Ar icx_file , Fl Fl ignored-chars Ar icx_file Ignores characters specified in the file .Ar icx_file .It Fl z , Fl Fl null-flush Flush output on the null character .It Fl C , Fl Fl careful-case Use dictionary case if present, else surface .It Fl N , Fl Fl analyses Output no more than N analyses (if the transducer is weighted, the N best analyses) .It Fl L , Fl Fl weight-classes Output no more than N best weight classes (where analyses with equal weight constitute a class) .It Fl W , Fl Fl show-weights Print final analysis weights (if any) .It Fl v , Fl Fl version Display the version number. .It Fl h , Fl Fl help Display this help. .El .Sh FILES .Bl -tag -width Ds .It Ar input_file The input compiled dictionary. .El .Sh SEE ALSO .Xr apertium 1 , .Xr apertium-tagger 1 , .Xr lt-comp 1 , .Xr lt-expand 1 .Sh COPYRIGHT Copyright \(co 2005, 2006 Universitat d'Alacant / Universidad de Alicante. This is free software. You may redistribute copies of it under the terms of .Lk https://www.gnu.org/licenses/gpl.html the GNU General Public License . .Sh BUGS Many... lurking in the dark and waiting for you!