FreeBSD Manual Pages

home | help
LT-PROC(1)		    General Commands Manual		    LT-PROC(1)

NAME
       lt-proc -- lexical processor for	Apertium

SYNOPSIS
       lt-proc
	       [-a|-b|-o|-c|-d|-e| -g |	-h | -p	| -s | -t | -v | -h | -z | -w]
	       [-W]   [-N    -N]    [-L	   -N]	  [-i	 icx_file]    fst_file
	       [input_file [output_file]]

DESCRIPTION
       lt-proc	is  the	application responsible	for providing the four lexical
       processing functionalities:

       •   morphological analyser (option -a)

       •   lexical transfer (option -n)

       •   morphological generator (option -g)

       •   post-generator (option -p)

       It accomplishes these tasks by reading binary files containing  a  com-
       pact  and  efficient representation of dictionaries (a class of finite-
       state transducers called	augmented letter  transducers).	  These	 files
       are generated by	lt-comp(1).

       It  is  worth mentioning	that some characters (`[', `]',	`$', `^', `/',
       `+') are	special	chars used for format and encapsulation.  They	should
       be  escaped  if they have to be used literally, for instance: `['...`]'
       are ignored and the format of a linefeed	is `^...$'.

OPTIONS
       -a, --analysis
	       Tokenizes the text in surface forms (lexical units as they  ap-
	       pear in texts) and delivers, for	each surface form, one or more
	       lexical forms consisting	of lemma, lexical category and morpho-
	       logical	inflection information.	 Tokenization is not straight-
	       forward due to the existence, on	the one	hand, of contractions,
	       and, on the other hand, of multi-word lexical units.  For  con-
	       tractions, the system reads in a	single surface form and	deliv-
	       ers  the	 corresponding	sequence of lexical forms.  Multi-word
	       surface forms are analysed in  a	 left-to-right,	 longest-match
	       fashion.	 Multi-word surface forms may be invariable (such as a
	       multi-word  preposition or conjunction) or inflected (for exam-
	       ple, in es, "echaban de menos", "they missed", is a form	of the
	       imperfect indicative tense of the verb "echar  de  menos",  "to
	       miss").	Limited	support	for some kinds of discontinuous	multi-
	       word units is also available.  Single-word surface forms	analy-
	       sis produces output like	the one	in these examples:

	       "cantar"	  ->   "^cantar/cantar<vblex><inf>$"   or   "daba"  ->
	       "^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$".

       -b, --bilingual
	       Does lexical transference, attaching  queues  of	 morphological
	       symbols	not  specified	in  the	dictionaries.  As the analysis
	       mode, supports multiple lexical forms in	 the  target  language
	       for  a  given lexical form in the source	language.  Works typi-
	       cally with the output of	apertium-pretransfer(1).

       -o, --surf-bilingual
	       As with -b, but takes input from	 apertium-tagger(1)  -p,  with
	       surface	forms,	and  if	 the  lexical form is not found	in the
	       bilingual dictionary, it	outputs	the surface form of the	word.

       -c, --case-sensitive
	       Use the literal case of the incoming characters

       -d, --debugged-gen
	       Morphological generation	with all the stuff

       -e, --decompose-compounds
	       Try to treat unknown words as compounds,	and decompose them.

       -w, --dictionary-case
	       Use the case information	contained in the lexicon,  instead  of
	       the surface case	(only applied in analysis mode).

       -g, --generation
	       Delivers	 a  target-language  surface form for each target-lan-
	       guage lexical form, by suitably inflecting it.

       -n, --non-marked-gen
	       Morphological generation	(like -g)  but	without	 unknown  word
	       marks (asterisk `*').

       -b, --tagged-gen
	       Morphological generation	(like -g) but retaining	part-of-speech
	       tags.

       -p, --post-generation
	       Performs	 orthographical	 operations  such  as contractions and
	       apostrophations.	 The post-generator is usually	dormant	 (just
	       copies  the  input  to the output) until	a special alarm	symbol
	       contained in some target-language surface forms wakes it	up  to
	       perform	a  particular string transformation if necessary; then
	       it goes back to sleep.

       -s, --sao
	       Input processing	is in orthoepikon (previously sao)  annotation
	       system format: https://orthoepikon.sf.net.

       -t, --transliteration
	       Apply a transliteration dictionary

       -i icx_file, --ignored-chars icx_file
	       Ignores characters specified in the file	icx_file

       -z, --null-flush
	       Flush output on the null	character

       -C, --careful-case
	       Use dictionary case if present, else surface

       -N, --analyses
	       Output  no more than N analyses (if the transducer is weighted,
	       the N best analyses)

       -L, --weight-classes
	       Output no more than N best weight classes (where	analyses  with
	       equal weight constitute a class)

       -W, --show-weights
	       Print final analysis weights (if	any)

       -v, --version
	       Display the version number.

       -h, --help
	       Display this help.

FILES
       input_file
	       The input compiled dictionary.

SEE ALSO
       apertium(1), apertium-tagger(1),	lt-comp(1), lt-expand(1)

COPYRIGHT
       Copyright  (C)  2005,  2006 Universitat d'Alacant / Universidad de Ali-
       cante.  This is free software.  You may redistribute copies of it under
       the    terms	of     the     GNU     General	   Public     License:
       https://www.gnu.org/licenses/gpl.html.

BUGS
       Many... lurking in the dark and waiting for you!

Apertium			March 23, 2006			    LT-PROC(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=lt-proc&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages