Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
GENDICT(1)			ICU 76.1 Manual			    GENDICT(1)

NAME
       gendict - Compiles word list into ICU string trie dictionary

SYNOPSIS
       gendict [ --uchars | --bytes --transform	transform ] [ -h, -?, --help ]
       [  -V,  --version ] [ -c, --copyright ] [ -v, --verbose ] [ -i, --icud-
       atadir directory	]  input-file  output-file

DESCRIPTION
       gendict reads the word list from	dictionary-file	and creates  a	string
       trie dictionary file. Normally this data	file has the .dict extension.

       Words  begin at the beginning of	a line and are terminated by the first
       whitespace.  Lines that begin with whitespace are ignored.

OPTIONS
       -h, -?, --help
	      Print help about usage and exit.

       -V, --version
	      Print the	version	of gendict and exit.

       -c, --copyright
	      Embeds the standard ICU copyright	into the output-file.

       -v, --verbose
	      Display extra informative	messages during	execution.

       -i, --icudatadir	directory
	      Look for any necessary ICU data files in directory.   For	 exam-
	      ple,  the	file pnames.icu	must be	located	when ICU's data	is not
	      built as a shared	library.  The default ICU  data	 directory  is
	      specified	by the environment variable ICU_DATA.  Most configura-
	      tions of ICU do not require this argument.

       --uchars
	      Set  the	output	trie  type  to	UChar. Mutually	exclusive with
	      --bytes.

       --bytes
	      Set the output trie  type	 to  Bytes.  Mutually  exclusive  with
	      --uchars.

       --transform
	      Set  the	transform type.	Should only be specified with --bytes.
	      Currently	supported transforms are:  offset-<hex-number>,	 which
	      specifies	 an  offset to subtract	from all input characters.  It
	      should be	noted that the offset transform	also  maps  U+200D  to
	      0xFF and U+200C to 0xFE, in order	to offer compatibility to lan-
	      guages that require these	characters.  A transform must be spec-
	      ified  for a bytes trie, and when	applied	to the non-value char-
	      acters in	the input-file must produce output  between  0x00  and
	      0xFF.

	input-file
	      The source file to read.

	output-file
	      The file to write	the output dictionary to.

CAVEATS
       The  input-file is assumed to be	encoded	in UTF-8.  The integers	in the
       input-file that are used	as values must be made	up  of	ASCII  digits.
       They  may be specified either in	hex, by	using a	0x prefix, or in deci-
       mal.  Either --bytes or --uchars	must be	specified.

ENVIRONMENT
       ICU_DATA	 Specifies the directory  containing  ICU  data.  Defaults  to
		 ${prefix}/share/icu/76.1/.   Some  tools in ICU depend	on the
		 presence of the trailing slash. It is thus important to  make
		 sure that it is present if ICU_DATA is	set.

AUTHORS
       Maxime Serrano

VERSION
       1.0

COPYRIGHT
       Copyright (C) 2012 International	Business Machines Corporation and oth-
       ers

SEE ALSO
       http://www.icu-project.org/userguide/boundaryAnalysis.html

ICU MANPAGE			  1 June 2012			    GENDICT(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=gendict&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help