Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
DICTFMT(1)							    DICTFMT(1)

NAME
       dictfmt - formats a DICT	protocol dictionary database

SYNOPSIS
       dictfmt	-c5|-t|-e|-f|-h|-j|-p [options]	 basename
       dictfmt	-i|-I [options]

DESCRIPTION
       dictfmt takes a file, FILE, on stdin, and creates a dictionary database
       named  basename.dict, that conforms to the DICT protocol.  It also cre-
       ates an index file named	basename.index.	  By  default,	the  index  is
       sorted  according to the	C locale, and only alphanumeric	characters and
       spaces are used in sorting, however this	may be changed with the	 --lo-
       cale  and  --allchars options.  ( basename is commonly chosen to	corre-
       spond to	the basename of	FILE , but this	is not mandatory.)

       Unless the database is extremely	small, it is highly  recommended  that
       basename.dict  be  compressed  with  /usr/bin/dictzip  to  create base-
       name.dict.dz.  (dictzip is included in the dictd	source package.)

       FILE may	be in any of the several formats described by the  format  op-
       tions  -c5, -t, -e, -f, -h, -j, -p, -i or -I.  Exactly one of these op-
       tions must be given.

       dictfmt prepends	several	headers	are to the .dict file.	 The  00-data-
       base-url	header gives the value of the -u option	as the URL of the site
       from  which  the	original database was obtained.	 The 00-database-short
       header gives the	value of the -s	option as the short name of  the  dic-
       tionary.	  (This	 "short	 name"	is  the	 identifying name given	by the
       "dict- D" option.)  If the -u and/or -s options are omitted, these val-
       ues will	be shown as "unknown", which is	 undesirable  for  a  publicly
       distributed database.

       The  date  of  conversion (formatting) is given in the 00-database-info
       header.	All text in the	input file prior to the	first headword (as de-
       fined by	the appropriate	formatting option) is appended to this header.
       All text	in the input file following a headword,	up to the  next	 head-
       word, is	copied unchanged to the	.dict file.

FORMATTING OPTIONS
       -c5    FILE  is	formatted  with	headwords preceded by 5	or more	under-
	      score characters (_) and a blank line.  All text until the  next
	      headword	is considered the definition.  Any leading `@' charac-
	      ters are stripped	out, but the file is otherwise unchanged. This
	      option was written to format the CIA WORLD FACTBOOK 1995.

       -t     -c5, --without-info and --without-headword options are  implied.
	      Use  this	 option,  if an	input database comes from dictunformat
	      utility.

       -e     FILE is in html  format,	with  the  headword  tagged  as	 bold.
	      (<B>headword - </B>)
	      This  option  was	 written to format EASTON'S 1897 BIBLE DICTIO-
	      NARY.  A typical entry from Easton is:

	      <A NAME="T0000005">
	      <B>Abagtha - </B>
	      one of the seven eunuchs	in  Ahasuerus's	 court	(Esther	 1:10;
	      2:21).

	      This is converted	to:
	      Abagtha
		 one  of  the seven eunuchs in Ahasuerus's court (Esther 1:10;
	      2:21).

	      The heading "<A NAME="T0000005"> is omitted,  and	 the  headword
	      `Abagtha'	is indexed.

	      NOTE:  This option should	be used	with caution.  It removes sev-
	      eral html	tags (enough to	format Easton properly), but not  all.
	      The  Makefile  that was originally written to format dict-easton
	      uses sed scripts to modify certain cross reference tags.	It may
	      be necessary to pipe the input file through  a  sed  script,  or
	      hack  the	 source	 of  dictfmt in	order to properly format other
	      html databases.

       -f     FILE is formatted	with the headwords starting in column 0,  with
	      the definition indented at least one space (or tab character) on
	      subsequent  lines.  The third line starting in column 0 is taken
	      as the first headword , and the first two	lines starting in col-
	      umn 0 are	treated	as part	of the 00-database-info	header.	  This
	      option was written to format the F.O.L.D.O.C.

       -h     FILE  is formatted with the headwords starting in	column 0, fol-
	      lowed by a comma,	with the definition  continuing	 on  the  same
	      line.   All  text	 before	the first single character line	is in-
	      cluded in	00-database-info header, and lines with	only one char-
	      acter are	omitted	from the .dict file.  The first	headword is on
	      the line following the first single character line.   The	 head-
	      word  is indexed;	the text of the	file is	not changed.  This op-
	      tion was written to format HITCHCOCK'S BIBLE NAMES DICTIONARY.

       -j     FILE is formatted	with headwords starting	in col 0, enclosed  in
	      colons,  followed	by the definition.  The	colons surrounding the
	      headword are removed, and	the headword is	indexed.  Lines	begin-
	      ning with	'*', '=', or '-' are also removed.   All  text	before
	      the  first headword is included in the headers.  This option was
	      written to format	the JARGON FILE.
	      NOTE: Some recent	versions of the	JARGON FILE had	 three	blanks
	      inserted before the first	colon at each headword.	 These must be
	      removed  before processing with dictfmt.	(sed scripts have been
	      used for this purpose. ed, awk, or perl scripts are also	possi-
	      ble.)

       -p     FILE  is	formatted  with	`%h' in	column 0, followed by a	blank,
	      followed by the headword,	optionally followed by a line contain-
	      ing `%d' in column 0.  The definition starts  on	the  following
	      line.   The  first  line	beginning '%h' and any lines beginning
	      '%d' are stripped	from the .dict file, and  '%h  '  is  stripped
	      from  in front of	the headword.  All text	before the first head-
	      word is included in the headers.	The second line	beginning '%h'
	      is taken as the first headword.  This option was written to for-
	      mat Jay Kominek's	elements database.

       -i -I  These two	options	are different from all	other  formatting  op-
	      tions.  They are intended	to resort (according to	dictd require-
	      ment)  an	.index file given on stdin.  That is .dict file	is not
	      generated	at all.	Only resorting is made.	 Three-	or four-column
	      .index like input	is expected.  -i expects  decimal  offset  and
	      length, while -I expects them in base64 format.

OPTIONS
       -u url Specifies	 the  URL  of the site from which the raw database was
	      obtained.	 If this option	is specified, 00-database-url headword
	      and appropriate definition will be ignored.

       -s name
	      Specifies	the name and, optionally, the version and date,	of the
	      database.	 (If this contains spaces, it  must  be	 quoted.)   If
	      this  option is specified, 00-database-short headword and	appro-
	      priate definition	will be	ignored.

       -L     display license and copyright information

       -V     display version information

       -D     output debugging information

       --help display a	help message

       --locale	locale
	      Specifies	the locale used	for sorting.  If no locale  is	speci-
	      fied,  the  "C"  locale is used. For using UTF-8 mode, --utf8 is
	      needed.

       --8bit generates	database in 8-bit mode,	see --locale option also.
	      Note: This option	is deprecated.	 Use  it  for  creating	 8-bit
	      (non-UTF8)  dictionaries only.  In order to create UTF-8 dictio-
	      nary, use	--utf8 option instead.

       --utf8 If specified, UTF-8 database is created.

       --allchars
	      Specifies	that all characters should be used for the search,  by
	      default  only  alphabetic, numeric characters and	spaces are put
	      to .index	file and therefore are used  in	 search.  Creates  the
	      special entry 00-database-allchars.

       --case-sensitive
	      makes  the  search  case	sensitive.   Creates the special entry
	      00-database-case-sensitive.

       --headword-separator sep
	      sets the headword	separator, which allows	several	words to  have
	      the same definition.  For	example, if '--headword-separator %%%'
	      is given,	and the	input file contains 'autumn%%%fall', both 'au-
	      tumn'  and  'fall'  will be indexed as  headwords, with the same
	      definition.

       --index-data-separator sep
	      sets the index/data separator, which allows to set the first and
	      fourth columns of	.index file independently. That	is  the	 first
	      column  can  be treated as an index column (where	the MATCH com-
	      mand searches) and the fourth column as a	result	column	(where
	      the  MATCH  gets things to be returned), and they	(1-st and 4-th
	      columns) are completely independant of each other.  The  default
	      value for	this separator is ASCII	symbol " \034".

       --break-headwords
	      multiple	headwords  will	 be  written  on separate lines	in the
	      .dict file.  For use with	'--headword-separator.

       --index-keep-orig
	      When --utf-8 is specified	headwords are lowercased  and  non-al-
	      phanumeric  characters are removed from it before	saving to .in-
	      dex file in order	to simplify the	 search.   When	 --index-keep-
	      orig  option  is used fourth column is created (if necessary) in
	      .index file, and contains	an original headword which is returned
	      by MATCH command.	 This option may be useful to prevent convert-
	      ing " AT&T" to " ATT" or to keep proper  nouns  with  uppercased
	      first letter.

       --without-headword
	      headwords	will not be included in	.dict file

       --without-header
	      header will not be copied	to DB info entry

       --without-url
	      URL will not be copied to	DB info	entry

       --without-time
	      time of creation will not	be copied to DB	info entry

       --without-ver
	      By  default dictfmt creates a special entry 00-database-dictfmt-
	      X.Y.Z that contains (in .dict file) dictfmt  version  in	format
	      dictfmt-X.Y.Z. This option suppresses this.

       --without-info
	      DB  info	entry  will  not  be  created.	 This may be useful if
	      00-database-info headword	is expected from  stdin	 (dictunformat
	      outputs it).

       --columns columns
	      By  default dictfmt wraps	strings	read from stdin	to 72 columns.
	      This option changes this default.	If it is set to	zero or	 nega-
	      tive value, wrapping is off.

       --default-strategy strategy
	      Sets  the	 default search	strategy for the database.  It will be
	      used instead of strategy	'.'.   Special	entry  00-database-de-
	      fault-strategy  is created for this purpose.  This option	may be
	      useful, for example, for dictionaries containing mainly  phrases
	      but  the	single words.  In any case, use	this option if you are
	      absolutely sure what you are doing.

       --mime-header mime_header
	      When client sends	OPTION MIME command to the dictd , definitions
	      found in this database  are  prepended  by  the  specified  MIME
	      header. Creates the special entry	00-database-mime-header.

CREDITS
       dictfmt	was  written  by  Rik  Faith (faith@cs.unc.edu)	as part	of the
       dict-misc package.  dictfmt is distributed under	the terms of  the  GNU
       General	Public	License.  If you need to distribute under other	terms,
       write to	the author.

AUTHOR
       This manual page	 was  written  by  Robert  D.  Hilliard	 <hilliard@de-
       bian.org> .

SEE ALSO
       dict(1),	 dictd(8),  dictzip(1),	 dictunformat(1), http://www.dict.org,
       RFC 2229

			       25 December 2000			    DICTFMT(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=dictfmt&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help