FreeBSD Manual Pages

home | help
DICTZIP(1)							    DICTZIP(1)

NAME
       dictzip,	dictunzip - compress (or expand) files,	allowing random	access

SYNOPSIS
       dictzip [options] name
       dictunzip [options] name

DESCRIPTION
       dictzip compresses files	using the gzip(1) algorithm (LZ77) in a	manner
       which is	completely compatible with the gzip file format.  An extension
       to the gzip file	format (Extra Field, described in 2.3.1.1 of RFC 1952)
       allows  extra  data  to	be  stored in the header of a compressed file.
       Programs	like gzip and zcat will	 ignore	 this  extra  data.   However,
       dictd(8),  the  DICT  protocol  dictionary server will make use of this
       data to perform pseudo-random access on the file.  Files	in the dictzip
       format should end in ".dz" so that they may be distinguished from  com-
       mon gzip	files that do not contain the special header information.

       From RFC	1952, the extra	field is specified as follows:

	      If the FLG.FEXTRA	bit is set, an "extra field" is	present	in the
	      header,  with  total length XLEN bytes.  It consists of a	series
	      of subfields, each of the	form:

	      +---+---+---+---+==================================+
	      |SI1|SI2|	 LEN  |... LEN bytes of	subfield data ...|
	      +---+---+---+---+==================================+

	      SI1 and SI2 provide a subfield ID, typically two	ASCII  letters
	      with	some	 mnemonic     value.	  Jean-Loup	Gailly
	      <gzip@prep.ai.mit.edu> is	maintaining  a	registry  of  subfield
	      IDs;  please send	him any	subfield ID you	wish to	use.  Subfield
	      IDs with SI2 = 0 are reserved for	future use.

	      LEN gives	the length of the subfield data, excluding the 4  ini-
	      tial bytes.

       The  dictzip  program  uses 'R' for SI1,	and 'A'	for SI2	(i.e., "Random
       Access").  After	the LEN	field, the data	is arranged as follows:

       +---+---+---+---+---+---+===============================+
       |  VER  | CHLEN | CHCNT |  ... CHCNT words of data ...  |
       +---+---+---+---+---+---+===============================+

       As per RFC 1952,	all data is stored least-significant byte first.   For
       VER  1  of the data, all	values are 16-bits long	(2 bytes), and are un-
       signed integers.

       XLEN (which is specified	earlier	in the header) is a two	byte  integer,
       so  the extra field can be 0xffff bytes long, 2 bytes of	which are used
       for the subfield	ID (SI1	and SI1), and 2	bytes of which	are  used  for
       the subfield length (LEN).  This	leaves 0xfffb bytes (0x7ffd 2-byte en-
       tries or	0x3ffe 4-byte entries).	 Given that the	zip output buffer must
       be  10%	+  12  bytes  larger than the input buffer, we can store 58969
       bytes per entry,	or about 1.8GB if the 2-byte  entries  are  used.   If
       this  becomes a limiting	factor,	another	format version can be selected
       and defined for 4-byte entries.

       For compression,	the file is divided up into  "chunks"  of  data,  each
       chunk  is  less	than  64kB, and	can be compressed into an area that is
       also less than 64kB long	(taking	incompressible data  into  account  --
       usually	the  data is compressed	into a block that is much smaller than
       the original).  The CHLEN field specifies the length of	a  "chunk"  of
       data.   The  CHCNT  field specifies how many chunks are preset, and the
       CHCNT words of data specifies how long each chunk is after  compression
       (i.e., in the current compressed	file).

       To perform random access	on the data, the offset	and length of the data
       are  provided  to library routines.  These routines determine the chunk
       in which	the desired data begins, and decompresses that chunk.  Consec-
       utive chunks are	decompressed as	necessary.

TRADEOFFS
       Speed  True random file access is not realized, since any access,  even
	      for a single byte, requires that a 64kB chunk be read and	decom-
	      pressed.	This is	slower than accessing a	flat text file,	but is
	      much,  much faster than performing serial	access on a fully com-
	      pressed file.

       Space  For the textual dictionary databases we are  working  with,  the
	      use  of 64kB chunks and maximal LZ77 compression realizes	a file
	      which is only about 4% larger than the same file compressed  all
	      at once.

OPTIONS
       -d or --decompress
	      Decompress.   This  is  the  default if the executable is	called
	      dictunzip.

       -c or --stdout
	      Write output on standard output; keep original files  unchanged.
	      This  is only available when decompressing (because parts	of the
	      header must be updated after a write when	compressing).

       -f or --force
	      Force compression	or decompression even if the output  file  al-
	      ready exists.

       -h or --help
	      Display help.

       -k or --keep
	      Do not delete the	original file.

       -l or --list
	      For each compressed file,	list the following fields:

		  type:	 dzip,	gzip,  or text (includes files in unknown for-
	      mats)
		  crc: CRC checksum
		  date and time: from header
		  chunks: number of chunks in file
		  size:	size of	each uncompressed chunk
		  compr.: compressed size
		  uncompr.: uncompressed size
		  ratio: compression ratio (0.0% if unknown)
		  name:	name of	uncompressed file

	      Unlike gzip, the compression method is not detected.

       -L or --license
	      Display the dictzip license and quit.

       -t or --test
	      Check the	compressed file	integrity.  This option	is not	imple-
	      mented.  Instead,	it will	list the header	information.

       -v or --verbose
	      Verbose. Display extra information during	compression.

       -V or --version
	      Version. Display the version number and compilation options then
	      quit.

       -s start	or --start start
	      Specify the offer	to start decompression,	using decimal numbers.
	      The default is at	the beginning of the file.

       -e size or --size size
	      Specify the size of the portion of the file to decompress, using
	      decimal numbers.	The default is the whole file.

       -S start	or --Start start
	      Specify  the offer to start decompression, using base64 numbers.
	      The default is at	the beginning of the file.

       -E size or --Size start
	      Specify the size of the portion of the file to decompress, using
	      base64 numbers.  The default is the whole	file.

       -p prefilter or --pre prefilter
	      Specify a	shell command to execute as a filter  before  compres-
	      sion or decompression of a chunk.	 The pre- and post-compression
	      filters  can be used to provide additional compression or	output
	      formatting.  The filters may not increase	the buffer  size  sig-
	      nificantly.  The pre- and	post-compression filters were designed
	      to provide the most general interface possible.

       -P postfilter or	--post postfilter
	      Specify a	shell command to execute as a filter after compression
	      or decompression.

CREDITS
       dictzip	was written by Rik Faith (faith@cs.unc.edu) and	is distributed
       under the terms of the GNU General Public License.  If you need to dis-
       tribute under other terms, write	to the author.

       The main	libraries used by this programs	(zlib, regex, libmaa) are dis-
       tributed	under different	terms, so you may be able to use the libraries
       for applications	which are incompatible with the	GPL -- please see  the
       copyright  notices and license information that come with the libraries
       for more	information, and consult with your attorney to	resolve	 these
       issues.

SEE ALSO
       dict(1),	dictd(8), gzip(1), gunzip(1), zcat(1)

				  22 Jun 1997			    DICTZIP(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=dictzip&sektion=1&manpath=FreeBSD+Ports+15.0>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages