Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
BZIP(1)			    General Commands Manual		       BZIP(1)

NAME
       bzip, bunzip - a	block-sorting file compressor, v0.21

SYNOPSIS
       bzip [ -cdfkvVL123456789	] [ filenames ...  ]
       bunzip [	-kvVL ]	[ filenames ...	 ]

DESCRIPTION
       Bzip  compresses	 files using the Burrows-Wheeler-Fenwick block-sorting
       text compression	algorithm.  Compression	is generally considerably bet-
       ter than	that achieved by more  conventional  LZ77/LZ78-based  compres-
       sors,  and  competitive with all	but the	best of	the PPM	family of sta-
       tistical	compressors.

       The command-line	options	are deliberately very similar to those of  GNU
       Gzip, but they are not identical.

       Bzip  expects  a	 list  of file names to	follow the command-line	flags.
       Each file is replaced by	a compressed version of	itself,	with the  name
       "original_name.bz".   Each  compressed  file  has the same modification
       date and	permissions as the corresponding original, so that these prop-
       erties can be correctly restored	at decompression time.	File name han-
       dling is	naive in the sense that	there is no mechanism  for  preserving
       original	 file  names,  permissions and dates in	filesystems which lack
       these concepts, or have serious file name length	restrictions, such  as
       MS-DOS.

       Bzip  and bunzip	will not overwrite existing files; if you want this to
       happen, you should delete them first.

       If no file names	are specified, bzip compresses from standard input  to
       standard	 output.   In this case, bzip will decline to write compressed
       output to a terminal, as	this would be  entirely	 incomprehensible  and
       therefore pointless.

       Bunzip  (or  bzip  -d  )	 decompresses and restores all specified files
       whose names end in ".bz".   Files  without  this	 suffix	 are  ignored.
       Again,  supplying no filenames causes decompression from	standard input
       to standard output.

       You can also compress or	decompress exactly one named file to the stan-
       dard output by giving the -c flag.

       Compression is  always  performed,  even	 if  the  compressed  file  is
       slightly	 larger	 than  the  original.  The worst case expansion	is for
       files of	zero length, which expand to  seventeen	 bytes.	  Random  data
       (including  the	output of most file compressors) is coded at about 8.1
       bits per	byte, giving an	expansion of around 1%.

       As a self-check for your	protection, bzip uses 32-bit CRCs to make sure
       that the	decompressed version of	a file is identical to	the  original.
       This  guards against corruption of the compressed data, and against un-
       detected	bugs in	bzip (hopefully	very unlikely).	 The chances  of  data
       corruption  going  undetected  is microscopic, about one	chance in four
       billion for each	file processed.	 Be aware, though, that	the check  oc-
       curs upon decompression,	so it can only tell you	that that something is
       wrong.  It can't	help you recover the original uncompressed data.

       Return values: 1	for an abnormal	exit, otherwise	0.

MEMORY MANAGEMENT
       Bzip compresses large files in blocks.  The block size affects both the
       compression  ratio  achieved,  and the amount of	memory needed both for
       compression and decompression.  The flags -1  through  -9  specify  the
       block  size to be 100,000 bytes through 900,000 bytes (the default) re-
       spectively.  At decompression-time, the block size used for compression
       is read from the	header of the compressed file, and bunzip  then	 allo-
       cates  itself  just  enough memory to decompress	the file.  Since block
       sizes are stored	in compressed files, it	follows	that the flags	-1  to
       -9  are irrelevant to and so ignored during decompression.  Compression
       and decompression requirements, in bytes, can be	estimated as:

	     Compression:   300k + ( 8 x block size )

	     Decompression: 6 x	block size

       The 300k	constant is for	a frequency-count table, used in  the  sorting
       phase of	compression.

       Larger  block  sizes give rapidly diminishing marginal returns; most of
       the compression comes from the first two	or three hundred  k  of	 block
       size,  a	 fact worth bearing in mind when using bzip on small machines.
       It is also important to appreciate that the  decompression  memory  re-
       quirement  is set at compression-time by	the choice of block size.  So,
       for example, if you are compressing files which you think might	possi-
       bly be decompressed on a	4-megabyte machine, you	might want to select a
       block  size  of 200k or 300k, so	the decompressor will draw 1200	kbytes
       or 1800 kbytes respectively, which is probably the limit	of what's com-
       fortable	on a 4-meg machine.  In	general, though, you  should  try  and
       use  the	 largest block size memory constraints allow.  Compression and
       decompression speed is virtually	unaffected by block size.

       Another significant point applies to files which	fit in a single	 block
       -- that means most files	you'd encounter	using a	large block size.  The
       amount  of real memory touched is proportional to the size of the file,
       since the file is smaller than a	block.	 For  example,	compressing  a
       file  20,000  bytes  long with the flag -9 will cause the compressor to
       allocate	[by the	formula, in practice a little more] 7500k  of  memory,
       but only	touch 300k + 20000 * 8 = 460 kbytes of it.  Similarly, the de-
       compressor will allocate	5400k but only touch 20000 * 6 = 120 kbytes.

       Here is a table which summarises	the maximum memory usage for different
       block  sizes.   Also recorded is	the total compressed size for 14 files
       of the Calgary Text Compression Corpus totalling	3,141,622 bytes.  This
       column gives some feel for how  compression  varies  with  block	 size.
       These  figures  tend  to	understate the advantage of larger block sizes
       for larger files, since the Corpus is dominated by smaller files.

		       Compress	  Decompress   Corpus
		Flag	 usage	    usage	Size

		 -1	 1100k	     500k      905958
		 -2	 1900k	    1000k      870646
		 -3	 2700k	    1500k      853650
		 -4	 3500k	    2000k      840140
		 -5	 4300k	    2500k      838355
		 -6	 5100k	    3000k      831695
		 -7	 5900k	    3500k      827104
		 -8	 6700k	    4000k      821652
		 -9	 7500k	    4500k      821652

OPTIONS
       -c     Compress or decompress to	standard output.  -c requires  you  to
	      supply exactly one file name, and	this file is compressed	or de-
	      compressed to standard out.

       -d     Force  decompression.   Bzip and bunzip are really the same pro-
	      gram, and	the decision about whether to compress	or  decompress
	      is done on the basis of which name is used.  This	flag overrides
	      that mechanism, and forces bzip to decompress.

       -f     The  complement to -d: forces compression, regardless of the in-
	      vokation name.

       -k     Keep (don't delete) input	files during compression or decompres-
	      sion.

       -v     Verbose mode  --	show  the  compression	ratio  for  each  file
	      processed.

       -V     Be very verbose.	This spews out lots of information during com-
	      pression which is	primarily of interest for debugging purposes.

       -L     Display the software license terms and conditions.

       -1 to -9
	      Set  the	block  size to 100 k, 200 k .. 900 k when compressing.
	      Has no effect when decompressing.	 See MEMORY MANAGEMENT above.

PERFORMANCE NOTES
       The sorting phase of compression	gathers	together  similar  strings  in
       the file.  Because of this, files containing very long runs of repeated
       symbols,	 like  "aabaabaabaab ..." (repeated several hundred times) may
       compress	extraordinarily	slowly.	 You can use the -V option to  monitor
       progress	 in  great  detail, if you want.  Decompression	speed is unaf-
       fected.	Such pathological cases	seem rare in practice.

       Incompressible or virtually-incompressible data may  decompress	rather
       more  slowly  than one would hope.  This	is due to naive	implementation
       of the move-to-front coder, and of the frequency	tables for the	arith-
       metic coder.

       Decompression  on  Sun  Sparc  1's  (and	other low-range	Sparcs)	can be
       slow, because of	the lack of hardware implementations of	integer	multi-
       ply and divide in the SPARC v7 instruction set.	The situation is  much
       exacerbated  if	bzip  is compiled for a	full SPARC v8 instruction set,
       since this causes the machine to	trap on	each multiply and  divide  in-
       struction.  These traps take control to the relevant software emulation
       of  the	offending instruction, but it is much quicker for the compiler
       simply to plant a call to the emulation routine.	 Moral:	be careful how
       you compile bzip	for a Sparc.  If you use GNU C,	 investigate  the  ef-
       fects of	the -msupersparc and -mcypress flags.

       Wildcard	expansion for Windows 95 and NT	loses leading directory	infor-
       mation.	 For example, the pathspec "sources\*.c" is searched correctly
       for matching files, but the "sources\" bit is ignored  when  the	 files
       come  to	 be  processed,	 which means bzip won't	be able	to find	any of
       them.  This is easy to fix; perhaps some	enterprising soul will send me
       a patch?

CAVEATS
       I/O error messages are not as helpful as	they  could  be.   Bzip	 tries
       hard to detect I/O errors and exit cleanly, but the details of what the
       problem is sometimes seem rather	misleading.

       There is	no -t option to	test the integrity of a	compressed file.  How-
       ever, Unix folks	can do the following:

	  bzip -dcV file.bz > /dev/null

       which causes bzip to do a trial decompression of	file.bz, throwing away
       the  result.   You'll  be shown the computed and	stored CRCs.  If these
       are identical, the file is almost certainly OK --  see  the  discussion
       above  on CRCs for a definition of "almost certainly".  If they're not,
       bzip will complain loudly.  Note	that file.bz is	left unchanged regard-
       less of the outcome.  Win95/NT folks can	do  the	 same,	but  /dev/null
       will have to be replaced	with something suitable, perhaps NUL.

       This  manual page pertains to version 0.21 of bzip.  It may well	happen
       that some future	version	will use a different compressed	 file  format.
       If  you try to decompress, using	0.21, a	.bz file created with some fu-
       ture version which uses a different compressed file format,  0.21  will
       complain	 that  your  file  "is not a BZIP file".  If that happens, you
       should obtain a more recent version of bzip and use that	to  decompress
       the file.

AUTHOR
       Julian Seward, sewardj@cs.man.ac.uk.

       The  ideas embodied in bzip are due to (at least) the following people:
       Michael Burrows and David Wheeler (for the  block  sorting  transforma-
       tion), Peter Fenwick (for the structured	coding model, and many refine-
       ments),	and  Alistair  Moffat,	Radford	 Neal  and Ian Witten (for the
       arithmetic coder).  I am	much indebted for their	help, support and  ad-
       vice.   See the file ALGORITHMS in the source distribution for pointers
       to sources of documentation.  Christian von  Roques  encouraged	me  to
       look  for  faster  sorting  algorithms,	so as to speed up compression.
       Many people sent	patches, helped	with portability  problems,  lent  ma-
       chines, gave advice and were generally helpful.

				     local			       BZIP(1)

NAME | SYNOPSIS | DESCRIPTION | MEMORY MANAGEMENT | OPTIONS | PERFORMANCE NOTES | CAVEATS | AUTHOR

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=bzip&manpath=FreeBSD+14.2-RELEASE+and+Ports>

home | help