Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
bgzip(1)		     Bioinformatics tools		      bgzip(1)

NAME
       bgzip - Block compression/decompression utility

SYNOPSIS
       bgzip  [-cdfhikrt]  [-b	virtualOffset]	[-I  index_name]  [-l compres-
       sion_level] [-o outfile]	[-s size] [-@ threads] [file ...]

DESCRIPTION
       Bgzip compresses	files in a similar manner  to,	and  compatible	 with,
       gzip(1).	 The file is compressed	into a series of small (less than 64K)
       'BGZF'  blocks.	This allows indexes to be built	against	the compressed
       file and	used to	retrieve portions of the data without having to	decom-
       press the entire	file.

       If no files are specified on the	command	line, bgzip will compress  (or
       decompress if the -d option is used) standard input to standard output.
       If  a  file  is	specified, it will be compressed (or decompressed with
       -d).  If	the -c option is used, the result will be written to  standard
       output,	otherwise when compressing bgzip will write to a new file with
       a .gz suffix and	remove the original.   When  decompressing  the	 input
       file  must  have	a .gz suffix, which will be removed to make the	output
       name.  Again after decompression	completes the input file will  be  re-
       moved.  When  multiple  files are given as input, the operation is per-
       formed on all of	them. Access and modification time of input file  from
       filesystem is set to output file.  Note,	access time may	get updated by
       system when it deems appropriate.

OPTIONS
       --binary	 Bgzip	will  attempt  to  ensure BGZF blocks end on a newline
		 when the input	is a text file.	  The  exception  to  this  is
		 where a single	line is	larger than a BGZF block (64Kb).  This
		 can  aid tools	that use the index to perform random access on
		 the compressed	stream,	as the start of	a block	is  likely  to
		 also be the start of a	text record.

		 This  option processes	text files as if they were binary con-
		 tent, ignoring	the location of	newlines.  This	also  restores
		 the  behaviour	 for text files	to bgzip version 1.15 and ear-
		 lier.

       -b, --offset INT
		 Decompress to standard	 output	 from  virtual	file  position
		 (0-based uncompressed offset).	 Implies -c and	-d.

       -c, --stdout
		 Write to standard output, keep	original files unchanged.

       -d, --decompress
		 Decompress.

       -f, --force
		 Overwrite  files  without  asking,  or	 decompress files that
		 don't have a known compression	filename extension (e.g., .gz)
		 without asking.  Use --force twice to do both without asking.

       -g, --rebgzip
		 Try to	use an existing	index to create	a compressed file with
		 matching block	offsets.  The index must  be  specified	 using
		 the -I	file.gzi option.  Note that this assumes that the same
		 compression  library  and level are in	use as when making the
		 original file.	 Don't use it unless you know what you're  do-
		 ing.

       -h, --help
		 Displays a help message.

       -i, --index
		 Create	 a BGZF	index while compressing.  Unless the -I	option
		 is used, this will have the name of the compressed file  with
		 .gzi appended to it.

       -I, --index-name	FILE
		 Index file name.

       -k, --keep
		 Do not	delete input file during operation.

       -l, --compress-level INT
		 Compression  level  to	use when compressing.  From 0 to 9, or
		 -1 for	the default level set by the compression library. [-1]

       -o, --output FILE
		 Write to a file, keep original	files  unchanged,  will	 over-
		 write an existing file.

       -r, --reindex
		 Rebuild the index on an existing compressed file.

       -s, --size INT
		 Decompress  INT bytes (uncompressed size) to standard output.
		 Implies -c.

       -t, --test
		 Test the integrity of the compressed file.

       -@, --threads INT
		 Number	of threads to use [1].

BGZF FORMAT
       The BGZF	format written by bgzip	is described in	the SAM	format	speci-
       fication	available from http://samtools.github.io/hts-specs/SAMv1.pdf.

       It makes	use of a gzip feature which allows compressed files to be con-
       catenated.   The	 input data is divided into blocks which are no	larger
       than 64 kilobytes both before and after compression (including compres-
       sion headers).  Each block is compressed	into a gzip  file.   The  gzip
       header  includes	an extra sub-field with	identifier 'BC'	and the	length
       of the compressed block,	including all headers.

GZI FORMAT
       The index format	is a binary file listing pairs of compressed  and  un-
       compressed  offsets  in	a BGZF file.  Each compressed offset points to
       the start of a BGZF block.  The uncompressed offset is the  correspond-
       ing location in the uncompressed	data stream.

       All values are stored as	little-endian 64-bit unsigned integers.

       The file	contents are:

	   uint64_t number_entries

       followed	by number_entries pairs	of:

	   uint64_t compressed_offset
	   uint64_t uncompressed_offset

EXAMPLES
	   # Compress stdin to stdout
	   bgzip < /usr/share/dict/words > /tmp/words.gz

	   # Make a .gzi index
	   bgzip -r /tmp/words.gz

	   # Extract part of the data using the	index
	   bgzip -b 367635 -s 4	/tmp/words.gz

	   # Uncompress	the whole file,	removing the compressed	copy
	   bgzip -d /tmp/words.gz

AUTHOR
       The  BGZF library was originally	implemented by Bob Handsaker and modi-
       fied by Heng Li for remote file access and in-memory caching.

SEE ALSO
       gzip(1),	tabix(1)

htslib-1.21		       12 September 2024		      bgzip(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=bgzip&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help