FreeBSD Manual Pages
bgzip(1) Bioinformatics tools bgzip(1) NAME bgzip - Block compression/decompression utility SYNOPSIS bgzip [-cdfhikrt] [-b virtualOffset] [-I index_name] [-l compres- sion_level] [-o outfile] [-s size] [-@ threads] [file ...] DESCRIPTION Bgzip compresses files in a similar manner to, and compatible with, gzip(1). The file is compressed into a series of small (less than 64K) 'BGZF' blocks. This allows indexes to be built against the compressed file and used to retrieve portions of the data without having to decom- press the entire file. If no files are specified on the command line, bgzip will compress (or decompress if the -d option is used) standard input to standard output. If a file is specified, it will be compressed (or decompressed with -d). If the -c option is used, the result will be written to standard output, otherwise when compressing bgzip will write to a new file with a .gz suffix and remove the original. When decompressing the input file must have a .gz suffix, which will be removed to make the output name. Again after decompression completes the input file will be re- moved. When multiple files are given as input, the operation is per- formed on all of them. Access and modification time of input file from filesystem is set to output file. Note, access time may get updated by system when it deems appropriate. OPTIONS --binary Bgzip will attempt to ensure BGZF blocks end on a newline when the input is a text file. The exception to this is where a single line is larger than a BGZF block (64Kb). This can aid tools that use the index to perform random access on the compressed stream, as the start of a block is likely to also be the start of a text record. This option processes text files as if they were binary con- tent, ignoring the location of newlines. This also restores the behaviour for text files to bgzip version 1.15 and ear- lier. -b, --offset INT Decompress to standard output from virtual file position (0-based uncompressed offset). Implies -c and -d. -c, --stdout Write to standard output, keep original files unchanged. -d, --decompress Decompress. -f, --force Overwrite files without asking, or decompress files that don't have a known compression filename extension (e.g., .gz) without asking. Use --force twice to do both without asking. -g, --rebgzip Try to use an existing index to create a compressed file with matching block offsets. The index must be specified using the -I file.gzi option. Note that this assumes that the same compression library and level are in use as when making the original file. Don't use it unless you know what you're do- ing. -h, --help Displays a help message. -i, --index Create a BGZF index while compressing. Unless the -I option is used, this will have the name of the compressed file with .gzi appended to it. -I, --index-name FILE Index file name. -k, --keep Do not delete input file during operation. -l, --compress-level INT Compression level to use when compressing. From 0 to 9, or -1 for the default level set by the compression library. [-1] -o, --output FILE Write to a file, keep original files unchanged, will over- write an existing file. -r, --reindex Rebuild the index on an existing compressed file. -s, --size INT Decompress INT bytes (uncompressed size) to standard output. Implies -c. -t, --test Test the integrity of the compressed file. -@, --threads INT Number of threads to use [1]. BGZF FORMAT The BGZF format written by bgzip is described in the SAM format speci- fication available from http://samtools.github.io/hts-specs/SAMv1.pdf. It makes use of a gzip feature which allows compressed files to be con- catenated. The input data is divided into blocks which are no larger than 64 kilobytes both before and after compression (including compres- sion headers). Each block is compressed into a gzip file. The gzip header includes an extra sub-field with identifier 'BC' and the length of the compressed block, including all headers. GZI FORMAT The index format is a binary file listing pairs of compressed and un- compressed offsets in a BGZF file. Each compressed offset points to the start of a BGZF block. The uncompressed offset is the correspond- ing location in the uncompressed data stream. All values are stored as little-endian 64-bit unsigned integers. The file contents are: uint64_t number_entries followed by number_entries pairs of: uint64_t compressed_offset uint64_t uncompressed_offset EXAMPLES # Compress stdin to stdout bgzip < /usr/share/dict/words > /tmp/words.gz # Make a .gzi index bgzip -r /tmp/words.gz # Extract part of the data using the index bgzip -b 367635 -s 4 /tmp/words.gz # Uncompress the whole file, removing the compressed copy bgzip -d /tmp/words.gz AUTHOR The BGZF library was originally implemented by Bob Handsaker and modi- fied by Heng Li for remote file access and in-memory caching. SEE ALSO gzip(1), tabix(1) htslib-1.21 12 September 2024 bgzip(1)
NAME | SYNOPSIS | DESCRIPTION | OPTIONS | BGZF FORMAT | GZI FORMAT | EXAMPLES | AUTHOR | SEE ALSO
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=bgzip&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>
