Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools-cram-size(1)	     Bioinformatics tools	 samtools-cram-size(1)

NAME
       samtools	cram-size - list a break down of data types in a CRAM file

SYNOPSIS
       samtools	cram-size [-ve]	[-o file] in.bam

DESCRIPTION
       Produces	 a  summary of CRAM block Content ID numbers and their associ-
       ated Data Series	stored within them.  Optionally	a more detailed	break-
       down of how each	data series is	encoded	 per  container	 may  also  be
       listed using the	-e or --encodings option.

       CRAM  permits mixing multiple Data Series into a	single block.  In this
       case it is not possible to tell the relative proportion that  the  Data
       Series  consume	within that block.  CRAM also permits different	encod-
       ings and	block Content ID assignment per	container, although this would
       be highly unusual.  Htslib will always assign the same Data Series to a
       block with a consistent Content ID,  although  the  CRAM	 Encoding  may
       change.

       Each  CRAM block	has a compression method.  These may not be consistent
       between successive blocks with the  same	 Content  ID.	Htslib	learns
       which  compression methods work,	so a single Content ID may have	multi-
       ple compression methods associated with it.  The	methods	 utilised  are
       listed  per line	with a single character	code, although the size	break-
       down per	method and a more verbose description can be shown  using  the
       -v  option.   The compression codecs used in CRAM may have a variety of
       parameters, such	as compression levels,	inbuilt	 transformations,  and
       choices of entropy encoding.  An	attempt	is made	to distinguish between
       these different method parameterisations.

       The compression methods and their short and long	(verbose) name are be-
       low:

       Short   Long		    Description
       -------------------------------------------------------------------------
       g       gzip		    Gzip
       _       gzip-min		    Gzip -1
       G       gzip-max		    Gzip -9
       b       bzip2		    Bzip2
       b       bzip2-1 to bzip2-8   Explicit bzip2 compression levels
       B       bzip2-9		    Bzip2 -9
       l       lzma		    LZMA
       r       r4x8-o0		    rANS 4x8 Order-0
       R       r4x8-o1		    rANS 4x8 Order-1
       0       r4x16-o0		    rANS 4x16 Order-0
       0       r4x16-o0R	    rANS 4x16 Order-0 with RLE
       0       r4x16-o0P	    rANS 4x16 Order-0 with PACK
       0       r4x16-o0PR	    rANS 4x16 Order-0 with PACK	and RLE
       1       r4x16-o1		    rANS 4x16 Order-1
       1       r4x16-o1R	    rANS 4x16 Order-1 with RLE
       1       r4x16-o1P	    rANS 4x16 Order-1 with PACK
       1       r4x16-o1PR	    rANS 4x16 Order-1 with PACK	and RLE
       4       r32x16-o0	    rANS 32x16 Order-0
       4       r32x16-o0R	    rANS 32x16 Order-0 with RLE
       4       r32x16-o0P	    rANS 32x16 Order-0 with PACK
       4       r32x16-o0PR	    rANS 32x16 Order-0 with PACK and RLE
       5       r32x16-o1	    rANS 32x16 Order-1
       5       r32x16-o1R	    rANS 32x16 Order-1 with RLE
       5       r32x16-o1P	    rANS 32x16 Order-1 with PACK
       5       r32x16-o1PR	    rANS 32x16 Order-1 with PACK and RLE
       8       rNx16-xo0	    rANS Nx16 STRIPED mode
       2       rNx16-cat	    rANS Nx16 CAT mode
       a       arith-o0		    Arithmetic coding Order-0
       a       arith-o0R	    Arithmetic coding Order-0 with RLE
       a       arith-o0P	    Arithmetic coding Order-0 with PACK
       a       arith-o0PR	    Arithmetic coding Order-0 with PACK	and RLE
       A       arith-o1		    Arithmetic coding Order-1
       A       arith-o1R	    Arithmetic coding Order-1 with RLE
       A       arith-o1P	    Arithmetic coding Order-1 with PACK
       A       arith-o1PR	    Arithmetic coding Order-1 with PACK	and RLE
       a       arith-xo0	    Arithmetic coding STRIPED mode
       a       arith-cat	    Arithmetic coding CAT mode
       f       fqzcomp		    FQZComp quality codec
       n       tok3-rans	    Name tokeniser with	rANS encoding
       n       tok3-arith	    Name tokeniser with	Arithmetic encoding

OPTIONS
       -o FILE	 Output	size information to FILE.

       -v	 Verbose mode.	This shows one line per	combination of Content
		 ID and	compression method.

       -e, --encodings
		 CRAM  uses an Encoding, which describes how the data is seri-
		 alised	into a data block.  This is  distinct  from  the  CRAM
		 compression  method, which is then applied to the block post-
		 encoding.  The	encoding methods  are  stored  per  CRAM  Con-
		 tainer.

		 This  option  list  CRAM record encoding map and tag encoding
		 map.  This shows the data series, the associated CRAM	encod-
		 ing  method, such as HUFFMAN, BETA or EXTERNAL, and any para-
		 meters	associated with	that  encoding.	  The  output  may  be
		 large as this is information per container rather than	a sin-
		 gle set of summary statistics at the end of processing.

EXAMPLES
       -      The basic	summary	of block Content ID sizes for a	CRAM file:
		$ samtools cram-size in.cram
		#   Content_ID	Uncomp.size    Comp.size   Ratio Method	 Data_series
		BLOCK	  CORE		  0	       0 100.00% .
		BLOCK	    11	  394734019	51023626  12.93% g	 RN
		BLOCK	    12	 1504781763	99158495   6.59% R	 QS
		BLOCK	    13	     330065	   84195  25.51% _r.g	 IN
		BLOCK	    14	   26625602	 6803930  25.55% Rrg	 SC
		...

       -      Show  the	 same  file  above with	verbose	mode.  Here we see the
	      distinct compression methods which have been used	per block Con-
	      tent ID.
		$ samtools cram-size -v	in.cram
		#   Content_ID	Uncomp.size    Comp.size   Ratio Method	     Data_series
		BLOCK	  CORE		  0	       0 100.00% raw
		BLOCK	    11	  394734019	51023626  12.93% gzip	     RN
		BLOCK	    12	 1504781763	99158495   6.59% r4x8-o1     QS
		BLOCK	    13	     275033	   64343  23.39% gzip-min    IN
		BLOCK	    13	      43327	   15412  35.57% r4x8-o0     IN
		BLOCK	    13	       2452	    2452 100.00% raw	     IN
		BLOCK	    13	       9253	    1988  21.49% gzip	     IN
		BLOCK	    14	   23106404	 5903351  25.55% r4x8-o1     SC
		BLOCK	    14	    1951616	  513722  26.32% r4x8-o0     SC
		BLOCK	    14	    1567582	  386857  24.68% gzip	     SC
		...

       -      List encoding methods per	CRAM Data Series.  The two letter  se-
	      ries are the standard CRAM Data Series and the three letter ones
	      are  the optional	auxiliary tags with the	tag name and type com-
	      bined.

		$ samtools cram-size -e	in.cram
		Container encodings
		    RN	    BYTE_ARRAY_STOP(stop=0,id=11)
		    QS	    EXTERNAL(id=12)
		    IN	    BYTE_ARRAY_STOP(stop=0,id=13)
		    SC	    BYTE_ARRAY_STOP(stop=0,id=14)
		    BB	    BYTE_ARRAY_LEN(len_codec={EXTERNAL(id=42)},	\
					   val_codec={EXTERNAL(id=37)}
		    ...
		    XAZ	    BYTE_ARRAY_STOP(stop=9,id=5783898)
		    MDZ	    BYTE_ARRAY_STOP(stop=9,id=5063770)
		    ASC	    BYTE_ARRAY_LEN(len_codec={HUFFMAN(codes={1},lengths={0})}, \
					   val_codec={EXTERNAL(id=4281155)}
		    ...

AUTHOR
       Written by James	Bonfield from the Sanger Institute.

SEE ALSO
       samtools(1),

       Samtools	website: <http://www.htslib.org/>

samtools-1.21		       12 September 2024	 samtools-cram-size(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=samtools-cram-size&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help