Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
samtools-collate(1)	     Bioinformatics tools	   samtools-collate(1)

       samtools	collate	- shuffles and groups reads together by	their names

       samtools	collate	[options] in.sam|in.bam|in.cram	[_prefix_]

       Shuffles	 and  groups reads together by their names.  A faster alterna-
       tive to a full query name sort, collate ensures that reads of the  same
       name  are  grouped  together in contiguous groups, but doesn't make any
       guarantees about	the order of read names	between	groups.

       The output from this command should be suitable for any operation  that
       requires	all reads from the same	template to be grouped together.

       If  present,  <prefix> is used to name the temporary files that collate
       uses when sorting the data.  If neither the '-O'	nor '-o'  options  are
       used,  <prefix> must be present and collate will	use it to make an out-
       put file	name by	appending a suffix depending  on  the  format  written
       (.bam by	default).

       If  either the -O or -o option is used, <prefix>	is optional.  If <pre-
       fix> is absent, collate will write the temporary	files to a  system-de-
       pendent location	(/tmp on UNIX).

       Using  -f  for  fast mode will output only primary alignments that have
       either the READ1	or READ2 flags set (but	not both).  Any	 other	align-
       ment  records  will be filtered out.  The collation will	only work cor-
       rectly if there are no more than	two reads for any  given  QNAME	 after

       Fast  mode  keeps a buffer of alignments	in memory so that it can write
       out most	pairs as soon as they are found	instead	 of  storing  them  in
       temporary  files.  This allows collate to avoid some work and so	finish
       more quickly compared to	the standard mode.  The	number	of  alignments
       held  can be changed using -r, storing more alignments uses more	memory
       but increases the number	of pairs that can be written early.

       While collate normally randomises the ordering of read pairs, fast mode
       does  not.   Position-dependent biases that would normally be broken up
       can remain in the fast collate output.  It is therefore not a good idea
       to  use fast mode when preparing	data for programs that expect randomly
       ordered paired reads.  For example using	fast collate  instead  of  the
       standard	mode may lead to significantly different results from aligners
       that estimate library insert sizes on batches of	reads.

       -O      Output to stdout.  This option cannot be	used with '-o'.

       -o FILE Write output to FILE.  This option cannot be used with '-O'.

       -u      Write uncompressed BAM output

       -l INT  Compression level.  [1]

       -n INT  Number of temporary files to use.  [64]

       -f      Fast mode (primary alignments only).

       -r INT  Number of reads to store	in memory (for use with	-f).  [10000]

       --no-PG Do not add a @PG	line to	the header of the output file.

       -@, --threads INT
	       Number of input/output compression threads to use  in  addition
	       to main thread [0].

       Written	by  Heng  Li  from the Sanger Institute	and extended by	Andrew

       samtools(1), samtools-sort(1)

       Samtools	website: <>

samtools-1.11		       22 September 2020	   samtools-collate(1)


Want to link to this manual page? Use this URL:

home | help