Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools-sort(1)	     Bioinformatics tools	      samtools-sort(1)

NAME
       samtools	sort - sorts SAM/BAM/CRAM files

SYNOPSIS
       samtools	sort [options] [in.sam|in.bam|in.cram]

DESCRIPTION
       Sort alignments by leftmost coordinates,	by read	name when -n or	-N are
       used,  by  tag  contents	 with -t, or a minimiser-based collation order
       with -M.	 An appropriate	@HD SO sort order header tag will be added  or
       an  existing  one  updated if necessary,	along with the @HD SS sub-sort
       header tag where	appropriate.

       The sorted output is written to standard	output by default, or  to  the
       specified  file (out.bam) when -o is used.  This	command	will also cre-
       ate temporary files tmpprefix.%d.bam as needed when the	entire	align-
       ment data cannot	fit into memory	(as controlled via the -m option).

       Consider	 using samtools	collate	instead	if you need name collated data
       without a full lexicographical sort.

       Note that if the	sorted output file is to be indexed with samtools  in-
       dex,  the default coordinate sort must be used.	Thus the -n, -N	and -t
       options are incompatible	with samtools index.

       When sorting by minimiser (-M), the sort	order for unplaced data	is de-
       fined by	the whole-read minimiser value and the offset  into  the  read
       that  this  minimiser was observed.  This produces small	clusters (con-
       tig-like, but unaligned)	and helps to improve compression with LZ algo-
       rithms.	This can be improved by	supplying a known reference to build a
       minimiser index (-I and -w options).

OPTIONS
       -l INT	  Set the desired compression level for	the final output file,
		  ranging from 0 (uncompressed)	or 1 (fastest but minimal com-
		  pression) to 9 (best compression but slowest to write), sim-
		  ilarly to gzip(1)'s compression level	setting.

		  If -l	is not used, the default compression level will	apply.

       -u	  Set the compression level to	0,  for	 uncompressed  output.
		  This is a synonym for	-l 0.

       -m INT	  Approximately	the maximum required memory per	thread,	speci-
		  fied either in bytes or with a K, M, or G suffix.  [768 MiB]

		  To  prevent  sort  from  creating a huge number of temporary
		  files, it enforces a minimum value of	1M for this setting.

       -n	  Sort by read names (i.e., the	QNAME field) using  an	alpha-
		  numeric  ordering,  rather  than by chromosomal coordinates.
		  The alpha-numeric or "natural" sort order  detects  runs  of
		  digits  in  the  strings and sorts these numerically.	 Hence
		  "a7b"	appears	before "a12b".	 Note  this  is	 not  suitable
		  where	 hexadecimal  values are in use.  Sets the header sub-
		  sort (@HD SS)	tag to queryname:natural.

       -N	  Sort by read names (i.e., the	QNAME field) using the lexico-
		  graphical ordering, rather than by chromosomal  coordinates.
		  Unlike  -n  no  detection of numeric components is used, in-
		  stead	relying	purely on the ASCII value of  each  character.
		  Hence	"x12" comes before "x7"	as "1" is before "7" in	ASCII.
		  This	is a more appropriate name sort	order where all	digits
		  in names are already zero-padded and/or  hexadecimal	values
		  are  being  used.   Sets the header sub-sort (@HD SS)	tag to
		  queryname:lexicographical.

       -t TAG	  Sort first by	the value in the alignment tag	TAG,  then  by
		  position or name (if also using -n or	-N).

       -M	  Sort	unmapped  reads	(those in chromosome "*") by their se-
		  quence minimiser (Schleimer et al., 2003;  Roberts  et  al.,
		  2004),  also reverse complementing as	appropriate.  This has
		  the effect of	collating some similar data together,  improv-
		  ing  the compressibility of the unmapped sequence.  The min-
		  imiser kmer size is adjusted using the -K option.  Note data
		  compressed in	this manner may	need to	be name	collated prior
		  to conversion	back to	fastq.

		  Mapped sequences are sorted by chromosome and	position.

		  Files	with at	least one aligned record (being	 placed	 at  a
		  position  on	a  chromosome) use the sort order "coordinate"
		  with a sub-sort  of  "coordinate:minhash".   Files  entirely
		  consisting  of unaligned data	use sort order "unsorted" with
		  sub-sort "unsorted:minhash".

       -R	  Do not use reverse strand with minimiser sort	(only compati-
		  ble with -M).

       -K INT	  Sets the kmer	size to	be used	in the -M option. [20]

       -I FILE	  Build	a minimiser index over FILE.  The per-read  minimisers
		  produced  by -M are no longer	sorted by their	numeric	value,
		  but by the reference coordinate this minimiser was found  to
		  come	from  (if  found in the	index).	 This further improves
		  compression due to improved sequence similarity between  se-
		  quences, albeit with a small CPU cost	of building and	query-
		  ing the index.  Specifying -I	automatically implies -M.

       -w INT	  Specifies  the  window size for building the minimiser index
		  on the file specified	in -I.	This defaults to 100.  It  may
		  be  better to	set this closer	to 50 for short-read data sets
		  (at a	higher CPU and memory cost), or	for more speed	up  to
		  1000 for long-read data sets.

       -H	  Squashes base	homopolymers down to a single base pair	before
		  constructing	the minimiser.	This is	useful for instruments
		  where	the primary source of error is in the  length  of  ho-
		  mopolymer.

       -o FILE	  Write	 the final sorted output to FILE, rather than to stan-
		  dard output.

       -O FORMAT  Write	the final output as sam, bam, or cram.

		  By default, samtools tries to	select a format	based  on  the
		  -o filename extension; if output is to standard output or no
		  format can be	deduced, bam is	selected.

       -T PREFIX  Write	 temporary  files to PREFIX.nnnn.bam, or if the	speci-
		  fied	PREFIX	is  an	existing  directory,  to   PREFIX/sam-
		  tools.mmm.mmm.tmp.nnnn.bam,  where mmm is unique to this in-
		  vocation of the sort command.

		  By default, any temporary files are  written	alongside  the
		  output  file,	 as  out.bam.tmp.nnnn.bam,  or if output is to
		  standard  output,  in	 the   current	 directory   as	  sam-
		  tools.mmm.mmm.tmp.nnnn.bam.

       -@ INT	  Set  number of sorting and compression threads.  By default,
		  operation is single-threaded.

       --no-PG	  Do not add a @PG line	to the header of the output file.

       --template-coordinate
		  Sorts	by template-coordinate,	whereby	the  sort  order  (@HD
		  SO) is unsorted, the group order (GO)	is query, and the sub-
		  sort (SS) is template-coordinate.

       Ordering	Rules

       The following rules are used for	ordering records.

       If  option  -t  is in use, records are first sorted by the value	of the
       given alignment tag, and	then by	position or name (if using -n or  -N).
       For  example,  "-t  RG" will make read group the	primary	sort key.  The
       rules for ordering by tag are:

          Records that	do not have the	tag are	sorted before ones that	do.

          If the types	of the tags are	different, they	will be	sorted so that
	   single character tags (type A) come before  array  tags  (type  B),
	   then	 string	 tags  (types H	and Z),	then numeric tags (types f and
	   i).

          Numeric tags	(types f and i)	are compared by	value.	Note that com-
	   parisons of floating-point values are subject to issues of rounding
	   and precision.

          String tags (types H	and Z) are compared based on the  binary  con-
	   tents of the	tag using the C	strcmp(3) function.

          Character tags (type	A) are compared	by binary character value.

          No attempt is made to compare tags of other types --	notably	type B
	   array values	will not be compared.

       When  the -n or -N option is present, records are sorted	by name.  His-
       torically samtools has used a "natural" ordering	-- i.e.	sections  con-
       sisting of digits are compared numerically while	all other sections are
       compared	 based	on  their binary representation.  This means "a1" will
       come before "b1"	and "a9" will come before "a10".  However this	alpha-
       numeric	sort can be confused by	runs of	hexadecimal digits.  The newer
       -N option adds a	simpler	lexicographical	 based	name  collation	 which
       does  not  attempt  any numeric comparisons and may be more appropriate
       for some	data sets.  Note care must be taken when using samtools	 merge
       to  ensure  all files are using the same	collation order.  Records with
       the same	name will be ordered according to the values of	the READ1  and
       READ2  flags  (see  samtools flags). When that flag is also equal, ties
       are resolved with primary alignments first,  then  SUPPLEMENTARY,  SEC-
       ONDARY,	and  finally SUPPLEMENTARY plus	SECONDARY.  Any	remaining ties
       are reported in the same	order as the input data.

       When the	--template-coordinate option is	in use,	the reads  are	sorted
       by:

       1. The earlier unclipped	5' coordinate of the template.

       2. The higher unclipped 5' coordinate of	the template.

       3. The library (from the	read group).

       4. The molecular	identifier (MI tag if present).

       5. The read name.

       6. If unpaired, or if R1	has the	lower coordinates of the pair.

       When  none  of the above	options	are in use, reads are sorted by	refer-
       ence (according to the order of the @SQ header records),	then by	 posi-
       tion in the reference, and then by the REVERSE flag.

       Note

       Historically  samtools sort also	accepted a less	flexible way of	speci-
       fying the final and temporary output filenames:

	      samtools sort [-f] [-o] in.bam out.prefix

       This has	now been removed.  The previous	out.prefix  argument  (and  -f
       option,	if  any) should	be changed to an appropriate combination of -T
       PREFIX and -o FILE.  The	previous -o option should be removed, as  out-
       put defaults to standard	output.

AUTHOR
       Written	by  Heng Li from the Sanger Institute with numerous subsequent
       modifications.

SEE ALSO
       samtools(1), samtools-collate(1), samtools-merge(1)

       Samtools	website: <http://www.htslib.org/>

samtools-1.22			  30 May 2025		      samtools-sort(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=samtools-sort&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help