Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools-split(1)	     Bioinformatics tools	     samtools-split(1)

NAME
       samtools	split -	splits a file by read group.

SYNOPSIS
       samtools	split [options]	merged.sam|merged.bam|merged.cram

DESCRIPTION
       Splits  a file by read group, or	a specified tag, producing one or more
       output files matching a common prefix (by default based	on  the	 input
       filename).

       Unless  the  -d option is used, the file	will be	split according	to the
       @RG tags	listed in the header.  Records without an RG tag or with an RG
       tag undefined in	the header will	cause the program to exit with an  er-
       ror unless the -u option	is used.

       RG  values  defined  in	the header but with no records will produce an
       output file only	containing a header.

       If the -d TAG option is used, the file will be split on	the  value  in
       the  given  aux	tag.  Only string (type	Z) and integer (type i in SAM,
       plus equivalents	in BAM/CRAM) tags are currently	supported.  Unless the
       -u option is used, the program will exit	with an	error if  it  finds  a
       record without the given	tag.

       Note that attempting to split on	a tag with high	cardinality may	result
       in  the	creation  of a large number of output files.  To prevent this,
       the -M option can be used to set	a limit	on the number of splits	made.

       Using -d	RG behaves in a	similar	way to the default (without -d), open-
       ing an output file for each @RG line in the  header.   However,	unlike
       the  default,  new output files will be opened for any RG tags found in
       the alignment records irrespective of if	they have  a  matching	header
       @RG line.

       The  -u	option	may  be	 used  to  specify the output filename for any
       records with a missing or unrecognised tag.  This  option  will	always
       write out a file	even if	there are no records.

       Output  format  defaults	 to  BAM.  For SAM or CRAM then	either set the
       format with --output-fmt	or use -f to set the file extension  e.g.   -f
       %*_%#.sam.

OPTIONS
       -u FILE1	     Put reads with no tag or an unrecognised tag into FILE1

       -h FILE2	     Use  the header from FILE2	when writing the file given in
		     the -u option.  This header completely replaces  the  one
		     from  the input file.  It must be compatible with the in-
		     put file header, which means it must have the same	number
		     of	references listed in the @SQ lines and the  references
		     must be in	the same order and have	the same lengths.

       -f STRING     Output filename format string (see	below) ["%*_%#.%."]

       -d TAG	     Split  reads  by  TAG value into distinct files. Only the
		     TAG key must be supplied with the option.	The  value  of
		     the  TAG has to be	a string (i.e.	key:Z:value) or	an in-
		     teger (key:i:value).

		     Using this	option changes	the  default  filename	format
		     string  to	 "%*_%!.%.",  so that tag values appear	in the
		     output file names.	 This can be overridden	by  using  the
		     -f	option.

       -p NUMBER     Pad numeric values	in %# and %! format expansions to this
		     many  digits  using  leading zeros.  For %!, only integer
		     tag values	will be	padded.	 String	 tag  values  will  be
		     left unchanged, even if the value only includes digits.

       -M,--max-split NUM
		     Limit the number of files created by the -d option	to NUM
		     (default  100).   This prevents accidents where trying to
		     split on a	tag with high cardinality could	result in  the
		     creation  of  a  very large number	of output files.  Once
		     the file limit is reached,	any  tag  values  not  already
		     seen  will	 be  treated as	unmatched and the program will
		     exit with an error	unless the -u option is	in use.

		     If	desired, the limit can be removed  using  -M  -1,  al-
		     though  in	 practice  the number of outputs will still be
		     restricted	by system limits on the	number of  files  that
		     can be open at once.

		     If	 splitting  by read group, and the read	group count in
		     the header	is higher than the requested  limit  then  the
		     limit will	be raised to match.

       -v	     Verbose output

       --no-PG	     Do	not add	a @PG line to the header of the	output file.

       Format string expansions:
	%%   %
	%*   basename
	%#   index (of @RG in the header, or count of TAG values seen so far)
	%!   @RG ID or TAG value
	%.   output format filename extension

       -@, --threads INT
	      Number of	input/output compression threads to use	in addition to
	      main thread [0].

AUTHOR
       Written by Martin Pollard from the Sanger Institute.

SEE ALSO
       samtools(1), samtools-addreplacerg(1)

       Samtools	website: <http://www.htslib.org/>

samtools-1.21		       12 September 2024	     samtools-split(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=samtools-split&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help