Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools-merge(1)	     Bioinformatics tools	     samtools-merge(1)

NAME
       samtools	merge -	merges multiple	sorted files into a single file

SYNOPSIS
       samtools	merge [options]	-o out.bam [options] in1.bam ... inN.bam

       samtools	merge [options]	out.bam	in1.bam	... inN.bam

DESCRIPTION
       Merge multiple sorted alignment files, producing	a single sorted	output
       file  that  contains  all  the input records and	maintains the existing
       sort order.

       The output file can be specified	via -o as shown	in the first synopsis.
       Otherwise the first non-option filename argument	is taken to be out.bam
       rather than an input file, as in	the second synopsis.  There is no  de-
       fault; to write to standard output (or to a pipe), use either "-o -" or
       the equivalent using "-"	as the first filename argument.

       If  -h  is specified the	@SQ headers of input files will	be merged into
       the specified header, otherwise they will be merged  into  a  composite
       header  created	from  the input	headers.  If in	the process of merging
       @SQ lines for coordinate	sorted input files, a conflict	arises	as  to
       the  order (for example input1.bam has @SQ for a,b,c and	input2.bam has
       b,a,c) then the resulting output	file will need to  be  re-sorted  back
       into coordinate order.

       Unless  the  -c or -p flags are specified then when merging @RG and @PG
       records into the	output header then any IDs found to be	duplicates  of
       existing	 IDs  in the output header will	have a suffix appended to them
       to differentiate	them from similar header records from other files  and
       the read	records	will be	updated	to reflect this.

       The  ordering of	the records in the input files must match the usage of
       the -n, -N and -t command-line options.	If they	do not,	the output or-
       der will	be undefined.  Note this also extends to disallowing mixing of
       "queryname" files with a	combination  of	 natural  and  lexicographical
       sort orders.  See sort for information about record ordering.

       Problems	 may  arise  when  attempting  to merge	thousands of files to-
       gether.	The operating system may impose	a limit	on the maximum	number
       of simultaneously open files.  See ulimit -n for	more information.  Ad-
       ditionally  many	 files being read from simultaneously may cause	a cer-
       tain amount of "disk thrashing".	 To partially alleviate	this the merge
       command will load 1MB of	data at	a time from each  file,	 but  this  in
       turn  adds to the overall merge program memory usage.  Please take this
       into account when setting memory	limits.

       In extreme cases, it may	be necessary to	reduce the  problem  to	 fewer
       files by	successively merging subsets before a second round of merging.

       -1      Use Deflate compression level 1 to compress the output.

       -b FILE List of input BAM files,	one file per line.

       -f      Force to	overwrite the output file if present.

       -h FILE Use  the	 lines of FILE as `@' headers to be copied to out.bam,
	       replacing any header lines that would otherwise be copied  from
	       in1.bam.	 (FILE is actually in SAM format, though any alignment
	       records it may contain are ignored.)

       -n      The  input  alignments are sorted by read names using an	alpha-
	       numeric ordering, rather	than by	chromosomal coordinates.   The
	       alpha-numeric or	"natural" sort order detects runs of digits in
	       the  strings  and sorts these numerically.  Hence "a7b" appears
	       before "a12b".  Note this is  not  suitable  where  hexadecimal
	       values are in use.

       -N      The  input  alignments are sorted by read names using a lexico-
	       graphical ordering, rather  than	 by  chromosomal  coordinates.
	       Unlike  -n  no detection	of numeric components is used, instead
	       relying purely on the ASCII value  of  each  character.	 Hence
	       "x12" comes before "x7" as "1" is before	"7" in ASCII.  This is
	       a  more	appropriate  name sort order where all digits in names
	       are already zero-padded and/or  hexadecimal  values  are	 being
	       used.

       -o FILE Write merged output to FILE, specifying the filename via	an op-
	       tion  rather  than  as the first	filename argument.  When -o is
	       used, all non-option filename arguments specify input files  to
	       be merged.

       -t TAG  The input alignments have been sorted by	the value of TAG, then
	       by either position or name (if -n is given).

       -R STR  Merge files in the specified region indicated by	STR [null]

       -r      Attach  an  RG tag to each alignment. The tag value is inferred
	       from file names.

       -u      Uncompressed BAM	output

       -c      When several input files	contain	@RG headers with the same  ID,
	       emit  only  one of them (namely,	the header line	from the first
	       file we find that ID in)	to the merged output file.   Combining
	       these similar headers is	usually	the right thing	to do when the
	       files being merged originated from the same file.

	       Without	-c,  all  @RG  headers appear in the output file, with
	       random suffixes added to	their IDs where	necessary to differen-
	       tiate them.

       -p      Similarly, for each @PG ID in the set of	files  to  merge,  use
	       the  @PG	 line of the first file	we find	that ID	in rather than
	       adding a	suffix to differentiate	similar	IDs.

       -X      If this option is set, it will  allows  user  to	 specify  cus-
	       tomized index file location(s) if the data folder does not con-
	       tain any	index file. See	EXAMPLES section for sample of usage.

       -L FILE BED  file  for  specifying  multiple regions on which the merge
	       will be performed.  This	option extends the usage of -R	option
	       and cannot be used concurrently with it.

       --no-PG Do not add a @PG	line to	the header of the output file.

       -@, --threads INT
	       Number  of  input/output	compression threads to use in addition
	       to main thread [0].

EXAMPLES
       o Attach	the RG tag while merging sorted	alignments:

	   printf '@RG\tID:ga\tSM:hs\tLB:ga\tPL:ILLUMINA\n@RG\tID:454\tSM:hs\tLB:454\tPL:LS454\n' > rg.txt
	   samtools merge -rh rg.txt merged.bam	ga.bam 454.bam

	 The value in a	RG tag is determined by	the file name the read is com-
	 ing from. In this example, in the merged.bam, reads from ga.bam  will
	 be  attached  RG:Z:ga,	 while	reads  from  454.bam  will be attached
	 RG:Z:454.

       o Include customized index file as a part of arguments:

	   samtools merge [options] -X <out.bam> </data_folder/in1.bam>	[</data_folder/in2.bam>	... </data_folder/inN.bam>] </index_folder/index1.bai> [</index_folder/index2.bai> ... </index_folder/indexN.bai>]

AUTHOR
       Written by Heng Li from the Sanger Institute.

SEE ALSO
       samtools(1), samtools-sort(1), sam(5)

       Samtools	website: <http://www.htslib.org/>

samtools-1.21		       12 September 2024	     samtools-merge(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=samtools-merge&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help