Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools-merge(1)	     Bioinformatics tools	     samtools-merge(1)

NAME
       samtools	merge -	merges multiple	sorted files into a single file

SYNOPSIS
       samtools	merge [options]	-o out.bam [options] in1.bam ... inN.bam

       samtools	merge [options]	out.bam	in1.bam	... inN.bam

DESCRIPTION
       Merge multiple sorted alignment files, producing	a single sorted	output
       file  that  contains  all  the input records and	maintains the existing
       sort order.

       The output file can be specified	via -o as shown	in the first synopsis.
       Otherwise the first non-option filename argument	is taken to be out.bam
       rather than an input file, as in	the second synopsis.  There is no  de-
       fault; to write to standard output (or to a pipe), use either "-o -" or
       the equivalent using "-"	as the first filename argument.

       If  -h  is specified the	@SQ headers of input files will	be merged into
       the specified header, otherwise they will be merged  into  a  composite
       header  created	from  the input	headers.  If in	the process of merging
       @SQ lines for coordinate	sorted input files, a conflict	arises	as  to
       the  order (for example input1.bam has @SQ for a,b,c and	input2.bam has
       b,a,c) then the resulting output	file will need to  be  re-sorted  back
       into coordinate order.

       Unless  the  -c or -p flags are specified then when merging @RG and @PG
       records into the	output header then any IDs found to be	duplicates  of
       existing	 IDs  in the output header will	have a suffix appended to them
       to differentiate	them from similar header records from other files  and
       the read	records	will be	updated	to reflect this.

       The  ordering of	the records in the input files must match the usage of
       the -n, -N, -t and --template-coordinate	command-line options.  If they
       do not, the output order	will be	undefined.  Note this also extends  to
       disallowing  mixing  of "queryname" files with a	combination of natural
       and lexicographical sort	orders.	 See sort for information about	record
       ordering.

       Problems	may arise when attempting to  merge  thousands	of  files  to-
       gether.	 The operating system may impose a limit on the	maximum	number
       of simultaneously open files.  See ulimit -n for	more information.  Ad-
       ditionally many files being read	from simultaneously may	cause  a  cer-
       tain amount of "disk thrashing".	 To partially alleviate	this the merge
       command	will  load  1MB	 of data at a time from	each file, but this in
       turn adds to the	overall	merge program memory usage.  Please take  this
       into account when setting memory	limits.

       In  extreme  cases,  it may be necessary	to reduce the problem to fewer
       files by	successively merging subsets before a second round of merging.

       -1      Use Deflate compression level 1 to compress the output.

       -b FILE List of input BAM files,	one file per line.

       -f      Force to	overwrite the output file if present.

       -h FILE Use the lines of	FILE as	`@' headers to be copied  to  out.bam,
	       replacing  any header lines that	would otherwise	be copied from
	       in1.bam.	 (FILE is actually in SAM format, though any alignment
	       records it may contain are ignored.)

       -n      The input alignments are	sorted by read names using  an	alpha-
	       numeric	ordering, rather than by chromosomal coordinates.  The
	       alpha-numeric or	"natural" sort order detects runs of digits in
	       the strings and sorts these numerically.	 Hence	"a7b"  appears
	       before  "a12b".	 Note  this  is	not suitable where hexadecimal
	       values are in use.

       -N      The input alignments are	sorted by read names using  a  lexico-
	       graphical  ordering,  rather  than  by chromosomal coordinates.
	       Unlike -n no detection of numeric components is	used,  instead
	       relying	purely	on  the	 ASCII value of	each character.	 Hence
	       "x12" comes before "x7" as "1" is before	"7" in ASCII.  This is
	       a more appropriate name sort order where	all  digits  in	 names
	       are  already  zero-padded  and/or  hexadecimal values are being
	       used.

       -o FILE Write merged output to FILE, specifying the filename via	an op-
	       tion rather than	as the first filename argument.	  When	-o  is
	       used,  all non-option filename arguments	specify	input files to
	       be merged.

       -t TAG  The input alignments have been sorted by	the value of TAG, then
	       by either position or name (if -n is given).

       --template-coordinate
	       Input files are sorted by template-coordinate.

       -R STR  Merge files in the specified region indicated by	STR [null]

       -r      Attach an RG tag	to each	alignment. The tag value  is  inferred
	       from file names.

       -u      Uncompressed BAM	output

       -c      When  several input files contain @RG headers with the same ID,
	       emit only one of	them (namely, the header line from  the	 first
	       file  we	find that ID in) to the	merged output file.  Combining
	       these similar headers is	usually	the right thing	to do when the
	       files being merged originated from the same file.

	       Without -c, all @RG headers appear in  the  output  file,  with
	       random suffixes added to	their IDs where	necessary to differen-
	       tiate them.

       -p      Similarly,  for	each  @PG ID in	the set	of files to merge, use
	       the @PG line of the first file we find that ID in  rather  than
	       adding a	suffix to differentiate	similar	IDs.

       -X      If  this	 option	 is  set,  it will allows user to specify cus-
	       tomized index file location(s) if the data folder does not con-
	       tain any	index file. See	EXAMPLES section for sample of usage.

       -L FILE BED file	for specifying multiple	regions	 on  which  the	 merge
	       will  be	performed.  This option	extends	the usage of -R	option
	       and cannot be used concurrently with it.

       --no-PG Do not add a @PG	line to	the header of the output file.

       -@, --threads INT
	       Number of input/output compression threads to use  in  addition
	       to main thread [0].

EXAMPLES
       o Attach	the RG tag while merging sorted	alignments:

	   printf '@RG\tID:ga\tSM:hs\tLB:ga\tPL:ILLUMINA\n@RG\tID:454\tSM:hs\tLB:454\tPL:LS454\n' > rg.txt
	   samtools merge -rh rg.txt merged.bam	ga.bam 454.bam

	 The value in a	RG tag is determined by	the file name the read is com-
	 ing  from. In this example, in	the merged.bam,	reads from ga.bam will
	 be attached RG:Z:ga,  while  reads  from  454.bam  will  be  attached
	 RG:Z:454.

       o Include customized index file as a part of arguments:

	   samtools merge [options] -X <out.bam> </data_folder/in1.bam>	[</data_folder/in2.bam>	... </data_folder/inN.bam>] </index_folder/index1.bai> [</index_folder/index2.bai> ... </index_folder/indexN.bai>]

AUTHOR
       Written by Heng Li from the Sanger Institute.

SEE ALSO
       samtools(1), samtools-sort(1), sam(5)

       Samtools	website: <http://www.htslib.org/>

samtools-1.22			  30 May 2025		     samtools-merge(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=samtools-merge&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help