Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools-view(1)	     Bioinformatics tools	      samtools-view(1)

NAME
       samtools	view - views and converts SAM/BAM/CRAM files

SYNOPSIS
       samtools	view [options] in.sam|in.bam|in.cram [region...]

DESCRIPTION
       With  no	 options  or  regions  specified, prints all alignments	in the
       specified input alignment file (in SAM, BAM, or CRAM format)  to	 stan-
       dard output in SAM format (with no header).

       You may specify one or more space-separated region specifications after
       the  input  filename  to	restrict output	to only	those alignments which
       overlap the specified region(s).	Use of region specifications  requires
       a coordinate-sorted and indexed input file (in BAM or CRAM format).

       The  -b,	 -C,  -1,  -u, -h, -H, and -c options change the output	format
       from the	default	of headerless SAM, and the -o and -U options  set  the
       output file name(s).

       The  -t	and -T options provide additional reference data. One of these
       two options is required when SAM	input does not	contain	 @SQ  headers,
       and the -T option is required whenever writing CRAM output.

       The  -L,	 -M,  -N, -r, -R, -d, -D, -s, -q, -l, -m, -f, -F, -G, and --rf
       options filter the alignments that will be included in  the  output  to
       only those alignments that match	certain	criteria.

       The  -p,	 option	sets the UNMAP flag on filtered	alignments then	writes
       them to the output file.

       The -x, -B, --add-flags,	and --remove-flags  options  modify  the  data
       which is	contained in each alignment.

       The  -X	option	can  be	used to	allow user to specify customized index
       file location(s)	if the data folder does	not contain  any  index	 file.
       See EXAMPLES section for	sample of usage.

       Finally,	the -@ option can be used to allocate additional threads to be
       used for	compression, and the -?	 option	requests a long	help message.

       REGIONS:
	      Regions  can  be specified as: RNAME[:STARTPOS[-ENDPOS]] and all
	      position coordinates are 1-based.

	      Important	note: when multiple regions are	given, some alignments
	      may be output multiple times if they overlap more	 than  one  of
	      the specified regions.

	      Examples of region specifications:

	      chr1	Output all alignments mapped to	the reference sequence
			named `chr1' (i.e. @SQ SN:chr1).

	      chr2:1000000
			The   region   on  chr2	 beginning  at	base  position
			1,000,000 and ending at	the end	of the chromosome.

	      chr3:1000-2000
			The 1001bp region on chr3 beginning at	base  position
			1,000  and  ending  at	base position 2,000 (including
			both end positions).

	      '*'	Output the unmapped reads at  the  end	of  the	 file.
			(This  does not	include	any unmapped reads placed on a
			reference sequence alongside their mapped mates.)

	      .		Output all alignments.	 (Mostly  unnecessary  as  not
			specifying a region at all has the same	effect.)

OPTIONS
       -b, --bam Output	in the BAM format.

       -C, --cram
		 Output	in the CRAM format (requires -T).

       -1, --fast
		 Enable	 fast compression.  This also changes the default out-
		 put format to BAM, but	this can be overridden by the explicit
		 format	options	or using a filename with a known suffix.

       -u, --uncompressed
		 Output	uncompressed data. This	also changes the default  out-
		 put format to BAM, but	this can be overridden by the explicit
		 format	options	or using a filename with a known suffix.

		 This option saves time	spent on compression/decompression and
		 is  thus  preferred  when the output is piped to another sam-
		 tools command.

       -h, --with-header
		 Include the header in the output.

       -H, --header-only
		 Output	the header only.

       --no-header
		 When producing	SAM format, output alignment records  but  not
		 headers.   This is the	default; the option can	be used	to re-
		 set the effect	of -h/-H.

       -c, --count
		 Instead of printing the alignments, only count	them and print
		 the total number. All filter options, such as -f, -F, and -q,
		 are taken into	account.  The -p option	 is  ignored  in  this
		 mode.

       -?, --help
		 Output	long help and exit immediately.

       -o FILE,	--output FILE
		 Output	to FILE	[stdout].

       -U FILE,	--unoutput FILE, --output-unselected FILE
		 Write	alignments that	are not	selected by the	various	filter
		 options to FILE.  When	this option is	used,  all  alignments
		 (or  all  alignments  intersecting the	regions	specified) are
		 written to either the output file or  this  file,  but	 never
		 both.

       -p, --unmap
		 Set the UNMAP flag on alignments that are not selected	by the
		 filter	 options.   These  alignments  are then	written	to the
		 normal	output.	 This is not compatible	with -U.

       -t FILE,	--fai-reference	FILE
		 A tab-delimited FILE.	Each line must contain	the  reference
		 name  in  the first column and	the length of the reference in
		 the second column, with one line for each distinct reference.
		 Any additional	fields beyond the second column	 are  ignored.
		 This  file  also defines the order of the reference sequences
		 in sorting. If	you run: `samtools faidx  <ref.fa>',  the  re-
		 sulting index file <ref.fa>.fai can be	used as	this FILE.

       -T FILE,	--reference FILE
		 A FASTA format	reference FILE,	optionally compressed by bgzip
		 and  ideally  indexed	by samtools faidx.  If an index	is not
		 present one will be generated for you,	if the reference  file
		 is local.

		 If  the  reference file is not	local, but is accessed instead
		 via an	https://, s3://	or other URL, the index	file will need
		 to be supplied	by the server alongside	the reference.	It  is
		 possible  to  have the	reference and index files in different
		 locations by supplying	both to	this option separated  by  the
		 string	"##idx##", for example:

		 -T ftp://x.com/ref.fa##idx##ftp://y.com/index.fa.fai

		 However, note that only the location of the reference will be
		 stored	 in the	output file header.  If	this method is used to
		 make CRAM files, the cram reader may not be able to find  the
		 index,	 and  may not be able to decode	the file unless	it can
		 get the references it needs using a different method.

       -L FILE,	--target-file FILE, --targets-file FILE
		 Only output alignments	overlapping the	input BED FILE [null].

       -M, --use-index
		 Use the multi-region iterator on the union of a BED file  and
		 command-line  region  arguments.   This avoids	re-reading the
		 same regions of files so can sometimes	be much	faster.	  Note
		 this  also  removes  duplicate	sequences.  Without this a se-
		 quence	that overlaps multiple regions specified on  the  com-
		 mand  line  will  be reported multiple	times.	The usage of a
		 BED file is optional and its path has to be  preceded	by  -L
		 option.

       --region-file FILE, --regions-file FILE
		 Use  an index and multi-region	iterator to only output	align-
		 ments overlapping the input BED FILE.	Equivalent  to	-M  -L
		 FILE or --use-index --target-file FILE.

       -N FILE,	--qname-file FILE
		 Output	 only  alignments  with	read names listed in FILE.  If
		 FILE starts with ^ then the operation	is  negated  and  only
		 outputs alignment with	read groups not	listed in FILE.	 It is
		 not  permissible  to  mix  both  the filter-in	and filter-out
		 style syntax in the same command.

       -r STR, --read-group STR
		 Output	alignments  in	read  group  STR  [null].   Note  that
		 records  with	no  RG tag will	also be	output when using this
		 option.  This behaviour may change in a future	release.

       -R FILE,	--read-group-file FILE
		 Output	alignments in read groups listed in FILE  [null].   If
		 FILE  starts  with  ^	then the operation is negated and only
		 outputs alignment with	read names not listed in FILE.	It  is
		 not  permissible  to  mix  both  the filter-in	and filter-out
		 style syntax in the same command.  Note that records with  no
		 RG  tag will also be output when using	this option.  This be-
		 haviour may change in a future	release.

       -d STR1[:STR2], --tag STR1[:STR2]
		 Only output alignments	with tag  STR1	and  associated	 value
		 STR2,	which can be a string or an integer [null].  The value
		 can be	omitted, in which case only the	tag is considered.

		 Note that this	option does not	specify	a tag type.  For exam-
		 ple, use -d XX:42 to select alignments	with an	XX:i:42	field,
		 not -d	XX:i:42.

       -D STR:FILE, --tag-file STR:FILE
		 Only output alignments	with tag  STR  and  associated	values
		 listed	in FILE	[null].

       -q INT, --min-MQ	INT
		 Skip alignments with MAPQ smaller than	INT [0].

       -l STR, --library STR
		 Only output alignments	in library STR [null].

       -m INT, --min-qlen INT
		 Only  output  alignments with number of CIGAR bases consuming
		 query sequence	>= INT [0]

       -e STR, --expr STR
		 Only include alignments that match the	filter expression STR.
		 The syntax for	these expressions is  described	 in  the  main
		 samtools(1) man page under the	FILTER EXPRESSIONS heading.

       -f FLAG,	--require-flags	FLAG
		 Only  output  alignments with all bits	set in FLAG present in
		 the FLAG field.  FLAG can be specified	in  hex	 by  beginning
		 with  `0x'  (i.e. /^0x[0-9A-F]+/), in octal by	beginning with
		 `0' (i.e. /^0[0-7]+/),	as a decimal number not	beginning with
		 '0' or	as a comma-separated list of flag names.

		 For a list of flag names see samtools-flags(1).

       -F FLAG,	--excl-flags FLAG, --exclude-flags FLAG
		 Do not	output alignments with any bits	set in FLAG present in
		 the FLAG field.  FLAG can be specified	in  hex	 by  beginning
		 with  `0x'  (i.e. /^0x[0-9A-F]+/), in octal by	beginning with
		 `0' (i.e. /^0[0-7]+/),	as a decimal number not	beginning with
		 '0' or	as a comma-separated list of flag names.

       --rf FLAG , --incl-flags	FLAG, --include-flags FLAG
		 Only output alignments	with any bit set in  FLAG  present  in
		 the  FLAG  field.   FLAG can be specified in hex by beginning
		 with `0x' (i.e. /^0x[0-9A-F]+/), in octal by  beginning  with
		 `0' (i.e. /^0[0-7]+/),	as a decimal number not	beginning with
		 '0' or	as a comma-separated list of flag names.

       -G FLAG	 Do  not output	alignments with	all bits set in	INT present in
		 the FLAG field.  This is the opposite of -f  such  that  -f12
		 -G12  is the same as no filtering at all.  FLAG can be	speci-
		 fied in hex by	beginning with `0x' (i.e. /^0x[0-9A-F]+/),  in
		 octal	by  beginning with `0' (i.e. /^0[0-7]+/), as a decimal
		 number	not beginning with '0' or as a comma-separated list of
		 flag names.

       -x STR, --remove-tag STR
		 Read tag(s) to	exclude	from output (repeatable) [null].  This
		 can be	a single tag or	a comma	separated list.	 Alternatively
		 the option itself can be repeated multiple times.

		 If the	list starts with a `^' then it is negated and  treated
		 as a request to remove	all tags except	those in STR. The list
		 may be	empty, so -x ^ will remove all tags.

		 Note that tags	will only be removed from reads	that pass fil-
		 tering.

       --keep-tag STR
		 This keeps only tags listed in	STR and	is directly equivalent
		 to  --remove-tag  ^STR.  Specifying an	empty list will	remove
		 all tags.  If both --keep-tag and --remove-tag	are  specified
		 then --keep-tag has precedence.

		 Note that tags	will only be removed from reads	that pass fil-
		 tering.

       -B, --remove-B
		 Collapse the backward CIGAR operation.

       --add-flags FLAG
		 Adds flag(s) to read.	FLAG can be specified in hex by	begin-
		 ning  with  `0x' (i.e.	/^0x[0-9A-F]+/), in octal by beginning
		 with `0' (i.e.	/^0[0-7]+/), as	a decimal number not beginning
		 with '0' or as	a comma-separated list of flag names.

       --remove-flags FLAG
		 Remove	flag(s)	from read.  FLAG is specified in the same  way
		 as with the --add-flags option.

       --subsample FLOAT
		 Output	 only  a proportion of the input alignments, as	speci-
		 fied by 0.0 <=	FLOAT <= 1.0, which gives the fraction of tem-
		 plates/pairs to be kept.  This	subsampling acts in  the  same
		 way  on  all of the alignment records in the same template or
		 read pair, so it never	keeps a	read but not its mate.

       --subsample-seed	INT
		 Subsampling seed used to influence which subset of  reads  is
		 kept.	When subsampling data that has previously been subsam-
		 pled,	be  sure to use	a different seed value from those used
		 previously; otherwise more reads will be  retained  than  ex-
		 pected.  [0]

       -s FLOAT	 Subsampling  shorthand	 option:  -s INT.FRAC is equivalent to
		 --subsample-seed INT --subsample 0.FRAC.

       -@ INT, --threads INT
		 Number	of BAM compression threads to use in addition to  main
		 thread	[0].

       -P, --fetch-pairs
		 Retrieve pairs	even when the mate is outside of the requested
		 region.   Enabling this option	also turns on the multi-region
		 iterator (-M).	 A region to search must be specified,	either
		 on  the command-line, or using	the -L option.	The input file
		 must be an indexed regular file.

		 This option first scans the requested region, using the RNEXT
		 and PNEXT fields of the records that have the PAIRED flag set
		 and pass other	filtering options to find where	 paired	 reads
		 are  located.	 These locations are used to build an expanded
		 region	list, and a set	of QNAMEs to allow from	 the  new  re-
		 gions.	 It will then make a second pass, collecting all reads
		 from the originally-specified region list together with reads
		 from  additional  locations  that  match  the	allowed	set of
		 QNAMEs.  Any other filtering options used will	be applied  to
		 all reads found during	this second pass.

		 As  this  option links	reads using RNEXT and PNEXT, it	is im-
		 portant that these fields are set accurately.	Use  'samtools
		 fixmate' to correct them if necessary.

		 Note that this	option does not	work with the -c, --count; -U,
		 --output-unselected; or -p, --unmap options.

       -S	 Ignored  for  compatibility  with previous samtools versions.
		 Previously this option	was required if	input was in SAM  for-
		 mat,  but now the correct format is automatically detected by
		 examining the first few characters of input.

       -X, --customized-index
		 Include customized index file as a part of arguments. See EX-
		 AMPLES	section	for sample of usage.

       -z FLAGs, --sanitize FLAGs
		 Perform some sanity checks on the state of SAM	record fields,
		 fixing	up common mistakes made	by  aligners.	These  include
		 soft-clipping	alignments  when they extend beyond the	end of
		 the reference,	marking	records	as  unmapped  when  they  have
		 reference  *  or position 0, and ensuring unmapped alignments
		 have no CIGAR or mapping quality for unmapped alignments  and
		 no MD,	NM, CG or SM tags.

		 FLAGs	is  a comma-separated list of keywords chosen from the
		 following list.

		 unmap	The UNMAPPED BAM flag. This is set for reads with  po-
			sition	<= 0, reference	name "*" or reads starting be-
			yond the end of	the reference. Note CIGAR "*" is  per-
			mitted for mapped data so does not trigger this.

		 pos	Position  and  reference  name	fields.	  These	may be
			cleared	when a sequence	is unmapped due	to the coordi-
			nates being beyond the end of the reference.   Select-
			ing  this may change the sort order of the file, so it
			is not a part of the on	compound argument.

		 mqual	Mapping	quality.  This is set  to  zero	 for  unmapped
			reads.

		 cigar	Modifies CIGAR fields, either by adding	soft-clips for
			reads  that  overlap  the  end	of the reference or by
			clearing it for	unmapped reads.

		 aux	For unmapped data, some	auxiliary fields are  meaning-
			less  and  will	 be removed.  These include NM,	MD, CG
			and SM.

		 off	Perform	no sanity fixing.  This	is the default

		 on	Sanitize data in a way that guarantees the  same  sort
			order.	This is	everything except for pos.

		 all	All sanitizing options,	including pos.

       --no-PG	 Do not	add a @PG line to the header of	the output file.

EXAMPLES
       o Import	SAM to BAM when	@SQ lines are present in the header:

	   samtools view -bo aln.bam aln.sam

	 If @SQ	lines are absent:

	   samtools faidx ref.fa
	   samtools view -bt ref.fa.fai	-o aln.bam aln.sam

	 where ref.fa.fai is generated automatically by	the faidx command.

       o Convert a BAM file to a CRAM file using a local reference sequence.

	   samtools view -C -T ref.fa -o aln.cram aln.bam

       o Convert  a  BAM  file	to  a CRAM with	NM and MD tags stored verbatim
	 rather	than calculating on the	fly during CRAM	decode,	so that	 mixed
	 data  sets  with  MD/NM  only on some records,	or NM calculated using
	 different definitions of mismatch, can	 be  decoded  without  change.
	 The  second  command demonstrates how to decode such a	file.  The re-
	 quest to not decode MD	here is	turning	off auto-generation of both MD
	 and NM; it will still emit the	MD/NM tags on records that  had	 these
	 stored	verbatim.

	   samtools view -C --output-fmt-option	store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
	   samtools view --input-fmt-option decode_md=0	-o aln.new.bam aln.cram

       o An alternative	way of achieving the above is listing multiple options
	 after	the --output-fmt or -O option.	The commands below are equiva-
	 lent to the two above.

	   samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
	   samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram

       o Include customized index file as a part of arguments.

	   samtools view [options] -X /data_folder/data.bam /index_folder/data.bai chrM:1-10

       o Output	alignments in read group grp2 (records with  no	 RG  tag  will
	 also be in the	output).

	   samtools view -r grp2 -o /data_folder/data.rg2.bam /data_folder/data.bam

       o Only keep reads with tag BC and were the barcode matches the barcodes
	 listed	in the barcode file.

	   samtools view -D BC:barcodes.txt -o /data_folder/data.barcodes.bam /data_folder/data.bam

       o Only  keep  reads  with tag RG	and read group grp2.  This does	almost
	 the same than -r grp2 but will	not keep records without the RG	tag.

	   samtools view -d RG:grp2 -o /data_folder/data.rg2_only.bam /data_folder/data.bam

       o Remove	the actions of samtools	markdup.  Clear	the duplicate flag and
	 remove	the dt tag, keep the header.

	   samtools view -h --remove-flags DUP -x dt -o	/data_folder/dat.no_dup_markings.bam /data_folder/data.bam

AUTHOR
       Written by Heng Li from the Sanger Institute.

SEE ALSO
       samtools(1), samtools-tview(1), sam(5)

       Samtools	website: <http://www.htslib.org/>

samtools-1.21		       12 September 2024	      samtools-view(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=samtools-view&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help