Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
samtools-import(1)	     Bioinformatics tools	    samtools-import(1)

NAME
       samtools	import - converts FASTQ	files to unmapped SAM/BAM/CRAM

SYNOPSIS
       samtools	import [options] [ fastq_file ... ]

DESCRIPTION
       Reads one or more FASTQ files and converts them to unmapped SAM,	BAM or
       CRAM.  The input	files may be automatically decompressed	if they	have a
       .gz extension.

       The  simplest usage in the absence of any other command line options is
       to provide one or two input files.

       If a single file	is given, it will be interpreted as a single-ended se-
       quencing	format unless the read names end with /1 and /2	in which  case
       they  will be labelled as PAIRED	with READ1 or READ2 BAM	flags set.  If
       a pair of filenames are given they will be  read	 from  alternately  to
       produce	an  interleaved	 output	 file, also setting PAIRED and READ1 /
       READ2 flags.

       The filenames may be explicitly labelled	using -1 and -2	for READ1  and
       READ2  data  files, -s for an interleaved paired	file (or one half of a
       paired-end run),	-0 for unpaired	data and explicit index	 files	speci-
       fied  with  --i1	and --i2.  These correspond to typical output produced
       by Illumina bcl2fastq and match the output from	samtools  fastq.   The
       index  files  will  set both the	BC barcode code	and it's associated QT
       quality tag.

       The Illumina CASAVA identifiers may also	be processed when the  -i  op-
       tion  is	 given.	 This tag will be processed for	READ1 /	READ2, whether
       or not the read failed processing (QCFAIL flag),	and  the  barcode  se-
       quence  which  will be added to the BC tag.  This can be	an alternative
       to explicitly specifying	the index files, although note that  doing  so
       will not	fill out the barcode quality tag.

OPTIONS
       -s FILE Import paired interleaved data from FILE.

       -0 FILE Import single-ended (unpaired) data from	FILE.

	       Operationally  there is no difference between the -s and	-0 op-
	       tions as	given an interleaved file with /1  and	/2  read  name
	       endings	both  will  correctly  set the PAIRED, READ1 and READ2
	       flags, and given	data with no suffixes and  no  CASAVA  identi-
	       fiers  being  processed	both  will leave the data as unpaired.
	       However their inclusion here is for  more  descriptive  command
	       lines and to improve the	header comment describing the samtools
	       fastq decode command.

       -1 FILE,	-2 FILE
	       Import  paired  data from a pair	of FILEs.  The BAM flag	PAIRED
	       will be set, but	not PROPER_PAIR	as it has  not	been  aligned.
	       READ1  and  READ2  will	be stored in their original, unmapped,
	       orientation.

       --i1 FILE, --i2 FILE
	       Specifies index barcodes	associated with	the -1 and  -2	files.
	       These  will  be appended	to READ1 and READ2 records in the bar-
	       code (BC) and quality (QT) tags.

       -i      Specifies  that	the  Illumina  CASAVA  identifiers  should  be
	       processed.   This may set the READ1, READ2 and QCFAIL flags and
	       add a barcode tag.

       -N, --name2
	       Assume the read names are encoded in the	SRA  and  ENA  formats
	       where  the  first  word is an automatically generated name with
	       the second field	being the original name.  This option extracts
	       that second field instead.

       --barcode-tag TAG
	       Changes the auxiliary tag used for barcode sequence.   Defaults
	       to BC.

       --quality-tag TAG
	       Changes	the  auxiliary tag used	for barcode quality.  Defaults
	       to QT.

       -oFILE  Output to FILE.	By default output will be written to stdout.

       --order TAG
	       When outputting a SAM record, also output an integer  tag  con-
	       taining	the Nth	record number.	This may be useful if the data
	       is to be	sorted or collated in some manner and we wish this  to
	       be  reversible.	In this	case the tag may be used with samtools
	       sort -t TAG to regenerate the original input order.

	       Note integer tags can only hold up to 2^32 record numbers  (ap-
	       proximately 4 billion).	Data sets with more records can	switch
	       to  using  a fixed-width	string tag instead, with leading 0s to
	       ensure sort works.  To do this specify TAG:LENGTH.  E.g.	 --or-
	       der rn:12 will be able to sort up to 1 trillion records.

       -r RG_line, --rg-line RG_line
	       A  complete  @RG	 header	line may be specified, with or without
	       the initial "@RG" component.  If	specified this will  also  use
	       the ID field from RG_line in each SAM records RG	auxiliary tag.

	       If  specified multiple times this appends to the	RG line, auto-
	       matically adding	tabs between invocations.

       -R RG_ID, --rg RG_ID
	       This is a shorter form of the option above, equivalent to --rg-
	       line ID:RG_ID.  If both are specified then this option  is  ig-
	       nored.

       -u      Output BAM or CRAM as uncompressed data.

       -T TAGLIST
	       This  looks  for	 any  SAM-format auxiliary tags	in the comment
	       field of	a fastq	read  name.   These  must  match  the  <alpha-
	       num><alpha-num>:<type>:<data>  pattern  as specified in the SAM
	       specification.  TAGLIST can be blank or * to indicate all  tags
	       should  be  copied to the output, otherwise it is a comma-sepa-
	       rated list of tag types to include with all others  being  dis-
	       carded.

EXAMPLES
       Convert	a  single-ended	fastq file to an unmapped CRAM.	 Both of these
       commands	perform	the same action.

	   samtools import -0 in.fastq -o out.cram
	   samtools import in.fastq > out.cram

       Convert a pair of Illumina fastqs containing CASAVA identifiers to BAM,
       adding the barcode information to the BC	auxiliary tag.

	   samtools import -i -1 in_1.fastq -2 in_2.fastq -o out.bam
	   samtools import -i in_[12].fastq > out.bam

       Specify the read	group. These commands are equivalent

	   samtools import -r "$(echo -e 'ID:xyz\tPL:ILLUMINA')" in.fq
	   samtools import -r "$(echo -e '@RG\tID:xyz\tPL:ILLUMINA')" in.fq
	   samtools import -r ID:xyz -r	PL:ILLUMINA in.fq

       Create an unmapped BAM file from	 a  set	 of  4	Illumina  fastqs  from
       bcf2fastq, consisting of	two read and two index tags.  The CASAVA iden-
       tifier is used only for setting QC pass / failure status.

	   samtools import -i -1 R1.fq -2 R2.fq	--i1 I1.fq --i2	I2.fq -o out.bam

       Convert	a pair of CASAVA barcoded fastq	files to unmapped CRAM with an
       incremental record counter, then	sort this by minimiser in order	to re-
       duce file space.	 The reversal process is  also	shown  using  samtools
       sort and	samtools fastq.

	   samtools import -i in_1.fq in_2.fq --order ro -O bam,level=0	| \
	       samtools	sort -@4 -M -o out.srt.cram -

	   samtools sort -@4 -O	bam -u -t ro out.srt.cram | \
	       samtools	fastq -1 out_1.fq -2 out_2.fq -i --index-format	"i*i*"

AUTHOR
       Written by James	Bonfield of the	Wellcome Sanger	Institute.

SEE ALSO
       samtools(1), samtools-fastq(1)

       Samtools	website: <http://www.htslib.org/>

samtools-1.21		       12 September 2024	    samtools-import(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=samtools-import&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help