Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
fastq-trim(1)		    General Commands Manual		 fastq-trim(1)

NAME
       fastq-trim - Trim adapters and low-quality bases	from FASTQ files

SYNOPSIS
       fastq-trim [--help]
       fastq-trim [--verbose] [--exact-match]
	   [--3p-adapter1 seq] [--3p-adapter2 seq]
	   [--min-match	N] [--max-mismatch-percent N]
	   [--min-qual N] [--min-length	N] [--polya-min-length N]
	   [--phred-base N]
	   [infile1.fastq[.gz|.bz2|.xz]] [outfile1.fastq[.gz|.bz2|.xz]]
	   [infile2.fastq[.gz|.bz2|.xz]	outfile2.fastq[.gz|.bz2|.xz]]

OPTIONS	AND ARGUMENTS
       --help Print a summary of usage and exit.

       --verbose
	      Print  some  intermediate	 results during	trimming for debugging
	      purposes.

       --exact-match
	      Use exact	matching to find adapters.  Adapters will  be  matched
	      if  a  segment matches the entire	adapter	sequence, or a portion
	      of it at the end of the read matches to a	minimum	 of  min_match
	      (default=3,  controlled by --min-match) characters.  The default
	      matching algorithm used when --exact-match is not	 specified  is
	      described	below.

       --3p-adapter1 seq
	      Specify  the 3' adapter for single-end mode and read1 in paired-
	      end mode.	 Default is the	Illumina universal adapter, AGATCGGAA-
	      GAG.  Use	fastq-scum(1) to identify other	standard  adapters  in
	      the input.

       --3p-adapter1 seq
	      Specify the 3' adapter for read2 in paired-end mode.  Default is
	      the Illumina universal adapter, AGATCGGAAGAG.  Use fastq-scum(1)
	      to identify other	standard adapters in the input.

       --min-match N
	      Minimum  number of bases in the read that	must match the adapter
	      in order to report a match.  This	applies	to both	exact matching
	      and smart	matching.  Default is 3.

       --max-mismatch-percent N
	      Maximum percentage of bases in the  read	can  differ  from  the
	      adapter  and  still  consider  it	a match.  This applies only to
	      smart matching.  Increasing the value from the default 10% slows
	      down processing slightly,	reduces	the number of missed adapters,
	      and  increases  the  risk	 of  removing  real  data   resembling
	      adapters.

       --min-qual N
	      Minimum  quality	of bases to keep for end-trimming.  Default is
	      20.

       --min-length N
	      Minimum length of	reads to keep after trimming.  Default is 30.

       --polya-min-length N
	      Minimum length of	poly-A tails to	be removed (after other	 trim-
	      ming).  Default is 0, meaning no poly-A trimming is done.

       --phred-base N
	      Offset  used  for	 characters in quality string.	Default	is 33,
	      which should be the correct value	for virtually all  modern  se-
	      quence data.

       File Arguments:
	      Fastq-trim optionally accepts 1, 2, or 4 filenames.

	      If  no filenames are provided, single-read mode is selected with
	      input read from the standard input and  output  written  to  the
	      standard output.

	      If  one  filename	is provided, single-read mode is selected with
	      input read from the given	filename and output is written to  the
	      standard output.

	      If two filenames are provided, single-read mode is selected with
	      input  read  from	 the  first filename and output	written	to the
	      second.

	      If four filenames	are provided, paired mode  is  selected.   The
	      first filename is	the forward read input,	the second the forward
	      read output, the third the reverse read input and	the fourth the
	      reverse read output.

DESCRIPTION
       Fastq-trim removes adapters and low-quality bases from the ends of each
       read  in	 a FASTQ file.	Reads with a length less than the specified or
       default minimum are not output.

       Note that adapter matching (A.K.A. alignment) is	not an exact  science.
       Most  bioinformatics  data contain errors and hence the processing must
       be probabilistic.   It is possible that an adapter sequence occurs nat-
       urally in a given sample.  The longer the adapter, the less often  this
       will  occur, which is why adapters are typically	12 or more bases long.
       Also, read errors can occur in adapters as well as in the  insert  (the
       real DNA/RNA sequence between the adapters).  This is rare and using an
       exact  match algorithm like memcmp(3) will generally find more than 99%
       of adapters.

       Tolerating some slop in adapter matching	will result in fewer  adapters
       left in the data	and a higher risk of false positives (removing natural
       sequences resembling adapters).	Neither	situation is catastrophic.  If
       a  fraction  of	a percent of reads will	not align to a genome properly
       because of adapter contamination, the end  results  of  the  downstream
       analysis	 won't tell a different	story.	The same is true of reads that
       were shortened by removing a falsely identified adapter.

       The default tolerance is	10% of the adapter length  (or	the  remaining
       bases at	the end	of the read if that's shorter).	 This can be increased
       using --max-mismatch_percent.

       The  default  "smart"  adapter  matching	 allows	for roughly 10%	of the
       bases to	be substituted.	 No insertions or deletions are	currently han-
       dled for	adapter	matching.

       Exact matching can be selected using --exact-match.  In either case,  a
       minimum	of min_match (default 3, controlled by --min-match) bases must
       be matched in order to decide that the sequence is indeed an adapter.

       Low-quality 3' end removal uses the same	algorithm as BWA and Cutadapt,
       namely scanning backward	from the 3' end	until the sum of (score	- min-
       qual) becomes > 0, then trimming	at the location	of  the	 minimum  sum.
       Quality trimming	is done	before adapter matching	so that	likely misread
       bases do	not contribute to the adapter match scoring.

       Input  and  output  files may be	compressed using gzip(1), bzip2(1), or
       xz(1).  Support for this	is provided by	xt_fopen(3),  which  automati-
       cally  determines  the  file type from the filename extension and pipes
       input or	output as needed.  Compression level of	output files and other
       options can be be passed	to gzip(1), bzip2(1), and xz(1)	via the	 envi-
       ronment	variables GZIP,	BZIP2, BZIP, and XZ_OPT.  This is often	useful
       for adjusting the compression level of output files in  order  to  tune
       performance.

ENVIRONMENT
       GZIP, BZIP2, BZIP, XZ_OPT: Fine-tune output compression.

SEE ALSO
       fastq-scum(1), biolibc(3)

BUGS
       Please  report bugs to the author and send patches in unified diff for-
       mat.  (man diff for more	information)

AUTHOR
       J. Bacon

								 fastq-trim(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=fastq-trim&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help