FreeBSD Manual Pages
fastq-trim(1) General Commands Manual fastq-trim(1) NAME fastq-trim - Trim adapters and low-quality bases from FASTQ files SYNOPSIS fastq-trim [--help] fastq-trim [--verbose] [--exact-match] [--3p-adapter1 seq] [--3p-adapter2 seq] [--min-match N] [--max-mismatch-percent N] [--min-qual N] [--min-length N] [--polya-min-length N] [--phred-base N] [infile1.fastq[.gz|.bz2|.xz]] [outfile1.fastq[.gz|.bz2|.xz]] [infile2.fastq[.gz|.bz2|.xz] outfile2.fastq[.gz|.bz2|.xz]] OPTIONS AND ARGUMENTS --help Print a summary of usage and exit. --verbose Print some intermediate results during trimming for debugging purposes. --exact-match Use exact matching to find adapters. Adapters will be matched if a segment matches the entire adapter sequence, or a portion of it at the end of the read matches to a minimum of min_match (default=3, controlled by --min-match) characters. The default matching algorithm used when --exact-match is not specified is described below. --3p-adapter1 seq Specify the 3' adapter for single-end mode and read1 in paired- end mode. Default is the Illumina universal adapter, AGATCGGAA- GAG. Use fastq-scum(1) to identify other standard adapters in the input. --3p-adapter1 seq Specify the 3' adapter for read2 in paired-end mode. Default is the Illumina universal adapter, AGATCGGAAGAG. Use fastq-scum(1) to identify other standard adapters in the input. --min-match N Minimum number of bases in the read that must match the adapter in order to report a match. This applies to both exact matching and smart matching. Default is 3. --max-mismatch-percent N Maximum percentage of bases in the read can differ from the adapter and still consider it a match. This applies only to smart matching. Increasing the value from the default 10% slows down processing slightly, reduces the number of missed adapters, and increases the risk of removing real data resembling adapters. --min-qual N Minimum quality of bases to keep for end-trimming. Default is 20. --min-length N Minimum length of reads to keep after trimming. Default is 30. --polya-min-length N Minimum length of poly-A tails to be removed (after other trim- ming). Default is 0, meaning no poly-A trimming is done. --phred-base N Offset used for characters in quality string. Default is 33, which should be the correct value for virtually all modern se- quence data. File Arguments: Fastq-trim optionally accepts 1, 2, or 4 filenames. If no filenames are provided, single-read mode is selected with input read from the standard input and output written to the standard output. If one filename is provided, single-read mode is selected with input read from the given filename and output is written to the standard output. If two filenames are provided, single-read mode is selected with input read from the first filename and output written to the second. If four filenames are provided, paired mode is selected. The first filename is the forward read input, the second the forward read output, the third the reverse read input and the fourth the reverse read output. DESCRIPTION Fastq-trim removes adapters and low-quality bases from the ends of each read in a FASTQ file. Reads with a length less than the specified or default minimum are not output. Note that adapter matching (A.K.A. alignment) is not an exact science. Most bioinformatics data contain errors and hence the processing must be probabilistic. It is possible that an adapter sequence occurs nat- urally in a given sample. The longer the adapter, the less often this will occur, which is why adapters are typically 12 or more bases long. Also, read errors can occur in adapters as well as in the insert (the real DNA/RNA sequence between the adapters). This is rare and using an exact match algorithm like memcmp(3) will generally find more than 99% of adapters. Tolerating some slop in adapter matching will result in fewer adapters left in the data and a higher risk of false positives (removing natural sequences resembling adapters). Neither situation is catastrophic. If a fraction of a percent of reads will not align to a genome properly because of adapter contamination, the end results of the downstream analysis won't tell a different story. The same is true of reads that were shortened by removing a falsely identified adapter. The default tolerance is 10% of the adapter length (or the remaining bases at the end of the read if that's shorter). This can be increased using --max-mismatch_percent. The default "smart" adapter matching allows for roughly 10% of the bases to be substituted. No insertions or deletions are currently han- dled for adapter matching. Exact matching can be selected using --exact-match. In either case, a minimum of min_match (default 3, controlled by --min-match) bases must be matched in order to decide that the sequence is indeed an adapter. Low-quality 3' end removal uses the same algorithm as BWA and Cutadapt, namely scanning backward from the 3' end until the sum of (score - min- qual) becomes > 0, then trimming at the location of the minimum sum. Quality trimming is done before adapter matching so that likely misread bases do not contribute to the adapter match scoring. Input and output files may be compressed using gzip(1), bzip2(1), or xz(1). Support for this is provided by xt_fopen(3), which automati- cally determines the file type from the filename extension and pipes input or output as needed. Compression level of output files and other options can be be passed to gzip(1), bzip2(1), and xz(1) via the envi- ronment variables GZIP, BZIP2, BZIP, and XZ_OPT. This is often useful for adjusting the compression level of output files in order to tune performance. ENVIRONMENT GZIP, BZIP2, BZIP, XZ_OPT: Fine-tune output compression. SEE ALSO fastq-scum(1), biolibc(3) BUGS Please report bugs to the author and send patches in unified diff for- mat. (man diff for more information) AUTHOR J. Bacon fastq-trim(1)
NAME | SYNOPSIS | OPTIONS AND ARGUMENTS | DESCRIPTION | ENVIRONMENT | SEE ALSO | BUGS | AUTHOR
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=fastq-trim&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>
