Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
abyss-sealer(1)			     ABySS		       abyss-sealer(1)

Name
       abyss-sealer - Close gaps within	scaffolds

Synopsis
       abyss-sealer -b <Bloom filter size> -k <kmer size> -k <kmer size>... -o
       <output_prefix>	-S  <path  to  scaffold	 file>	[options]...  <reads1>
       [reads2]...

       For example:

       abyss-sealer -b20G -k64 -k80 -k96 -k112 -k128 -o	 test  -S  scaffold.fa
       read1.fa	read2.fa

Description
       Sealer  is an application of Konnector that closes intra-scaffold gaps.
       It performs three sequential functions.	First,	regions	 with  Ns  are
       identified from an input	scaffold.  Flanking nucleotues (2 x 100bp) are
       extracted from those regions while respecting the strand	(5' to 3') di-
       rection	on  the	 sequence  immediately downstream of each gap.	In the
       second step, flanking sequence pairs are	used  as  input	 to  Konnector
       along  with  a  set  of reads with a high level of coverage redundancy.
       Ideally,	the reads should represent the original	dataset	from which the
       draft assembly is generated, or further whole genome shotgun (WGS)  se-
       quencing	 data  generated  from the same	sample.	 Within	Konnector, the
       input WGS reads are used	to populate a Bloom filter, tiling  the	 reads
       with a sliding window of	length k, thus generating a probabilistic rep-
       resentation  of all the k-mers in the reads.  Konnector also uses crude
       error  removal  and  correctional  algorithms,  eliminating  singletons
       (k-mers	that are observed only once) and fixing	base mismatches	in the
       flanking	sequence pairs.	 Sealer	launches Konnector processes  using  a
       user-input  range  of k-mer lengths.  In	the third and final operation,
       successfully merged sequences are inserted into the gaps	of the	origi-
       nal scaffolds, and Sealer outputs a new gap-filled scaffold file.

Installation
       See ABySS installation instructions.

How to run as stand-alone application
       abyss-sealer [-b	bloom filter size][-k values...] [-o outputprefix] [-S
       assembly	file] [options...] [reads...]

       Sealer  requires	 the  following	information to run: - draft assembly -
       user-supplied k values (>0) - output prefix - WGS reads	(for  building
       Bloom Filters)

Sample commands
       Without pre-built bloom filters:

       abyss-sealer -b20G -k64 -k96 -o run1 -S test.fa read1.fq.gz read2.fq.gz

       With pre-built bloom filters:

       abyss-sealer  -k64  -k96	 -o  run1 -S test.fa -i	k64.bloom -i k96.bloom
       read1.fq.gz read2.fq.gz

       Reusable	Bloom filters can be pre-built with abyss-bloom	build, e.g.:

       abyss-bloom  build  -vv	-k64  -j12  -b20G  -l2	k64.bloom  read1.fq.gz
       read2.fq.gz

       Note:  when  using  pre-built  bloom  filters  generated	by abyss-bloom
       build,  Sealer  must  be	 compiled  with	 the  same  maxk  value	  that
       abyss-bloom  was	 compiled  with.   For	example, if a Bloom filter was
       built with a maxk of 64,	Sealer must be compiled	with a maxk of	64  as
       well.   If different values are used between the	pre-built bloom	filter
       and Sealer, any sequences generated will	be nonsensical and incorrect.

Output files
        prefix_log.txt

        prefix_scaffold.fa

        prefix_merged.fa

        prefix_flanks_1.fq -> if --print-flanks option	used

        prefix_flanks_2.fq -> if --print-flanks option	used

       The log file contains results of	each Konnector run.  The structure  of
       one run is as follows:

        ## unique gaps	closed for k##

        No start/goal kmer: ###

        No path: ###

        Unique	path: ###

        Multiple paths: ###

        Too many paths: ###

        Too many branches: ###

        Too many path/path mismatches:	###

        Too many path/read mismatches:	###

        Contains cycle: ###

        Exceeded mem limit: ###

        Skipped: ###

        ### flanks left

        k## run complete

        Total gaps closed so far = ###

       The  scaffold.fa	file is	a gap-filled version of	the draft assembly in-
       serted into Sealer.  The	merged.fa file contains	every newly  generated
       sequence	 that  were  inserted  into  gaps,  including the flanking se-
       quences.	 Negative sizes	of new sequences indicate Konnector  collapsed
       the pair	of flanking sequences.	For example:

       >[scaffold  ID]_[original  start	 position of gap on scaffold]_[size of
       new sequence] ACGCGACGAGCAGCGAGCACGAGCAGCGACGAGCGACGACGAGCAGCGACGAGCG

       If --print-flanks option	is enabled, Sealer outputs  the	 flanking  se-
       quences used to insert into Konnector.  This may	be useful should users
       which  to double	check if this tool is extracting the correct sequences
       surrounding gaps.  The structure	of these files are as follows:

       >[scaffold ID]_[original	start position of gap  on  scaffold]_[size  of
       gap]/[1	or  2 indicating whether left or right flank] GCTAGCTAGCTAGCT-
       GATCGATCGTAGCTAGCTAGCTGACTAGCTGATCAGTCGA

How to optimize	for gap	closure
       To optimize Sealer, users can observe the log files generated  after  a
       run and adjust parameters accordingly.  If k runs are showing gaps hav-
       ing  too	 many  paths or	branches, consider increasing -P or -B parame-
       ters, respectively.

       Also consider increasing	the number of k	values used.  Generally, large
       k-mers are better able to address highly	 repetitive  genomic  regions,
       while smaller k-mers are	better able to resolve areas of	low coverage.

Runtime	and memory usage
       More  k values mean more	bloom filters will be required,	which will in-
       crease runtime as it takes time to build/load each bloom	filter at  the
       beginning  of  each  k run.  Memory usage is not	affected by using more
       bloom filters.

       The larger value	used for parameters such as -P,	-B or -F will increase
       runtime.

Options
       Parameters of abyss-sealer

        --print-flanks: outputs flank files

        -S,--input-scaffold=FILE: load	scaffold from FILE

        -L,--flank-length=N: length of	flanks to be used as pseudoreads [100]

        -j,--threads=N: use N parallel	threads	[1]

        -k,--kmer=N: the size of a k-mer

        -b,--bloom-size=N: size of bloom filter.   Required  when  not	 using
	 pre-built Bloom filter(s).

        -B,--max-branches=N:  max  branches in	de Bruijn graph	traversal; use
	 `nolimit' for no limit	[1000]

        -d,--dot-file=FILE: write graph traversals to a DOT file

        -e,--fix-errors: find and fix single-base errors when reads  have  no
	 kmers in bloom	filter [disabled]

        -f,--min-frag=N: min fragment size in base pairs [0]

        -F,--max-frag=N: max fragment size in base pairs [1000]

        -i,--input-bloom=FILE:	load bloom filter from FILE

        --mask: mask new and changed bases as lower case

        --no-mask: do not mask	bases [default]

        --chastity: discard unchaste reads [default]

        --no-chastity:	do not discard unchaste	reads

        --trim-masked:	trim masked bases from the ends	of reads

        --no-trim-masked:  do	not  trim  masked bases	from the ends of reads
	 [default]

        -l,--long-search: start path search as	close as possible to  the  be-
	 ginnings  of  reads.  Takes more time but improves results when bloom
	 filter	false positive rate is high [disabled]

        -m,-flank-mismatches=N`: max mismatches between paths and flanks; use
	 `nolimit' for no limit	[nolimit]

        -M,-max-mismatches=N`:	max mismatches between	all  alternate	paths;
	 use `nolimit' for no limit [nolimit]

        -n-no-limits`:	 disable  all limits; equivalent to `-B	nolimit	-m no-
	 limit -M nolimit -P nolimit'

        -o,-output-prefix=FILE`: prefix of output FASTA files [required]

        -P,-max-paths=N`: merge at most N alternate paths; use	`nolimit'  for
	 no limit [2]

        -q,-trim-quality=N`:  trim bases from the ends	of reads whose quality
	 is less than the threshold

        --standard-quality: zero quality is `!' (33) default  for  FASTQ  and
	 SAM files

        --illumina-quality: zero quality is `@' (64) default for qseq and ex-
	 port files

        -r,-read-name=STR`: only process reads	with names that	contain	STR

        -s,-search-mem=N`: mem	limit for graph	searches; multiply by the num-
	 ber  of  threads  (-j)	 to get	the total mem used for graph traversal
	 [500M]

        -t,-trace-file=FILE`: write graph search stats	to FILE

        -v,-verbose`: display verbose output

        --help: display this help and exit

        --version: output version information and exit

       k is the	size of	k-mer for the de Bruijn	graph.	You may	specify	multi-
       ple values of k,	which will increase the	number of gaps closed  at  the
       cost of increased run time.  Multiple values of k ought to be specified
       in  increasing order, as	lower values of	k have fewer coverage gaps and
       are less	likely to misassemble.

       P is the	threshold for number of	paths allowed to be  traversed.	  When
       set  to 10, Konnector will attempt to close gaps	even when there	are 10
       different paths found.  It would	attempt	to create a consensus sequence
       between these paths.  The default setting is 2.

AUTHORS
       Daniel Paulino.

ABySS				  2014-11-13		       abyss-sealer(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=abyss-sealer&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help