Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
bl_align_map_seq_exact(3)  Library Functions Manual  bl_align_map_seq_exact(3)

NAME
       bl_align_map_seq_exact()	- Locate little	sequence in big	sequence

LIBRARY
       #include	<biolibc/align.h>
       -lbiolibc -lxtend

SYNOPSIS
       size_t  bl_align_map_seq_exact(const bl_align_t *params,
       const char *big,	size_t big_len,
       const char *little, size_t little_len)

ARGUMENTS
       params	   bl_align_t parameters.  Only	min_match is used.
       big	   Sequence to be searched for matches to little
       little	   Sequence to be located within big

DESCRIPTION
       Locate  the leftmost (farthest 5') match	for sequence little within se-
       quence big, using exact matching	only.

       The content of little is	assumed	to be all upper	case.	This  improves
       speed  by avoiding numerous redundant toupper() conversions on the same
       string, assuming	multiple big strings will be searched for  little,  as
       in  adapter  removal and	read mapping.  Use strlupper(3)	or strupper(3)
       before calling this function if necessary.

       A minimum of min_match bases must match between little and  big.	  This
       mainly  matters	near  the  end of big, where remaining bases are fewer
       than the	length of little.

       Note that alignment is not an exact science.  We	 cannot	 detect	 every
       true  little  sequence without falsely detecting	other sequences, since
       it is impossible	to know	whether	any given sequence is really from  the
       source  of  interest  (e.g. an adapter) or naturally occurring from an-
       other source.  The best we can do is guestimate what will  provide  the
       most  true  positives  (best  statistical power)	and fewest false posi-
       tives.

       In the case of adapter removal, it is also not usually important	to re-
       move every adapter, but only to minimize	adapter	contamination.	 Fail-
       ing  to align a small percentage	of sequences due to adapter contamina-
       tion will not change the	story told by the  downstream  analysis.   Nor
       will erroneously	trimming off the 3' end	of a small percentage of reads
       containing  natural sequences resembling	adapters.  Just	trimming exact
       matches of the adapter sequence will generally remove 99%  or  more  of
       the  adapter  contamination and minimize	false-positives.  Tolerating 1
       or 2 differences	has been shown to do slightly better overall.	Modern
       read mapping software is	also tolerant of adapter contamination and can
       clip adapters as	needed.

RETURN VALUES
       Index  of little	sequence within	big if found, index of null terminator
       of big otherwise

EXAMPLES
       bl_param_t  params;
       bl_fastq_t  read;
       char	   *adapter;
       size_t	   index;

       bl_align_set_min_match(&params, 3);
       index = bl_align_map_seq_exact(&params,
	   BL_FASTQ_SEQ(&read),	BL_FASTQ_SEQ_LEN(&read),
	   little, strlen(adapter)3, 10);
       if ( index != BL_FASTQ_SEQ_LEN(&read) )
	   bl_fastq_3p_trim(&read, index);

SEE ALSO
       bl_align_map_seq_sub(3),	bl_align_set_min_match(3), bl_fastq_3p_trim(3)

						     bl_align_map_seq_exact(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=bl_align_map_seq_exact&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help