Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
alimask(1)			 HMMER Manual			    alimask(1)

NAME
       alimask	-  calculate and add column mask to a multiple sequence	align-
       ment

SYNOPSIS
       alimask [options] msafile postmsafile

DESCRIPTION
       alimask is used to apply	a mask line to a multiple sequence  alignment,
       based  on  provided  alignment or model coordinates.  When hmmbuild re-
       ceives a	masked alignment as input, it  produces	 a  profile  model  in
       which  the  emission probabilities at masked positions are set to match
       the background frequency, rather	than being set based on	observed  fre-
       quencies	 in  the  alignment.  Position-specific	insertion and deletion
       rates are not altered, even in masked regions.  alimask autodetects in-
       put  format,  and  produces  masked  alignments	in  Stockholm  format.
       msafile may contain only	one sequence alignment.

       A  common  motivation  for masking a region in an alignment is that the
       region contains a simple	tandem repeat that is observed to cause	an un-
       acceptably high rate of false positive hits.

       In the simplest case, a mask range is given in coordinates relative  to
       the  input  alignment,  using --alirange	<s>.  However it is more often
       the case	that the region	to be masked has been  identified  in  coordi-
       nates relative to the profile model (e.g. based on recognizing a	simple
       repeat  pattern	in  false hit alignments or in the HMM logo).  Not all
       alignment columns are converted to match	state positions	in the profile
       (see the	--symfrac flag for hmmbuild for	discussion),  so  model	 posi-
       tions  do  not  necessarily match up to alignment column	positions.  To
       remove the burden of converting model positions to alignment positions,
       alimask accepts the mask	range input in model coordinates as well,  us-
       ing  --modelrange  <s>.	When using this	flag, alimask determines which
       alignment positions would be identified by hmmbuild as match states,  a
       process	that  requires that all	hmmbuild flags impacting that decision
       be supplied to alimask.	It is for this reason that many	 of  the  hmm-
       build flags are also used by alimask.

OPTIONS
       -h     Help;  print  a  brief  reminder	of  command line usage and all
	      available	options.

       -o <f> Direct the summary output	to file	<f>, rather than to stdout.

OPTIONS	FOR SPECIFYING MASK RANGE
       A single	mask range is given as a dash-separated	 pair,	like  --model-
       range  10-20  and multiple ranges may be	submitted as a comma-separated
       list, --modelrange 10-20,30-42.

       --modelrange <s>
	      Supply the given range(s)	in model coordinates.

       --alirange <s>
	      Supply the given range(s)	in alignment coordinates.

       --appendmask
	      Add to the existing mask found with the alignment.  The  default
	      is to overwrite any existing mask.

       --model2ali <s>
	      Print  model  range(s) and the corresponding alignment range(s).
	      No masked	alignment is produced.	The output is  a  single  line
	      for each input range, of the form
		 i..j -> m..n
	      with i & j representing model range values, and m	& n represent-
	      ing alignment range values.

       --ali2model <s>
	      Print  alignment	range(s) and the corresponding model range(s).
	      No masked	alignment is produced.	Because	some  alignment	 posi-
	      tions may	not map	to model positions, the	range(s) produced will
	      begin  with the first alignment position between <from> and <to>
	      (inclusive) that maps to a model position, and end with the  fi-
	      nal  alignment position in that range that maps to a model posi-
	      tion.  The output	is a single line for each input	range, of  the
	      form
		i..j ->	m..n
	      with i & j representing alignment	range values, and m & n	repre-
	      senting  model  range  values.  If no alignment positions	in the
	      range <from>..<to> map to	a model	position,  the	output	prints
	      the input	<from> and <to>	mapping	to nothing, with the format:
	       i..j  ->	  -..-	(no map) .

OPTIONS	FOR SPECIFYING THE ALPHABET
       --amino
	      Assert that sequences in msafile are protein, bypassing alphabet
	      autodetection.

       --dna  Assert that sequences in msafile are DNA,	bypassing alphabet au-
	      todetection.

       --rna  Assert that sequences in msafile are RNA,	bypassing alphabet au-
	      todetection.

OPTIONS	CONTROLLING PROFILE CONSTRUCTION
       These  options  control	how consensus columns are defined in an	align-
       ment.

       --fast Define consensus columns as those	that have a fraction  >=  sym-
	      frac  of	residues as opposed to gaps. (See below	for the	--sym-
	      frac option.) This is the	default.

       --hand Define consensus columns in next profile using reference annota-
	      tion to the multiple alignment.  This allows you to  define  any
	      consensus	columns	you like.

       --symfrac <x>
	      Define the residue fraction threshold necessary to define	a con-
	      sensus  column when using	the --fast option. The default is 0.5.
	      The symbol fraction in each column is  calculated	 after	taking
	      relative sequence	weighting into account,	and ignoring gap char-
	      acters  corresponding  to	ends of	sequence fragments (as opposed
	      to internal insertions/deletions).  Setting this	to  0.0	 means
	      that every alignment column will be assigned as consensus, which
	      may  be  useful in some cases. Setting it	to 1.0 means that only
	      columns that include 0 gaps (internal insertions/deletions) will
	      be assigned as consensus.

       --fragthresh <x>
	      We only want to count terminal gaps as deletions if the  aligned
	      sequence	is  known  to  be full-length, not if it is a fragment
	      (for instance, because only part of  it  was  sequenced).	 HMMER
	      uses  a simple rule to infer fragments: if the sequence length L
	      is less than or equal to a  fraction  <x>	 times	the  alignment
	      length  in  columns, then	the sequence is	handled	as a fragment.
	      The default is 0.5.  Setting --fragthresh	0 will define no (non-
	      empty) sequence as a fragment; you might want to do this if  you
	      know you've got a	carefully curated alignment of full-length se-
	      quences.	 Setting  --fragthresh	1 will define all sequences as
	      fragments; you might want	to do this if you know your  alignment
	      is  entirely  composed  of  fragments,  such as translated short
	      reads in metagenomic shotgun data.

OPTIONS	CONTROLLING RELATIVE WEIGHTS
       HMMER uses an ad	hoc sequence weighting algorithm to downweight closely
       related sequences and upweight distantly	related	ones. This has the ef-
       fect of making models less biased by  uneven  phylogenetic  representa-
       tion. For example, two identical	sequences would	typically each receive
       half  the  weight that one sequence would.  These options control which
       algorithm gets used.

       --wpb  Use  the	Henikoff  position-based  sequence  weighting	scheme
	      [Henikoff	 and  Henikoff,	J. Mol.	Biol. 243:574, 1994].  This is
	      the default.

       --wgsc Use the Gerstein/Sonnhammer/Chothia  weighting  algorithm	 [Ger-
	      stein et al, J. Mol. Biol. 235:1067, 1994].

       --wblosum
	      Use  the	same clustering	scheme that was	used to	weight data in
	      calculating BLOSUM substitution matrices [Henikoff and Henikoff,
	      Proc. Natl. Acad.	Sci 89:10915,  1992].  Sequences  are  single-
	      linkage  clustered  at  an identity threshold (default 0.62; see
	      --wid) and within	each cluster of	 c  sequences,	each  sequence
	      gets relative weight 1/c.

       --wnone
	      No relative weights. All sequences are assigned uniform weight.

       --wid <x>
	      Sets  the	 identity  threshold used by single-linkage clustering
	      when using --wblosum.  Invalid with any other weighting  scheme.
	      Default is 0.62.

OTHER OPTIONS
       --informat <s>
	      Assert  that input msafile is in alignment format	<s>, bypassing
	      format autodetection.  Common choices for	 <s>  include:	stock-
	      holm,  a2m,  afa,	 psiblast, clustal, phylip.  For more informa-
	      tion, and	for codes for some less	common formats,	see main docu-
	      mentation.  The string <s> is case-insensitive (a2m or A2M  both
	      work).

       --outformat <s>
	      Write  the  output  postmsafile in alignment format <s>.	Common
	      choices for <s> include: stockholm, a2m, afa, psiblast, clustal,
	      phylip.  The string <s> is case-insensitive  (a2m	 or  A2M  both
	      work).  Default is stockholm.

       --seed <n>
	      Seed  the	random number generator	with <n>, an integer >=	0.  If
	      <n> is nonzero, any stochastic simulations will be reproducible;
	      the same command will give the same results.  If <n> is  0,  the
	      random  number  generator	 is seeded arbitrarily,	and stochastic
	      simulations will vary from run to	run of the same	command.   The
	      default seed is 42.

SEE ALSO
       See  hmmer(1)  for  a master man	page with a list of all	the individual
       man pages for programs in the HMMER package.

       For complete documentation, see the user	guide that came	with your  HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2023 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT	in your	HMMER source distribution, or  see  the	 HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.4			   Aug 2023			    alimask(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=alimask&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help