Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
nhmmscan(1)			 HMMER Manual			   nhmmscan(1)

NAME
       nhmmscan	- search DNA sequence(s) against a DNA profile database

SYNOPSIS
       nhmmscan	[options] hmmdb	seqfile

DESCRIPTION
       nhmmscan	 is used to search nucleotide sequences	against	collections of
       nucleotide profiles. For	each sequence in seqfile, use that  query  se-
       quence  to  search the target database of profiles in hmmdb, and	output
       ranked lists of the profiles with the most significant matches  to  the
       sequence.

       The  seqfile  may  contain  more	 than one query	sequence. It can be in
       FASTA format, or	several	other common sequence file  formats  (genbank,
       embl,  and uniprot, among others), or in	alignment file formats (stock-
       holm, aligned fasta, and	others). See the --qformat option for  a  com-
       plete list.

       The hmmdb needs to be press'ed using hmmpress before it can be searched
       with nhmmscan.  This creates four binary	files, suffixed	.h3{fimp}.

       The  query  seqfile  may	 be  '-' (a dash character), in	which case the
       query sequences are read	from a stdin pipe instead of from a file.  The
       hmmdb cannot be read from a stdin stream, because it needs to have  the
       four auxiliary binary files generated by	hmmpress.

       The output format is designed to	be human-readable, but is often	so vo-
       luminous	 that reading it is impractical, and parsing it	is a pain. The
       --tblout	option saves output in a simple	tabular	format that is concise
       and easier to parse.  The -o option allows redirecting the main output,
       including throwing it away in /dev/null.

OPTIONS
       -h     Help; print a brief reminder  of	command	 line  usage  and  all
	      available	options.

OPTIONS	FOR CONTROLLING	OUTPUT
       -o <f> Direct  the  main	human-readable output to a file	<f> instead of
	      the default stdout.

       --tblout	<f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-hit  output,	with one data line per homologous target model
	      hit found.

       --dfamtblout <f>
	      Save a tabular (space-delimited) file  summarizing  the  per-hit
	      output, similar to --tblout but more succinct.

       --aliscoresout <f>
	      Save  to	file a list of per-position scores for each hit.  This
	      is useful, for example, in identifying  regions  of  high	 score
	      density  for  use	 in  resolving overlapping hits	from different
	      models.

       --acc  Use accessions instead of	names in the main output, where	avail-
	      able for profiles	and/or sequences.

       --noali
	      Omit the alignment  section  from	 the  main  output.  This  can
	      greatly reduce the output	volume.

       --notextw
	      Unlimit  the length of each line in the main output. The default
	      is a limit of 120	characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw <n>
	      Set the main output's line length	limit to  <n>  characters  per
	      line. The	default	is 120.

OPTIONS	FOR REPORTING THRESHOLDS
       Reporting  thresholds  control  which hits are reported in output files
       (the main output, --tblout, and --dfamtblout).  Hits are	ranked by sta-
       tistical	significance (E-value).

       -E <x> Report target profiles with an E-value of	<= <x>.	  The  default
	      is  10.0,	meaning	that on	average, about 10 false	positives will
	      be reported per query, so	you can	see the	top of the  noise  and
	      decide for yourself if it's really noise.

       -T <x> Instead of thresholding output on	E-value, instead report	target
	      profiles with a bit score	of >= <x>.

OPTIONS	FOR INCLUSION THRESHOLDS
       Inclusion thresholds are	stricter than reporting	thresholds.  Inclusion
       thresholds  control  which hits are considered to be reliable enough to
       be included in an output	alignment or a subsequent  search  round.   In
       nhmmscan,  which	 does not have any alignment output (like nhmmer), in-
       clusion thresholds have little effect. They only	affect what  hits  get
       marked as significant (!) or questionable (?) in	hit output.

       --incE <x>
	      Use  an  E-value	of <= <x> as the inclusion threshold.  The de-
	      fault is 0.01, meaning that on average, about 1  false  positive
	      would be expected	in every 100 searches with different query se-
	      quences.

       --incT <x>
	      Instead  of  using E-values for setting the inclusion threshold,
	      use a bit	score of >= <x>	as the inclusion threshold.  It	 would
	      be unusual to use	bit score thresholds with hmmscan, because you
	      don't expect a single score threshold to work for	different pro-
	      files; different profiles	have slightly different	expected score
	      distributions.

OPTIONS	FOR MODEL-SPECIFIC SCORE THRESHOLDING
       Curated	profile	databases may define specific bit score	thresholds for
       each profile, superseding any thresholding based	on statistical signif-
       icance alone.

       To use these options, the profile must contain the appropriate (GA, TC,
       and/or NC) optional score threshold annotation; this is	picked	up  by
       hmmbuild	from Stockholm format alignment	files. For a nucleotide	model,
       each  thresholding  option has a	single per-hit threshold <x> This acts
       as if -T	<x> --incT  <x>	 has  been  applied  specifically  using  each
       model's curated thresholds.

       --cut_ga
	      Use  the	GA (gathering) bit score threshold in the model	to set
	      per-hit reporting	and inclusion thresholds.  GA  thresholds  are
	      generally	 considered  to	 be  the  reliable  curated thresholds
	      defining family membership; for example, in Dfam,	these  thresh-
	      olds are applied when annotating a genome	with a model of	a fam-
	      ily known	to be found in that organism. They may allow for mini-
	      mal expected false discovery rate.

       --cut_nc
	      Use  the	NC  (noise cutoff) bit score threshold in the model to
	      set per-hit reporting and	inclusion  thresholds.	NC  thresholds
	      are  less	 stringent  than  GA; in the context of	Pfam, they are
	      generally	used to	store the score	of the	highest-scoring	 known
	      false positive.

       --cut_tc
	      Use  the TC (trusted cutoff) bit score threshold in the model to
	      set per-hit reporting and	inclusion  thresholds.	TC  thresholds
	      are  more	 stringent than	GA, and	are generally considered to be
	      the score	of the lowest-scoring  known  true  positive  that  is
	      above  all  known	 false	positives; for example,	in Dfam, these
	      thresholds are applied when annotating a genome with a model  of
	      a	family not known to be found in	that organism.

CONTROL	OF THE ACCELERATION PIPELINE
       HMMER3  searches	 are  accelerated in a three-step filter pipeline: the
       scanning-SSV filter, the	Viterbi	filter,	and the	 Forward  filter.  The
       first  filter is	the fastest and	most approximate; the last is the full
       Forward scoring algorithm. There	is also	a bias filter step between SSV
       and Viterbi. Targets that  pass	all  the  steps	 in  the  acceleration
       pipeline	 are then subjected to postprocessing -- domain	identification
       and scoring using the Forward/Backward algorithm.

       Changing	filter thresholds only removes or includes targets  from  con-
       sideration;  changing  filter  thresholds does not alter	bit scores, E-
       values, or alignments, all of which are determined solely  in  postpro-
       cessing.

       --max  Turn  off	 (nearly)  all filters,	including the bias filter, and
	      run full Forward/Backward	postprocessing on most of  the	target
	      sequence.	  In  contrast to hmmscan, where this flag really does
	      turn off the filters entirely, the --max flag in	nhmmscan  sets
	      the  scanning-SSV	 filter	threshold to 0.4, not 1.0. Use of this
	      flag increases sensitivity somewhat, at a	large cost in speed.

       --F1 <x>
	      Set the P-value threshold	for the	MSV filter step.  The  default
	      is  0.02,	 meaning that roughly 2% of the	highest	scoring	nonho-
	      mologous targets are expected to pass the	filter.

       --F2 <x>
	      Set the P-value threshold	for the	Viterbi	filter step.  The  de-
	      fault is 0.001.

       --F3 <x>
	      Set  the P-value threshold for the Forward filter	step.  The de-
	      fault is 1e-5.

       --nobias
	      Turn off the bias	filter.	This increases	sensitivity  somewhat,
	      but  can	come  at a high	cost in	speed, especially if the query
	      has biased residue composition (such as  a  repetitive  sequence
	      region, or if it is a membrane protein with large	regions	of hy-
	      drophobicity).  Without  the bias	filter,	too many sequences may
	      pass the filter with biased queries, leading to slower than  ex-
	      pected   performance   as	 the  computationally  intensive  For-
	      ward/Backward algorithms shoulder	an abnormally heavy load.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z <x> Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-sequence	E-value	 calculations,	rather
	      than the actual number of	targets	seen.

       --seed <n>
	      Set the random number seed to <n>.  Some steps in	postprocessing
	      require  Monte  Carlo simulation.	 The default is	to use a fixed
	      seed (42), so that results are exactly reproducible.  Any	 other
	      positive integer will give different (but	also reproducible) re-
	      sults. A choice of 0 uses	an arbitrarily chosen seed.

       --qformat <s>
	      Assert that input	query seqfile is in format <s>,	bypassing for-
	      mat autodetection.  Common choices for <s> include: fasta, embl,
	      genbank.	 Alignment  formats also work; common choices include:
	      stockholm, a2m, afa, psiblast, clustal, phylip.  For more	infor-
	      mation, and for codes for	some less  common  formats,  see  main
	      documentation.   The  string  <s>	 is case-insensitive (fasta or
	      FASTA both work).

       --w_beta	<x>
	      Window length tail mass.	The upper bound, W, on the  length  at
	      which  nhmmer  expects  to  find an instance of the model	is set
	      such that	the fraction of	all sequences generated	by  the	 model
	      with  length  >= W is less than <x>.  The	default	is 1e-7.  This
	      flag may be used to override the value of	W established for  the
	      model by hmmbuild.

       --w_length <n>
	      Override the model instance length upper bound, W, which is oth-
	      erwise  controlled  by  --w_beta.	  It should be larger than the
	      model length. The	value of  W is used deep in  the  acceleration
	      pipeline,	 and modest changes are	not expected to	impact results
	      (though larger values of W do lead to longer  run	 time).	  This
	      flag  may	be used	to override the	value of W established for the
	      model by hmmbuild.

       --watson
	      Only search the top strand. By default both the  query  sequence
	      and its reverse-complement are searched.

       --crick
	      Only  search  the	bottom (reverse-complement) strand. By default
	      both the query sequence and its reverse-complement are searched.

       --cpu <n>
	      Set the number of	parallel worker	threads	to <n>.	  The  default
	      is  0,  meaning  off  (no	thread-level parallelization), because
	      nhmmscan is typically i/o	bound and the extra  overhead  of  our
	      current  multithreaded implementation isn't worthwhile.  You can
	      also control this	number by setting an environment variable, HM-
	      MER_NCPU.	 There is also a master	thread,	so the	actual	number
	      of threads that HMMER spawns is at least <n>+1.

	      This  option  is	not available if HMMER was compiled with POSIX
	      threads support turned off.

       --stall
	      For debugging the	MPI master/worker version: pause after	start,
	      to  enable the developer to attach debuggers to the running mas-
	      ter and worker(s)	processes. Send	SIGCONT	signal to release  the
	      pause.  (Under gdb: (gdb)	signal SIGCONT)

	      (Only  available if optional MPI support was enabled at compile-
	      time.)

       --mpi  Run under	MPI control with master/worker parallelization	(using
	      mpirun,  for example, or equivalent). Only available if optional
	      MPI support was enabled at compile-time.

SEE ALSO
       See hmmer(1) for	a master man page with a list of  all  the  individual
       man pages for programs in the HMMER package.

       For  complete documentation, see	the user guide that came with your HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2023 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For additional information on copyright and  licensing,	see  the  file
       called  COPYRIGHT  in  your HMMER source	distribution, or see the HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.4			   Aug 2023			   nhmmscan(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=nhmmscan&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help