Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
hmmscan(1)			 HMMER Manual			    hmmscan(1)

NAME
       hmmscan - search	sequence(s) against a profile database

SYNOPSIS
       hmmscan [options] hmmdb seqfile

DESCRIPTION
       hmmscan is used to search protein sequences against collections of pro-
       tein profiles. For each sequence	in seqfile, use	that query sequence to
       search  the  target  database  of  profiles in hmmdb, and output	ranked
       lists of	the profiles with the most  significant	 matches  to  the  se-
       quence.

       The  seqfile  may  contain  more	 than one query	sequence. Each will be
       searched	in turn	against	hmmdb.

       The hmmdb needs to be press'ed using hmmpress before it can be searched
       with hmmscan.  This creates four	binary files, suffixed .h3{fimp}.

       The query seqfile may be	'-' (a dash  character),  in  which  case  the
       query sequences are read	from a stdin pipe instead of from a file.  The
       hmmdb  cannot  be  read	from  a	stdin stream, because it needs to have
       those four auxiliary binary files generated by hmmpress.

       The output format is designed to	be human-readable, but is often	so vo-
       luminous	that reading it	is impractical,	and parsing it is a pain.  The
       --tblout	 and --domtblout options save output in	simple tabular formats
       that are	concise	and easier to parse.  The -o option allows redirecting
       the main	output,	including throwing it away in /dev/null.

OPTIONS
       -h     Help; print a brief reminder  of	command	 line  usage  and  all
	      available	options.

OPTIONS	FOR CONTROLLING	OUTPUT
       -o <f> Direct  the  main	human-readable output to a file	<f> instead of
	      the default stdout.

       --tblout	<f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-target  output,  with	 one  data  line per homologous	target
	      model found.

       --domtblout <f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-domain  output, with one data	line per homologous domain de-
	      tected in	a query	sequence for each homologous model.

       --pfamtblout <f>
	      Save an especially succinct tabular (space-delimited) file  sum-
	      marizing	the  per-target	output,	with one data line per homolo-
	      gous target model	found.

       --acc  Use accessions instead of	names in the main output, where	avail-
	      able for profiles	and/or sequences.

       --noali
	      Omit the alignment  section  from	 the  main  output.  This  can
	      greatly reduce the output	volume.

       --notextw
	      Unlimit  the length of each line in the main output. The default
	      is a limit of 120	characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw <n>
	      Set the main output's line length	limit to  <n>  characters  per
	      line. The	default	is 120.

OPTIONS	FOR REPORTING THRESHOLDS
       Reporting  thresholds  control  which hits are reported in output files
       (the main output, --tblout, and --domtblout).

       -E <x> In the per-target	output,	report target profiles with an E-value
	      of <= <x>.  The default is 10.0, meaning that on average,	 about
	      10  false	 positives  will be reported per query,	so you can see
	      the top of the noise and decide  for  yourself  if  it's	really
	      noise.

       -T <x> Instead  of  thresholding	per-profile output on E-value, instead
	      report target profiles with a bit	score of >= <x>.

       --domE <x>
	      In the per-domain	output,	for target profiles that have  already
	      satisfied	the per-profile	reporting threshold, report individual
	      domains  with  a	conditional E-value of <= <x>.	The default is
	      10.0.  A conditional E-value means the expected number of	 addi-
	      tional  false  positive  domains	in the smaller search space of
	      those comparisons	that already satisfied the per-profile report-
	      ing threshold (and thus must have	at least one homologous	domain
	      already).

       --domT <x>
	      Instead of thresholding per-domain output	 on  E-value,  instead
	      report domains with a bit	score of >= <x>.

OPTIONS	FOR INCLUSION THRESHOLDS
       Inclusion thresholds are	stricter than reporting	thresholds.  Inclusion
       thresholds  control  which hits are considered to be reliable enough to
       be included in an output	alignment or a subsequent  search  round.   In
       hmmscan,	 which	does  not have any alignment output (like hmmsearch or
       phmmer) nor any iterative  search  steps	 (like	jackhmmer),  inclusion
       thresholds have little effect. They only	affect what domains get	marked
       as significant (!) or questionable (?) in domain	output.

       --incE <x>
	      Use  an E-value of <= <x>	as the per-target inclusion threshold.
	      The default is 0.01, meaning that	on average, about 1 false pos-
	      itive would be expected in every	100  searches  with  different
	      query sequences.

       --incT <x>
	      Instead  of  using E-values for setting the inclusion threshold,
	      instead use a bit	score of >= <x>	as  the	 per-target  inclusion
	      threshold.  It would be unusual to use bit score thresholds with
	      hmmscan,	because	 you  don't expect a single score threshold to
	      work for different profiles; different  profiles	have  slightly
	      different	expected score distributions.

       --incdomE <x>
	      Use  a conditional E-value of <= <x> as the per-domain inclusion
	      threshold, in targets that have already  satisfied  the  overall
	      per-target inclusion threshold.  The default is 0.01.

       --incdomT <x>
	      Instead  of using	E-values, instead use a	bit score of >=	<x> as
	      the per-domain inclusion threshold.  As with  --incT  above,  it
	      would be unusual to use a	single bit score threshold in hmmscan.

OPTIONS	FOR MODEL-SPECIFIC SCORE THRESHOLDING
       Curated	profile	databases may define specific bit score	thresholds for
       each profile, superseding any thresholding based	on statistical signif-
       icance alone.

       To use these options, the profile must contain the appropriate (GA, TC,
       and/or NC) optional score threshold annotation; this is	picked	up  by
       hmmbuild	 from  Stockholm format	alignment files. Each thresholding op-
       tion has	two scores: the	per-sequence threshold <x1> and	the per-domain
       threshold <x2>.	These act as if	-T <x1>	--incT <x1> --domT <x2>	--inc-
       domT <x2> has been applied  specifically	 using	each  model's  curated
       thresholds.

       --cut_ga
	      Use  the	GA  (gathering)	bit scores in the model	to set per-se-
	      quence  (GA1)  and  per-domain  (GA2)  reporting	and  inclusion
	      thresholds. GA thresholds	are generally considered to be the re-
	      liable  curated thresholds defining family membership; for exam-
	      ple, in Pfam, these thresholds define what gets included in Pfam
	      Full alignments based on searches	with Pfam Seed models.

       --cut_nc
	      Use the NC (noise	cutoff)	bit score thresholds in	the  model  to
	      set per-sequence (NC1) and per-domain (NC2) reporting and	inclu-
	      sion  thresholds.	 NC  thresholds	are generally considered to be
	      the score	of the highest-scoring known false positive.

       --cut_tc
	      Use the NC (trusted cutoff) bit score thresholds in the model to
	      set per-sequence (TC1) and per-domain (TC2) reporting and	inclu-
	      sion thresholds. TC thresholds are generally  considered	to  be
	      the  score  of  the  lowest-scoring  known true positive that is
	      above all	known false positives.

CONTROL	OF THE ACCELERATION PIPELINE
       HMMER3 searches are accelerated in a three-step	filter	pipeline:  the
       MSV  filter, the	Viterbi	filter,	and the	Forward	filter.	The first fil-
       ter is the fastest and most approximate;	the last is the	 full  Forward
       scoring	algorithm.  There  is  also a bias filter step between MSV and
       Viterbi.	Targets	that pass all the steps	in the	acceleration  pipeline
       are then	subjected to postprocessing -- domain identification and scor-
       ing using the Forward/Backward algorithm.

       Changing	 filter	 thresholds only removes or includes targets from con-
       sideration; changing filter thresholds does not alter  bit  scores,  E-
       values,	or  alignments,	all of which are determined solely in postpro-
       cessing.

       --max  Turn off all filters, including the bias filter,	and  run  full
	      Forward/Backward	postprocessing on every	target.	This increases
	      sensitivity somewhat, at a large cost in speed.

       --F1 <x>
	      Set the P-value threshold	for the	MSV filter step.  The  default
	      is  0.02,	 meaning that roughly 2% of the	highest	scoring	nonho-
	      mologous targets are expected to pass the	filter.

       --F2 <x>
	      Set the P-value threshold	for the	Viterbi	filter step.  The  de-
	      fault is 0.001.

       --F3 <x>
	      Set  the P-value threshold for the Forward filter	step.  The de-
	      fault is 1e-5.

       --nobias
	      Turn off the bias	filter.	This increases	sensitivity  somewhat,
	      but  can	come  at a high	cost in	speed, especially if the query
	      has biased residue composition (such as  a  repetitive  sequence
	      region, or if it is a membrane protein with large	regions	of hy-
	      drophobicity).  Without  the bias	filter,	too many sequences may
	      pass the filter with biased queries, leading to slower than  ex-
	      pected   performance   as	 the  computationally  intensive  For-
	      ward/Backward algorithms shoulder	an abnormally heavy load.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z <x> Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-sequence	E-value	 calculations,	rather
	      than the actual number of	targets	seen.

       --domZ <x>
	      Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-domain conditional E-value calculations,
	      rather  than  the	 number	 of  targets that passed the reporting
	      thresholds.

       --seed <n>
	      Set the random number seed to <n>.  Some steps in	postprocessing
	      require Monte Carlo simulation.  The default is to use  a	 fixed
	      seed  (42),  so that results are exactly reproducible. Any other
	      positive integer will give different (but	also reproducible) re-
	      sults. A choice of 0 uses	an arbitrarily chosen seed.

       --qformat <s>
	      Assert that input	seqfile	is in format <s>, bypassing format au-
	      todetection.  Common choices for <s> include: fasta, embl,  gen-
	      bank.   Alignment	 formats  also	work;  common choices include:
	      stockholm, a2m, afa, psiblast, clustal, phylip.  For more	infor-
	      mation, and for codes for	some less  common  formats,  see  main
	      documentation.   The  string  <s>	 is case-insensitive (fasta or
	      FASTA both work).

       --cpu <n>
	      Set the number of	parallel worker	threads	to <n>.	  The  default
	      is  0,  meaning  off  (no	thread-level parallelization), because
	      hmmscan is typically i/o bound and the  extra  overhead  of  our
	      current  multithreaded implementation isn't worthwhile.  You can
	      also control this	number by setting an environment variable, HM-
	      MER_NCPU.	 There is also a master	thread,	so the	actual	number
	      of threads that HMMER spawns is at least <n>+1.

	      This  option  is	not available if HMMER was compiled with POSIX
	      threads support turned off.

       --stall
	      For debugging the	MPI master/worker version: pause after	start,
	      to  enable the developer to attach debuggers to the running mas-
	      ter and worker(s)	processes. Send	SIGCONT	signal to release  the
	      pause.  (Under gdb: (gdb)	signal SIGCONT)

	      (Only  available if optional MPI support was enabled at compile-
	      time.)

       --mpi  Run under	MPI control with master/worker parallelization	(using
	      mpirun,  for example, or equivalent). Only available if optional
	      MPI support was enabled at compile-time.

SEE ALSO
       See hmmer(1) for	a master man page with a list of  all  the  individual
       man pages for programs in the HMMER package.

       For  complete documentation, see	the user guide that came with your HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2023 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For additional information on copyright and  licensing,	see  the  file
       called  COPYRIGHT  in  your HMMER source	distribution, or see the HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.4			   Aug 2023			    hmmscan(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=hmmscan&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help