Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
hmmsearch(1)			 HMMER Manual			  hmmsearch(1)

NAME
       hmmsearch - search profile(s) against a sequence	database

SYNOPSIS
       hmmsearch [options] hmmfile seqdb

DESCRIPTION
       hmmsearch  is  used  to	search one or more profiles against a sequence
       database.  For each profile in  hmmfile,	 use  that  query  profile  to
       search  the  target  database  of sequences in seqdb, and output	ranked
       lists of	the sequences with the most significant	matches	 to  the  pro-
       file.  To build profiles	from multiple alignments, see hmmbuild.

       Either the query	hmmfile	or the target seqdb may	be '-' (a dash charac-
       ter),  in which case the	query profile or target	database input will be
       read from a stdin pipe instead of from a	file. Only  one	 input	source
       can  come through stdin,	not both.  An exception	is that	if the hmmfile
       contains	more than one profile  query,  then  seqdb  cannot  come  from
       stdin,  because we can't	rewind the streaming target database to	search
       it with another profile.

       The output format is designed to	be human-readable, but is often	so vo-
       luminous	that reading it	is impractical,	and parsing it is a pain.  The
       --tblout	 and --domtblout options save output in	simple tabular formats
       that are	concise	and easier to parse.  The -o option allows redirecting
       the main	output,	including throwing it away in /dev/null.

OPTIONS
       -h     Help; print a brief reminder  of	command	 line  usage  and  all
	      available	options.

OPTIONS	FOR CONTROLLING	OUTPUT
       -o <f> Direct  the  main	human-readable output to a file	<f> instead of
	      the default stdout.

       -A <f> Save a multiple alignment	of all significant hits	(those	satis-
	      fying inclusion thresholds) to the file <f>.

       --tblout	<f>
	      Save  a  simple  tabular	(space-delimited) file summarizing the
	      per-target output, with one data line per	homologous target  se-
	      quence found.

       --domtblout <f>
	      Save  a  simple  tabular	(space-delimited) file summarizing the
	      per-domain output, with one data line per	homologous domain  de-
	      tected in	a query	sequence for each homologous model.

       --acc  Use accessions instead of	names in the main output, where	avail-
	      able for profiles	and/or sequences.

       --noali
	      Omit  the	 alignment  section  from  the	main  output. This can
	      greatly reduce the output	volume.

       --notextw
	      Unlimit the length of each line in the main output. The  default
	      is a limit of 120	characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw <n>
	      Set  the	main  output's line length limit to <n>	characters per
	      line. The	default	is 120.

OPTIONS	CONTROLLING REPORTING THRESHOLDS
       Reporting thresholds control which hits are reported  in	 output	 files
       (the main output, --tblout, and --domtblout).  Sequence hits and	domain
       hits  are  ranked  by  statistical significance (E-value) and output is
       generated in two	sections called	per-target and per-domain  output.  In
       per-target  output, by default, all sequence hits with an E-value <= 10
       are reported. In	the per-domain output, for each	target that has	passed
       per-target reporting thresholds,	all domains satisfying per-domain  re-
       porting	thresholds  are	 reported.  By default,	these are domains with
       conditional E-values of <= 10.  The  following  options	allow  you  to
       change  the  default  E-value reporting thresholds, or to use bit score
       thresholds instead.

       -E <x> In the per-target	output,	report target  sequences  with	an  E-
	      value  of	<= <x>.	 The default is	10.0, meaning that on average,
	      about 10 false positives will be reported	per query, so you  can
	      see  the top of the noise	and decide for yourself	if it's	really
	      noise.

       -T <x> Instead of thresholding per-profile output on  E-value,  instead
	      report target sequences with a bit score of >= <x>.

       --domE <x>
	      In the per-domain	output,	for target sequences that have already
	      satisfied	the per-profile	reporting threshold, report individual
	      domains  with  a	conditional E-value of <= <x>.	The default is
	      10.0.  A conditional E-value means the expected number of	 addi-
	      tional  false  positive  domains	in the smaller search space of
	      those comparisons	that already satisfied the per-target  report-
	      ing threshold (and thus must have	at least one homologous	domain
	      already).

       --domT <x>
	      Instead  of  thresholding	 per-domain output on E-value, instead
	      report domains with a bit	score of >= <x>.

OPTIONS	FOR INCLUSION THRESHOLDS
       Inclusion thresholds are	stricter than reporting	thresholds.  Inclusion
       thresholds control which	hits are considered to be reliable  enough  to
       be  included  in	 an  output alignment or a subsequent search round, or
       marked as significant ("!") as opposed to questionable ("?")  in	domain
       output.

       --incE <x>
	      Use an E-value of	<= <x> as the per-target inclusion  threshold.
	      The default is 0.01, meaning that	on average, about 1 false pos-
	      itive  would  be	expected  in every 100 searches	with different
	      query sequences.

       --incT <x>
	      Instead of using E-values	for setting the	 inclusion  threshold,
	      instead  use  a  bit score of >= <x> as the per-target inclusion
	      threshold.  By default this option is unset.

       --incdomE <x>
	      Use a conditional	E-value	of <= <x> as the per-domain  inclusion
	      threshold,  in  targets  that have already satisfied the overall
	      per-target inclusion threshold.  The default is 0.01.

       --incdomT <x>
	      Instead of using E-values, use a bit score of >= <x> as the per-
	      domain inclusion threshold.

OPTIONS	FOR MODEL-SPECIFIC SCORE THRESHOLDING
       Curated profile databases may define specific bit score thresholds  for
       each profile, superseding any thresholding based	on statistical signif-
       icance alone.

       To use these options, the profile must contain the appropriate (GA, TC,
       and/or  NC)  optional  score threshold annotation; this is picked up by
       hmmbuild	from Stockholm format alignment	files. Each  thresholding  op-
       tion has	two scores: the	per-sequence threshold <x1> and	the per-domain
       threshold  <x2>	These act as if	-T <x1>	--incT <x1> --domT <x2>	--inc-
       domT <x2> has been applied  specifically	 using	each  model's  curated
       thresholds.

       --cut_ga
	      Use  the	GA  (gathering)	bit scores in the model	to set per-se-
	      quence  (GA1)  and  per-domain  (GA2)  reporting	and  inclusion
	      thresholds. GA thresholds	are generally considered to be the re-
	      liable  curated thresholds defining family membership; for exam-
	      ple, in Pfam, these thresholds define what gets included in Pfam
	      Full alignments based on searches	with Pfam Seed models.

       --cut_nc
	      Use the NC (noise	cutoff)	bit score thresholds in	the  model  to
	      set per-sequence (NC1) and per-domain (NC2) reporting and	inclu-
	      sion  thresholds.	 NC  thresholds	are generally considered to be
	      the score	of the highest-scoring known false positive.

       --cut_tc
	      Use the TC (trusted cutoff) bit score thresholds in the model to
	      set per-sequence (TC1) and per-domain (TC2) reporting and	inclu-
	      sion thresholds. TC thresholds are generally  considered	to  be
	      the  score  of  the  lowest-scoring  known true positive that is
	      above all	known false positives.

OPTIONS	CONTROLLING THE	ACCELERATION PIPELINE
       HMMER3 searches are accelerated in a three-step	filter	pipeline:  the
       MSV  filter, the	Viterbi	filter,	and the	Forward	filter.	The first fil-
       ter is the fastest and most approximate;	the last is the	 full  Forward
       scoring	algorithm.  There  is  also a bias filter step between MSV and
       Viterbi.	Targets	that pass all the steps	in the	acceleration  pipeline
       are then	subjected to postprocessing -- domain identification and scor-
       ing using the Forward/Backward algorithm.

       Changing	 filter	 thresholds only removes or includes targets from con-
       sideration; changing filter thresholds does not alter  bit  scores,  E-
       values,	or  alignments,	all of which are determined solely in postpro-
       cessing.

       --max  Turn off all filters, including the bias filter,	and  run  full
	      Forward/Backward	postprocessing on every	target.	This increases
	      sensitivity somewhat, at a large cost in speed.

       --F1 <x>
	      Set the P-value threshold	for the	MSV filter step.  The  default
	      is  0.02,	 meaning that roughly 2% of the	highest	scoring	nonho-
	      mologous targets are expected to pass the	filter.

       --F2 <x>
	      Set the P-value threshold	for the	Viterbi	filter step.  The  de-
	      fault is 0.001.

       --F3 <x>
	      Set  the P-value threshold for the Forward filter	step.  The de-
	      fault is 1e-5.

       --nobias
	      Turn off the bias	filter.	This increases	sensitivity  somewhat,
	      but  can	come  at a high	cost in	speed, especially if the query
	      has biased residue composition (such as  a  repetitive  sequence
	      region, or if it is a membrane protein with large	regions	of hy-
	      drophobicity).  Without  the bias	filter,	too many sequences may
	      pass the filter with biased queries, leading to slower than  ex-
	      pected   performance   as	 the  computationally  intensive  For-
	      ward/Backward algorithms shoulder	an abnormally heavy load.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z <x> Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-sequence	E-value	 calculations,	rather
	      than the actual number of	targets	seen.

       --domZ <x>
	      Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-domain conditional E-value calculations,
	      rather  than  the	 number	 of  targets that passed the reporting
	      thresholds.

       --seed <n>
	      Set the random number seed to <n>.  Some steps in	postprocessing
	      require Monte Carlo simulation.  The default is to use  a	 fixed
	      seed  (42),  so that results are exactly reproducible. Any other
	      positive integer will give different (but	also reproducible) re-
	      sults. A choice of 0 uses	a randomly chosen seed.

       --tformat <s>
	      Assert that target sequence file seqfile is in format  <s>,  by-
	      passing  format  autodetection.  Common choices for <s> include:
	      fasta, embl,  genbank.   Alignment  formats  also	 work;	common
	      choices include: stockholm, a2m, afa, psiblast, clustal, phylip.
	      For  more	 information,  and for codes for some less common for-
	      mats, see	main documentation.  The string	<s>  is	 case-insensi-
	      tive (fasta or FASTA both	work).

       --cpu <n>
	      Set  the number of parallel worker threads to <n>.  On multicore
	      machines,	the default is 2.  You can also	control	this number by
	      setting an environment variable, HMMER_NCPU.  There  is  also  a
	      master thread, so	the actual number of threads that HMMER	spawns
	      is <n>+1.

	      This  option  is	not available if HMMER was compiled with POSIX
	      threads support turned off.

       --stall
	      For debugging the	MPI master/worker version: pause after	start,
	      to  enable the developer to attach debuggers to the running mas-
	      ter and worker(s)	processes. Send	SIGCONT	signal to release  the
	      pause.  (Under gdb: (gdb)	signal SIGCONT)	(Only available	if op-
	      tional MPI support was enabled at	compile-time.)

       --mpi  Run  under MPI control with master/worker	parallelization	(using
	      mpirun, for example, or equivalent). Only	available if  optional
	      MPI support was enabled at compile-time.

SEE ALSO
       See  hmmer(1)  for  a master man	page with a list of all	the individual
       man pages for programs in the HMMER package.

       For complete documentation, see the user	guide that came	with your  HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2023 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT	in your	HMMER source distribution, or  see  the	 HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.4			   Aug 2023			  hmmsearch(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=hmmsearch&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help