Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
phmmer(1)			 HMMER Manual			     phmmer(1)

NAME
       phmmer -	search protein sequence(s) against a protein sequence database

SYNOPSIS
       phmmer [options]	seqfile	seqdb

DESCRIPTION
       phmmer  is used to search one or	more query protein sequences against a
       protein sequence	database.  For each query  sequence  in	 seqfile,  use
       that  sequence to search	the target database of sequences in seqdb, and
       output ranked lists of the sequences with the most significant  matches
       to the query.

       Either the query	seqfile	or the target seqdb may	be '-' (a dash charac-
       ter),  in  which	case the query sequences or target database input will
       be read from a <stdin> pipe instead of from  a  file.  Only  one	 input
       source can come through <stdin>,	not both.  An exception	is that	if the
       seqfile	contains  more than one	query sequence,	then seqdb cannot come
       from <stdin>, because we	can't rewind the streaming target database  to
       search it with another query.

       The output format is designed to	be human-readable, but is often	so vo-
       luminous	 that reading it is impractical, and parsing it	is a pain. The
       --tblout	and --domtblout	options	save output in simple tabular  formats
       that are	concise	and easier to parse.  The -o option allows redirecting
       the main	output,	including throwing it away in /dev/null.

OPTIONS
       -h     Help;  print  a  brief  reminder	of  command line usage and all
	      available	options.

OPTIONS	FOR CONTROLLING	OUTPUT
       -o <f> Direct the main human-readable output to a file <f>  instead  of
	      the default stdout.

       -A <f> Save  a multiple alignment of all	significant hits (those	satis-
	      fying inclusion thresholds) to the file <f> in Stockholm format.

       --tblout	<f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-target  output, with one data	line per homologous target se-
	      quence found.

       --domtblout <f>
	      Save a simple tabular  (space-delimited)	file  summarizing  the
	      per-domain  output, with one data	line per homologous domain de-
	      tected in	a query	sequence for each homologous model.

       --acc  Use accessions instead of	names in the main output, where	avail-
	      able for profiles	and/or sequences.

       --noali
	      Omit the alignment  section  from	 the  main  output.  This  can
	      greatly reduce the output	volume.

       --notextw
	      Unlimit  the length of each line in the main output. The default
	      is a limit of 120	characters per line, which helps in displaying
	      the output cleanly on terminals and in editors, but can truncate
	      target profile description lines.

       --textw <n>
	      Set the main output's line length	limit to  <n>  characters  per
	      line. The	default	is 120.

OPTIONS	CONTROLLING SCORING SYSTEM
       The  probability	 model	in  phmmer is constructed by inferring residue
       probabilities from a standard 20x20 substitution	score matrix, plus two
       additional parameters for position-independent gap open and gap	extend
       probabilities.

       --popen <x>
	      Set  the	gap open probability for a single sequence query model
	      to <x>.  The default is 0.02.  <x> must be >= 0 and < 0.5.

       --pextend <x>
	      Set the gap extend probability for a single sequence query model
	      to <x>.  The default is 0.4.  <x>	must be	>= 0 and < 1.0.

       --mx <s>
	      Obtain residue alignment probabilities from the built-in substi-
	      tution matrix named <s>.	Several	standard matrices  are	built-
	      in,  and do not need to be read from files.  The matrix name <s>
	      can be PAM30, PAM70, PAM120, PAM240,  BLOSUM45,  BLOSUM50,  BLO-
	      SUM62, BLOSUM80, or BLOSUM90.  Only one of the --mx and --mxfile
	      options may be used.

       --mxfile	mxfile
	      Obtain residue alignment probabilities from the substitution ma-
	      trix in file mxfile.  The	default	score matrix is	BLOSUM62 (this
	      matrix is	internal to HMMER and does not have to be available as
	      a	 file).	  The  format  of  a substitution matrix mxfile	is the
	      standard format accepted by BLAST,  FASTA,  and  other  sequence
	      analysis software.  See ftp.ncbi.nlm.nih.gov/blast/matrices/ for
	      example  files.  (The  only exception: we	require	matrices to be
	      square, so for DNA, use files like NCBI's	NUC.4.4, not NUC.4.2.)

OPTIONS	CONTROLLING REPORTING THRESHOLDS
       Reporting thresholds control which hits are reported  in	 output	 files
       (the main output, --tblout, and --domtblout).  Sequence hits and	domain
       hits  are  ranked  by  statistical significance (E-value) and output is
       generated in two	sections called	per-target and per-domain  output.  In
       per-target  output, by default, all sequence hits with an E-value <= 10
       are reported. In	the per-domain output, for each	target that has	passed
       per-target reporting thresholds,	all domains satisfying per-domain  re-
       porting	thresholds  are	 reported.  By default,	these are domains with
       conditional E-values of <= 10.  The  following  options	allow  you  to
       change  the  default  E-value reporting thresholds, or to use bit score
       thresholds instead.

       -E <x> In the per-target	output,	report target  sequences  with	an  E-
	      value  of	<= <x>.	 The default is	10.0, meaning that on average,
	      about 10 false positives will be reported	per query, so you  can
	      see  the top of the noise	and decide for yourself	if it's	really
	      noise.

       -T <x> Instead of thresholding per-profile output on  E-value,  instead
	      report target sequences with a bit score of >= <x>.

       --domE <x>
	      In the per-domain	output,	for target sequences that have already
	      satisfied	the per-profile	reporting threshold, report individual
	      domains  with  a	conditional E-value of <= <x>.	The default is
	      10.0.  A conditional E-value means the expected number of	 addi-
	      tional  false  positive  domains	in the smaller search space of
	      those comparisons	that already satisfied the per-target  report-
	      ing threshold (and thus must have	at least one homologous	domain
	      already).

       --domT <x>
	      Instead  of  thresholding	 per-domain output on E-value, instead
	      report domains with a bit	score of >= <x>.

OPTIONS	CONTROLLING INCLUSION THRESHOLDS
       Inclusion thresholds are	stricter than reporting	thresholds. They  con-
       trol  which  hits are included in any output multiple alignment (the -A
       option) and which domains are marked as significant ("!") as opposed to
       questionable ("?")  in domain output.

       --incE <x>
	      Use an E-value of	<= <x> as the per-target inclusion  threshold.
	      The default is 0.01, meaning that	on average, about 1 false pos-
	      itive  would  be	expected  in every 100 searches	with different
	      query sequences.

       --incT <x>
	      Instead of using E-values	for setting the	 inclusion  threshold,
	      instead  use  a  bit score of >= <x> as the per-target inclusion
	      threshold.  By default this option is unset.

       --incdomE <x>
	      Use a conditional	E-value	of <= <x> as the per-domain  inclusion
	      threshold,  in  targets  that have already satisfied the overall
	      per-target inclusion threshold.  The default is 0.01.

       --incdomT <x>
	      Instead of using E-values, use a bit score of >= <x> as the per-
	      domain inclusion threshold.  By default this option is unset.

OPTIONS	CONTROLLING THE	ACCELERATION PIPELINE
       HMMER3 searches are accelerated in a three-step	filter	pipeline:  the
       MSV  filter, the	Viterbi	filter,	and the	Forward	filter.	The first fil-
       ter is the fastest and most approximate;	the last is the	 full  Forward
       scoring algorithm, slowest but most accurate. There is also a bias fil-
       ter  step  between  MSV and Viterbi. Targets that pass all the steps in
       the acceleration	pipeline are then subjected to postprocessing  --  do-
       main identification and scoring using the Forward/Backward algorithm.

       Essentially  the	 only  free  parameters	that control HMMER's heuristic
       filters are the P-value thresholds controlling the expected fraction of
       nonhomologous sequences that pass  the  filters.	 Setting  the  default
       thresholds  higher  will	 pass a	higher proportion of nonhomologous se-
       quence, increasing sensitivity at the  expense  of  speed;  conversely,
       setting	lower  P-value	thresholds will	pass a smaller proportion, de-
       creasing	sensitivity and	increasing speed. Setting a  filter's  P-value
       threshold  to  1.0 means	it will	passing	all sequences, and effectively
       disables	the filter.

       Changing	filter thresholds only removes or includes targets  from  con-
       sideration;  changing  filter  thresholds does not alter	bit scores, E-
       values, or alignments, all of which are determined solely  in  postpro-
       cessing.

       --max  Maximum  sensitivity.   Turn off all filters, including the bias
	      filter, and run full Forward/Backward  postprocessing  on	 every
	      target.  This increases sensitivity slightly, at a large cost in
	      speed.

       --F1 <x>
	      First filter threshold; set the P-value threshold	 for  the  MSV
	      filter  step.   The  default is 0.02, meaning that roughly 2% of
	      the highest scoring nonhomologous	targets	are expected  to  pass
	      the filter.

       --F2 <x>
	      Second  filter  threshold;  set  the  P-value  threshold for the
	      Viterbi filter step.  The	default	is 0.001.

       --F3 <x>
	      Third filter threshold; set the P-value threshold	for  the  For-
	      ward filter step.	 The default is	1e-5.

       --nobias
	      Turn  off	 the bias filter. This increases sensitivity somewhat,
	      but can come at a	high cost in speed, especially	if  the	 query
	      has  biased  residue  composition	(such as a repetitive sequence
	      region, or if it is a membrane protein with large	regions	of hy-
	      drophobicity). Without the bias filter, too many	sequences  may
	      pass  the	filter with biased queries, leading to slower than ex-
	      pected  performance  as  the  computationally   intensive	  For-
	      ward/Backward algorithms shoulder	an abnormally heavy load.

OPTIONS	CONTROLLING E-VALUE CALIBRATION
       Estimating the location parameters for the expected score distributions
       for  MSV	 filter	 scores, Viterbi filter	scores,	and Forward scores re-
       quires three short random sequence simulations.

       --EmL <n>
	      Sets the sequence	length in simulation that estimates the	 loca-
	      tion parameter mu	for MSV	filter E-values. Default is 200.

       --EmN <n>
	      Sets  the	 number	 of sequences in simulation that estimates the
	      location parameter mu for	MSV filter E-values. Default is	200.

       --EvL <n>
	      Sets the sequence	length in simulation that estimates the	 loca-
	      tion parameter mu	for Viterbi filter E-values. Default is	200.

       --EvN <n>
	      Sets  the	 number	 of sequences in simulation that estimates the
	      location parameter mu for	Viterbi	filter	E-values.  Default  is
	      200.

       --EfL <n>
	      Sets  the	sequence length	in simulation that estimates the loca-
	      tion parameter tau for Forward E-values. Default is 100.

       --EfN <n>
	      Sets the number of sequences in simulation  that	estimates  the
	      location parameter tau for Forward E-values. Default is 200.

       --Eft <x>
	      Sets  the	tail mass fraction to fit in the simulation that esti-
	      mates the	location parameter tau for Forward evalues. Default is
	      0.04.

OTHER OPTIONS
       --nonull2
	      Turn off the null2 score corrections for biased composition.

       -Z <x> Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-sequence	E-value	 calculations,	rather
	      than the actual number of	targets	seen.

       --domZ <x>
	      Assert that the total number of targets in your searches is <x>,
	      for the purposes of per-domain conditional E-value calculations,
	      rather  than  the	 number	 of  targets that passed the reporting
	      thresholds.

       --seed <n>
	      Seed the random number generator with <n>, an integer >= 0.   If
	      <n>  is >0, any stochastic simulations will be reproducible; the
	      same command will	give the same results.	If <n> is 0, the  ran-
	      dom number generator is seeded arbitrarily, and stochastic simu-
	      lations  will vary from run to run of the	same command.  The de-
	      fault seed is 42.

       --qformat <s>
	      Assert that input	seqfile	is in format <s>, bypassing format au-
	      todetection.  Common choices for <s> include: fasta, embl,  gen-
	      bank.   Alignment	 formats  also	work;  common choices include:
	      stockholm, a2m, afa, psiblast, clustal, phylip.	phmmer	always
	      uses  a  single  sequence	query to start its search, so when the
	      input seqfile is an alignment, phmmer  reads  it	one  unaligned
	      query  sequence at a time, not as	an alignment.  For more	infor-
	      mation, and for codes for	some less  common  formats,  see  main
	      documentation.   The  string  <s>	 is case-insensitive (fasta or
	      FASTA both work).

	      --tformat	<s> Assert that	target sequence	database seqdb	is  in
	      format <s>, bypassing format autodetection.  See --qformat above
	      for list of accepted format codes	for <s>.

       --cpu <n>
	      Set  the number of parallel worker threads to <n>.  On multicore
	      machines,	the default is 2.  You can also	control	this number by
	      setting an environment variable, HMMER_NCPU.  There  is  also  a
	      master thread, so	the actual number of threads that HMMER	spawns
	      is <n>+1.

	      This  option  is	not available if HMMER was compiled with POSIX
	      threads support turned off.

       --stall
	      For debugging the	MPI master/worker version: pause after	start,
	      to  enable the developer to attach debuggers to the running mas-
	      ter and worker(s)	processes. Send	SIGCONT	signal to release  the
	      pause.  (Under gdb: (gdb)	signal SIGCONT)	(Only available	if op-
	      tional MPI support was enabled at	compile-time.)

       --mpi  Run  under MPI control with master/worker	parallelization	(using
	      mpirun, for example, or equivalent). Only	available if  optional
	      MPI support was enabled at compile-time.

SEE ALSO
       See  hmmer(1)  for  a master man	page with a list of all	the individual
       man pages for programs in the HMMER package.

       For complete documentation, see the user	guide that came	with your  HM-
       MER distribution	(Userguide.pdf); or see	the HMMER web page (http://hm-
       mer.org/).

COPYRIGHT
       Copyright (C) 2023 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT	in your	HMMER source distribution, or  see  the	 HMMER
       web page	(http://hmmer.org/).

AUTHOR
       http://eddylab.org

HMMER 3.4			   Aug 2023			     phmmer(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=phmmer&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help