Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
cmemit(1)			Infernal Manual			     cmemit(1)

NAME
       cmemit -	sample sequences from a	covariance model

SYNOPSIS
       cmemit [options]	<cmfile>

DESCRIPTION
       The  cmemit  program  samples  (emits)  sequences  from	the covariance
       model(s)	in <cmfile>, and writes	them to	 output.   Sampling  sequences
       may  be	useful for a variety of	purposes, including creating synthetic
       true positives for benchmarks or	tests.

       The default is to sample	ten unaligned sequence from each CM.  Alterna-
       tively, with the	-c option, you can emit	a single majority-rule consen-
       sus sequence; or	with the -a option, you	can emit an alignment.

       The  <cmfile>  may contain a library of CMs, in which case each CM will
       be used in turn.

       <cmfile>	may be '-' (dash), which means reading this input  from	 stdin
       rather than a file.

       For  models with	zero basepairs,	sequences are sampled from the profile
       HMM filter instead of the CM.  However,	since  these  models  will  be
       nearly  identical  (unless special options were used in cmbuild to pre-
       vent this), using the HMM instead of the	CM will	not change the	output
       in  a  significant  way,	unless the -l option is	used. With -l, the HMM
       will be configured for equiprobable  model  begin  and  end  positions,
       while  the  CM will not.	You can	force cmemit to	always sample from the
       CM with the --nohmmonly option.

OPTIONS
       -h     Help; print a brief reminder of command line usage and available
	      options.

       -o <f> Save the synthetic sequences to file  <f>	 rather	 than  writing
	      them to stdout.

       -N <n> Generate <n> sequences. The default value	for <n>	is 10.

       -u     Write  the generated sequences in	unaligned format (FASTA). This
	      is the default behavior.

       -a     Write the	generated sequences in an aligned  format  (STOCKHOLM)
	      with  consensus  structure  annotation  rather than FASTA. Other
	      output formats are possible with the --outformat option.

       -c     Predict a	single majority-rule  consensus	 sequence  instead  of
	      sampling	sequences  from	 the  CM's  probability	 distribution.
	      Highly conserved	residues  (base	 paired	 residues  that	 score
	      higher  than  3.0	 bits,	or single stranded residues that score
	      higher than 1.0 bits) are	shown in upper case; others are	 shown
	      in lower case.

       -e <n> Embed  the  CM  emitted sequences	in a larger randomly generated
	      sequence of length <n> generated from an HMM that	was trained on
	      real genomic sequences with various GC contents  (the  same  HMM
	      used  by cmcalibrate).  You can use the --iid option to generate
	      25% A, C,	G, and U sequence instead.  The	 CM  emitted  sequence
	      will  begin  at a	random position	within the larger sequence and
	      will be included in its entirety unless the --u5p	or  --u3p  op-
	      tions  are used.	When -e	is used	in combination with --u5p, the
	      CM emitted sequence will always  begin  at  position  1  of  the
	      larger  sequence and will	be truncated 5'. When used in combina-
	      tion --u3p the CM	emitted	sequence will always end  at  position
	      <n> of the larger	sequence and will be truncated 3'.

       -l     Configure	 the CMs into local mode before	emitting sequences. By
	      default the model	will be	in global mode.	In local  mode,	 large
	      insertions and deletions are more	common than in global mode.

OPTIONS	FOR TRUNCATING EMITTED SEQUENCES
       --u5p  Truncate	all emitted sequences at a randomly chosen start posi-
	      tion <n>,	by only	outputting residues beginning at <n>.  A  dif-
	      ferent start point is randomly chosen for	each sequence.

       --u3p  Truncate all emitted sequences at	a randomly chosen end position
	      <n>,  by only outputting residues	up to position <n>.  A differ-
	      ent end point is randomly	chosen for each	sequence.

       --a5p <n>
	      In combination with the -a option, truncate the  emitted	align-
	      ment at a	randomly chosen	start match position <n>, by only out-
	      putting  alignment columns for positions after match state <n> -
	      1.  <n> must be an integer between 0 and the consensus length of
	      the model	(which can be determined using the cmstat program.  As
	      a	 special case, using 0 as <n> will result in a randomly	chosen
	      start position.

       --a3p <n>
	      In combination with the -a option, truncate the  emitted	align-
	      ment  at	a randomly chosen end match position <n>, by only out-
	      putting alignment	columns	for positions before match state <n> +
	      1.  <n> must be an integer between 1 and the consensus length of
	      the model	(which can be determined using the cmstat program). As
	      a	special	case, using 0 as <n> will result in a randomly	chosen
	      end position.

OTHER OPTIONS
       --seed <n>
	      Seed  the	 random	number generator with <n>, an integer >= 0. If
	      <n> is nonzero, stochastic sampling of sequences will be	repro-
	      ducible; the same	command	will give the same results.  If	<n> is
	      0,  the  random number generator is seeded arbitrarily, and sto-
	      chastic samplings	will vary from run to run of the same command.
	      The default seed is 0.

       --iid  With -e, generate	the larger sequences as	25% each A, C,	G  and
	      U.

       --rna  Specify  that  the emitted sequences be output as	RNA sequences.
	      This is true by default.

       --dna  Specify that the emitted sequences be output as  DNA  sequences.
	      By default, the output alphabet is RNA.

       --idx <n>
	      Specify  that the	emitted	sequences be named starting with <mod-
	      elname>.<n>.  By default <n> is 1.

       --outformat <s>
	      With -a, specify the output alignment format as <s>.  Acceptable
	      formats are: Pfam,  AFA,	A2M,  Clustal,	and  Phylip.   AFA  is
	      aligned  fasta.  Only  Pfam and Stockholm	alignment formats will
	      include consensus	structure annotation.

       --tfile <f>
	      Dump tabular sequence parsetrees (tracebacks) for	 each  emitted
	      sequence to file <f>.  Primarily useful for debugging.

       --exp <x>
	      Exponentiate the emission	and transition probabilities of	the CM
	      by  <x> and then renormalize those distributions before emitting
	      sequences. This option changes the CM  probability  distribution
	      of  parsetrees  relative	to default. With <x> less than 1.0 the
	      emitted sequences	will tend to have lower	bit scores upon	align-
	      ment to the CM.  With <x>	greater	 than  1.0,  the  emitted  se-
	      quences  will  tend  to have higher bit scores upon alignment to
	      the CM. This bit score difference	will  increase	as  <x>	 moves
	      further  away  from 1.0 in either	direction.  If <x> equals 1.0,
	      this option has no effect	relative to default.  This  option  is
	      useful for generating sequences that are either more difficult (
	      <x>  <  1.0) or easier ( <x> > 1.0) for the CM to	distinguish as
	      homologous from background, random sequence.

       --hmmonly
	      Emit from	the filter profile HMM instead of the CM.

       --nohmmonly
	      Never emit from the filter profile HMM, always use the CM,  even
	      for models with zero basepairs.

SEE ALSO
       See infernal(1) for a master man	page with a list of all	the individual
       man pages for programs in the Infernal package.

       For  complete documentation, see	the user guide that came with your In-
       fernal distribution (Userguide.pdf);  or	 see  the  Infernal  web  page
       (http://eddylab.org/infernal/).

COPYRIGHT
       Copyright (C) 2023 Howard Hughes	Medical	Institute.
       Freely distributed under	the BSD	open source license.

       For  additional	information  on	 copyright and licensing, see the file
       called COPYRIGHT	in your	Infernal source	distribution, or see  the  In-
       fernal web page (http://eddylab.org/infernal/).

AUTHOR
       http://eddylab.org

Infernal 1.1.5			   Sep 2023			     cmemit(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=cmemit&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help