Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
RNAALIFOLD(1)			 User Commands			 RNAALIFOLD(1)

NAME
       RNAalifold - manual page	for RNAalifold 2.7.0

SYNOPSIS
       RNAalifold [options] [<input0.aln>] [<input1.aln>]...

DESCRIPTION
       RNAalifold 2.7.0

       calculate secondary structures for a set	of aligned RNAs

       Read  aligned  RNA sequences from stdin or file.aln and calculate their
       minimum free energy (mfe) structure, partition function (pf)  and  base
       pairing	probability  matrix. Currently,	input alignments have to be in
       CLUSTAL,	Stockholm, FASTA, or MAF format. The input format must be  set
       manually	 in  interactive mode (default is Clustal), but	will be	deter-
       mined automagically from	the input file,	if not expplicitly set.	It re-
       turns the mfe structure in bracket notation, its	energy,	the  free  en-
       ergy  of	the thermodynamic ensemble and the frequency of	the mfe	struc-
       ture in the ensemble to stdout.	It also	produces Postscript files with
       plots of	the resulting secondary	structure graph	 ("alirna.ps")	and  a
       "dot  plot"  of	the base pairing matrix	("alidot.ps").	The file "ali-
       fold.out" will contain a	list of	likely pairs  sorted  by  credibility,
       suitable	for viewing  with "AliDot.pl". Be warned that output file will
       overwrite any existing files of the same	name.

       -h, --help
	      Print help and exit

       --detailed-help
	      Print help, including all	details	and hidden options, and	exit

       --full-help
	      Print help, including hidden options, and	exit

       -V, --version
	      Print version and	exit

       -v, --verbose
	      Be verbose.  (default=off)

	      Lower  the  log  level  setting such that	even INFO messages are
	      passed through.

       -q, --quiet
	      Be quiet.	 (default=off)

	      This option can be used to minimize the output of	additional in-
	      formation	and non-severe warnings	 which	otherwise  might  spam
	      stdout/stderr.

   I/O Options:
	      Command line options for input and output	(pre-)processing

       -f, --input-format=C|S|F|M
	      File format of the input multiple	sequence alignment (MSA).

	      If  this	parameter  is  set, the	input is considered to be in a
	      particular file format. Otherwise, the program tries  to	deter-
	      mine  the	 file  format automatically, if	an input file was pro-
	      vided in the set of parameters. In case the input	 MSA  is  pro-
	      vided  in	 interactive  mode, or from a terminal (TTY), the pro-
	      grams default is to assume CLUSTALW format.  Currently, the fol-
	      lowing formats are  available:  ClustalW	('C'),	Stockholm  1.0
	      ('S'), FASTA/Pearson ('F'), and MAF ('M').

       --mis  Output  "most informative	sequence" instead of simple consensus:
	      For each column of the alignment output the set  of  nucleotides
	      with frequency greater than average in IUPAC notation.

	      (default=off)

       -j, --jobs[=number]
	      Split batch input	into jobs and start processing in parallel us-
	      ing multiple threads. A value of 0 indicates to use as many par-
	      allel threads as computation cores are available.

	      (default=`0')

	      Default  processing of input data	is performed in	a serial fash-
	      ion, i.e.	one alignment at a time. Using this switch, a user can
	      instead start the	computation for	many alignments	in  the	 input
	      in parallel. RNAalifold will create as many parallel computation
	      slots  as	 specified  and	 assigns input alignments of the input
	      file(s) to the available slots. Note, that this increases	memory
	      consumption since	input alignments have to be kept in memory un-
	      til an empty compute slot	is available and each running job  re-
	      quires its own dynamic programming matrices.

       --unordered
	      Do  not  try  to	keep output in order with input	while parallel
	      processing is in place.

	      (default=off)

	      When parallel input processing (--jobs flag) is enabled, the or-
	      der in which input is processed depends on the host machines job
	      scheduler. Therefore, any	output to stdout or files generated by
	      this program will	most likely not	follow the order of the	corre-
	      sponding input data set. The default of RNAalifold is to	use  a
	      specialized  data	 structure to still keep the results output in
	      order with the input data. However, this comes with a  trade-off
	      in terms of memory consumption, since all	output must be kept in
	      memory  for  as long as no chunks	of consecutive,	ordered	output
	      are available. By	setting	this flag, RNAalifold will not	buffer
	      individual results but print them	as soon	as they	have been com-
	      putated.

       --noconv
	      Do not automatically substitute nucleotide "T" with "U".

	      (default=off)

       -n, --continuous-ids
	      Use  continuous  alignment ID numbering when no alignment	ID can
	      be retrieved from	input data.

	      (default=off)

	      Due to its past, RNAalifold produces a specific  set  of	output
	      file  names  for	the  first input alignment, "alirna.ps", "ali-
	      dot.ps", etc. But	for all	further	alignments in  the  input,  it
	      usually  adopts  a  naming scheme	based on IDs, which may	be re-
	      trieved from the input alignment's meta-data, or generated by  a
	      prefix  followed by an increasing	counter. Setting this flag in-
	      structs RNAalifold to use	the ID	naming	scheme	also  for  the
	      first alignment.

       --auto-id
	      Automatically generate an	ID for each alignment.

	      (default=off)

	      The  default mode	of RNAalifold is to automatically determine an
	      ID from the input	alignment if the input file format  allows  to
	      do  that.	 Alignment  IDs	 are,  for  instance, usually given in
	      Stockholm	1.0 formatted input. If	this flag is  active,  RNAali-
	      fold  ignores any	IDs retrieved from the input and automatically
	      generates	an ID for each alignment.

       --id-prefix=STRING
	      Prefix for automatically generated IDs (as used in  output  file
	      names).

	      (default=`alignment')

	      If  this	parameter is set, each alignment will be prefixed with
	      the provided string. Hence, the output files will	obey the  fol-
	      lowing  naming  scheme: "prefix_xxxx_ss.ps" (secondary structure
	      plot), "prefix_xxxx_dp.ps" (dot-plot), "prefix_xxxx_aln.ps" (an-
	      notated alignment), etc. where xxxx is the alignment number  be-
	      ginning with the second alignment	in the input. Use this setting
	      in  conjunction with the --continuous-ids	flag to	assign IDs be-
	      ginning with the first input alignment.

       --id-delim=CHAR
	      Change the delimiter between prefix and  increasing  number  for
	      automatically generated IDs (as used in output file names).

	      (default=`_')

	      This  parameter  can be used to change the default delimiter '_'
	      between the prefix string	and the	increasing number for automat-
	      ically generated ID.

       --id-digits=INT
	      Specify the number of digits of  the  counter  in	 automatically
	      generated	alignment IDs.

	      (default=`4')

	      When alignments IDs are automatically generated, they receive an
	      increasing  number,  starting with 1. This number	will always be
	      left-padded by leading zeros, such that the number  takes	 up  a
	      certain  width. Using this parameter, the	width can be specified
	      to the users need. We allow numbers in the range [1:18].

       --id-start=LONG
	      Specify the first	number in  automatically  generated  alignment
	      IDs.

	      (default=`1')

	      When  alignment IDs are automatically generated, they receive an
	      increasing number, usually starting with 1. Using	 this  parame-
	      ter,  the	 first	number	can be specified to the	users require-
	      ments. Note: negative numbers are	not  allowed.	Note:  Setting
	      this  parameter  implies continuous alignment IDs, i.e. it acti-
	      vates the	--continuous-ids flag.

       --filename-delim=CHAR
	      Change the delimiting character used in sanitized	filenames.

	      (default=`ID-delimiter')

	      This parameter can be used to change  the	 delimiting  character
	      used  while sanitizing filenames,	i.e. replacing invalid charac-
	      ters. Note, that the default delimiter ALWAYS is the first char-
	      acter of the "ID delimiter" as supplied through  the  --id-delim
	      option. If the delimiter is a whitespace character or empty, in-
	      valid characters will be simply removed rather than substituted.
	      Currently, we regard the following characters as illegal for use
	      in  filenames: backslash '\', slash '/', question	mark '?', per-
	      cent sign	'%', asterisk '*', colon ':', pipe symbol '|',	double
	      quote '"', triangular brackets '<' and '>'.

       --log-level=level
	      Set log level threshold.	(default=`2')

	      By  default,  any	log messages are filtered such that only warn-
	      ings (level 2) or	errors (level 3) are printed. This setting al-
	      lows for specifying the log level	threshold, where higher	values
	      result in	fewer information. Log-level 5 turns off all messages,
	      even errors and other critical information.

       --log-file[=filename]
	      Print  log  messages  to	a  file	 instead  of   stderr.	  (de-
	      fault=`RNAalifold.log')

       --log-time
	      Include time stamp in log	messages.

	      (default=off)

       --log-call
	      Include file and line of log calling function.

	      (default=off)

   Algorithms:
	      Select  additional  algorithms  which  should be included	in the
	      calculations.

       -p, --partfunc[=INT]
	      Calculate	the partition function and  base  pairing  probability
	      matrix  in addition to the mfe structure.	Default	is calculation
	      of mfe structure only.

	      (default=`1')

	      In addition to the MFE structure we print	a  coarse  representa-
	      tion of the pair probabilities in	form of	a pseudo bracket nota-
	      tion,  followed by the ensemble free energy, as well as the cen-
	      troid structure derived from  the	 pair  probabilities  together
	      with  its	 free energy and distance to the ensemble.  Finally it
	      prints the frequency of the mfe structure.

	      An additionally passed value to this option changes the behavior
	      of partition function calculation: -p0 deactivates the  calcula-
	      tion  of	the  pair  probabilities, saving about 50% in runtime.
	      This prints the ensemble free energy 'dG=-kT ln(Z)'.

       --betaScale=DOUBLE
	      Set the scaling of the Boltzmann factors.	 (default=`1.')

	      The argument provided with this option  is  used	to  scale  the
	      thermodynamic temperature	in the Boltzmann factors independently
	      from  the	 temperature  of  the individual loop energy contribu-
	      tions. The Boltzmann factors then	become	'exp(-	dG/(kTn*betaS-
	      cale))'  where  'k' is the Boltzmann constant, 'dG' the free en-
	      ergy contribution	of the state, 'T' the absolute temperature and
	      'n' the number of	sequences.

       -S, --pfScale=DOUBLE
	      In the calculation of the	pf use scale*mfe as  an	 estimate  for
	      the ensemble free	energy (used to	avoid overflows).

	      (default=`1.07')

	      The  default is 1.07, useful values are 1.0 to 1.2. Occasionally
	      needed for long sequences.

       --MEA[=gamma]
	      Compute MEA (maximum expected accuracy) structure.

	      (default=`1.')

	      The expected accuracy is computed	from the  pair	probabilities:
	      each  base  pair '(i,j)' receives	a score	'2*gamma*p_ij' and the
	      score of an unpaired base	is given by  the  probability  of  not
	      forming a	pair. The parameter gamma tunes	the importance of cor-
	      rectly  predicted	 pairs	versus unpaired	bases. Thus, for small
	      values of	gamma the MEA structure	will contain only  pairs  with
	      very  high probability. Using --MEA implies -p for computing the
	      pair probabilities.

       --sci  Compute the structure conservation index (SCI) for the MFE  con-
	      sensus structure of the alignment.

	      (default=off)

       -c, --circ
	      Assume a circular	(instead of linear) RNA	molecule.

	      (default=off)

       --bppmThreshold=cutoff
	      Set the threshold/cutoff for base	pair probabilities included in
	      the postscript output.

	      (default=`1e-6')

	      By  setting  the	threshold the base pair	probabilities that are
	      included in the output can be varied. By default only those  ex-
	      ceeding  '1e-6'  in  probability will be shown as	squares	in the
	      dot plot.	Changing the threshold to any other value  allows  for
	      increase or decrease of data.

       -g, --gquad
	      Incoorporate  G-Quadruplex  formation into the structure predic-
	      tion algorithm.

	      (default=off)

       -s, --stochBT=INT
	      Stochastic backtrack. Compute a certain number of	random	struc-
	      tures  with  a  probability dependend on the partition function.
	      See -p option in RNAsubopt.

       --stochBT_en=INT
	      same as -s option	but also print out the energies	and probabili-
	      ties of the backtraced structures.

       -N, --nonRedundant
	      Enable non-redundant sampling strategy.

	      (default=off)

       --random-seed=INT
	      Set the seed for the random number generator

   Structure Constraints:
	      Command line options to interact with the	structure  constraints
	      feature of this program

       --maxBPspan=INT
	      Set the maximum base pair	span.

	      (default=`-1')

       -C, --constraint[=filename]
	      Calculate	 structures  subject to	constraints.  The constraining
	      structure	will be	read from 'stdin', the	alignment  has	to  be
	      given as a file name on the command line.

	      (default=`')

	      The  program  reads first	the sequence, then a string containing
	      constraints on the structure encoded with	the symbols:

	      '.' (no constraint for this base)

	      '|' (the corresponding base has to be paired

	      'x' (the base is unpaired)

	      '<' (base	i is paired with a base	j>i)

	      '>' (base	i is paired with a base	j<i)

	      and matching brackets '('	')' (base i pairs base j)

	      With the exception of '|', constraints will disallow  all	 pairs
	      conflicting  with	 the constraint. This is usually sufficient to
	      enforce the constraint, but occasionally a  base	may  stay  un-
	      paired  in  spite	of constraints.	PF folding ignores constraints
	      of type '|'.

       --batch
	      Use constraints for all alignment	records.  (default=off)

	      Usually, constraints provided from input file are	 only  applied
	      to  a single sequence alignment. Therefore, RNAalifold will stop
	      its computation and quit after the  first	 input	alignment  was
	      processed.  Using	this switch, RNAalifold	processes all sequence
	      alignments in the	input  and  applies  the  same	provided  con-
	      straints to each of them.

       --enforceConstraint
	      Enforce  base pairs given	by round brackets '(' ')' in structure
	      constraint.

	      (default=off)

       --SS_cons
	      Use consensus structures from Stockholm file ('#=GF SS_cons') as
	      constraint.

	      (default=off)

	      Stockholm	formatted alignment  files  have  the  possibility  to
	      store  a secondary structure string in one of if ('#=GC')	column
	      annotation meta tags. The	 corresponding	tag  name  is  usually
	      'SS_cons',  a  consensus	secondary  structure.  Activating this
	      flag allows one to use this consensus secondary  structure  from
	      the input	file as	structure constraint. Currently, only the fol-
	      lowing characters	are interpreted:

	      '(' ')' [mathing parenthesis: column i pairs with	column j]

	      '<'  '>'	[matching angular brackets: column i pairs with	column
	      j]

	      All other	characters are not interpreted (yet).  Note:  Activat-
	      ing this flag implies --constraint.

       --shape=file1,file2
	      Use SHAPE	reactivity data	to guide structure predictions.

	      Multiple	shapefiles  for	the individual sequences in the	align-
	      ment may be specified  as	a comma	separated  list.  An  optional
	      association of particular	shape files to a specific  sequence in
	      the alignment can	be expressed by	prepending the sequence	number
	      to  the filename,	 e.g.  "5=seq5.shape,3=seq3.shape" will	assign
	      the reactivity values from file seq5.shape  to   the  fifth  se-
	      quence  in the alignment,	and the	values from file seq3.shape to
	      sequence 3. If  no assignment is specified, the reactivity  val-
	      ues  are	assigned to corresponding sequences in	the order they
	      are given.

       --shapeMethod=D[mX][bY]
	      Specify the method how  to  convert  SHAPE  reactivity  data  to
	      pseudo energy contributions.

	      (default=`D')

	      Currently,  the only data	conversion method available is that of
	      to Deigan	et al 2009.  This method is the	default	and is	recog-
	      nized  by	 a  capital  'D'  in  the  provided  parameter,	 i.e.:
	      --shapeMethod="D"	is the default setting.	 The slope 'm' and the
	      intercept	'b' can	be set to a  non-default value	if  necessary.
	      Otherwise	 m=1.8 and b=-0.6 as stated in the paper mentionen be-
	      fore.  To	alter these parameters,	e.g. m=1.9 and b=-0.7,	use  a
	      parameter	 string	like this: --shapeMethod="Dm1.9b-0.7". You may
	      also  provide  only   one	  of   the   two    parameters	 like:
	      --shapeMethod="Dm1.9" or --shapeMethod="Db-0.7".

   Energy Parameters:
	      Energy  parameter	 sets  can be adapted or loaded	from user-pro-
	      vided input files

       -T, --temp=DOUBLE
	      Rescale energy parameters	to a temperature of temp C. Default is
	      37C.

	      (default=`37.0')

       -P, --paramFile=paramfile
	      Read energy parameters from paramfile, instead of	using the  de-
	      fault parameter set.

	      Different	 sets  of energy parameters for	RNA and	DNA should ac-
	      company your distribution.  See the RNAlib documentation for de-
	      tails on the file	format.	The placeholder	file name 'DNA'	can be
	      used to load DNA parameters without the need to actually specify
	      any input	file.

       -4, --noTetra
	      Do not include special tabulated stabilizing energies for	 tri-,
	      tetra- and hexaloop hairpins.

	      (default=off)

	      Mostly for testing.

       --salt=DOUBLE
	      Set salt concentration in	molar (M). Default is 1.021M.

   Model Details:
	      Tweak  the energy	model and pairing rules	additionally using the
	      following	parameters

       -d, --dangles=INT
	      How to treat "dangling end" energies for bases adjacent  to  he-
	      lices in free ends and multi-loops.

	      (default=`2')

	      With  -d2	dangling energies will be added	for the	bases adjacent
	      to a helix on both sides

	      in any case.

	      The option -d0 ignores dangling ends altogether (mostly for  de-
	      bugging).

       --noLP Produce structures without lonely	pairs (helices of length 1).

	      (default=off)

	      For  partition  function	folding	this only disallows pairs that
	      can only occur isolated. Other pairs may still occasionally  oc-
	      cur as helices of	length 1.

       --noGU Do not allow GU pairs.

	      (default=off)

       --noClosingGU
	      Do not allow GU pairs at the end of helices.

	      (default=off)

       --cfactor=DOUBLE
	      Set the weight of	the covariance term in the energy function

	      (default=`1.0')

       --nfactor=DOUBLE
	      Set  the	penalty	for non-compatible sequences in	the covariance
	      term of the energy function

	      (default=`1.0')

       -E, --endgaps
	      Score pairs with endgaps same as gap-gap pairs.

	      (default=off)

       -R, --ribosum_file=ribosumfile
	      use specified Ribosum Matrix instead of normal

	      energy model.

	      Matrixes to use should be	6x6 matrices, the order	of  the	 terms
	      is 'AU', 'CG', 'GC', 'GU', 'UA', 'UG'.

       -r, --ribosum_scoring
	      use ribosum scoring matrix.  (default=off)

	      The  matrix is chosen according to the minimal and maximal pair-
	      wise identities of the sequences in the file.

       --old  use old energy evaluation, treating gaps as characters.

	      (default=off)

       --nsp=STRING
	      Allow other pairs	in addition to the usual AU,GC,and GU pairs.

	      Its argument is a	comma separated	list of	 additionally  allowed
	      pairs.  If  the first character is a "-" then AB will imply that
	      AB and BA	are allowed pairs, e.g.	--nsp="-GA"  will allow	GA and
	      AG pairs.	Nonstandard pairs are given 0 stacking energy.

       --energyModel=INT
	      Set energy model.

	      Rarely used option to fold sequences from	the artificial ABCD...
	      alphabet,	where A	pairs B, C-D etc.  Use the  energy  parameters
	      for GC (--energyModel 1) or AU (--energyModel 2) pairs.

       --helical-rise=FLOAT
	      Set the helical rise of the helix	in units of Angstrom.

	      (default=`2.8')

	      Use with caution!	This value will	be re-set automatically	to 3.4
	      in  case	DNA  parameters	 are  loaded via -P DNA	and no further
	      value is provided.

       --backbone-length=FLOAT
	      Set the average backbone length for looped regions in  units  of
	      Angstrom.

	      (default=`6.0')

	      Use  with	 caution!  This	 value will be re-set automatically to
	      6.76 in case DNA parameters are loaded via -P DNA	and no further
	      value is provided.

   Plotting:
	      Command line options for changing	the default behavior of	struc-
	      ture layout and pairing probability plots

       --color
	      Produce a	 colored  version  of  the  consensus  structure  plot
	      "alirna.ps" (default b&w only)

	      (default=off)

       --color-threshold=FLOAT
	      Set  the threshold of maximum counter examples for coloring con-
	      sensus structure plot.

	      (default=`2')

	      Floating point numbers between 0 and 1 are treated  as  frequen-
	      cies  among  all	sequencesin  the  alignment. All other will be
	      truncated	to integer and used as absolute	number of counter  ex-
	      amples.

       --color-min-sat=FLOAT
	      Set  the	minimum	 saturation  for  coloring consensus structure
	      plot.

	      (default=`0.2')

	      Floating point number >= 0 and smaller than 1.

       --aln  Produce a	colored	and structure  annotated  alignment  in	 Post-
	      Script format in the file	"aln.ps" in the	current	directory.

	      (default=off)

       --aln-EPS-cols=INT
	      Number of	columns	in colored EPS alignment output.

	      (default=`60')

	      A	 value	less  than  1  indicates that the output should	not be
	      wrapped at all.

       --aln-stk[=prefix]
	      Create  a	  multi-Stockholm   formatted	output	 file.	  (de-
	      fault=`RNAalifold_results')

	      The  default  file  name	used for the output is "RNAalifold_re-
	      sults.stk".  Users may change the	filename  to  "prefix.stk"  by
	      specifying  the  prefix  as  optional argument. The file will be
	      create in	the current directory if it does not already exist. In
	      case the file already exists, output will	 be  appended  to  it.
	      Note: Any	special	characters in the filename will	be replaced by
	      the  filename delimiter, hence there is no way to	pass an	entire
	      directory	path through this option yet. (See also	 the  "--file-
	      name-delim" parameter)

       --noPS Do not produce postscript	drawing	of the mfe structure.

	      (default=off)

       --noDP Do  not produce dot-plot postscript file containing base pair or
	      stack probabilitities.

	      (default=off)

	      In combination with the -p option, this flag turns-off  creation
	      of  individual  dot-plot files. Consequently, computed base pair
	      probability output is omitted but	 centroid  and	MEA  structure
	      prediction is still performed.

       -t, --layout-type=INT
	      Choose the layout	algorithm.  (default=`1')

	      Select the layout	algorithm that computes	the nucleotide coordi-
	      nates.  Currently, the following algorithms are available:

	      '0': simple radial layout

	      '1': Naview layout (Bruccoleri et	al. 1988)

	      '2': circular layout

	      '3': RNAturtle (Wiegreffe	et al. 2018)

	      '4': RNApuzzler (Wiegreffe et al.	2018)

       Caveats:

       Sequences  are  not  weighted. If possible, do not mix very similar and
       dissimilar sequences. Duplicate sequences, for example, can distort the
       prediction.

REFERENCES
       If you use this program in your work you	might want to cite:

       R. Lorenz, S.H. Bernhart, C.  Hoener  zu	 Siederdissen,	H.  Tafer,  C.
       Flamm,  P.F. Stadler and	I.L. Hofacker (2011), "ViennaRNA Package 2.0",
       Algorithms for Molecular	Biology: 6:26

       I.L. Hofacker, W. Fontana, P.F. Stadler,	S. Bonhoeffer, M.  Tacker,  P.
       Schuster	 (1994),  "Fast	Folding	and Comparison of RNA Secondary	Struc-
       tures", Monatshefte f. Chemie: 125, pp 167-188

       R. Lorenz, I.L. Hofacker, P.F. Stadler (2016), "RNA folding  with  hard
       and soft	constraints", Algorithms for Molecular Biology 11:1 pp 1-13

       The  algorithm is a variant of the dynamic programming algorithms of M.
       Zuker and P. Stiegler (mfe) and J.S. McCaskill (pf) adapted for sets of
       aligned sequences with covariance information.

       Ivo L. Hofacker,	Martin Fekete, and Peter F. Stadler (2002), "Secondary
       Structure Prediction for	Aligned	RNA Sequences",	J.Mol.Biol.:  319,  pp
       1059-1066.

       Stephan	H.  Bernhart, Ivo L. Hofacker, Sebastian Will, Andreas R. Gru-
       ber, and	Peter  F.  Stadler  (2008),  "RNAalifold:  Improved  consensus
       structure prediction for	RNA alignments", BMC Bioinformatics: 9,	pp 474

       The energy parameters are taken from:

       D.H.  Mathews, M.D. Disney, D. Matthew, J.L. Childs, S.J. Schroeder, J.
       Susan, M. Zuker,	D.H. Turner (2004), "Incorporating chemical  modifica-
       tion constraints	into a dynamic programming algorithm for prediction of
       RNA secondary structure", Proc. Natl. Acad. Sci.	USA: 101, pp 7287-7292

       D.H  Turner, D.H. Mathews (2009), "NNDB:	The nearest neighbor parameter
       database	for predicting stability of nucleic acid secondary structure",
       Nucleic Acids Research: 38, pp 280-282

EXAMPLES
       A simple	call to	compute	consensus MFE structure, ensemble free energy,
       base pair probabilities,	centroid structure, and	MEA  structure	for  a
       multiple	 sequence alignment (MSA) provided as Stockholm	formatted file
       alignment.stk might look	like:

	 $ RNAalifold -p --MEA alignment.stk

       Consider	the following MSA file for three sequences

	 # STOCKHOLM 1.0

	 #=GF AC   RF01293
	 #=GF ID   ACA59
	 #=GF DE   Small nucleolar RNA ACA59
	 #=GF AU   Wilkinson A
	 #=GF SE   Predicted; WAR; Wilkinson A
	 #=GF SS   Predicted; WAR; Wilkinson A
	 #=GF GA   43.00
	 #=GF TC   44.90
	 #=GF NC   40.30
	 #=GF TP   Gene; snRNA;	snoRNA;	HACA-box;
	 #=GF BM   cmbuild -F CM SEED
	 #=GF CB   cmcalibrate --mpi CM
	 #=GF SM   cmsearch --cpu 4 --verbose --nohmmonly -E 1000 -Z 549862.597050 CM SEQDB
	 #=GF DR   snoRNABase; ACA59;
	 #=GF DR   SO; 0001263;	ncRNA_gene;
	 #=GF DR   GO; 0006396;	RNA processing;
	 #=GF DR   GO; 0005730;	nucleolus;
	 #=GF RN   [1]
	 #=GF RM   15199136
	 #=GF RT   Human box H/ACA pseudouridylation guide RNA machinery.
	 #=GF RA   Kiss	AM, Jady BE, Bertrand E, Kiss T
	 #=GF RL   Mol Cell Biol. 2004;24:5797-5807.
	 #=GF WK   Small_nucleolar_RNA
	 #=GF SQ   3

	 AL031296.1/85969-86120	    CUGCCUCACAACGUUUGUGCCUCAGUUACCCGUAGAUGUAGUGAGGGUAACAAUACUUACUCUCGUUGGUGAUAAGGAACAGCU
	 AANU01225121.1/438-603	    CUGCCUCACAACAUUUGUGCCUCAGUUACUCAUAGAUGUAGUGAGGGUGACAAUACUUACUCUCGUUGGUGAUAAGGAACAGCU
	 AAWR02037329.1/29294-29150 ---CUCGACACCACU---GCCUCGGUUACCCAUCGGUGCAGUGCGGGUAGUAGUACCAAUGCUAAUUAGUUGUGAGGACCAACU
	 #=GC SS_cons		    -----((((,<<<<<<<<<___________>>>>>>>>>,,,,<<<<<<<______>>>>>>>,,,,,))))::::::::::::
	 #=GC RF		    CUGCcccaCAaCacuuguGCCUCaGUUACcCauagguGuAGUGaGgGuggcAaUACccaCcCucgUUgGuggUaAGGAaCAgCU
	 //

       Then, the above program call will produce this output:

	 3 sequences; length of	alignment 84.
	 >ACA59
	 CUGCCUCACAACAUUUGUGCCUCAGUUACCCAUAGAUGUAGUGAGGGUAACAAUACUUACUCUCGUUGGUGAUAAGGAACAGCU
	 ...((((((.(((((((((...........))))))))).))))))..........(((((......)))))............ (-12.54 =	-12.77 +   0.23)
	 ...((((((.(((((((((...........))))))))).)))))){{,.......{{{{,......}))))............ [-14.38]
	 ...((((((.(((((((((...........))))))))).))))))..........((((........))))............ {-12.44 =	-12.33 +  -0.10	d=10.94}
	 ...((((((.(((((((((...........))))))))).))))))..........((((........))))............ {-12.44 =	-12.33 +  -0.10	MEA=66.65}
	  frequency of mfe structure in	ensemble 0.368739; ensemble diversity 17.77

       Here, the first line is written to stderr and simply states the	number
       of  sequences  and  the	length of the alignment. This line can be sup-
       pressed using the --quiet option.  The main output then consists	 of  7
       lines,  where  the  first  two resemble the FASTA header	with the ID as
       read from the input data	set, followed by the consensus sequence	in the
       second line. The	third line consists of the consensus secondary	struc-
       ture  in	dot-bracket notation followed by the averaged minimum free en-
       ergy in parenthesis. This energy	is composed  of	 two  major  contribu-
       tions,  the  actual  free  energies  derived  from the Nearest Neighbor
       model, and the covariance pseudo-energy term, which are both  displayed
       after the equal sign. The fourth	line shows the base pair propensity in
       pseudo  dot-bracket  notation followed by the ensemble free energy dG =
       -kT ln(Z) in square brackets.  Similarly, the next two lines state  the
       controid-  and  the  MEA	structure in dot-bracket notation, followed by
       their corresponding free	energy contributions, the mean distance	(d) to
       the ensemble as well as the maximum expected accuracy (MEA). Again, the
       free energies are split into Nearest Neighbor contribution and the  co-
       variance	pseudo-energy term.

       Furthermore,  RNAalifold	 will produce three output files: ACA59_ss.ps,
       ACA59_dp.ps, and	ACA59_ali.out that  contain  the  secondary  structure
       drawing,	 the  base  pair probability dot-plot, and a detailed table of
       base pair probabilities,	respectively.

THE ALIOUT FILE
       When computing base pair	probabilities (--partfunc option),  RNAalifold
       will  produce  a	file with the suffix `ali.out`.	This file contains the
       base pairing probabilities between different alignment columns together
       with some detailed statistics for the individual	sequences  within  the
       alignment.  The	file is	a simple text file with	a two line header that
       states the number of sequences and length of the	alignment.  The	 first
       couple of lines of this file may	look like:

	 3 sequence; length of alignment 84
	 alifold output
	    14	  36  0	 92.7%	 0.212 CG:1    UA:2
	    13	  37  0	 92.7%	 0.218 GU:1    AU:2
	    12	  38  0	 92.7%	 0.231 CG:3
	    15	  35  0	 91.9%	 0.239 UG:3
	    16	  34  0	 85.2%	 0.434 UA:2    --:1
	     8	  42  0	 80.7%	 0.526 AU:3   +
	     9	  41  0	 80.4%	 0.542 CG:3   +
	     7	  43  1	 80.1%	 0.541 CG:2   +

       Starting	 with  the  third  row,	 there are at least six	and at most 13
       columns separated by whitespaces	stating: (1) the  i-position  and  (2)
       the  j-position	of  a  potential base pair (i, j), followed by (3) the
       number of counter examples, i.e.	the number of sequences	in the	align-
       ment  that  can't  form a canonical base	pair with their	respective se-
       quence positions.  Next is (4) the base pair probabilitiy  in  percent,
       (5)  a  pseudo  entropy measure S_ij = S_i + S_j	- p_ij ln(p_ij), where
       S_i and S_j are the positional entropies	for the	two alignment  columns
       i  and  j,  and	p_ij  is  the base pair	probability. Finally, the last
       columns (6-12) state the	number of particular base pairs	for the	 indi-
       vidual  sequences in the	alignment. Here, we distinguish	the base pairs
       "GC","CG","AU","UA","GU","UG", and the special case  "--"  that	repre-
       sents gaps at both positions i and j.  Finally, base pairs that are not
       part  of	 the MFE structure are marked by an additional "+" sign	in the
       last column.

AUTHOR
       Ivo L Hofacker, Stephan Bernhart, Ronny Lorenz

REPORTING BUGS
       If in doubt our program is right, nature	is at fault.  Comments	should
       be sent to rna@tbi.univie.ac.at.

SEE ALSO
       The ALIDOT package http://www.tbi.univie.ac.at/RNA/Alidot/

RNAalifold 2.7.0		 October 2024			 RNAALIFOLD(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=RNAalifold&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help