Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
ANALYSESEQS(1)		    General Commands Manual		ANALYSESEQS(1)

NAME
       AnalyseSeqs - Analyse a set of sequences	of common length

SYNOPSIS
       AnalyseSeqs [-X[bswn]] [-Q] [-M{mask}[+|!]] [-D{H|A|G}] [-d{S|H|D|B}]

DESCRIPTION
       AnalyseSeqs  reads a set	of sequences from stdin	and tries a variety of
       methods for sequence analysis on	them. Currently	available are:
       Statistical geometry for	quadruples of sequences; THIS  IS  PRELIMINARY
       AND NOT WELL TESTED BY NOW.
       split  decomposition;  neighbour	joining	and Ward's variance method for
       reconstructing phylogenies using	various	distance measures.   For  sta-
       tistical	 geometry  and the cluster methods PostScript output is	avail-
       able.
       The program continues reading until it encounters one of	the  separator
       characters  '@' or '%'. Only sequences of alphabetical characters or of
       a specified alphabet are	processed, all other lines  are	 ignored.  The
       program	stops  reading if it either encounters an EOF condition, or if
       there are no valid sequence data	between	two lines beginning with sepa-
       rator characters.
       A list of taxa names can	be specified in	the input stream. The list be-
       gins with a line	beginning with '*'. Optionally,	 a  file  name	prefix
       [fn]  for the PostScript	output can be specified	in this	line.  The en-
       tries have the form 'x :	Taxon',	where x	is the number of taxon,	 i.e.,
       of  the	corresponding  entry  in the list of input sequences. The taxa
       list need not be	complete. It must end, however,	with a line  beginning
       with  '*'  or any of the	separator characters. The taxa list is printed
       on top of the output. The specified taxa	names are used	as  labels  in
       the PostScript output.

OPTIONS
       -X[bswn]
	      specifies	the analysis methods to	be used.

       [b]    Statistical Geometry. A PostScript file named '[fn_]box.ps' giv-
	      ing  a  graphical	 representation	of the statistical geometry is
	      created. The resulting box is a good measure of 'tree  likeness'
	      of the data set.	This is	the default.

       [s]    Split decomposition.

       [w]    Cluster  analysis	 using	Ward's method. A PostScript file named
	      '[fn_]wards.ps' is created containing a drawing of the tree.

       [n]    Cluster analysis using  Saitou's	neighbour  joining  method.  A
	      PostScript file named '[fn_]nj.ps' is created containing a draw-
	      ing of the tree.

       -Q     indicates	 that  a  statistical  geometry	analysis is to be per-
	      formed comparing four data sets, for  instance  to  confirm  the
	      significance of a	proposed phylogeny. This option	is only	useful
	      for statistical geometry analysis	and hence the -X option	is ig-
	      nored. Each of the four data sets	must be	of the form
	      *	[filename_prefix]
	      #	number
	      [list of taxa names]
	      *
	      list of sequences
	      %
	      where number is 1,2,3,4 for the four groups to be	compared.

       -M{mask}[+|!]
	      allows one to specify a mask for the input file. '{mask}'	can be
	      one of the following letters indicating a	predefined alphabet or
	      the  %-sign  followed by all characters to be accepted. A	+ sign
	      at the very end of the mask indicates that the input  is	to  be
	      handled  case  sensitive.	 Default is conversion of the input to
	      upper case. A ! sign can be used to convert the input data to RY
	      code: GgAaXx -> R, UuCcKkTt -> Y,	all  other  letters  are  con-
	      verted to	*.

       -Ma    all letters A-Z and a-z.

       -Mu    uppercase	letters.

       -Ml    lowercase	letters.

       -Mc    digits [0-9].

       -Mn    all alphanumeric characters.

       -MR    RNA alphabet (GCAUgcau).

       -MD    DNA alphabet (GCATgcat).

       -MA    Amino acids in one-letter	code.

       -MS    Secondary	strcutures coded as '^.()'

       -M%alphabet
	      use the specified	alphabet.

       -D     specifies	 the algorithm to be used for calculating the distance
	      matrix of	the input data set. Available are

       -DH    Hamming Distance

       -DA[,cost]
	      Simple alignment distance	according to Needleman and Wunsch.   A
	      gap cost different from 1. can be	specified after	the comma.

       -DG[,cost1,cost2]
	      Gotoh's	 distance    with    gap    cost   function   g(k)   =
	      cost2+cost1*(k-1). cost2<=cost1 has to  be  fulfilled.   Default
	      values are cost1=1., cost2=1., yielding the same distance	as op-
	      tion A.
	      ONLY THE HAMMING DISTANCE	IS WELL	TESTED BY NOW !!!

       -d     specifies	the edit cost matrix to	be used. Available are

       -dS    simple  distance.	Indel and substitution of different characters
	      all have cost 1. The indel cost can be set by specifying the gap
	      costs with the algorithm options -DA and -DG. This  is  the  de-
	      fault.

       -dH    A	 distance  matrix  for	RNA  secondary structures. Inspired by
	      Hogeweg's	similarity measure (J.Mol.Biol 1988).  Gap-function is
	      set automatically.

       -dD    Dayhoff's	matrix for amino acid distances.

       -dB    Distinguish purines and pyrimidines only.	 CAUTION  this	option
	      of course	influences only	the calculation	of distances.  It does
	      NOT affect computation of	the statistical	geometry. This is done
	      directly	on the sequences. If you want to do statistical	geome-
	      try on RY	sequences use the ! sign with the -M option,  for  in-
	      stance -MR!.

REFERENCES
       The  method of statistical geometry has been introduced by M. Eigen, R.
       Winkler-Oswatitsch and A.W.M. Dress (Proc Natl Acad Sci,	85:1988,5912).
       The method of split decomposition was  proposed	by  H.J.  Bandelt  and
       A.W.M.  Dress  (Adv Math, 92:1992,47).  The variance method for cluster
       analysis	is due to H.J. Ward  (J	 Amer  Stat  Ass,  58:1963,236).   The
       neighbour  joining  method  was	published  by Saitou and Nei (Mol Biol
       Evol, 4:1987,406).

       This program is part of the Vienna RNA Package

WARNING
       This is the beta	test version. Some options or combinations of  options
       may  still  produce  nonsense.  Please send bug reports to ivo@tbi.uni-
       vie.ac.at.

VERSION
       This man	page is	part of	the Vienna RNA Package version 1.2.

AUTHOR
       Peter F Stadler,	Ivo L. Hofacker.

BUGS
       Comments	should be sent to ivo@itc.univie.ac.at.

								ANALYSESEQS(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=AnalyseSeqs&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help