Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
PRSS3(1)		    General Commands Manual		      PRSS3(1)

NAME
       prss - test a protein sequence similarity for significance

SYNOPSIS
       prss34 [-Q -A -f	# -g # -H -O file -s SMATRIX -w	# -Z # -k # -v # ] se-
       quence-file-1 sequence-file-2 [ #-of-shuffles ]

       prfx34  [-Q -A -f # -g #	-H -O file -s SMATRIX -w # -z 1,3 -Z # -k # -v
       # ] sequence-file-1 sequence-file-2 [ ktup ] [ #-of-shuffles ]

       prss34(_t)/prfx34(_t) [-AfghksvwzZ] - interactive mode

DESCRIPTION
       prss34 and prfx34 are used to  evaluate	the  significance  of  a  pro-
       tein:protein,  DNA:DNA (	prss34 ), or translated-DNA:protein ( prfx34 )
       sequence	similarity score by comparing two  sequences  and  calculating
       optimal similarity scores, and then repeatedly shuffling	the second se-
       quence,	and  calculating optimal similarity scores using the Smith-Wa-
       terman algorithm. An extreme value distribution	is  then  fit  to  the
       shuffled-sequence scores.  The characteristic parameters	of the extreme
       value  distribution are then used to estimate the probability that each
       of the unshuffled sequence scores would be obtained by  chance  in  one
       sequence,  or in	a number of sequences equal to the number of shuffles.
       This program is derived from rdf2, described  by	 Pearson  and  Lipman,
       PNAS  (1988) 85:2444-2448, and Pearson (Meth. Enz.  183:63-98).	Use of
       the extreme value distribution for estimating the probabilities of sim-
       ilarity scores  was  described  by  Altshul  and	 Karlin,  PNAS	(1990)
       87:2264-2268.   The and expectations calculated by prdf.	 prss34	calcu-
       lates optimal scores using the same rigorous  Smith-Waterman  algorithm
       (Smith  and  Waterman,  J.  Mol.	 Biol. (1983) 147:195-197) used	by the
       ssearch34 program.  prfx34 calculates scores using the FASTX  algorithm
       (Pearson	et al. (1997) Genomics 46:24-36.

       prss34  and  prfx34  also  allow	a more sophisticated shuffling method:
       residues	can be shuffled	within a local window, so that	the  order  of
       residues	 1-10,	11-20, etc, is destroyed but a residue in the first 10
       is never	swapped	with a residue outside the first ten, and  so  on  for
       each local window.

EXAMPLES
       (1)    prss34  -v 10 musplfm.aa lcbo.aa

       Compare	the  amino  acid  sequence in the file musplfm.aa with that in
       lcbo.aa,	then shuffle lcbo.aa 200 times using a local  shuffle  with  a
       window  of  10.	Report the significance	of the unshuffled musplfm/lcbo
       comparison scores with respect to the shuffled scores.

       (2)    prss34 musplfm.aa	lcbo.aa	1000

       Compare the amino acid sequence in the file  musplfm.aa	with  the  se-
       quences	in  the	 file lcbo.aa, shuffling lcbo.aa 1000 times.  Shuffles
       can also	be specified with the -k # option.

       (3)    prfx34 mgstm1.esq	xurt8c.aa 2 1000

       Translate the DNA sequence in the mgstm1.esq file in all	six frames and
       compare it to the amino acid sequence  in  the  file  xurt8c.aa,	 using
       ktup=2  and  shuffling xurt8c.aa	1000 times.  Each comparison considers
       the best	forward	or reverse alignment with frameshifts, using the fastx
       algorithm (Pearson et al	(1997) Genomics	46:24-36).

       (4)    prss34/prfx34

       Run prss	in interactive mode.  The program will	prompt	for  the  file
       name  of	 the two query sequence	files and the number of	shuffles to be
       used.

OPTIONS
       prss34/prfx34 can be directed to	change the scoring matrix, gap	penal-
       ties,  and  shuffle  parameters by entering options on the command line
       (preceeded by a `-'). All of the	options	should preceed the file	 names
       number of shuffles.

       -A     Show unshuffled alignment.

       -f #   Penalty for opening a gap	(-10 by	default	for proteins).

       -g #   Penalty  for  additional	residues  in a gap (-2 by default) for
	      proteins.

       -H     Do not display histogram of similarity scores.

       -k #   Number of	shuffles (200 is the default)

       -Q -q  "quiet" -	do not prompt for filename.

       -O filename
	      send copy	of results to "filename."

       -s str specify the scoring matrix.  BLOSUM50 is	used  by  default  for
	      proteins;	 +5/-4	is  used by defaul for DNA.  prss34 recognizes
	      the same scoring matrices	as fasta34, ssearch34,	fastx34,  etc;
	      e.g.  BL50,  P250, BL62, BL80, MD10, MD20, and other matrices in
	      BLAST1.4 matrix format.

       -v #   Use a local window shuffle with a	window size of #.

       -z #   Calculate	statistical significance using the mean/variance  (mo-
	      ments)  approach used by fasta34/ssearch or from maximum likeli-
	      hood estimates of	lambda and K.

       -Z #   Present statistical significance as if a '#' entry database  had
	      been searched (e.g. "-Z 50000" presents statistical significance
	      as if 50,000 sequences had been compared).

ENVIRONMENT VARIABLES
       (SMATRIX) the filename of an alternative	scoring	matrix file.  For pro-
       tein  sequences,	 BLOSUM50  is used by default; PAM250 can be used with
       the command line	option -s P250(or with -s pam250.mat).	 BLOSUM62  (-s
       BL62) and PAM120	(-S P120).

SEE ALSO
       ssearch3(1), fasta3(1).

AUTHOR
       Bill Pearson
       wrp@virginia.EDU

				     local			      PRSS3(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=prss3&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help