FreeBSD Manual Pages

home | help
SSEARCH(1)		    General Commands Manual		    SSEARCH(1)

NAME
       ssearch - scan a	protein	or DNA sequence	library	for similar sequences

SYNOPSIS
       ssearch	[-a -b # -d # -E # -f #	-g # -h	-i -l FASTLIBS	-L -r STATFILE
       -m # -O filename	-Q -s SMATRIX -w # -z ]	 query-sequence-file  library-
       file

       ssearch [-QabdEfghilmOrswz] query-file @library-name-file

       ssearch [-QabdEfghilmOrswz] query-file "%PRMVI"

       ssearch [-aEfghilmrsw] -	interactive mode

DESCRIPTION
       ssearch	compares  a protein or DNA sequence to all of the entries in a
       sequence	library	using the rigorous Smith-Waterman algorithm (Smith and
       Waterman, J. Mol. Biol. (1983) 147:195-197.  For	example,  ssearch  can
       compare a protein sequence to all of the	sequences in the NBRF PIR pro-
       tein  sequence database.	 ssearch will automatically decide whether the
       query sequence is DNA or	protein	by reading the query sequence as  pro-
       tein  and determining whether the `amino-acid composition' is more than
       85% A+C+G+T.  The program can be	invoked	either with command line argu-
       ments or	in interactive mode.  ssearch compares a query sequence	 to  a
       sequence	library	which consists of sequence data	interspersed with com-
       ments,  see  below.  The	fasta programs,	including ssearch, use a stan-
       dard text format	sequence file.	Lines beginning	with  or  lower	 case,
       blanks,tabs and unrecognizable characters are ignored.  ssearch expects
       sequences to use	the single letter amino	acid codes, see	protcodes(1) .
       Library files for ssearch should	have the form shown below.

OPTIONS
       ssearch	can  be	 directed to change the	scoring	matrix,	search parame-
       ters, output format, and	default	search directories by entering options
       on the command line (preceeded by a `-'). All  of  the  options	should
       preceed	the  file name and ktup	arguments). Alternately, these options
       can be changed by setting environment variables.	 The options and envi-
       ronment variables are:

       -a     (SHOWALL)	Modifies the display of	the two	 sequences  in	align-
	      ments.  Normally,	both sequences are shown only where they over-
	      lap (SHOWALL=0); If -a or	the environment	variable SHOWALL =  1,
	      both sequences are shown in their	entirety.

       -b #   The  number  of similarity scores	to be shown when the -Q	option
	      is used.	This value is usually calculated based on  the	actual
	      scores.

       -d #   The  number  of alignments to be shown.  Normally, ssearch shows
	      the same number of alignments as similarity  scores.   By	 using
	      ssearch  -Q  -b 200 -d 50, one would see the top scoring 200 se-
	      quences and alignments for the 50	best scores.

       -E #   The expectation value threshold for displaying similarity	scores
	      and sequence alignments.	fasta -Q -E 2.0	would show all library
	      sequences	with scores expected to	occur no more than 2 times  by
	      chance in	a search of the	library.

       -f #   Penalty for the first residue in a gap (-12 by default).

       -g #   Penalty for additional residues in a gap (-2 by default).

       -h     Do not display histogram of similarity scores.

       -l file
	      (FASTLIBS)  The  name  of	 the library menu file.	 Normally this
	      will be determined by the	environment variable  FASTLIBS.	  How-
	      ever, a library menu file	can also be specified with -l.

       -L     display  more  information  about	 the  library  sequence	in the
	      alignment.

       -m #   (MARKX) =0,1,2,3.	Alternate display of matches and mismatches in
	      alignments. MARKX=0 uses ":","."," ", for	 identities,  conseva-
	      tive  replacements,  and	non-conservative replacements, respec-
	      tively. MARKX=1 uses " ","x", and	"X".  MARKX=2  does  not  show
	      the  second sequence, but	uses the second	alignment line to dis-
	      play matches with	a "."  for identity, or	 with  the  mismatched
	      residue  for  mismatches.	  MARKX=2 is useful for	aligning large
	      numbers of similar sequences.  MARKX=3 writes out	a file of  li-
	      brary  sequences in FASTA	format.	 MARKX=3 should	always be used
	      with the "SHOWALL" (-a) option, but this does not	completely en-
	      sure that	all of the sequences output will be aligned.

       -O filename
	      Sends copy of results to "filename".

       -Q Quiet	option.	 This allows ssearch to	search a database and report
	      the results without asking any questions.	ssearch	 -Q  file  li-
	      brary  >	output	can be put in the background or	run at a later
	      time with	the unix  'at'	command.   The	number	of  similarity
	      scores  and alignments displayed with the	-Q option can be modi-
	      fied with	the -b (scores)	and -d (alignments) options.

       -r     STATFILE Causes ssearch to write out  the	 sequence  identifier,
	      superfamily  number  (if	available),  and  similarity scores to
	      STATFILE for every sequence in the library.  These  results  are
	      not sorted.

       -s str (SMATRIX)	 the  filename	of an alternative scoring matrix file.
	      For protein sequences, BLOSUM50 is used by default;  PAM250  can
	      be used with the command line option -s 250.

       -w #   (LINLEN)	output line length for sequence	alignments.  (normally
	      60, can be set up	to 200).

       -z     Do not do	statistical significance calculation.

EXAMPLES
       (1)    ssearch musplfm.aa $AABANK

       Compare the amino acid sequence in the file musplfm.aa  with  the  com-
       plete  PIR protein sequence library.  This is extremely slow and	should
       almost never be done.  ssearch is designed to  search  very  small  li-
       braries of sequences.

	    >LCBO bovine preprolactin
	    WILLLSQ ...
	    >LCHU human	...
	    ...

       (2)    ssearch -a -w 80 musplfm.aa lcbo.aa

       Compare	the  amino  acid  sequence in the file musplfm.aa with the se-
       quences in the file lcbo.aa using ktup =	1.   Show  both	 sequences  in
       their entirety, with 80 residues	on each	output line.

       (3)    ssearch

       Run  the	 ssearch program in interactive	mode.  The program will	prompt
       for the file name for the query sequence, list alternative libraries to
       be seached (if FASTLIBS is set),	and prompt for the ktup.

       You can use your	own sequence files for ssearch,	just be	certain	to put
       a '>' and comment as the	first line before the sequence.

SEE ALSO
       rss(1), align(1), fasta(1), rdf2(1),protcodes(5), dnacodes(5)

AUTHOR
       Bill Pearson
       wrp@virginia.EDU

				     local			    SSEARCH(1)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=ssearch&sektion=1&manpath=FreeBSD+Ports+15.0>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages