Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
FA2HTGS(1)		   NCBI	Tools User's Manual		    FA2HTGS(1)

NAME
       fa2htgs	- formatter for	high throughput	genome sequencing project sub-
       missions

SYNOPSIS
       fa2htgs [-] [-6 str] [-7	str] [-A filename] [-C str] [-D] [-L filename]
       [-M str]	[-N] [-O filename] [-P str] [-Q	filename]  [-S str]  [-T file-
       name]  [-X] [-a str] [-b	N] [-c str] [-d	str] [-e filename] [-f]	-g str
       [-h str]	[-i filename]  [-k str]	 [-l N]	 [-m]  [-n str]	 [-o filename]
       [-p N] [-q] [-r str] -s str [-t filename] [-u] [-v] [-w]	[-x str]

DESCRIPTION
       fa2htgs	is  a  program used to generate	Seq-submits (an	ASN.1 sequence
       submission file)	for high throughput genome sequencing projects.

       fa2htgs will read a FASTA file (or an Ace Contig	file  with  Phrap  se-
       quence quality values), a Sequin	submission template file, (to get con-
       tact and	citation information for the submission), and a	series of com-
       mand line arguments (see	below).	 This program will then	combines these
       information  to	make  a	submission suitable for	GenBank. Once you have
       generated your submission file, you need	to follow the submission  pro-
       tocol (see the README present on	your FTP account or mailed out to your
       Center).

       fa2htgs	is  intended for the automation	by scripts for bulk submission
       of unannotated genome sequence. It can easily be	extended from its cur-
       rent simple form	to allow more complicated  processing.	 A  submission
       prepared	 with fa2htgs can also be read into Psequin(1),	and then anno-
       tated more extensively.

       Questions and concerns about this processing protocol, or  how  to  use
       this tool should	be forwarded to	<htgs@ncbi.nlm.nih.gov>.

OPTIONS
       A summary of options is included	below.

       -      Print usage message

       -6 str SP6 clone	(e.g., Contig1,left)

       -7 str T7 clone (e.g., Contig2,right)

       -A filename
	      Filename	for  accession	list input (mutually exclusive with -T
	      and -i).	The input file contains	 a  tab-delimited  table  with
	      three  to	 five columns, which are accession number, start posi-
	      tion, stop position, and (optionally)  length  and  strand.   If
	      start  >	stop,  the minus strand	on the referenced accession is
	      used.  A gap is indicated	by the word "gap" instead of an	acces-
	      sion, 0 for the start and	stop positions,	and a number  for  the
	      length.

       -C str Clone  library  name  (will  appear  as  /clone-lib="str"	on the
	      source feature)

       -D     HTGS_DRAFT sequence

       -L filename
	      Read phrap contig	order from filename.  This is a	 tab-delimited
	      file  that  can  be used to drive	the order of contigs (normally
	      specified	by -P),	as well	as indicating the SP6 and T7 ends.  It
	      can also be used when contigs are	known to be in opposite	orien-
	      tation.  For example:

		  Contig2     +	      1	      SP6     left
		  Contig3     +	      1
		  Contig1     -		      T7      right

	      The first	column is the contig name, the second is the  orienta-
	      tion,  the third is the fragment_group, the fourth indicates the
	      SP6 or T7	end, and the fifth says	which side of SP6  or  T7  end
	      had vector removed.

       -M str Map name (will appear as /map="str" on the source	feature)

       -N     Annotate assembly_fragments

       -O filename
	      Read comment from	filename (100-character-per-line maximum; ~ is
	      a	 linebreak  and	 `~  is	a literal ~.  You can check the	format
	      with PSequin(1).)

       -P str Contigs to use, separated	by commas.  If	-P  is	not  indicated
	      with  the	 -T option, then the fragments will go in in the order
	      that they	are in the ace file (which is appropriate for a	 phase
	      1	 record,  but not for a	phase 2	or 3).	If you need to set the
	      order of the segments of the ace file, you need to set  it  with
	      the -P flag, like	this: -P "Contig1,Contig4,Contig3,Contig2,Con-
	      tig5"

       -Q filename
	      Read quality scores from filename

       -S str Strain name

       -T filename
	      Filename for phrap input (mutually exclusive with	-A and -i)

       -X     The coordinates in the input file	are on the resulting segmented
	      sequence.	 (Bases	1 through n of each accession are used.)  Oth-
	      erwise,  the coordinates are on the individual accessions, which
	      need not start at	base 1 of the record.

       -a str GenBank accession; use if	and only if updating a sequence.

       -b N   Gap length (default = 100; anything from 0 to 1000000000 is  le-
	      gal)

       -c str Clone  name (will	appear as /clone in the	source feature;	can be
	      the same as -s)

       -d str Title for	sequence (will appear in GenBank DEFINITION line)

       -e filename
	      Log errors to filename

       -f     htgs_fulltop keyword

       -g str Genome Center tag	(probably the same as your login name  on  the
	      NCBI FTP server)

       -h str Chromosome (will appear as /chromosome in	the source feature)

       -i filename
	      Filename	for  fasta input (default is stdin; mutually exclusive
	      with -A and -T)

       -k str Add the supplied string as a keyword.

       -l N   Length of	sequence in bp (default	= 0). The  length  is  checked
	      against the actual number	of bases we get. For phase 1 and 2 se-
	      quence  it is also used to estimate gap lengths. For phase 1 and
	      2	records, it is important to use	 a  number  GREATER  than  the
	      amount  of  provided  nucleotide,	 otherwise  this will generate
	      false `gaps'.  Here is assumed that the putative full length  of
	      the  BAC or cosmid will be used.	There should be	at least 20 to
	      30 `n' in	between	the segments (you can check for	these  in  Se-
	      quin), as	this will ensure proper	behavior when this sequence is
	      used  with  BLAST.   Otherwise  `artifactual'  unrelated segment
	      neighbors	may be brought into proximity of each other.

       -m     Take comment from	template

       -n str Organism name (default = Homo sapiens)

       -o filename
	      Filename for asn.1 output	(default = stdout)

       -p N   HTGS phase:
	      1	     A collection of unordered contigs with  gaps  of  unknown
		     length.  A	Phase 1	record must at the very	least have two
		     segments with one gap.  (default)
	      2	     A	series	of  ordered  contigs,  possibly	with known gap
		     lengths.  This could be a single sequence	without	 gaps,
		     if	the sequence has ambiguities to	resolve.
	      3	     A single contiguous sequence.  This sequence is finished,
		     but not necessarily annotated.

       -q     htgs_cancelled keyword

       -r str Remark  for  update  (brief comment describing the nature	of the
	      update, such as "new sequence", "new citation", or "updated fea-
	      tures")

       -s str Sequence name.  The sequence must	have a	name  that  is	unique
	      within  the  genome center. We use the combination of the	genome
	      center name (-g argument)	and the	sequence name  (-s)  to	 track
	      this  sequence  and  to talk to you about	it.  The name can have
	      any form you like	but must be unique within your center.

       -t filename
	      Filename for Seq-submit template (default	= template.sub)

       -u     Take biosource from template

       -v     htgs_activefin keyword

       -w     Whole Genome Shotgun flag

       -x str Secondary	 accession  numbers,   separated   by	commas,	  s.t.
	      U10000,L11000.

	      In some cases a large segment will supersede another or group of
	      other  accession	numbers	(records).  These records which	are no
	      longer wanted in GenBank should be made secondary. Using the  -x
	      argument	you  can  list	the Accession Numbers you want to make
	      secondary.  This will instruct us	to remove the  accession  num-
	      ber(s)  from  GenBank, and will no longer	be part	of the GenBank
	      release. They will nonetheless be	available from Entrez.

	      GREAT CARE should	be taken when using this argument!!!  Improper
	      use of accession numbers here will result	in  the	 inappropriate
	      withdrawal  of  GenBank records from GenBank, EMBL and DDBJ.  We
	      provide this parameter as	a convenience to  submitting  centers,
	      but this may need	to be removed if it is not used	carefully.

AUTHOR
       The National Center for Biotechnology Information.

SEE ALSO
       Psequin(1), fa2htgs/README

NCBI				  2006-05-29			    FA2HTGS(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=fa2htgs&sektion=1&manpath=FreeBSD+Ports+15.0>

home | help