Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
SGMLS(1)		    General Commands Manual		      SGMLS(1)

NAME
       sgmls - a validating SGML parser

       An  System Conforming to
       International Standard ISO 8879 --
       Standard	Generalized Markup Language

SYNOPSIS
       sgmls [ -deglprsuv ] [ -cfile ] [ -iname	] [ -mfile ] [ filename...  ]

DESCRIPTION
       Sgmls  parses  and  validates  the  document entity in filename...  and
       prints on the standard output a simple ASCII representation of its Ele-
       ment Structure Information Set.	(This is the information set  which  a
       structure-controlled  conforming	  application  should act upon.)  Note
       that the	document entity	may be spread amongst several files; for exam-
       ple, the	SGML declaration, document type	declaration and	 document  in-
       stance set could	each be	in a separate file.  If	no filenames are spec-
       ified,  then  sgmls will	read the document entity from the standard in-
       put.  A filename	of - can also be used to refer to the standard input.

       The following options are available:

       -cfile Report any capacity limits that are exceeded and write a	report
	      of  capacity  usage  to  file.  The report is in the format of a
	      RACT result.  RACT is the	 Reference  Application	 for  Capacity
	      Testing  defined in the Proposed American	National Standard Con-
	      formance Testing for Standard Generalized	Markup Language	 (SGL)
	      Systems (X3.190-199X), Draft July	1991.

       -d     Warn about duplicate entity declarations.

       -e     Describe open entities in	error messages.	 Error messages	always
	      include  the  position  of the most recently opened external en-
	      tity.

       -g     Show the GIs of open elements in error messages.

       -iname Pretend that

		     <!ENTITY %	name "INCLUDE">

	      occurs at	the start of the document type declaration  subset  in
	      the   document  entity.  Since repeated definitions of an	entity
	      are ignored, this	definition will	take precedence	over any other
	      definitions of this entity in  the  document  type  declaration.
	      Multiple	-i  options are	allowed.  If the  declaration replaces
	      the reserved name	INCLUDE	then the new reserved name will	be the
	      replacement text of the entity.  Typically the document type de-
	      claration	will contain

		     <!ENTITY %	name "IGNORE">

	      and will use %name; in the status	 keyword  specification	 of  a
	      marked  section declaration.  In this case the effect of the op-
	      tion will	be to cause the	marked section not to be ignored.

       -l     Output L commands	giving the current line	number and filename.

       -mfile Map public identifiers and entity	names  to  system  identifiers
	      using  the catalog entry file file.  Multiple -m options are al-
	      lowed.  Catalog entry files specified with the -m	option will be
	      searched before the defaults.

       -p     Parse only the prolog.  Sgmls will exit after parsing the	 docu-
	      ment type	declaration.  Implies -s.

       -r     Warn about defaulted references.

       -s     Suppress output.	Error messages will still be printed.

       -u     Warn  about undefined elements: elements used in the DTD but not
	      defined.

       -v     Print the	version	number.

   Entity Manager
       An external entity resides in one or more files.	  The  entity  manager
       component of sgmls maps a sequence of files into	an entity in three se-
       quential	stages:

       1.     each carriage return character is	turned into a non-SGML charac-
	      ter;

       2.     each  newline  character	is turned into a record	end character,
	      and at the same time a record start character is inserted	at the
	      beginning	of each	line;

       3.     the files	are concatenated.

       A system	identifier is interpreted as a list of filenames separated  by
       colons.	A filename of -	can be used to refer to	the standard input.

       If  a  system  identifier is not	specified, then	the entity manager can
       generate	one using catalog entry	files in the  format  defined  in  the
       SGML  Open  Draft Technical Resolution on Entity	Management.  A catalog
       entry file contains a sequence of entries in one	of the following  four
       forms:

       PUBLIC pubid sysid
	      This  specifies  that sysid should be used as the	system identi-
	      fier if the the public identifier	is pubid.  Sysid is  a	system
	      identifier  as defined in	ISO 8879 and pubid is a	public identi-
	      fier as defined in ISO 8879.

       ENTITY name sysid
	      This specifies that sysid	should be used as the  system  identi-
	      fier if the entity is a general entity whose name	is name.

       ENTITY %name sysid
	      This  specifies  that sysid should be used as the	system identi-
	      fier if the entity is a parameter	entity	whose  name  is	 name.
	      Note that	there is no space between the %	and the	name.

       DOCTYPE name sysid
	      This  specifies  that sysid should be used as the	system identi-
	      fier if the entity is an entity declared in a document type dec-
	      laration whose document type name	is name.

       The last	two forms are extensions to the	SGML Open format.  The	delim-
       iters  can  be  omitted from the	sysid provided it does not contain any
       white space.  Comments are allowed between parameters delimited	by  --
       as  in  SGML.   The  environment	variable SGML_CATALOG_FILES contains a
       colon-separated list of catalog entry files.  These  will  be  searched
       after  any  catalog entry files specified using the -m option.  If this
       environment variable is not set,	then a system dependent	list of	 cata-
       log  entry  files  will be used.	 A match in a catalog entry file for a
       PUBLIC entry will take precedence over a	match in the same file for  an
       ENTITY  or DOCTYPE entry.  A filename in	a system identifier in a cata-
       log entry file is interpreted relative to the directory containing  the
       catalog entry file.

       If  no match can	be found in a catalog entry file, then the entity man-
       ager will attempt to generate a filename	using  the  public  identifier
       (if  there  is  one)  and  other	information available to it.  Notation
       identifiers are not subject to this treatment.  This  process  is  con-
       trolled	by  the	environment variable SGML_PATH;	this contains a	colon-
       separated list of filename templates.  A	filename template is  a	 file-
       name  that may contain substitution fields; a substitution field	is a %
       character followed by a single letter that indicates the	value  of  the
       substitution.  The value	of a substitution can either be	a string or it
       can  be	null.  The entity manager transforms the list of filename tem-
       plates into a list of filenames by substituting for  each  substitution
       field  and  discarding any template that	contained a substitution field
       whose value was null.  It then uses the first resulting	filename  that
       exists and is readable.	Substitution values are	transformed before be-
       ing  used for substitution: firstly, any	names that were	subject	to up-
       per case	substitution are folded	to lower case; secondly, space charac-
       ters are	mapped to underscores and slashes are mapped to	percents.  The
       value of	the %S field is	not transformed.  The values  of  substitution
       fields are as follows:

       %%     A	single %.

       %D     The entity's data	content	notation.  This	substitution will suc-
	      ceed only	for external data entities.

       %N     The entity, notation or document type name.

       %P     The  public  identifier if there was a public identifier,	other-
	      wise null.

       %S     The system identifier if there was a system identifier otherwise
	      null.

       %X     (This is provided	mainly for  compatibility  with	 ARCSGML.)   A
	      three-letter string chosen as follows:
					 |	      |
					 |	      |	With public identifier
					 |	      +-------------+-----------
					 | No public  |	  Device    |  Device
					 | identifier |	independent | dependent
	      ---------------------------+------------+-------------+-----------
	      Data or subdocument entity | nsd	      |	pns	    | vns
	      General SGML text	entity	 | gml	      |	pge	    | vge
	      Parameter	entity		 | spe	      |	ppe	    | vpe
	      Document type definition	 | dtd	      |	pdt	    | vdt
	      Link process definition	 | lpd	      |	plp	    | vlp

	      The  device  dependent  version  is  selected if the public text
	      class allows a public text display version but  no  public  text
	      display version was specified.

       %Y     The type of thing	for which the filename is being	generated:
	      SGML subdocument entity	 sgml
	      Data entity		 data
	      General text entity	 text
	      Parameter	entity		 parm
	      Document type definition	 dtd
	      Link process definition	 lpd

       The  value  of  the following substitution fields will be null unless a
       valid formal public identifier was supplied.

       %A     Null if the text identifier in the formal	public identifier con-
	      tains an unavailable text	indicator, otherwise the empty string.

       %C     The public text class, mapped to lower case.

       %E     The public text designating sequence (escape  sequence)  if  the
	      public text class	is CHARSET, otherwise null.

       %I     The  empty  string  if the owner identifier in the formal	public
	      identifier is an ISO owner identifier, otherwise null.

       %L     The public text language,	mapped to lower	case, unless the  pub-
	      lic text class is	CHARSET, in which case null.

       %O     The owner	identifier (with the +// or -//	prefix stripped.)

       %R     The  empty  string  if the owner identifier in the formal	public
	      identifier is a registered owner identifier, otherwise null.

       %T     The public text description.

       %U     The empty	string if the owner identifier in  the	formal	public
	      identifier is an unregistered owner identifier, otherwise	null.

       %V     The public text display version.	This substitution will be null
	      if  the public text class	does not allow a display version or if
	      no version was specified.	 If an empty version was specified,  a
	      value of default will be used.

       Normally	 if  the  external  identifier for an entity includes a	system
       identifier, the entity manager will use the specified system identifier
       and not attempt to generate one.	 If, however, SGML_PATH	 uses  the  %S
       field,  then  the entity	manager	will first search for a	matching entry
       in the catalog entry files.  If a match is found,  then	this  will  be
       used  instead  of  the  specified system	identifier.  Otherwise,	if the
       specified system	identifier does	not contain  any  colons,  the	entity
       manager	will  use SGML_PATH to generate	a filename.  Otherwise the en-
       tity manager will use the specified system identifier.

   System declaration
       The system declaration for sgmls	is as follows:

			  SYSTEM "ISO 8879:1986"
				  CHARSET
       BASESET	"ISO 646-1983//CHARSET
		 International Reference Version (IRV)//ESC 2/5	4/0"
       DESCSET	0 128 0
       CAPACITY	PUBLIC	"ISO 8879:1986//CAPACITY Reference//EN"
				 FEATURES
       MINIMIZE	DATATAG	NO  OMITTAG  YES   RANK	    NO	SHORTTAG YES
       LINK	SIMPLE	NO  IMPLICIT NO	   EXPLICIT NO
       OTHER	CONCUR	NO  SUBDOC   YES 1 FORMAL   YES
       SCOPE	DOCUMENT
       SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Reference//EN"
       SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Core//EN"
				 VALIDATE
		GENERAL	YES MODEL    YES   EXCLUDE  YES	CAPACITY YES
		NONSGML	YES SGML     YES   FORMAL   YES
				   SDIF
		PACK	NO  UNPACK   NO

       Exceeding a capacity limit will be ignored  unless  the	-c  option  is
       given.

       The memory usage	of sgmls is not	a function of the capacity points used
       by  a  document;	 however,  sgmls  can  handle capacities significantly
       greater than the	reference capacity set.

       In some environments, higher values may be supported for	the SUBDOC pa-
       rameter.

       Documents that do not use optional features are	also  supported.   For
       example,	 if FORMAL NO is specified in the  declaration,	public identi-
       fiers will not be required to be	valid formal public identifiers.

       Certain parts of	the concrete syntax may	be changed:

	      The shunned character numbers can	be changed.

	      Eight bit	characters can be assigned to LCNMSTRT,	UCNMSTRT, LCN-
	      MCHAR and	UCNMCHAR.

	      Uppercase	substitution can be performed or  not  performed  both
	      for entity names and for other names.

	      Either  short reference delimiters assigned by the reference de-
	      limiter set or no	short reference	delimiters are supported.

	      The reserved names can be	changed.

	      The quantity set can be increased	within certain limits  subject
	      to  there	being sufficient memory	available.  The	upper limit on
	      NAMELEN is 239.  The upper limits	on ATTCNT, ATTSPLEN,  BSEQLEN,
	      ENTLVL,  LITLEN,	PILEN, TAGLEN, and TAGLVL are more than	thirty
	      times greater than the reference limits.	 The  upper  limit  on
	      GRPCNT, GRPGTCNT,	and GRPLVL is 253.  NORMSEP cannot be changed.
	      DTAGLEN are DTEMPLEN irrelevant since sgmls does not support the
	      DATATAG feature.

    declaration
       The   declaration may be	omitted, the following declaration will	be im-
       plied:
			     <!SGML "ISO 8879:1986"
				     CHARSET
       BASESET	"ISO 646-1983//CHARSET
		 International Reference Version (IRV)//ESC 2/5	4/0"
       DESCSET	  0  9 UNUSED
		  9  2	9
		 11  2 UNUSED
		 13  1 13
		 14 18 UNUSED
		 32 95 32
		127  1 UNUSED
       CAPACITY	PUBLIC	"ISO 8879:1986//CAPACITY Reference//EN"
       SCOPE	DOCUMENT
       SYNTAX	PUBLIC	"ISO 8879:1986//SYNTAX Reference//EN"
				    FEATURES
       MINIMIZE	DATATAG	NO OMITTAG  YES		 RANK	  NO  SHORTTAG YES
       LINK	SIMPLE	NO IMPLICIT NO		 EXPLICIT NO
       OTHER	CONCUR	NO SUBDOC   YES	99999999 FORMAL	  YES
				  APPINFO NONE>
       with the	exception that characters 128 through 254 will be assigned  to
       DATACHAR.

       Sgmls  identifies base character	sets using the designating sequence in
       the public identifier.  The following designating sequences are	recog-
       nized:
	 Designating	      ISO	  Minimum      Number
	    Escape	  Registration	 Character	 of		Description
	   Sequence	     Number	  Number     Characters
       ------------------------------------------------------------------------------------
       ESC 2/5 4/0	       -	     0		128	  full set of ISO 646 IRV
       ESC 2/8 4/0		2	    33		 94	  G0 set of ISO	646 IRV
       ESC 2/8 4/2		6	    33		 94	  G0 set of ASCII
       ESC 2/13	4/1	      100	    32		 96	  G1 set of ISO	8859-1
       ESC 2/1 4/0		1	     0		 32	  C0 set of ISO	646
       ESC 2/2 4/3	       77	     0		 32	  C1 set of ISO	6429
       ESC 2/5 2/15 3/0	       -	     0		256	  the system character set

       When one	of the G0 sets is used as a base set, the characters SPACE and
       DELETE  are  treated as occurring at positions 32 and 127 respectively;
       although	these characters are not part of the character sets designated
       by the escape sequences,	this mimics the	behaviour of ISO 2022 with re-
       spect to	these code positions.

   Output format
       The output is a series of lines.	 Lines can be arbitrarily long.	  Each
       line  consists  of  an  initial command character and one or more argu-
       ments.  Arguments are separated by a single space, but when  a  command
       takes a fixed number of arguments the last argument can contain spaces.
       There is	no space between the command character and the first argument.
       Arguments can contain the following escape sequences.

       \\     A	\.

       \n     A	record end character.

       \|     Internal SDATA entities are bracketed by these.

       \nnn   The character whose code is nnn octal.

       A  record  start	 character will	be represented by \012.	 Most applica-
       tions will need to ignore \012 and translate \n into newline.

       The possible command characters and arguments are as follows:

       (gi    The start	of an element whose generic identifier is gi.  Any at-
	      tributes for this	element	will have been specified with  A  com-
	      mands.

       )gi    The end an element whose generic identifier is gi.

       -data  Data.

       &name  A	reference to an	external data entity name; name	will have been
	      defined using an E command.

       ?pi    A	processing instruction with data pi.

       Aname val
	      The  next	 element to start has an attribute name	with value val
	      which takes one of the following forms:

	      IMPLIED
		     The value of the attribute	is implied.

	      CDATA data
		     The attribute is character	data.  This is	used  for  at-
		     tributes whose declared value is CDATA.

	      NOTATION nname
		     The  attribute  is	 a notation name; nname	will have been
		     defined using a N command.	 This is used  for  attributes
		     whose declared value is NOTATION.

	      ENTITY name...
		     The  attribute  is	 a list	of general entity names.  Each
		     entity name will have been	defined	using an  I,  E	 or  S
		     command.	This  is  used	for  attributes	whose declared
		     value is ENTITY or	ENTITIES.

	      TOKEN token...
		     The attribute is a	list of	tokens.	 This is used for  at-
		     tributes whose declared value is anything else.

       Dename name val
	      This  is	the  same as the A command, except that	it specifies a
	      data attribute for an external entity named ename.  Any  D  com-
	      mands  will  come	after the E command that defines the entity to
	      which they apply,	but before any & or A commands that  reference
	      the entity.

       Nnname nname.   Define  a notation This command will be preceded	by a p
	      command if the notation was declared with	a  public  identifier,
	      and  by  a  s command if the notation was	declared with a	system
	      identifier.  A notation will only	be defined if it is to be ref-
	      erenced in an E command or in an A command for an	attribute with
	      a	declared value of NOTATION.

       Eename typ nname
	      Define an	external data entity named ename with type typ (CDATA,
	      NDATA or SDATA) and notation not.	 This command will be preceded
	      by one or	more f commands	giving the filenames generated by  the
	      entity  manager  from  the system	and public identifiers,	by a p
	      command if a public identifier was declared for the entity,  and
	      by  a  s command if a system identifier was declared for the en-
	      tity.  not will have been	defined	using a	N command.   Data  at-
	      tributes	may  be	specified for the entity using D commands.  An
	      external data entity will	only be	defined	if it is to be	refer-
	      enced  in	 a & command or	in an A	command	for an attribute whose
	      declared value is	ENTITY or ENTITIES.

       Iename typ text
	      Define an	internal data entity named ename with type typ	(CDATA
	      or  SDATA)  and  entity text text.  An internal data entity will
	      only be defined if it is referenced in an	A command for  an  at-
	      tribute whose declared value is ENTITY or	ENTITIES.

       Sename Define  a	 subdocument entity named ename.  This command will be
	      preceded by one or more f	commands giving	the  filenames	gener-
	      ated  by	the  entity manager from the system and	public identi-
	      fiers, by	a p command if a public	identifier  was	 declared  for
	      the  entity,  and	 by a s	command	if a system identifier was de-
	      clared for the entity.  A	subdocument entity will	 only  be  de-
	      fined  if	it is referenced in a {	command	or in an A command for
	      an attribute whose declared value	is ENTITY or ENTITIES.

       ssysid This command applies to the next E, S or N command and specifies
	      the associated system identifier.

       ppubid This command applies to the next E, S or N command and specifies
	      the associated public identifier.

       ffilename
	      This command applies to the next E or S command and specifies an
	      associated filename.  There will be more than one	f command  for
	      a	single E or S command if the system identifier used a colon.

       {ename The start	of the	subdocument entity ename; ename	will have been
	      defined using a S	command.

       }ename The end of the  subdocument entity ename.

       Llineno file
       Llineno
	      Set the current line number and filename.	 The filename argument
	      will  be omitted if only the line	number has changed.  This will
	      be output	only if	the -l option has been given.

       #text  An APPINFO parameter of text was specified in the	  declaration.
	      This  is	not  strictly  part  of	the ESIS, but a	structure-con-
	      trolled application is permitted to act on  it.	No  #  command
	      will  be output if APPINFO NONE was specified.  A	# command will
	      occur at most once, and may be preceded only by a	single L  com-
	      mand.

       C      This command indicates that the document was a conforming	 docu-
	      ment.   If  this command is output, it will be the last command.
	      An  document is not conforming if	it  references	a  subdocument
	      entity that is not conforming.

BUGS
       Some  non-SGML characters in literals are counted as two	characters for
       the purposes of quantity	and capacity calculations.

SEE ALSO
       The  Handbook, Charles F. Goldfarb
       ISO 8879	(Standard Generalized Markup Language),	International  Organi-
       zation for Standardization

ORIGIN
       ARCSGML was written by Charles F. Goldfarb.

       Sgmls was derived from ARCSGML by James Clark (jjc@jclark.com), to whom
       bugs should be reported.

								      SGMLS(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=sgmls&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help