Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
HXPIPE(1)			HTML-XML-utils			     HXPIPE(1)

NAME
       hxpipe -	convert	XML file to a format easier to parse with Perl or AWK

SYNOPSIS
       hxpipe [	-l ] [ -H ] [ -- ] [ file-or-URL ]

DESCRIPTION
       hxpipe parses an	HTML or	XML file and outputs a line-oriented represen-
       tation of it that is well suited	to further processing with AWK or sim-
       ilar tools. The format is similar to the	ESIS (Element Structure	Infor-
       mation Set) that	is output by nsgmls/onsgmls.

       The  reverse operation, converting back to mark-up, is performed	by the
       hxunpipe	program.

       The output format is as follows:

       <!--comment-->
		 Comments are output as

		     *comment

		 I.e., a single	line starting with "*" followed	by the text of
		 the comment. Line feeds, carriage returns  and	 tabs  in  the
		 text  are  written as "\n", "\r" and "\t", respectively. Text
		 that looks like a numerical character entity is written  with
		 the "&" replaced by "\".  The line ends with a	line feed.

		 Note  that  onsgmls  outputs comments starting	with a "_" in-
		 stead of a "*"	and doesn't replace the	"&" of numerical char-
		 acter entities	by "\" (and by default it omits	comments alto-
		 gether).

       <?processing instruction>
		 Processing instructions are output as

		     ?processing instruction

		 I.e., a single	line starting with a "?" followed by the  text
		 of  the  processing  instruction.  The	text is	escaped	as for
		 comments (see above).

       <!DOCTYPE root PUBLIC "-//foo//DTD bar//EN" "http://example.org/dtd">
		 DOCTYPEs are output as	one of the following:

		     !root "-//foo//DTD	bar//EN" http://example.org/dtd
		     !root "-//foo//DTD	bar//EN"
		     !root "" http://example.org/dtd
		     !root ""

		 for respectively: a DOCTYPE with (1) both a public and	a sys-
		 tem identifier, (2) only a public identifier, (3) only	a sys-
		 tem identifier, or (4)	neither	of the	two.  I.e.,  a	single
		 line  starting	with a "!", followed by	a space	and a possibly
		 empty quoted string, followed optionally by a space and arbi-
		 trary text. Note the quotes for the public identifier and the
		 absence of quotes for the system identifier.

       <elt att1="value1" att2="value2">
		 A start tag is	output as

		     Aatt1 CDATA value1
		     Aatt2 CDATA value2
		     (elt

		 I.e., as zero or more lines for the attributes	and  one  line
		 for  the element type.	Each line for an attribute starts with
		 "A" followed by the name of the attribute, a space, the  lit-
		 eral  string "CDATA", another space, and the attribute	value.
		 The text of the attribute value is escaped  as	 for  comments
		 (see  above).	The  line for the element type starts with "("
		 followed by the element type.

		 hxpipe	does not read DTDs and assumes that attributes are al-
		 ways CDATA. It	never generates	other types  (IMPLIED,	TOKEN,
		 ID, etc.), unlike onsgmls.

       </elt>	 End tags are output as

		     )elt

		 I.e.,	as  a  line  starting with ")" followed	by the element
		 type.

       <empty att1="val1" att2="val2"/>
		 Empty elements	are output as

		     Aatt1 CDATA val1
		     Aatt2 CDATA val2
		     |empty

		 I.e., as zero or more	lines  for  attributes	and  one  line
		 starting with "|" followed by the element type.

		 Note that onsgmls never outputs "|". (However,	it can option-
		 ally output a line consisting of a single "e" just before the
		 "(" line, to indicate that the	element	is empty.)

       text	 Text is output	as

		     -text

		 I.e.,	as  a single line starting with	a "-". The text	is es-
		 caped as for comments (see above).

       line numbers
		 When the -l option is in effect, hxpipe will intersperse  the
		 output	with lines of the form

		     L12

		 where	"12"  is  replaced  with the line number in the	source
		 where the next	output came from.

       hxpipe normalizes the input only	in the sense that it  outputs  attrib-
       utes in a fixed order (alphabetical, but	not locale-dependent). It does
       not  read a DTD and thus	cannot remove redundant	white space and	cannot
       add implied attributes. It does not expand character entities. (But you
       can pipe	the input through hxunent beforehand.) It also	does  not  add
       implied tags. (But see the -H option.)

OPTIONS
       The following options are supported:

       -l	 Add  "L"  lines to the	output to indicate the line numbers in
		 the source. Currently does not	work together with the -H
		  option.

       -H	 Apply special rules for HTML. Normally, hxpipe	assumes	 well-
		 formed	XML. With this option, hxpipe will assume the input is
		 HTML  and will	add implied tags, recognize empty elements and
		 treat the contents of <script>	and <style> elements  as  lit-
		 eral text.

OPERANDS
       The following operand is	supported:

       file-or-URL
		 The name or URL of an HTML file. If absent, standard input is
		 read instead.

EXIT STATUS
       The following exit values are returned:

       0	 Successful completion.

       > 0	 An  error  occurred  in the parsing of	the HTML file.	hxpipe
		 will try to correct the error and produce output anyway.

ENVIRONMENT
       To use a	proxy to retrieve remote files,	set the	environment  variables
       http_proxy and ftp_proxy.  E.g.,	http_proxy="http://localhost:8080/"

BUGS
       The error recovery for incorrect	HTML is	primitive.

       hxpipe  can  currently only retrieve remote files over HTTP. It doesn't
       handle password-protected files,	nor files  whose  content  depends  on
       HTTP "cookies."

       Option -l ought to work also with HTML input (option -H).

SEE ALSO
       hxunpipe(1), hxunent(1),	onsgmls(1).

8.x				  10 Feb 2022			     HXPIPE(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=hxpipe&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help