Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
html2xhtml(1)		    General Commands Manual		 html2xhtml(1)

NAME
       html2xhtml - Converts HTML files	to XHTML

SYNTAX
       html2xhtml [ filename ] [ options ]

DESCRIPTION
       Html2xhtml  is  a  command-line	tool that converts HTML	files to XHTML
       files. The path of the HTML input file can be provided  as  a  command-
       line argument. If not, it is read from stdin.

       Xhtml2xhtml  tries always to generate valid XHTML files.	 It is able to
       correct many common errors in input HTML	files without loose of	infor-
       mation.	However,  for some errors, html2xhtml may decide to loose some
       information in order to generate	a valid	XHTML  output.	 This  can  be
       avoided	with  the  -e option, which allows html2xhtml to generate non-
       valid output in these cases.

       Html2xhtml can generate the XHTML output	compliant to one of  the  fol-
       lowing  document	 types:	XHTML 1.0 (Transitional, Strict	and Frameset),
       XHTML 1.1, XHTML	Basic and XHTML	Mobile Profile.

OPTIONS
       The command line	options/arguments are:

       filename		   Read	the HTML input from filename  (optional	 argu-
			   ment).  If  this argument is	not provided, the HTML
			   input is read from standard input.

       -o filename	   Output XHTML	file. The file is  overwritten	if  it
			   exists.  If	not provided, the output is written to
			   standard output.

       -e		   Instructs the program to propagate input chunks  to
			   the	output	even  if it is unable to adapt them to
			   the output XHTML doctype. Using  this  option,  the
			   XHTML  output  may be non-valid. Not	using this op-
			   tion, some input data could	be  removed  from  the
			   output in some [rare] cases.

       -t output-doctype   Doctype of the output XHTML file. If	not specified,
			   the	program	selects	automatically either XHTML 1.0
			   Transitional	or XHTML 1.0 Frameset depending	on the
			   input. Current available doctypes are:
			    o transitional XHTML 1.0 Transitional
			    o frameset XHTML 1.0 Frameset
			    o strict XHTML 1.0 Strict
			    o 1.1 XHTML	1.1
			    o basic-1.0	XHTML Basic 1.0
			    o basic-1.1	XHTML Basic 1.1
			    o mp XHTML Mobile Profile
			    o print-1.0	XHTML Print 1.0

       --ics input_charset Character set of the	input  document.  This	option
			   overrides the default input character set detection
			   mechanism.

       --ocs output_charset
			   Character  set  for	the  output XHTML document. If
			   this	option is not present, the  character  set  of
			   the input is	used as	default.

       --lcs		   Dump	 the  list  of available character set aliases
			   and exit html2xhtml.	 No  conversion	 is  performed
			   when	this option is present.

       -l line_length	   Number of characters	per line. The default value is
			   80.	 It  must be greater or	equal to 40, otherwise
			   the parameter is ignored.

       -b tab_length	   Tab length in number	of characters. It  must	 be  a
			   number between 0 and	16, otherwise the parameter is
			   ignored.  Use 0 to avoid indentation	in the output.

       --preserve-space-comments
			   Use	this option to preserve	white spaces, tabs and
			   ends	of lines in HTML comments. The default,	if not
			   provided, is	to rearrange spacing.

       --no-protect-cdata  Enclose CDATA sections in "script" and "style" fol-
			   lowing   the	  XHTML	  1.0	specification	(using
			   "<!CDATA[["	and  "]]>").  It might be incompatible
			   with	some browsers.	The default in this version is
			   to enclose CDATA sections using  "//<!CDATA[["  and
			   "//]]>", because major browsers handle it properly.

       --compact-block-elements
			   No  white spaces or line breaks are written between
			   the start tag of a block element and	the start  tag
			   of  its first enclosed inline element (or character
			   data) and between the end tag of its	last  enclosed
			   inline  element (or character data) and the end tag
			   of the block	element. By default, if	this option is
			   not set, a new line character  and  indentation  is
			   written between them.

       --compact-empty-elm-tags
			   Do  not  write  a  whitespace  before the slash for
			   empty element tags (i.e. write "<br/>"  instead  of
			   the default "<br />").  Note	that although both no-
			   tations  are	correct	in XML,	the XHTML 1.0 standard
			   recommends the latter to improve compatibility with
			   old browsers.

       --empty-elm-tags-always
			   By default, empty element tags are written only for
			   elements declared as	empty in the DTD. This	option
			   makes  any element not having content to be written
			   with	the empty element tag, even if it is  not  de-
			   clared  as  empty in	the DTD. This option may cause
			   problems when  the  XHTML  document	is  opened  by
			   browsers in HTML (tag soup) mode.

       --dos-eol	   Write  the output XHTML file	with DOS--style	(CRLF)
			   end of line,	instead	of the default UNIX--style end
			   of line.  Both end of line styles  are  allowed  by
			   the XML recommendation.

       --generate-snippet  Treat  the  input  as an HTML fragment instead of a
			   full	document.  The output will also	be  a  snippet
			   and will not	contain	either the XML and doctype de-
			   clarations or the html, head	and body elements.

       --help		   Show	a brief	help message and exit.

       --version	   Show	the version number and exit.

NOTE ON	CHARACTER SETS
       Since  version  1.1.2,  html2xhtml  is able to parse and	write HTML and
       XHTML documents using the most popular character	sets / encodings.   It
       is also able to read the	input document using a given character set and
       generate	 an  output that uses a	different character set. The iconv im-
       plementation in the GNU C library is used with that purpose.

       Any IANA-registered character set that is supported by  the  iconv  li-
       brary  may  be  used.  When  naming a character set, any	IANA--approved
       alias for it may	be used.  The  full  list  of  aliases	recognised  by
       html2xhtml can be obtained with the --lcs command-line option.

       If the character	set of the input document is not specified, html2xhtml
       tries  to  guess	 it automatically.  If the character set of the	output
       document	is not specified, html2xhtml writes the	output using the  same
       character set as	the input document.

NOTE ON	END OF LINE CHARACTES
       By  default,  the  UNIX-style  one-byte	end of line is used. It	can be
       changed to DOS-style CRLF end of	line with the  --dos-eol  command-line
       option.

       However,	 when the program is compiled in the MinGW environment and the
       output is sent to standard output, the  output  is  automatically  con-
       verted  by the environment to CRLF by default. Do not use the --dos-eol
       command-line option in that situation.  When the	output is  sent	 to  a
       file  with the -o command-line option, the output is as expected	(UNIX-
       style by	default), and the --dos-eol option may be used.

ACKNOWLEDGMENTS
       Program developer up to current version:
       Jesus Arias Fisteus <jaf@it.uc3m.es>

       The first working version of this program has been developed as
       a Master	Thesis at the University of Vigo (Spain) [http://www.uvigo.es],
       advised by:

       Rebeca Diaz Redondo
       Ana Fernandez Vilas

       Copyright 2000-2001 by Jesus Arias Fisteus, Rebeca Diaz Redondo,	Ana
       Fernandez Vilas.
       Copyright 2002-2009 by Jesus Arias Fisteus

								 html2xhtml(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=html2xhtml&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help