Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
html2text(1)		    General Commands Manual		  html2text(1)

NAME
       html2text - an advanced HTML-to-text converter

SYNOPSIS
       html2text -help
       html2text -version
       html2text  [  -check  ]	[ -debug-scanner ] [ -debug-parser ] [ -rcfile
       path ] [	-width width ] [ -o output-file	] [ -nobs ]  [	-from_encoding
       encoding	 ] [ -to_encoding encoding ] [ -ascii ]	[ -utf8	] [ input-file
       ...  ]

DESCRIPTION
       html2text reads HTML documents from the input-files,  formats  each  of
       them  into  a stream of UTF-8 encoded characters, and writes the	result
       to standard output (or into output-file,	if the -o command line	option
       is used).

       If  no  input-files  are	specified on the command line, html2text reads
       from standard input. A dash as the input-file is	an  alternate  way  to
       specify standard	input.

       html2text understands all HTML 3.2 constructs, but can render only part
       of  them	due to the limitations of the text output format. However, the
       program attempts	to provide good	substitutes for	the elements it	cannot
       render.	html2text parses HTML 4	input, too, but	not always as success-
       ful as other HTML processors. It	also accepts  syntactically  incorrect
       input, and attempts to interpret	it "reasonably".

       The  way	 html2text formats the HTML documents is controlled by format-
       ting properties read from an  RC	 file.	 html2text  attempts  to  read
       $HOME/.html2textrc  (or	the file specified by the -rcfile command line
       option);	if that	file  cannot  be  read,	 html2text  attempts  to  read
       /etc/html2textrc.   If  no  RC file can be read (or if the RC file does
       not override all	formatting properties),	then "reasonable" defaults are
       assumed.	The RC file format is described	in the	html2textrc(5)	manual
       page.

OPTIONS
       -ascii By default, html2text uses UTF-8 for its output. Specifying this
	      option,  plain  ASCII  is	used instead. Non-ASCII	characters are
	      rendered via iconv's transliteration feature.  As	such this  op-
	      tion is an alias for -to_encoding	ASCII//TRANSLIT.

       -utf8  Assume both terminal and input stream are	in UTF-8 mode. This is
	      an alias for -from_encoding UTF-8	-to_encoding UTF-8.

       -from_encoding encoding
	      Sets  the	 encoding of the input file or stream to the given en-
	      coding.  By default, html2text tries to obtain the encoding from
	      the input	file and uses ISO-8859-1 encoding  as  fallback.   You
	      might  want to override the detection or fallback	using this op-
	      tion.  See iconv -l for a	list of	supported encodings.

       -to_encoding encoding
	      Use the given encoding while writing output, instead  of	UTF-8.
	      The  same	 set of	encodings as for the -from_encoding option can
	      be used.

       -check This option is for diagnostic purposes:  The  HTML  document  is
	      only  parsed and not processed otherwise.	In this	mode of	opera-
	      tion, html2text will report on parse  errors  and	 scan  errors,
	      which  it	 does not in other modes of operation. Note that parse
	      and scan errors are not fatal for	html2text, but may cause  mis-
	      interpretation  of the HTML code and/or portions of the document
	      being swallowed.

       -debug-parser
	      Let html2text report on the tokens being	shifted,  rules	 being
	      applied,	etc., while scanning the HTML document.	This option is
	      for diagnostic purposes.

       -debug-scanner
	      Let html2text report on each lexical token scanned, while	 scan-
	      ning the HTML document. This option is for diagnostic purposes.

       -help  Print command line summary and exit.

       -nobs  By  default, html2text renders underlined	letters	with sequences
	      like "underscore-backspace-character" and	boldface letters  like
	      "character-backspace-character",	which works fine when the out-
	      put is piped into	more(1), less(1), or similar. For other	appli-
	      cations, or when redirecting the output into a file, it  may  be
	      desirable	not to render character	attributes with	such backspace
	      sequences,  which	can be accomplished with this command line op-
	      tion.

       -o output-file
	      Write the	output to output-file instead of  standard  output.  A
	      dash as the output-file is an alternate way to specify the stan-
	      dard output.

       -rcfile path
	      Attempt to read the file specified in path as RC file.

       -links Tags  all	external links with a number between brackets ([num]),
	      and produces a numbered list at the end of the document with all
	      link targets.

       -version
	      Print program version and	exit.

       -width width
	      By default, html2text formats the	HTML documents	for  a	screen
	      width  of	 79 characters.	If redirecting the output into a file,
	      or if your terminal has a	width other than 80 characters,	or  if
	      you  just	want to	get an idea how	html2text deals	with large ta-
	      bles and different terminal widths, you may want	to  specify  a
	      different	width.

FILES
       /etc/html2textrc
	      System wide parser configuration file.

       $HOME/.html2textrc
	      Personal	parser	configuration  file, overrides the system wide
	      values.

CONFORMING TO
       HTML 3.2	(HTML 3.2 Reference Specification -  http://www.w3.org/TR/REC-
       html32).

RESTRICTIONS
       html2text was written to	convert	HTML 3.2 documents. When using it with
       HTML 4 or even XHTML 1 documents, some constructs present only in these
       HTML versions might not be rendered.

AUTHOR
       html2text  was  written	up  to	version	1.2.2 by Arno Unkrig <arno@un-
       krig.de>	for GMRS Software GmbH,	Unterschleissheim.

       Up to version 1.3.2a, the maintainer was	Martin Bayer <mbayer@zedat.fu-
       berlin.de>.

       This  version   of   html2text	comes	from   https://github.com/gro-
       bian/html2text.	Please use the GitHub page to file issues and improve-
       ments.

SEE ALSO
       html2textrc(5), less(1),	more(1)

				  2020-04-15			  html2text(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=html2text&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help