Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
rdrview(1)		    General Commands Manual		    rdrview(1)

NAME
       rdrview - extract readable content from a webpage

SYNOPSIS
       rdrview	[-v] [-u base-url] [-E encoding] [-A user-agent] [-T template]
       [-P] [-c|-H|-M|-B browser] [path|url]

DESCRIPTION
       rdrview attempts	to extract the meaningful content from a  webpage,  as
       done  by	 the  "Reader View" feature of most modern browsers.  It's in-
       tended to be used with terminal RSS readers, to clean up	 the  articles
       for display on web browsers such	as lynx.

       If  no url or path is provided, the HTML	will be	read from standard in-
       put.  By	default, rdrview will check mailcap for	a way to  display  the
       content	as text.  If preferred,	a browser can be specified with	the -B
       option, or with the RDRVIEW_BROWSER environment variable.

EXAMPLES
       If you have a text mode browser,	you can	extract	content	with just:
	  rdrview 'https://en.wikipedia.org/wiki/World_wide_web'

       To see the same article in a browser:
	  rdrview -B firefox 'https://en.wikipedia.org/wiki/World_wide_web'

       To clean	up local HTML files:
	  rdrview -H -u	'http://fakehost.com' <	source.html > result.html

       To mediate between the newsboat(1) feed reader and lynx(1):
	  BROWSER='rdrview -B lynx' newsboat

OPTIONS
       -c, --check
	      Don't extract content, just run a	quick check to see if the doc-
	      ument appears to have any.  Exit status is 0 in that case, or  1
	      otherwise.

       -u base-url, --base=base-url
	      Specify  the base	to be used for all relative URLs.  This	option
	      is most useful for local files and  standard  input,  where  the
	      document's URL may be unknown.

       -v, --version
	      Print the	version	number of rdrview and exit.

       -A user-agent, --agent=user-agent
	      Specify  the user-agent string.  The default should work fine in
	      most situations.

       -B browser, --browser=browser
	      Specify a	browser	to display the result.

       -E encoding, --encoding=encoding
	      Specify the character encoding of	the source.  By	 default,  the
	      meta tags	will be	checked.

       -H, --html
	      Output  the  raw	HTML  for the extracted	article.  WARNING: the
	      markup may still contain some scripts so,	if you plan to open it
	      with a modern browser at some point, first check how  it	imple-
	      ments the	same-origin policy for local files.

       -M, --meta
	      Output only the metadata for the article.

       -P, --preserve-classes
	      Don't remove html	class attributes.

       -T template, --template=template
	      Pick the metadata	to include in the extracted article.  The tem-
	      plate is a comma-separated list of some of the following:	title,
	      body,  byline,  excerpt,	sitename, url.	The order matters, and
	      metadata fields can be repeated.	By default, only the  body  is
	      included.

       --disable-sandbox
	      Disable  the  security sandbox.  This option is potentially dan-
	      gerous, so don't use it unless you know what you are doing.

EXIT STATUS
       The exit	status is 0 on success,	1 on failure.

ENVIRONMENT
       Any environment understood by curl(1) can be used here.	TMPDIR is  re-
       spected as well.

       RDRVIEW_BROWSER
	  Default  browser  to	display	 the extracted articles. The -B	option
	  overrides this.

       RDRVIEW_TEMPLATE
	  Default template for article content.	The -T option overrides	 this;
	  see that option for details.

       RDRVIEW_USER_AGENT
	  Default user-agent string, overridden	by the -A option.

BUGS
       The  markup produced by the -H option is	a huge mess.  If you intend to
       work with it you	may want to pipe it to something like tidy(1) first.

       If you have a version of	the libraries that hasn't been tested, the se-
       curity sandbox might not	allow the code to run.	 Please	 report	 this,
       but  in	the  meantime,	an  option is provided to disable the sandbox.
       Don't use it unless you have other security measures in place.

AUTHOR
       Ernesto A. Fernndez <ernesto.mnd.fernandez@gmail.com>

       Please report bugs via email or,	if preferred, file a github  issue  at
       https://github.com/eafer/rdrview/issues.

       Credits	to Readability.js by Mozilla; this tool	is mostly a transpila-
       tion of their code done by hand.

SEE ALSO
       lynx(1),	newsboat(1)

0.1.4				   May 2025			    rdrview(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=rdrview&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help