FreeBSD Manual Pages
rdrview(1) General Commands Manual rdrview(1) NAME rdrview - extract readable content from a webpage SYNOPSIS rdrview [-v] [-u base-url] [-E encoding] [-A user-agent] [-T template] [-P] [-c|-H|-M|-B browser] [path|url] DESCRIPTION rdrview attempts to extract the meaningful content from a webpage, as done by the "Reader View" feature of most modern browsers. It's in- tended to be used with terminal RSS readers, to clean up the articles for display on web browsers such as lynx. If no url or path is provided, the HTML will be read from standard in- put. By default, rdrview will check mailcap for a way to display the content as text. If preferred, a browser can be specified with the -B option, or with the RDRVIEW_BROWSER environment variable. EXAMPLES If you have a text mode browser, you can extract content with just: rdrview 'https://en.wikipedia.org/wiki/World_wide_web' To see the same article in a browser: rdrview -B firefox 'https://en.wikipedia.org/wiki/World_wide_web' To clean up local HTML files: rdrview -H -u 'http://fakehost.com' < source.html > result.html To mediate between the newsboat(1) feed reader and lynx(1): BROWSER='rdrview -B lynx' newsboat OPTIONS -c, --check Don't extract content, just run a quick check to see if the doc- ument appears to have any. Exit status is 0 in that case, or 1 otherwise. -u base-url, --base=base-url Specify the base to be used for all relative URLs. This option is most useful for local files and standard input, where the document's URL may be unknown. -v, --version Print the version number of rdrview and exit. -A user-agent, --agent=user-agent Specify the user-agent string. The default should work fine in most situations. -B browser, --browser=browser Specify a browser to display the result. -E encoding, --encoding=encoding Specify the character encoding of the source. By default, the meta tags will be checked. -H, --html Output the raw HTML for the extracted article. WARNING: the markup may still contain some scripts so, if you plan to open it with a modern browser at some point, first check how it imple- ments the same-origin policy for local files. -M, --meta Output only the metadata for the article. -P, --preserve-classes Don't remove html class attributes. -T template, --template=template Pick the metadata to include in the extracted article. The tem- plate is a comma-separated list of some of the following: title, body, byline, excerpt, sitename, url. The order matters, and metadata fields can be repeated. By default, only the body is included. --disable-sandbox Disable the security sandbox. This option is potentially dan- gerous, so don't use it unless you know what you are doing. EXIT STATUS The exit status is 0 on success, 1 on failure. ENVIRONMENT Any environment understood by curl(1) can be used here. TMPDIR is re- spected as well. RDRVIEW_BROWSER Default browser to display the extracted articles. The -B option overrides this. RDRVIEW_TEMPLATE Default template for article content. The -T option overrides this; see that option for details. RDRVIEW_USER_AGENT Default user-agent string, overridden by the -A option. BUGS The markup produced by the -H option is a huge mess. If you intend to work with it you may want to pipe it to something like tidy(1) first. If you have a version of the libraries that hasn't been tested, the se- curity sandbox might not allow the code to run. Please report this, but in the meantime, an option is provided to disable the sandbox. Don't use it unless you have other security measures in place. AUTHOR Ernesto A. Fernndez <ernesto.mnd.fernandez@gmail.com> Please report bugs via email or, if preferred, file a github issue at https://github.com/eafer/rdrview/issues. Credits to Readability.js by Mozilla; this tool is mostly a transpila- tion of their code done by hand. SEE ALSO lynx(1), newsboat(1) 0.1.4 May 2025 rdrview(1)
NAME | SYNOPSIS | DESCRIPTION | EXAMPLES | OPTIONS | EXIT STATUS | ENVIRONMENT | BUGS | AUTHOR | SEE ALSO
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=rdrview&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>
