Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
NETSTIFF(1)			   netstiff			   NETSTIFF(1)

NAME
       netstiff	- powerful and easy tool to check for Web and FTP updates

SYNOPSIS
       netstiff	[options] [command]

DESCRIPTION
       Netstiff	(formerly known	as webdiff) is a powerful and easy-to-use tool
       which checks for	Web page and/or	FTP site updates.

       For  the	Web, updates are recognized using several test criteria	(diff,
       html, size, date, md5sum, regexp).  The FTP update checker is only able
       to diff on directory listings and files and to compare size and date of
       files.

       Without a given command,	netstiff will check for	updates	of the	speci-
       fied URIs and then print	the changes.  If no configuration file exists,
       the configurator	is launched instead.

       Netstiff	 exits	after all configured URIs are checked.	Occuring warn-
       ings and	errors leave a message in the log  file	 (~/.netstiff/lastlog)
       and on stderr.  Use it with cron	if you want to check for updates regu-
       larly.

COMMANDS
       You  can	only pass one command to netstiff. It has to be	the last argu-
       ment in the argument list.

       Commands	may be shortened down to one character (e.g. c instead of con-
       figure).	Leading	dashes are ignored.

       If you start netstiff without command, the full command will be used.

       configure
	      Use this command if you want to start the	configurator, the  in-
	      teractive	 configuration	tool  of  netstiff. Of course, you may
	      also edit	the configuration file in ~/.netstiff/config by	 hand.
	      Using  the configurator is recommended if	you are	a new netstiff
	      user, because it explains	the possible test  methods,  validates
	      your  regexps, etc.  Nevertheless, the configuration file	format
	      is very easy.  See section CONFIGURATION FILE.
	      The configurator will not	 initialize  the  netstiff  cache  for
	      added  URIs,  i.e. it will not download anything.	 To do so, you
	      have to run netstiff update first.  This is a feature.
	      If the config file does not  exit,  the  configuration  tool  is
	      started automatically.

       diff   Use  this	command	if you want to see the differences between two
	      versions of  saved  content  (Web	 pages	or  meta  data).   See
	      diff(1).

	      The  version  after  the last reset (or the initial version) and
	      the version of the last update will be compared.

       full   Use this command if you simply want netstiff to  check  for  up-
	      dates and	print the diff.

	      full is a	simple replacement for the following sequence:
	      netstiff update >	/dev/null
	      netstiff diff
	      netstiff reset

       help   Use  this	command	to get usage information about netstiff. To be
	      honest, this manual page in conjunction with the configurator is
	      a	better documentation.

       reset  Use this command after you noticed all differences with the diff
	      command (see above), so that diff	will not  show	you  the  same
	      changes again and	again.

       update Use this command if you want netstiff to fetch the data from the
	      specified	 URIs  and  show  you only those - one per line	- that
	      have changed since your last update.

       version
	      This command will	display	version	number and copyright.

OPTIONS
       You may pass the	following options.

       --no-stderr, -S
	      Use this option  to  suppress  warning  and  error  messages  on
	      stderr.  Thus the	messages can only be seen in the log file.

       --workdir DIR, -W DIR
	      Use  this	 option	 if you	want to	specify	another	working	direc-
	      tory. The	working	directory  is  the  directory  where  netstiff
	      reads  the  configuration	 file,	stores the downloaded data and
	      writes it	logs.  It defaults to ~/.netstiff.  See	 also  section
	      BUGS.

RESTRICTIONS
       There  is  no  special  case  to	handle status codes other than 200. In
       practice, netstiff will neither follow redirections nor will it	notice
       any  4xx	 or  5xx  error	code. The resulting error pages	are treated as
       usual Web pages.	No logged message. Please check	on your	own.

USAGE EXAMPLE
       You want	to add a new URI netstiff should check for updates.
	       netstiff	conf
       The configurator	is not described here. I know some weaknesses  in  us-
       ability,	but you	can get	along with it.

       When  you  are  seeing  your shell prompt again,	you know that netstiff
       should retrieve an initial version of the Web page you specified.
	       netstiff	update
       After some weeks	in the sun you want to see if something	 has  changed.
       So you let netstiff check for updates.
	       netstiff
       It is printing an URI! Let's see	the changes!
	       netstiff	diff
       Oh, it is so much, that it does not fit on a screen!
	       netstiff	d | pager
       Now  you	are satisfied because you read all the changes.	So you finally
       do
	       netstiff	reset
       and netstiff forgets about the changes.

CONFIGURATION FILE
       There is	no need	to manually edit the configuration file	WORKDIR/config
       (usually	~/.netstiff/config), because netstiff configure	should do  the
       job.   But  sometimes it	is easier to edit a simple file	than to	browse
       through menus, or you are writing another application that changes net-
       stiff settings.	So it is useful	to describe the	file format here.

   RULES
	 Whitespace at	the begin and end of each line is ignored.

	 Empty	lines are ignored.

	 A line beginning with	# is regarded as comment.

	 A line beginning with	+ is regarded as option.  The +	is followed by
	  the option name, some	whitespace and the option value.

	 A line neither beginning with	# nor +	 is  regarded  as  URI.	  URIs
	  without  scheme  (https://,  http://,	ftp://)	are recognized as HTTP
	  URIs.

	 The configurator interprets a	comment	right above an URI as the  ti-
	  tle of the URI.

	 Options  always  apply  to the first URI above.  Options without URI
	  line above are global	options	and affect every  URI  that  does  not
	  override these specific options.

   CONFIGURATION OPTIONS
       The following options are generally available:

       test   sets the test method (or test criteria).
	      See section TEST METHODS for a description.  Defaults to diff.

       timeout
	      sets the timeout (in seconds) for	TCP connections.
	      Defaults to 20.

       The following options only affect HTTP URIs:

       client set the user-agent string.
	      Some  web	 sites check the HTTP header field User-Agent and dis-
	      play different content for different agents.   By	 setting  this
	      field  you can pretend to	use Mozilla Firefox, for example.  Be-
	      cause many log analyzer tools for	webmasters display  statistics
	      about that field,	you may	spread the word	about netstiff by set-
	      ting this	variable to the	truth: netstiff. ;-)
	      Example:	+  client  Mozilla/5.0	(X11;  U;  Linux  i686;	en-US;
	      rv:1.8.1.12) Gecko/20080208 Galeon/2.0.4
	      This option is not set by	default.

       lang   sets the accepted	languages.
	      Internationalized	web sites offer	there  contents	 in  different
	      languages	 and  may check	the HTTP header	field Accept-Language.
	      It contains a list of languages (and sometimes extra information
	      like associated countries) sorted	by priority.  The best way  to
	      get a good value is to copy and paste it from the	preferences of
	      your web browser.
	      Example: de,en;q=0.9
	      This option is not set by	default.

       proxy  sets  HTTP  proxy	host and port.	Must be	in the form host:port.
	      Will fail	if no port is given.

       range  sets the range (in bytes)	to get from a server.
	      Use this option if you are only interested in the	changes	within
	      a	small region of	a big file on a	 HTTP  server.	 Examples  are
	      12000-12500 or 13000- (till the end).
	      The  Range  feature  is  not supported by	all web	servers	or for
	      every content. That means, that some web servers send the	 whole
	      content instead of only the given	range.
	      This option is not set by	default.

       referer
	      sets the referrer.
	      Some web sites check the HTTP header field Referer and refuse to
	      display  the  wished  contents  if  it is	not appropriately set.
	      When clicking on a link in an ordinary web browser, the referrer
	      is set to	the URI, where you clicked on the  link.   By  setting
	      this option to an	URI, you can pretend clicking on a link	on the
	      web  page	of this	URI.  Please do	not use	this option to `adver-
	      tise' your own homepage (so-called referer spamming).
	      This option is not set by	default.

       The following options only affect the test method html:

       htmlcmd
	      sets the command that is used to produce non-HTML	human-readable
	      output. The command will be run on a temporary file.
	      Doing many experiments I got my best  results  using  +  htmlcmd
	      lynx  -nolist -dump.  Other dumpers had features,	like justified
	      text or well-formatted tables, that turned out to	 be  disadvan-
	      tages when looking at the	diffs.
	      This  option  is	not  set  by default. If you use the html test
	      method then, a very simple mechanism will	hide HTML tags.	 It is
	      possible to get good results doing that, but it  is  not	likely
	      and thus not recommended to leave	this option unset.

       The following options only affect the test methods diff and html:

       start, end
	      Motivation:  Many	 modern	 or CMS-generated web pages have a dy-
	      namic and	a static part. For example, at the beginning of	a  web
	      page  there  is  always  a  randomly  chosen citation the	author
	      liked. At	the end	there is a calendar showing the	current	 date,
	      a	 weather  analysis  for	 the next days,	and some other useless
	      stuff.  The information you want to  monitor  for	 changes  (the
	      static  part)  is	 situated  between those dynamic parts.	 It is
	      very often possible to figure out	textual	anchors, that indicate
	      the start	or the end of the static part.
	      Using this options you can set regular expressions to  that  an-
	      chors.   For example, if the last	entry of the navigation	bar is
	      Imprint and thereafter comes the static part, set	+  start  /Im-
	      print/.	I hope,	you can	imagine	analogous examples for the end
	      option.
	      Note, that the regular expressions act on	the unprocessed	 input
	      (e.g. HTML source	code), also when using the html	test method.
	      These options are	not set	by default.

       The following options only affect FTP URIs:

       passive
	      is  a  boolean  option  (value true or false, case-insensitive).
	      Passive mode (PASV) will not be used on FTP connections iff  set
	      to false.
	      Defaults to true.

   EXAMPLE
       # this is my netstiff config file
       + test	   html
       + htmlcmd   lynx	-nolist	-dump
       + client	   netstiff
       + lang	   de_DE
       + timeout   6

       # local usage statistics
       http://localhost/stats.php
	 + start   /Statistics/
	 + end	   /Generating page took/

       # sbeyer's homepage
       http://pkqs.net/~sbeyer/

       # buggy scripts test
       http://localhost/buggyscripts/test.cgi
	 + test	/Internal Server Error/

       # muetze's funny	videos
       ftp://foo:duff23@muetze.localnet/funnyvideos/
	 + passive false

TEST METHODS
       The following test methods can be used:

       date   On  HTTP	URIs,  this  method downloads the HTTP header to check
	      when the file has	last been  modified.   To  make	 this  feature
	      work,  the  server  should response the Last-Modified header en-
	      tity.  This behaviour can	become useless when fetching some  dy-
	      namic web	sites.
	      On  FTP URIs, this method	requests the last modification date of
	      the file on the FTP site to check	when the file  has  last  been
	      modified.

       diff   This  method  downloads the HTTP content,	FTP file or FTP	direc-
	      tory listing and saves the two last versions.  Later you can use
	      netstiff diff to take a look at a	diff of	these versions.

       html   This method acts like diff, but assumes to get  HTML  input  and
	      preprocesses it to it more human-readable.
	      See  also	 the description of the	htmlcmd	option in section CON-
	      FIGURATION FILE /	CONFIGURATION OPTIONS.
	      This method is not available on FTP URIs.

       md5sum This method downloads the	HTTP header to check if	 the  MD5  sum
	      has  changed.  The server	should response	the Content-MD5	header
	      entity to	make this method work.
	      Use this method on big binary files on HTTP sites	 and  only  if
	      the server supports it. (netstiff	will tell you.)
	      This method is not available on FTP URIs.

       size   On  HTTP URIs, this method downloads the HTTP header to check if
	      the file size has	changed.  This feature needs the server	to re-
	      sponse the Content-Length	header entity.
	      On FTP URIs, this	method requests	the size of the	 file  on  the
	      FTP site to check	if it has changed.

       /regexp/
	      This  method  downloads the HTTP content and checks if the given
	      regular expression matches or not.  The URI  is  prompted	 (when
	      using update) iff	this match status has changed.
	      This method is not available on FTP URIs.

RETURN VALUE
       The number of errors are	returned. So exit code 0 is success.

BUGS
       The  regular  expression	stuff is using the eval	function of Ruby. This
       means that you are able to do  non-regex-related	 stuff	using  special
       strings as `regular expressions'. This is a big security	issue when us-
       ing  netstiff  as  a backend for	e.g. Web applications. So do NOT do it
       and NEVER start netstiff	on foreign, unchecked configurations  (-W  can
       be dangerous).

       Feel free to send feedback, bug reports,	etc.

AUTHOR AND COPYRIGHT
       (C) 2004, 2007-2008 Stephan Beyer <s-beyer@gmx.net>, GNU	GPL

sbeyer				   20080331			   NETSTIFF(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=netstiff&sektion=1&manpath=FreeBSD+Ports+15.0.quarterly>

home | help