Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
DIABLO(8)		    System Manager's Manual		     DIABLO(8)

NAME
       diablo -	NetNews	daemon for backbone article transit

SYNOPSIS
       diablo [	-A newsadminname ] [ -B	ip/hostname[:port] ] [ -c  commonpath-
       name  ] [ -d[n] ] [ -e pctimeout	] [ -F filterpath ] [ -h reportedhost-
       name ] [	-M maxforkper ]	[ -P port ] -p newspathname/0 [	-R rxbufsize ]
       [ -S[Bn[sn]][Nn[sn]] ] [	 -s  argv-buffer-space-for-ps-status  ]	 [  -T
       txbufsize ] [ -X	 xrefhost ] [ -x ] server

DESCRIPTION
       Diablo is an internet news backbone storage and transit server.	Diablo
       sits  on	the NNTP port of the machine and accepts inbound news articles
       from Innd or Diablo based servers... really anything that can  run  in-
       nxmit  or  the  newslink.   Diablo  stores the articles and handles the
       queueing	for outbound feeds.  Queue files are in	an dnewslink  compati-
       ble  format and dnewslink is supplied with the distribution.  Diablo is
       about 10-20 times as efficient as Innd when dealing with	inbound	 traf-
       fic, mainly due to the fact that	it is a	forking	server.	 Diablo's mem-
       ory footprint of	less then a megabyte is	tiny compared to innd.	Diablo
       was  initially written by Matt Dillon over a weekend and	has grown from
       there.

       Many of the options below can be	configured in diablo.config.

       -A newsadminname

       The news	administrator email address that is  reported  in  the	banner
       message for new connections. This defaults to ``news@hostname''.

       -B ip/hostname[:port]

       Specify	the IP address or hostname for an interface that diablo	should
       sit on.	The port can be	specified after	a ':'. The default is all  in-
       terfaces.

       -c  commonpathname sets the common path name.  The path is prepended to
       the Path: header	only if	it does	not already exist in the Path: header.
       Usually both -p and -c options are used.	 The newspathname is placed in
       front of	the commonname in this case (assuming it does not exist	 else-
       where  in  the  path),  as in wil be added (along with -p) in the order
       specified on the	command-line.

       -d[n] turns on debugging.  Specifying  a	 number	 increases  the	 debug
       level.

       -e  pctimeout sets the precommit	cache timeout.	The default is 30 sec-
       onds.  Setting it to 0 disables the  precommit  cache.	The  precommit
       cache  is a check/ihave message-id lockout used to prevent simultanious
       article reception of the	same article.  The  first  client  to  send  a
       check  for  a  message-id wins.	Other clients will get a dup or	reject
       return code for that message-id for 30 seconds.

       -F path

       Specifies the path to the external spamfilter. The path must  be	 fully
       qualified.

       -h reportedhostname

       Diablo calls gethostbyname() to set the hostname	it reports on connect.
       On  some	 systems this will not necessarily be what you want so you can
       override	it with	the -h option.

       -M maxforkper

       Set the maximum number of simultanious connections from any  given  re-
       mote host.  For example,	if you set this	to 10, each of your feeds will
       be  allowed  to make up to 10 simultanious connections to you.  The de-
       fault is	0 (unlimited)

       -P port

       Specify the port	that the diablo	should sit on.	The  default  is  119.
       This is commonly	used to	run a server on	a different port (say, 434) so
       you can run a reader on the main	port.

       -p newspathname

       -p0  sets  the domain name to prepend to	the Path: header.  This	option
       is required.  If	you specify -p0, diablo	will not insert	anything  into
       the  Path:  header,  i.e. when you use Diablo as	a bridge rather	then a
       full router.  The use of	-p0 is NOT RECOMMENDED.	 Also note that	 ipad-
       dress.MISMATCH Path: elements will still	be added in either case	if the
       first  element  of  the Path: on	the incoming article does not match an
       alias in	the appropriate	dnewsfeeds file	entry. Multiple	-p options may
       be used and will	be added (along	with -c) options in the	 order	speci-
       fied  on	the command-line. The last -p option is	used for Xref: genera-
       tion.

       -R rxbufsize

       Set the TCP receive buffer size.

       -S[Bn[sn]][Nn[sn]

       This option enables the internal	spamfilter and will  override  the  an
       ISPAM  entry  in	 dnewsfeeds  before any	articles will reach the	filter
       even if it is enabled here. The ISPAM entry determines  which  articles
       are sent	to the internal	filter.	 There are two internal	filters	avail-
       able:

       Duplicate body detection	NNTP-Posting-Host rate detection

       Each  type is enabled with a different option, which also sets the trip
       value for that type.

       Bn - enables duplicate body detection and sets the  number  of  allowed
       duplicates before further articles are rejected.

       Nn - enables the	NNTP-Posting-Host rate detection and the number	speci-
       fies how	many duplicate hosts are allowed in an hour before extra arti-
       cles from that host are rejected.

       en - set	the expire time	(in seconds) for the previous

       sn  -  set  the number of entries in the	filter hash table to n for the
       previous	'B' or 'N' optiopn. The	size must be a power of	2. Default  is
       65536 entries.

       Both  types  of	filters	also make a note of the	number of lines	in the
       body of the article to reduce the possibility of	false duplicates.

       Use of this option causes the creation of 2 files in path_db  that  are
       used to store the filter	hash tables.

       e.g:  B6s32768 N16 would	set the	body filter trip to 6, with a hash ta-
       ble size	of 32768 entries and the nph filter trip to 16	with  the  de-
       fault hash table	size of	65536.

       The default is disabled (B0 N0).

       The  spam filter	utilizes a fixed-size hash table cache and rate-limits
       postings	with the same number of	lines from the same NNTP-Posting-Host:
       source or with the same body hash.  If the rate exceeds n articles over
       a period	of e seconds, all further matching articles will be rejected.

       -s argvbufferspace Generally used to reserve buffer space so diablo can
       generate	a real time status in its argv that the	 system's  ps  command
       can read.  This does not	work with all operating	systems.

       -T txbufsize

       Set  the	 TCP transmit buffer size.  A minimum size of 4K is imposed to
       guarentee lockup-free operation in streaming mode.

       -X xrefhost sets	the XRef: hostname used	when generating	 Xref:	lines.
       The default is to use the newspathname if Xref: generation is enabled.

       -x

       If  active  file	is enabled and we are not an Xref: slave, then use the
       Xref: line to update the	NX field in the	active file.  This  is	useful
       for a backup to the Xref: generator in large installations.

       Diablo  understands  a subset of	the NNTP protocol.  The	basic commands
       it understands are ihave, check,	and takethis.  Diablo also understands
       stat, head, mode	stream,	and quit.  Diablo also implements a number  of
       commands	 to  support remotely configured newsfeeds files on a site-by-
       site basis.  These are feedrset,	feedadd, feeddel, and feedcommit.  Re-
       mote sites may query the	state of outgoing feeds	directed to them  with
       the outq	command.

       Diablo  is  strictly  a	news  holding and transit server.  It does not
       maintain	a newsgroup , or active	file, and it does not  store  articles
       in a hierarchy based on the group name.	Diablo stores files in a hier-
       archy  based  on	 the  time received and	a randomly generated iteration
       number.	A new directory	is created every 10 minutes and	each incomming
       connection creates its own file.	 Multiple articles may	be  stored  in
       each file.  Connections that last more then 10 minutes will close their
       current	file  and  reopen a new	one in the new directory.  Diablo also
       maintains a history database, called dhistory, which  references	 arti-
       cles  based on their hash code and stores reception date	and expiration
       information.  The history database is headed by a  four	million	 entry
       hash table then followed	by linked lists	of History structures in a ma-
       chine-readable  (but  not  human-readable)  format.  It should be noted
       that the	Message-ID is not stored anywhere but in the  article  and  in
       the  outbound  feed  queue files.  If two different Message-IDs wind up
       with the	same hash code,	one of the articles will be lost.  Given a (as
       of this writing)	full feed of 250,000 articles a	day, a	maximum	 life-
       time  of	 16 days, and 62 significant bits in the hash code, collisions
       will statistically occur	only once every	 4  billion  articles  or  so.
       This is the price for using Diablo, and I consider it a minor one.

       The  file names also have an iteration tagged onto the end.  The	itera-
       tion is used to group files within a 10 minute-span directory.	If  an
       article	collision on input occurs, whichever diablo process missed the
       history commit will remove the data associated with  the	 article  from
       its spool file.

       Critical-path  operations  in diablo are	extremely efficient due	to the
       time-locality for most of its operations.  From a time-local  point  of
       view,  files  are  created in the same reasonably-sized directory.  The
       diablo expiration program, dexpire , does not rewrite the history  file
       (see  didump  and diload	for that), Instead it simply scans it, removes
       expired files, and updates the history file in-place  to	 indicate  the
       fact.   There  are  no softlinks, because the spool is not based	on the
       group name(s).	Cleaning  the  spool  directory	 is  trivial  because,
       frankly,	 there aren't many files in it.	 In a very heavily loaded sys-
       tem, approximately 80 files are created every  10  minutes.   The  only
       real random access is the history file itself.  Due to the fixed-length
       records,	dhistory is around 1/2 the size	of a typical INN history file.
       Since there is no active	or newsgroups file to maintain,	no renumbering
       mechanism is required.  Diablo forks for	each inbound connection	allow-
       ing  history  file lookups and file creates to run in parallel.	Diablo
       uses true record	locking	for history database updates and none  at  all
       for lookups.

       Finally,	 being	strictly transit in nature, Diablo does	not attempt to
       act on the contents of the message...  For  example,  control  messages
       are  ignored,  and  Diablo  makes no header modifications except	to the
       Path: header and	to remove the Xref: header, if it exists.  The	source
       of  the	feed is	expected to generate a properly	formatted article, and
       very little article checking is done until after	the article  has  pro-
       pogated to a newsreader site (beyond Diablo).  Any content-specific ac-
       tion  which  you	wish to	support	must be	dealt with through an external
       medium using the	outbound feed mechanism.

       Diablo's	dexpire	program	uses a dynamic	expiration  mechanism  whereby
       you give	it a free-space	goal and it scales the dexpire.ctl expirations
       accordingly to reach that goal.	It should be noted that	the expiration
       is  stored in the history file at the time of article reception and NOT
       calculated when dexpire is run.	The dexpire.ctl	file has a  number  of
       features	 that allow you	to scale the expiration	based on the number of
       cross posts and message-size, and to reject messages for	certain	groups
       that are	too large.

       Diablo maintains	a pipe between each forked child and the master	accep-
       tor server and has a mechanism which may	be used	to issue  commands  to
       the  running  system.   The master acceptor server handles all outbound
       feed file queueing which	makes feed file	flushing a very	simple command
       to issue.  You may also request the master server to exit,  which  pro-
       pogates	to  the	 forked	 slaves	 and guarentees	that all outbound feed
       files have been flushed.	 The program used to issue commands to	Diablo
       is  called  dicmd , and is generally run	with flush or exit as an argu-
       ment.  Queue file flushing works	in a manner similar to	Innd  in  that
       you are supposed	to rename the server-created queue file	and then flush
       it  with	 dicmd	, but unlike Innd, the file is not wiped out if	you do
       not rename it.  It is instead reopened for append.  Diablo includes two
       separate	programs, dspoolout and	dnewsfeed which	do queue file sequenc-
       ing, management,	outbound feeds,	and trimming.  The  dnewsfeed  program
       can  run	 the  non-streaming  ihave  protocol, or can run the streaming
       check/takethis protocol.	 It dynamically	figures	out  what  the	remote
       end can handle.	It should be noted that	Diablo can run all of its com-
       mands fully streamed, not just the check/takethis protocol.

CRON JOBS
       Typically  you set up a number of cron jobs to support the running Dia-
       blo server.

       dspoolout -s 9 ,	should generally be run	every 5	minutes.  The -s argu-
       ment should generally be	2x-1 the cron interval,	see  the  manual  page
       for dspoolout for more information.

       dexpire	-r2000	,  should  generally be	run every 4 hours. -rFREESPACE
       tells  dexpire  to  remove  files  until	 the  free-space  target,   in
       megabytes,  is reached.	In this	example, we have a 2GB free space tar-
       get.  Once your system has stabilized, you  can	reduced	 this  to  1GB
       safely,	and less if you	are not	taking a full feed.  It	should roughly
       be equivalent to	5% of your available news spool	space.	You  may  have
       to run dexpire more often with tighter free-space margins.

       The  adm/biweekly.atrim	script	should	generally be run twice a week.
       The script shuts	down the diablo	server,	then renames and rewrites  the
       dhistory	 file  using  a	combination of didump and diload to remove ex-
       pired entries over 16 days old.	The dhistory file is  typically	 about
       1/2  the	size of	an INN history file for	a full feed, so	it is not nec-
       essary to run this script more then once	a week.	 Diablo	must  be  shut
       down  during  this procedure to prevent appends to the older version of
       the history file	from occuring.

       adm/daily.atrim , To rotate the log files in the	 log/  directory.   If
       you  are	 using	syslog to generate a /var/log/news or other log	files,
       you need	to have	appropriate crontab entries to rotate them as well.

LOGGING
       Diablo syslog's to NEWS.	 It typically  generates  both	per-connection
       statistics and global statistics.

       The per-connection statistics are made up of two	lines.	Each line con-
       tains key=value pairs as	described below.

       secs - elapsed time of connection

       ihave - number of IHAVE nntp commands received

       chk - number of CHECK nntp commands received

       rec - number of articles	received from remote

       rej - of	the received articles, the number rejected

       predup -	number of duplicate articles via takethis determined to	be du-
       plicates	prior to the first byte	of the article being received.

       posdup -	(meaningless)

       pcoll  -	pre-commit cache collision.  Typically indicates that either a
       history collision occurs	against	some other article simultaniously  in-
       transit or that a history collision occured with	recently received mes-
       sage-ids.

       spam - number of	articles determined to be spam by the spam filter

       err - number of errors that occured.  Typically protocol	errors

       added - of the received articles, the number committed to the spool

       bytes - number of bytes committed to the	spool

       The second statistics line contains key-value pairs as shown below.

       acc - number of articles	accepted

       ctl - of	the accepted articles, how many	were control messages

       failsafe	- rejected due to failsafe, typically means that the spool di-
       rectory structure got messed up.

       misshdrs	 -  rejected due to missing required headers.	Can also occur
       when the	feeder sends an	empty article  (  typically  occurs  when  the
       feeder cannot find the article in its spool ).

       tooold -	rejected for being too old.

       grpfilt	-  rejected  due to the	incoming group filter for this feed in
       dnewsfeeds.

       spamfilt	- rejected due to the spam filter

       earlyexp	- of the articles received, the	number that have been accepted
       but will	be expired early, usually due to dexpire.ctl.

       instantexp - rejected because dexpire.ctl indicated  that  the  article
       would expire instantly.

       notinactv  -  rejected because none of the newsgroups are in the	active
       file ( if you have 'activedrop' set in diablo.config ).

       ioerr - rejected	due to an I/O or other abnormal	error

       The global statistics are logged	by the master diablo process  and  in-
       clude the key-value pairs shown below.

       uptime -	total uptime in	hours and minutes.

       arts - total number of articles accepted

       bytes - total number of bytes accepted

       fed - aggregate number of articles queued to outgoing feeds

CONCEPTS
       The  Diablo system employs a number of concepts to attain high through-
       put and efficiency.  Some, like	the  fork()ing	server,	 are  obvious.
       Others are not so obvious.

       The  history file consists of a chained hash table with a four-million-
       entry base array.  History entries form a linked	list relative to their
       base index, which is itself  calculated	through	 a  hashing  function.
       When new	history	entries	are added, they	are physically appended	to the
       file but	logically inserted at the base of the appropriate linked list,
       NOT  at the end.	 What this means is that certain programs such as dex-
       pire , which scan the history file  linearly  rather  then  follow  the
       chains,	generally  wind	up accessing files grouped by directory.  This
       is very efficient.  Searches, however, run through the chains and  thus
       scan  the  chain	 in  reverse-time  order, with the most	recent entries
       scanned first.  While this hops	through	 the  history  file  (you  hop
       through	it anyway), it is well optimized by the	fact that (a) the hash
       table array is so large,	and (b)	it is likely to	be looking up more re-
       cently received articles	and thus likely	to hit them  first.   Searches
       for  which  failures are	expected only have the advantage of (a), but I
       had to compromise somewhere.

       The spool directory itself is organized by time-received.   It  is  ex-
       plicitly	NOT organized by the Date: field or by group.  A new directory
       is  created  every  10 minutes, and in a	heavily	loaded system does not
       generally contain more then 80 or so spool files, each containing  mul-
       tiple  articles.	 Inbound articles have the advantage of	being appended
       to open descriptors as well as being readily  cacheable	and  in	 time-
       proximity  localized  directories,  and outbound	articles have the same
       advantage.  Even	when some of your feeds	get  behind,  per-process  ac-
       cesses  are  readily cacheable and the kernel can generally survive the
       partitioning effect.  This is quite unlike standard INN	spool  manage-
       ment which bounces files	all over the group hierarchy and makes article
       adds and	accesses almost	random.

       Direct access to	the articles is	supported by looking the article up in
       the history file.  The history file contains the	time-received and that
       combined	 with  the iteration id, a byte	offset,	and byte count,	allows
       you to access the physical article.

SIGNALS
       Sending a USR1 signal to	Diablo will  enable  debugging.	  Diablo  will
       output  debug info for each received article and	will indicate the rea-
       son for any rejection.  More USR1's bump	up the debug level.  A	single
       USR2  signal  will set the debug	level back to 0.  It is	suggested that
       signals only be sent to child processes and never the parent Diablo.

TYPICAL	PERFORMANCE, TUNING SUGGESTIONS
       See the KERNEL_NOTES file for tuning suggestions	 and  machine-specific
       configurations.

SEE ALSO
       diablo(8), dicmd(8), didump(8), diload(8), dnewslink(8),	doutq(8), dex-
       pire(8),	  dexpireover(8),  diconvhist(8),  dilookup(8),	 dspoolout(8),
       dkp(8), dpath(8), diablo-kp(5), diablo-files(5)

								     DIABLO(8)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=diablo&sektion=8&manpath=FreeBSD+Ports+15.0>

home | help