Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
STORAGE.CONF(5)		  InterNetNews Documentation	       STORAGE.CONF(5)

NAME
       storage.conf - Configuration file for storage manager

DESCRIPTION
       The file	pathetc/storage.conf contains the rules	to be used in
       assigning articles to different storage methods.	 These rules determine
       where incoming articles will be stored.

       The storage manager is a	unified	interface between INN and a variety of
       different storage methods, allowing the news administrator to choose
       between different storage methods with different	trade-offs (or even
       use several at the same time for	different newsgroups, or articles of
       different sizes).  The rest of INN need not care	what type of storage
       method was used for a given article; the	storage	manager	will figure
       this out	automatically when that	article	is retrieved via the storage
       API.  Note that you may also want to see	the options provided in
       inn.conf(5) regarding article storage.

       The storage.conf	file consists of a series of storage method entries.
       Blank lines and lines beginning with a number sign ("#")	are ignored.
       The maximum number of characters	in each	line is	255.  The order	of
       entries in this file is important, see below.

       Each entry specifies a storage method and a set of rules.  Articles
       which match all of the rules of a storage method	entry will be stored
       using that storage method; if an	article	matches	multiple storage
       method entries, the first one will be used.  Each entry is formatted as
       follows:

	   method <methodname> {
	       newsgroups: <wildmat>
	       class: <storage_class>
	       size: <minsize>[,<maxsize>]
	       expires:	<mintime>[,<maxtime>]
	       options:	<options>
	       exactmatch: <bool>
	       filtered: <bool>
	       path: <wildmat>
	   }

       If spaces or tabs are included in a value, that value must be enclosed
       in double quotes	("").  If either a number sign ("#") or	a double quote
       are meant to be included	verbatim in a value, they should be escaped
       with "\".

       <methodname> is the name	of a storage method to use for articles	which
       match the rules of this entry.  The currently available storage methods
       are:

	   cnfs
	   timecaf
	   timehash
	   tradspool
	   trash

       See the "STORAGE	METHODS" section below for more	details.

       The meanings of the keys	in each	storage	method entry are as follows:

       newsgroups: <wildmat>
	   What	newsgroups are stored using this storage method.  <wildmat> is
	   a  uwildmat	pattern	 which	is  matched  against the newsgroups an
	   article is posted to.  If storeonxref in  inn.conf  is  true,  this
	   pattern  will  be  matched  against the newsgroup names in the Xref
	   header field	body;  otherwise,  it  will  be	 matched  against  the
	   newsgroup   names   in   the	 Newsgroups  header  field  body  (see
	   inn.conf(5)	for  discussion	 of  the  differences  between	 these
	   possibilities).   Poison  wildmat expressions (expressions starting
	   with	"@") are allowed and can be  used  to  exclude	certain	 group
	   patterns:  articles	crossposted to poisoned	newsgroups will	not be
	   stored using	this storage method.  The <wildmat> pattern is matched
	   in order.

	   There is no default newsgroups pattern; if an  entry	 should	 match
	   all newsgroups, use an explicit "newsgroups:	*".

       class: <storage_class>
	   An  identifier  for	this  storage  method  entry.  <storage_class>
	   should be a number between 0	and 255.  It should be	unique	across
	   all	of the entries in this file.  It is mainly used	for specifying
	   expiration times by storage class as	 described  in	expire.ctl(5);
	   "timehash"  and  "timecaf" will also	set the	top-level directory in
	   which articles accepted by this storage class are stored.   Storage
	   classes can be for instance numbered	sequentially in	storage.conf.

	   The	assignment  of	a  particular  number  to  a  storage class is
	   arbitrary but permanent (since it is	used in	storage	tokens).  As a
	   matter of fact, an article is assigned a storage class depending on
	   the storage rules in	effect at  the	time  of  its  arrival.	  This
	   identifier  will  not  change  even	if  storage.conf  is  modified
	   afterwards and the same article would have  been  assigned  another
	   storage class, had it been received after that change.  The article
	   is still perfectly valid and	retrievable.  The only difference will
	   be  for  expiration	with groupbaseexpiry set to false in inn.conf:
	   the rules in	expire.ctl apply to  the  storage  class  assigned  to
	   articles at their arrival.

       size: <minsize>[,<maxsize>]
	   A  range  of	 article sizes (in bytes) which	should be stored using
	   this	storage	method.	 If <maxsize> is 0 or  not  given,  the	 upper
	   size	 of  articles  is limited only by maxartsize in	inn.conf.  The
	   size: field is optional and may be omitted  entirely	 if  you  want
	   articles  of	 any  size to be stored	in this	storage	method (if, of
	   course, these articles fulfill all the other	requirements  of  this
	   storage method entry).  By default, <minsize> is set	to 0.

       expires:	<mintime>[,<maxtime>]
	   A  range  of	 article expiration times which	should be stored using
	   this	storage	method.	 Be careful; this is less useful than  it  may
	   appear at first.  This is based only	on the Expires header field of
	   the	article,  not  on any local expiration policies	or anything in
	   expire.ctl!	If <mintime> is	non-zero, then	this  entry  will  not
	   match  any  article	without	 an Expires header field.  This	key is
	   therefore only really useful	for assigning articles with  requested
	   longer  expire  times  to a separate	storage	method.	 Articles only
	   match if the	time until expiration (that is to say, the  amount  of
	   time	 into  the future that the Expires header field	of the article
	   requests that it remain around) falls in the	interval specified  by
	   <mintime> and <maxtime>.

	   The format of these parameters is "0d0h0m0s"	(days, hours, minutes,
	   and	seconds	 into  the  future).   If  <maxtime> is	"0s" or	is not
	   specified, there is no upper	bound on  expire  times	 falling  into
	   this	 entry	(note  that this key has no effect on when the article
	   will	actually be expired, but only on whether or  not  the  article
	   will	 be  stored  using  this  storage method).  This field is also
	   optional and	may be omitted entirely	if you do not  want  to	 store
	   articles according to their Expires header field, if	any.

	   A  <mintime>	 value	greater	 than  "0s"  implies that this storage
	   method won't	match any article without an Expires header field.

       options:	<options>
	   This	key is for passing special options  to	storage	 methods  that
	   require  them  (currently  only "cnfs").  See the "STORAGE METHODS"
	   section below for a description of its use.

       exactmatch: <bool>
	   If this key is set to true, all the newsgroups  in  the  Newsgroups
	   header  field  body (or Xref	if storeonxref in inn.conf is true) of
	   incoming articles will be examined to see if	they match  newsgroups
	   patterns.  (Normally, any non-zero number of	matching newsgroups is
	   sufficient,	provided  no  newsgroup	 matches  a  poison wildmat as
	   described above.)  This is a	boolean	value; "true", "yes" and  "on"
	   are	usable	to  enable  this key.  The case	of these values	is not
	   significant.	 The default is	false.

       filtered: <bool>
	   If this key is set to true, the article must	have been rejected  by
	   any	enabled	 article filters (Perl or Python) for innd.  This also
	   requires that  dontrejectfiltered  is  set  to  true	 in  inn.conf.
	   Filtered  articles  are  usually  stored in a small CNFS buffer, or
	   another storage method with a rather	tight expiration policy.  This
	   is a	boolean	value; "true", "yes" and "on"  are  usable  to	enable
	   this	 key.	The  case  of  these  values  is not significant.  The
	   default is false.

	   If all the storage classes have this	key  set  to  false,  filtered
	   articles are	stored in the same storage class as accepted articles.
	   It is only when at least one	storage	class has this key set to true
	   than	 filtered  articles and	accepted articles are no longer	stored
	   mixed together in any storage class.

       path: <wildmat>
	   What	articles by their Path header  field  are  stored  using  this
	   storage  method.   <wildmat>	is a uwildmat pattern which is matched
	   against the Path header field body of articles,  which  corresponds
	   to  where  articles have passed, or were posted at.	Poison wildmat
	   expressions (expressions starting with "@") are allowed and can  be
	   used	 to  exclude  certain  path  patterns  (in lieu	of expressions
	   starting with "!" that are not considered negated because "!" has a
	   special meaning in Path header fields).  The	<wildmat>  pattern  is
	   matched in order.

	   A typical use case might be to store	articles from a	spammy site in
	   a small CNFS	buffer to avoid	overall	retention impacts:

	       path: "*!spam-site.example.com!not-for-mail"

	   The default is to match all articles.

       If  an article matches all of the constraints of	an entry, it is	stored
       via that	storage	method and is associated  with	that  <storage_class>.
       This  file  is scanned in order and the first matching entry is used to
       store the article.

       If an article does not match any	entry, either by  being	 posted	 to  a
       newsgroup  which	 does  not  match  any of the <wildmat>	patterns or by
       being outside  the  size	 and  expires  ranges  of  all	entries	 whose
       newsgroups  pattern  it	does  match,  the article is not stored	and is
       rejected	by innd.  When this happens, the error message:

	   cant	store article: no matching entry in storage.conf

       is logged to syslog.  If	you want to silently  drop  articles  matching
       certain	newsgroup  patterns  or	size or	expires	ranges,	assign them to
       the "trash" storage method  rather  than	 having	 them  not  match  any
       storage method entry.

STORAGE	METHODS
       Currently,  there  are five storage methods available.  Each method has
       its pros	and cons; you can choose any mixture of	them  as  is  suitable
       for   your  environment.	  Note	that  each  method  has	 an  attribute
       EXPENSIVESTAT which indicates whether  checking	the  existence	of  an
       article is expensive or not.  This is used to run expireover(8).

       cnfs
	   The	"cnfs"	storage	method stores articles in large	cyclic buffers
	   (CNFS stands	for Cyclic News	File System).  Articles	are stored  in
	   CNFS	 buffers in arrival order, and when the	buffer fills, it wraps
	   around to the beginning and stores new articles over	the top	of the
	   oldest articles in the buffer.  The expire time of articles	stored
	   in  CNFS  buffers  is  therefore entirely determined	by how long it
	   takes the buffer to wrap around, which depends on how quickly  data
	   is  being  stored  in  it.	(This method is	therefore said to have
	   self-expire functionality.  It also means that when an  article  is
	   cancelled, the cycbuff doesn't go back and use space	until it rolls
	   over	 and the whole cycbuff starts being reused.)  EXPENSIVESTAT is
	   false for this method.

	   CNFS	has its	own configuration file,	cycbuff.conf, which  describes
	   some	 subtleties  to	 the  basic  description given above.  Storage
	   method entries for the "cnfs" storage method	must have an  options:
	   field  specifying the metacycbuff into which	articles matching that
	   entry  should  be  stored;  see  cycbuff.conf(5)  for  details   on
	   metacycbuffs.

	   Advantages:	By  far	the fastest of all storage methods (except for
	   "trash"), since it eliminates the overhead of dealing with  a  file
	   system  and	creating new files.  Unlike all	other storage methods,
	   it does not require manual  article	expiration.   With  CNFS,  the
	   server  will	 never	throttle  itself due to	a full spool disk, and
	   groups are restricted to just the buffer files given	so  that  they
	   can never use more than the amount of disk space allocated to them.

	   Disadvantages:  Article  retention  times  are  more	 difficult  to
	   control  because  old  articles  are	  overwritten	automatically.
	   Attacks on Usenet, such as flooding or massive amounts of spam, can
	   result  in wanted articles expiring much faster than	intended (with
	   no warning).

       timecaf
	   This	method stores multiple articles	in one	file,  whose  name  is
	   based  on  the  article's  arrival time and the storage class.  The
	   file	name will be:

	       <patharticles>/timecaf-nn/bb/aacc.CF

	   where "nn" is the hexadecimal value of  <storage_class>,  "bb"  and
	   "aacc" are the hexadecimal components of the	arrival	time, and "CF"
	   is  a hardcoded extension.  (The arrival time, in seconds since the
	   epoch, is converted to hexadecimal and interpreted  as  0xaabbccdd,
	   with	 "aa",	"bb",  and  "cc" used to build the path.)  This	method
	   does	not have self-expire functionality (meaning expire has to  run
	   periodically	 to delete old articles, as well as cancelled articles
	   if immediatecancel is not set to true in inn.conf).	 EXPENSIVESTAT
	   is false for	this method.

	   A  given  CAF file contains all the articles	received during	a time
	   frame of 4 minutes or so (256 seconds), and is limited  to  262,144
	   articles  and  about	 3,5  GB.  It is enough	for normal operations.
	   The only caveat is when you're feeding at  high  speed  bunches  of
	   articles  between two servers; you'll then want to limit it to that
	   amount of articles during the time frame when  a  CAF  file	stores
	   newly arrived articles.

	   Advantages:	It  is	roughly	 four times faster than	"timehash" for
	   article writes, since much of the file system overhead is bypassed,
	   while still retaining the same fine control over article  retention
	   time.

	   Disadvantages:  Using  this method means giving up all but the most
	   careful manually fiddling with the article spool; in	 this  aspect,
	   it  looks  like  "cnfs".  As	one of the newer and least widely used
	   storage types, "timecaf" has	not been as thoroughly tested  as  the
	   other  methods.   It	 requires  running a nightly expire program to
	   delete old articles by either compacting CAF	files  if  they	 still
	   contain available articles, or removing them.

       timehash
	   This	 method	 is very similar to "timecaf" except that each article
	   is stored in	a separate file.  The name of the  file	 for  a	 given
	   article will	be:

	       <patharticles>/time-nn/bb/cc/yyyy-aadd

	   where "nn" is the hexadecimal value of <storage_class>, "yyyy" is a
	   hexadecimal	sequence  number,  and	"bb",  "cc",  and  "aadd"  are
	   components of the arrival time in hexadecimal (the arrival time  is
	   interpreted as documented above under "timecaf").  This method does
	   not have self-expire	functionality.	Cancelled articles are removed
	   immediately.	 EXPENSIVESTAT is true for this	method.

	   Advantages:	Heavy  traffic	groups do not cause bottlenecks, and a
	   fine	control	of article retention time is still possible.

	   Disadvantages: The ability to easily	find all articles in  a	 given
	   newsgroup  and  manually fiddle with	the article spool is lost, and
	   INN still  suffers  from  speed  degradation	 due  to  file	system
	   overhead   (creating	 and  deleting	individual  files  is  a  slow
	   operation) and from a higher	 inode	usage.	 It  also  requires  a
	   nightly  expire  program  to	 delete	 old  articles out of the news
	   spool.

       tradspool
	   Traditional spool, or "tradspool", is the traditional news  article
	   storage  format.  Each article is stored in an individual text file
	   named:

	       <patharticles>/news/group/name/nnnnn

	   where "news/group/name" is the name of the newsgroup	to  which  the
	   article was posted with each	period changed to a slash, and "nnnnn"
	   is  the  sequence  number  of  the  article in that newsgroup.  For
	   crossposted articles, the article is	linked into each newsgroup  to
	   which  it  is  crossposted  (using  either hard or symbolic links).
	   This	is the way versions of INN prior to 2.0	stored	all  articles,
	   as  well  as	 being	the  article storage format used by C News and
	   earlier news	 systems.   This  method  does	not  have  self-expire
	   functionality.    Cancelled	 articles   are	 removed  immediately.
	   EXPENSIVESTAT is true for this method.

	   Advantages: It is widely used  and  well-understood;	 it  can  read
	   article  spools  written  by	 older	versions  of  INN  and	it  is
	   compatible  with  all  third-party  INN  add-ons.	This   storage
	   mechanism provides easy and direct access to	the articles stored on
	   the	server,	makes writing programs that fiddle with	the news spool
	   very	easy, gives fine control over  article	retention  times,  and
	   comes with the scanspool support utility to perform sanity checks.

	   Disadvantages:  It  needs  a	faster file system and I/O system than
	   the cnfs and	timecaf	storage	methods	due to file  system  overhead.
	   Groups  with	 heavy	traffic	tend to	create a bottleneck because of
	   inefficiencies in storing large  numbers  of	 article  files	 in  a
	   single  directory.	It consumes more inodes	and requires a nightly
	   expire program to delete old	articles out of	the news spool.

       trash
	   This	method silently	discards all articles stored in	it.  Its  only
	   real	 uses  are  for	 testing  and for silently discarding articles
	   matching a particular storage method	entry (for  whatever  reason).
	   Articles  stored in this method take	up no disk space and can never
	   be retrieved, so this method	has  self-expire  functionality	 of  a
	   sort.  EXPENSIVESTAT	is false for this method.

EXAMPLES
       The  following sample storage.conf file would store all articles	posted
       to alt.binaries.* in the	"BINARIES" CNFS	metacycbuff, all articles over
       roughly 50 KB in	any other hierarchy in the "LARGE"  CNFS  metacycbuff,
       all  other  articles  in	 alt.*	in  one	 timehash class, and all other
       articles	in any newsgroups in a second timehash class, except  for  the
       internal.* hierarchy which is stored in traditional spool format.

	   method tradspool {
	       class: 1
	       newsgroups: internal.*
	   }
	   method cnfs {
	       class: 2
	       newsgroups: alt.binaries.*
	       options:	BINARIES
	   }
	   method cnfs {
	       class: 3
	       newsgroups: *
	       size: 50000
	       options:	LARGE
	   }
	   method timehash {
	       class: 4
	       newsgroups: alt.*
	   }
	   method timehash {
	       class: 5
	       newsgroups: *
	   }

       Notice  that the	last storage method entry will catch everything.  This
       is a good habit to get into; make sure  that  you  have	at  least  one
       catch-all entry just in case something you did not expect falls through
       the  cracks.   Notice  also  that  the  special rule for	the internal.*
       hierarchy is first, so it  will	catch  even  articles  crossposted  to
       alt.binaries.* or over 50 KB in size.

       As  for poison wildmat expressions, if you have for instance an article
       crossposted between misc.foo and	misc.bar, the pattern:

	   misc.*,!misc.bar

       will match that article whereas the pattern:

	   misc.*,@misc.bar

       will not	match that article.  An	article	posted only to	misc.bar  will
       fail to match either pattern.

       Usually,	high-volume groups and groups whose articles do	not need to be
       kept  around  very  long	(binaries groups, *.jobs*, news.lists.filters,
       etc.) are stored	in CNFS	buffers.   Use	the  other  methods  (or  CNFS
       buffers	again)	for  everything	 else.	However, it is as often	as not
       most convenient to keep in "tradspool" special hierarchies  like	 local
       hierarchies  and	 hierarchies  that  should never expire	or through the
       spool of	which you need to go manually.

HISTORY
       Written	by  Katsuhiro  Kondou  <kondou@nec.co.jp>  for	 InterNetNews.
       Rewritten into POD by Julien Elie.

SEE ALSO
       cycbuff.conf(5),	 expire.ctl(5),	 expireover(8),	 inn.conf(5), innd(8),
       libinn_uwildmat(3), scanspool(8).

INN 2.8.0			  2025-03-29		       STORAGE.CONF(5)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=storage.conf&sektion=5&manpath=FreeBSD+Ports+14.3.quarterly>

home | help