Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
libinn_dbz(3)		  InterNetNews Documentation		 libinn_dbz(3)

NAME
       dbz - Database routines for InterNetNews

SYNOPSIS
	   #include <inn/dbz.h>

	   #define DBZMAXKEY		  ...
	   #define DBZ_INTERNAL_HASH_SIZE ...

	   typedef enum
	   {
	       DBZSTORE_OK,
	       DBZSTORE_EXISTS,
	       DBZSTORE_ERROR
	   } DBZSTORE_RESULT;

	   typedef enum
	   {
	       INCORE_NO,
	       INCORE_MEM,
	       INCORE_MMAP
	   } dbz_incore_val;

	   typedef struct {
	       bool writethrough;
	       dbz_incore_val pag_incore;
	       dbz_incore_val exists_incore;
	       bool nonblock;
	   } dbzoptions;

	   typedef struct {
	       char hash[DBZ_INTERNAL_HASH_SIZE];
	   } __attribute__((__packed__)) erec;

	   extern bool dbzinit(const char *name);
	   extern bool dbzclose(void);

	   extern bool dbzfresh(const char *name, off_t	size);
	   extern bool dbzagain(const char *name, const	char *oldname);
	   extern bool dbzexists(const HASH key);
	   extern bool dbzfetch(const HASH key,	off_t *value);
	   extern DBZSTORE_RESULT dbzstore(const HASH key, off_t data);
	   extern bool dbzsync(void);
	   extern long dbzsize(off_t contents);
	   extern void dbzsetoptions(const dbzoptions options);
	   extern void dbzgetoptions(dbzoptions	*options);

DESCRIPTION
       These functions provide an indexing system for rapid random access to a
       text file, hereafter named the base file.

       dbz stores offsets into the base	file for rapid retrieval.  All
       retrievals are keyed on a hash value that is generated by the
       HashMessageID function in libinn(3).

       dbzinit opens a database, an index into the base	file name, consisting
       of files	name.dir, name.index, and name.hash which must already exist.
       (If the database	is new,	they should be zero-length files.)  Subsequent
       accesses	go to that database until dbzclose is called to	close the
       database.  When tagged hash format is used (if --enable-tagged-hash was
       given at	configure time), a name.pag file is used instead of .index and
       .hash.

       dbzfetch	searches the database for the specified	key, assigning the
       offset of the base file for the corresponding key to value, if any.

       dbzstore	stores the key-data pair in the	database.  It will return
       "DBZSTORE_EXISTS" for duplicates	(already existing entries), and
       "DBZSTORE_OK" for success.  It will fail	with "DBZSTORE_ERROR" if the
       database	files are not writable or not opened, or if any	other error
       occurs.

       dbzexists will verify whether or	not the	given hash exists or not.  dbz
       is optimized for	this operation and it may be significantly faster than
       dbzfetch.

       dbzfresh	is a variant of	dbzinit	for creating a new database with more
       control over details.  The size parameter specifies the size of the
       first hash table	within the database, in	number of key-value pairs.
       Performance will	be best	if the number of key-value pairs stored	in the
       database	does not exceed	about 2/3 of size, or 1/2 of size when the
       tagged hash format is used.  (The dbzsize function, given the expected
       number of key-value pairs, will suggest a database size that meets
       these criteria.)	 Assuming that an fseek	offset is 4 bytes, the .index
       file will be 4 *	size bytes.  The .hash file will be
       "DBZ_INTERNAL_HASH_SIZE"	* size bytes (the .dir file is tiny and
       roughly constant	in size) until the number of key-value pairs exceeds
       about 80% of size.  (Nothing awful will happen if the database grows
       beyond 100% of size, but	accesses will slow down	quite a	bit and	the
       .index and .hash	files will grow	somewhat.)

       dbz stores up to	"DBZ_INTERNAL_HASH_SIZE" bytes (by default, 4 bytes if
       tagged hash format is used, 6 otherwise)	of the Message-ID's hash in
       the .hash file to confirm a hit.	 This eliminates the need to read the
       base file to handle collisions.

       A size of 0 given to dbzfresh is	synonymous with	the local default; the
       normal default is suitable for tables of	about 6,000,000	key-value
       pairs (or 500,000 key-value pairs when the tagged hash format is	used).
       That default value is used by dbzinit.

       When databases are regenerated periodically, as it is the case for the
       history file, it	is simplest to pick the	parameters for a new database
       based on	the old	one.  This also	permits	some memory of past sizes of
       the old database, so that a new database	size can be chosen to cover
       expected	fluctuations.  dbzagain	is a variant of	dbzinit	for creating a
       new database as a new generation	of an old database.  The database
       files for oldname must exist.  dbzagain is equivalent to	calling
       dbzfresh	with a size equal to the result	of applying dbzsize to the
       largest number of entries in the	oldname	database and its previous 10
       generations.

       When many accesses are being done by the	same program, dbz is massively
       faster if its first hash	table is in memory.  If	the pag_incore flag is
       set to "INCORE_MEM", an attempt is made to read the table in when the
       database	is opened, and dbzclose	writes it out to disk again (if	it was
       read successfully and has been modified).  dbzsetoptions	can be used to
       set the pag_incore and exists_incore flags to different values which
       should be "INCORE_NO" (read from	disk), "INCORE_MEM" (read from memory)
       or "INCORE_MMAP"	(read from a mmap'ed file) for the .hash and .index
       files separately; this does not affect the status of a database that
       has already been	opened.	 The default is	"INCORE_NO" for	the .index
       file and	"INCORE_MMAP" for the .hash file.  The attempt to read the
       table in	may fail due to	memory shortage; in this case dbz fails	with
       an error.  Stores to an in-memory database are not (in general) written
       out to the file until dbzclose or dbzsync, so if	robustness in the
       presence	of crashes or concurrent accesses is crucial, in-memory
       databases should	probably be avoided or the writethrough	option should
       be set to true (telling to systematically write to the filesystem in
       addition	to updating the	in-memory database).

       If the nonblock option is true, then writes to the .hash	and .index
       files will be done using	non-blocking I/O.  This	can be significantly
       faster if your platform supports	non-blocking I/O with files.  It is
       only applicable if you're not mmap'ing the database.

       dbzsync causes all buffers etc. to be flushed out to the	files.	It is
       typically used as a precaution against crashes or concurrent accesses
       when a dbz-using	process	will be	running	for a long time.  It is	a
       somewhat	expensive operation, especially	for an in-memory database.

       Concurrent reading of databases is fairly safe, but there is no
       (inter)locking, so concurrent updating is not.

       An open database	occupies three stdio streams and two file descriptors;
       Memory consumption is negligible	except for in-memory databases (and
       stdio buffers).

DIAGNOSTICS
       Functions returning bool	values return true for success,	false for
       failure.

       dbzinit attempts	to have	errno set plausibly on return, but otherwise
       this is not guaranteed.	An errno of "EDOM" from	dbzinit	indicates that
       the database did	not appear to be in dbz	format.

       If "DBZTEST" is defined at compile-time,	then a main() function will be
       included.  This will do performance tests and integrity test.

BUGS
       Unlike dbm, dbz will refuse to dbzstore with a key already in the
       database.  The user is responsible for avoiding this.

       The RFC5322 case	mapper implements only a first approximation to	the
       hideously-complex RFC5322 case rules.

       dbz no longer tries to be call-compatible with dbm in any way.

HISTORY
       The original dbz	was written by Jon Zeeff
       <zeeff@b-tech.ann-arbor.mi.us>.	Later contributions by David Butler
       and Mark	Moraes.	 Extensive reworking, including	this documentation, by
       Henry Spencer <henry@zoo.toronto.edu> as	part of	the C News project.
       MD5 code	borrowed from RSA.  Extensive reworking	to remove backwards
       compatibility and to add	hashes into dbz	files by Clayton O'Neill
       <coneill@oneill.net>.  Rewritten	into POD by Julien Elie.

SEE ALSO
       dbm(3), history(5), libinn(3).

INN 2.8.0			  2023-08-05			 libinn_dbz(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=libinn_dbz&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help