FreeBSD Manual Pages

home | help
QDBM(3)			    Quick Database Manager		       QDBM(3)

NAME
       QDBM - quick database manager

OVERVIEW
       QDBM is a library of routines for managing a database.  The database is
       a  simple  data	file containing	records, each is a pair	of a key and a
       value.  Every key and value is serial bytes with	variable length.  Both
       binary data and character string	can be used as	a  key	and  a	value.
       There  is  neither  concept of data tables nor data types.  Records are
       organized in hash table or B+ tree.

       As for database of hash table, each key must be unique within  a	 data-
       base, so	it is impossible to store two or more records with a key over-
       laps.  The following access methods are provided	to the database: stor-
       ing  a  record  with a key and a	value, deleting	a record by a key, re-
       trieving	a record by a key.  Moreover, traversal	access	to  every  key
       are  provided,  although	 the order is arbitrary.  These	access methods
       are similar to ones of DBM (or its followers: NDBM  and	GDBM)  library
       defined	in  the	UNIX standard.	QDBM is	an alternative for DBM because
       of its higher performance.

       As for database of B+ tree, records whose keys are  duplicated  can  be
       stored.	 Access	 methods of storing, deleting, and retrieving are pro-
       vided as	with the database of hash table.  Records are stored in	 order
       by  a  comparing	function assigned by a user.  It is possible to	access
       each record with	the cursor in ascending	or descending order.   Accord-
       ing  to	this  mechanism, forward matching search for strings and range
       search for integers are realized.  Moreover, transaction	 is  available
       in database of B+ tree.

EFFECTIVE IMPLEMENTATION OF HASH DATABASE
       QDBM  is	 developed  referring to GDBM for the purpose of the following
       three points: higher processing speed, smaller size of a	database file,
       and simpler API.	 They have been	 achieved.   Moreover,	the  following
       three  restrictions  of	traditional DBM: a process can handle only one
       database, the size of a key and a value is bounded, a database file  is
       sparse, are cleared.

       QDBM  uses  hash	 algorithm to retrieve records.	 If a bucket array has
       sufficient number of elements, the  time	 complexity  of	 retrieval  is
       `O(1)'.	 That  is,  time required for retrieving a record is constant,
       regardless of the scale of a database.  It is also the same about stor-
       ing and deleting.  Collision of hash  values  is	 managed  by  separate
       chaining.  Data structure of the	chains is binary search	tree.  Even if
       a  bucket  array	 has unusually scarce elements,	the time complexity of
       retrieval is `O(log n)'.

       QDBM attains improvement	in retrieval by	loading	RAM with the whole  of
       a  bucket array.	 If a bucket array is on RAM, it is possible to	access
       a region	of a target record by about one	path of	 file  operations.   A
       bucket  array saved in a	file is	not read into RAM with the `read' call
       but directly mapped to RAM with the `mmap' call.	  Therefore,  prepara-
       tion  time  on  connecting to a database	is very	short, and two or more
       processes can share the same memory map.

       If the number of	elements of a bucket array is about  half  of  records
       stored  within a	database, although it depends on characteristic	of the
       input, the probability of collision  of	hash  values  is  about	 56.7%
       (36.8%  if the same, 21.3% if twice, 11.5% if four times, 6.0% if eight
       times).	In such	case, it is possible to	retrieve a record  by  two  or
       less paths of file operations.  If it is	made into a performance	index,
       in  order  to  handle  a	 database containing one million of records, a
       bucket array with half a	million	of elements is needed.	 The  size  of
       each  element  is 4 bytes.  That	is, if 2M bytes	of RAM is available, a
       database	containing one million records can be handled.

       QDBM provides  two  modes  to  connect  to  a  database:	 `reader'  and
       `writer'.   A  reader  can  perform  retrieving but neither storing nor
       deleting.  A writer can perform all access methods.  Exclusion  control
       between	processes  is  performed when connecting to a database by file
       locking.	 While a writer	is connected to	a  database,  neither  readers
       nor  writers  can be connected.	While a	reader is connected to a data-
       base, other readers can be connect, but writers can not.	 According  to
       this  mechanism,	 data consistency is guaranteed	with simultaneous con-
       nections	in multitasking	environment.

       Traditional DBM provides	two modes of the storing operations:  `insert'
       and  `replace'.	In the case a key overlaps an existing record, the in-
       sert mode keeps the existing value, while the replace  mode  transposes
       it to the specified value.  In addition to the two modes, QDBM provides
       `concatenate'  mode.   In the mode, the specified value is concatenated
       at the end of the existing value	and stored.  This  feature  is	useful
       when  adding  a element to a value as an	array.	Moreover, although DBM
       has a method to fetch out a value from a	database only by  reading  the
       whole of	a region of a record, QDBM has a method	to fetch out a part of
       a region	of a value.  When a value is treated as	an array, this feature
       is also useful.

       Generally  speaking,  while  succession	of  updating, fragmentation of
       available regions occurs, and the size of  a  database  grows  rapidly.
       QDBM  deal  with	this problem by	coalescence of dispensable regions and
       reuse of	them, and featuring of optimization of a database.  When over-
       writing a record	with a value whose size	is greater than	 the  existing
       one,  it	 is  necessary to remove the region to another position	of the
       file.  Because the time complexity of the operation depends on the size
       of the region of	a record, extending  values  successively  is  ineffi-
       cient.	However,  QDBM deal with this problem by alignment.  If	incre-
       ment can	be put in padding, it is not necessary to remove the region.

       As for many file	systems, it is impossible to handle a file whose  size
       is more than 2GB.  To deal with this problem, QDBM provides a directory
       database	 containing  multiple database files.  Due to this feature, it
       is possible to handle a database	whose total size is up to 1TB in  the-
       ory.   Moreover,	 because  database  files  can be deployed on multiple
       disks, the speed	of updating operations can be improved as with	RAID-0
       (striping).   It	 is  also possible for the database files to deploy on
       multiple	file servers using NFS and so on.

USEFUL IMPLEMENTATION OF B+ TREE DATABASE
       Although	B+ tree	database is slower than	hash database, it features or-
       dering access to	each record.  The order	 can  be  assigned  by	users.
       Records	of  B+	tree are sorted	and arranged in	logical	pages.	Sparse
       index organized in B tree that is multiway balanced tree	are maintained
       for each	page.  Thus, the time complexity of retrieval  and  so	on  is
       `O(log  n)'.   Cursor  is provided to access each record	in order.  The
       cursor can jump to a position specified by a key	and can	 step  forward
       or  backward  from the current position.	 Because each page is arranged
       as double linked	list,  the  time  complexity  of  stepping  cursor  is
       `O(1)'.

       B+ tree database	is implemented,	based on above hash database.  Because
       each page of B+ tree is stored as each record of	hash database, B+ tree
       database	 inherits  efficiency  of storage management of	hash database.
       Because the header of each record is smaller and	alignment of each page
       is calculated statistically, in most cases, the size of	database  file
       is cut by half compared to one of hash database.	 Although operation of
       many  pages  are	required to update B+ tree, QDBM expedites the process
       by caching pages	and reducing file operations.  In most cases,  because
       whole  of  the  sparse index is cached on memory, it is possible	to re-
       trieve a	record by one or less path of file operations.

       B+ tree database	features transaction mechanism.	  It  is  possible  to
       commit  a series	of operations between the beginning and	the end	of the
       transaction in a	lump, or to abort the transaction and perform rollback
       to the state before the transaction.  Even if the process of an	appli-
       cation  is crushed while	the transaction, the database file is not bro-
       ken.

       In case that QDBM is built with ZLIB, LZO, or BZIP2 enabled, a lossless
       data-compression	library, the content of	each page of B+	tree  is  com-
       pressed	and stored in a	file.  Because each record in a	page has simi-
       lar patterns, high efficiency of	compression is	expected  due  to  the
       Lempel-Ziv  algorithm  and  the	like.  In case handling	text data, the
       size of a database is reduced to	about 25%.  If the scale of a database
       is large	and disk I/O is	the bottleneck,	 featuring  compression	 makes
       the processing speed improved to	a large	extent.

SIMPLE BUT VARIOUS INTERFACES
       QDBM  provides very simple APIs.	 You can perform database I/O as usual
       file I/O	with `FILE' pointer defined in ANSI C.	In the	basic  API  of
       QDBM,  entity  of  a database is	recorded as one	file.  In the extended
       API, entity of a	database is recorded as	several	files  in  one	direc-
       tory.   Because	the two	APIs are very similar with each	other, porting
       an application from one to the other is easy.

       APIs which are compatible with NDBM and GDBM  are  also	provided.   As
       there  are a lot	of applications	using NDBM or GDBM, it is easy to port
       them onto QDBM.	In most	cases, it is completed only by replacement  of
       header  including  (#include)  and re-compiling.	 However, QDBM can not
       handle database files made by the original NDBM or GDBM.

       In order	to handle records on memory easily, the	utility	 API  is  pro-
       vided.	It  implements memory allocating functions, sorting functions,
       extensible datum, array list, hash map, and so on.  Using them, you can
       handle records in C language cheaply as in  such	 script	 languages  as
       Perl or Ruby.

       B+  tree	 database  is used with	the advanced API.  The advanced	API is
       implemented using the basic API and the utility API.  Because  the  ad-
       vanced API is also similar to the basic API and the extended API, it is
       easy to learn how to use	it.

       In  order to handle an inverted index which is used by full-text	search
       systems,	the inverted API is provided.  If it is	easy to	handle an  in-
       verted  index of	documents, an application can focus on text processing
       and natural language processing.	 Because this API does not  depend  on
       character  codes	nor languages, it is possible to implement a full-text
       search system which can respond to various requests from	users.

       Along with APIs for C, QDBM provides APIs  for  C++,  Java,  Perl,  and
       Ruby.   APIs  for C are composed	of seven kinds:	the basic API, the ex-
       tended API, the NDBM-compatible API, the	GDBM-compatible	API, the util-
       ity API,	the advanced API, and the inverted API.	 Command  line	inter-
       faces corresponding to each API are also	provided.  They	are useful for
       prototyping,  testing,  debugging, and so on.  The C++ API encapsulates
       database	handling functions of the basic	API, the extended API, and the
       advanced	API with class mechanism of C++.   The	Java  API  has	native
       methods	calling	 the basic API,	the extended API, and the advanced API
       with Java Native	Interface.  The	Perl API has methods calling the basic
       API, the	extended API, and the advanced API with	XS language.  The Ruby
       API has method calling the basic	API, the extended  API,	 and  the  ad-
       vanced  API  as modules of Ruby.	 Moreover, CGI scripts for administra-
       tion of databases and full-text search are provided.

WIDE PORTABILITY
       QDBM is implemented being based on syntax of ANSI  C  (C89)  and	 using
       only  APIs  defined  in ANSI C or POSIX.	 Thus, QDBM works on most UNIX
       and its compatible OSs.	As for C API, checking	operations  have  been
       done  at	least on Linux 2.2, Linux 2.4, FreeBSD 4.8, FreeBSD 5.0, SunOS
       5.7, SunOS 5.8, SunOS 5.9, HP-UX	11.00, Cygwin 1.3.10, Mac OS  X	 10.2,
       and  RISC OS 5.03.  Although a database file created by QDBM depends on
       byte order of the processor, to do with it, utilities to	dump  data  in
       format which is independent to byte orders are provided.

BUILDING
       For  building a program using QDBM, the program should be linked	with a
       library file `libqdbm.a'	or `libqdbm.so'.  For example,	the  following
       command is executed to build `sample' from `sample.c'.

	      gcc  -I/usr/local/include	 -o  sample  sample.c -L/usr/local/lib
	      -lqdbm

AUTHOR
       QDBM is written by Mikio	Hirabayashi.  You can contact  the  author  by
       e-mail to <mikio@fallabs.com>.  Any suggestion or bug report is welcome
       to the author.

COPYRIGHT
       Copyright(c) 2000-2003 Mikio Hirabayashi

       QDBM  is	 free software;	you can	redistribute it	and/or modify it under
       the terms of the	GNU Lesser General Public License as published by  the
       Free Software Foundation; either	version	2 of the License, or any later
       version.

       QDBM is distributed in the hope that it will be useful, but WITHOUT ANY
       WARRANTY;  without even the implied warranty of MERCHANTABILITY or FIT-
       NESS FOR	A PARTICULAR PURPOSE.  See the GNU Lesser General  Public  Li-
       cense for more details.

       You  should  have  received a copy of the GNU Lesser General Public Li-
       cense along with	QDBM; if not, write to the Free	 Software  Foundation,
       Inc., 59	Temple Place, Suite 330, Boston, MA 02111-1307 USA.

SEE ALSO
       depot(3),  curia(3),  relic(3), hovel(3), cabin(3), villa(3), odeum(3),
       ndbm(3),	gdbm(3)

Man Page			  2004-04-22			       QDBM(3)
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=qdbm&sektion=3&manpath=FreeBSD+Ports+15.0>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages