Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
TOKYOCABINET(3)			 Tokyo Cabinet		       TOKYOCABINET(3)

       tokyocabinet - a	modern implementation of DBM

       Tokyo  Cabinet  is  a library of	routines for managing a	database.  The
       database	is a simple data file containing records, each is a pair of  a
       key  and	 a  value.   Every key and value is serial bytes with variable
       length.	Both binary data and character string can be used as a key and
       a  value.   There  is  neither  concept	of data	tables nor data	types.
       Records are organized in	hash table, B+ tree, or	fixed-length array.

       As for database of hash table, each key must be unique within  a	 data-
       base, so	it is impossible to store two or more records with a key over-
       laps.  The following access methods are provided	to the database: stor-
       ing  a  record  with a key and a	value, deleting	a record by a key, re-
       trieving	a record by a key.  Moreover, traversal	access	to  every  key
       are  provided,  although	 the order is arbitrary.  These	access methods
       are similar to ones of DBM (or its followers: NDBM  and	GDBM)  library
       defined	in the UNIX standard.  Tokyo Cabinet is	an alternative for DBM
       because of its higher performance.

       As for database of B+ tree, records whose keys are  duplicated  can  be
       stored.	 Access	 methods of storing, deleting, and retrieving are pro-
       vided as	with the database of hash table.  Records are stored in	 order
       by  a comparison	function assigned by a user.  It is possible to	access
       each record with	the cursor in ascending	or descending order.   Accord-
       ing  to	this  mechanism, forward matching search for strings and range
       search for integers are realized.

       As for database of fixed-length array, records are stored  with	unique
       natural	numbers.  It is	impossible to store two	or more	records	with a
       key overlaps.  Moreover,	the length of each record is  limited  by  the
       specified  length.   Provided  operations  are the same as ones of hash

       Table database is also provided as a variant of	hash  database.	  Each
       record is identified by the primary key and has a set of	named columns.
       Although	there is no concept of data schema, it is possible  to	search
       for records with	complex	conditions efficiently by using	indices	of ar-
       bitrary columns.

       Tokyo Cabinet is	written	in the C language, and provided	as API	of  C,
       Perl,  Ruby,  Java,  and	 Lua.  Tokyo Cabinet is	available on platforms
       which have API conforming to C99	and POSIX.  Tokyo Cabinet  is  a  free
       software	licensed under the GNU Lesser General Public License.

       Tokyo  Cabinet  is  developed  as the successor of GDBM and QDBM	on the
       following purposes.  They are achieved and Tokyo	Cabinet	replaces  con-
       ventional DBM products.

	      improves space efficiency	: smaller size of database file.
	      improves time efficiency : faster	processing speed.
	      improves	parallelism : higher performance in multi-thread envi-
	      improves usability : simplified API.
	      improves robustness : database file is not corrupted even	 under
	      catastrophic situation.
	      supports	64-bit	architecture : enormous	memory space and data-
	      base file	are available.

       As with QDBM, the following three restrictions of  traditional  DBM:  a
       process	can handle only	one database, the size of a key	and a value is
       bounded,	a database file	is sparse, are cleared.	 Moreover, the follow-
       ing  three restrictions of QDBM:	the size of a database file is limited
       to 2GB, environments with different byte	orders can not share  a	 data-
       base  file, only	one thread can search a	database at the	same time, are

       Tokyo Cabinet runs very fast.  For example, elapsed  time  to  store  1
       million	records	 is 0.7	seconds	for hash database, and 1.6 seconds for
       B+ tree database.  Moreover, the	size of	database of Tokyo  Cabinet  is
       very  small.   For  example, overhead for a record is 16	bytes for hash
       database, and 5 bytes for B+ tree database.   Furthermore,  scalability
       of Tokyo	Cabinet	is great.  The database	size can be up to 8EB (9.22e18

       Tokyo Cabinet uses hash algorithm to retrieve records.  If a bucket ar-
       ray has sufficient number of elements, the time complexity of retrieval
       is "O(1)".  That	is, time required for retrieving a record is constant,
       regardless of the scale of a database.  It is also the same about stor-
       ing and deleting.  Collision of hash  values  is	 managed  by  separate
       chaining.  Data structure of the	chains is binary search	tree.  Even if
       a bucket	array has unusually scarce elements, the  time	complexity  of
       retrieval is "O(log n)".

       Tokyo  Cabinet attains improvement in retrieval by loading RAM with the
       whole of	a bucket array.	 If a bucket array is on RAM, it  is  possible
       to  access a region of a	target record by about one path	of file	opera-
       tions.  A bucket	array saved in a file is not read into	RAM  with  the
       `read'  call  but  directly mapped to RAM with the `mmap' call.	There-
       fore, preparation time on connecting to a database is very  short,  and
       two or more processes can share the same	memory map.

       If  the	number	of elements of a bucket	array is about half of records
       stored within a database, although it depends on	characteristic of  the
       input,  the  probability	 of  collision	of  hash values	is about 56.7%
       (36.8% if the same, 21.3% if twice, 11.5% if four times,	6.0% if	 eight
       times).	 In  such  case, it is possible	to retrieve a record by	two or
       less paths of file operations.  If it is	made into a performance	index,
       in  order  to  handle  a	 database containing one million of records, a
       bucket array with half a	million	of elements is needed.	 The  size  of
       each  element  is 4 bytes.  That	is, if 2M bytes	of RAM is available, a
       database	containing one million records can be handled.

       Traditional DBM provides	two modes of the storing operations:  "insert"
       and  "replace".	In the case a key overlaps an existing record, the in-
       sert mode keeps the existing value, while the replace  mode  transposes
       it to the specified value.  In addition to the two modes, Tokyo Cabinet
       provides	"concatenate" mode.  In	the mode, the specified	value is  con-
       catenated at the	end of the existing value and stored.  This feature is
       useful when adding an element to	a value	as an array.

       Generally speaking, while  succession  of  updating,  fragmentation  of
       available  regions  occurs,  and	 the size of a database	grows rapidly.
       Tokyo Cabinet deal with this problem by coalescence of dispensable  re-
       gions  and reuse	of them.  When overwriting a record with a value whose
       size is greater than the	existing one, it is necessary  to  remove  the
       region to another position of the file.	Because	the time complexity of
       the operation depends on	the size of the	region of a record,  extending
       values  successively  is	inefficient.  However, Tokyo Cabinet deal with
       this problem by alignment.  If increment	can be put in padding,	it  is
       not necessary to	remove the region.

       The  "free block	pool" to reuse dispensable regions efficiently is also
       implemented.  It	keeps a	list of	 dispensable  regions  and  reuse  the
       "best  fit" region, that	is the smallest	region in the list, when a new
       block is	requested.  Because fragmentation is inevitable	even then, two
       kinds  of  optimization	(defragmentation)  mechanisms are implemented.
       The first is called static optimization which deploys all records  into
       another	file  and  then	writes them back to the	original file at once.
       The second is called dynamic optimization which gathers up  dispensable
       regions	by  replacing the locations of records and dispensable regions

       Although	B+ tree	database is slower than	hash database, it features or-
       dering  access  to  each	 record.   The order can be assigned by	users.
       Records of B+ tree are sorted and arranged in  logical  pages.	Sparse
       index organized in B tree that is multiway balanced tree	are maintained
       for each	page.  Thus, the time complexity of retrieval  and  so	on  is
       "O(log  n)".   Cursor  is provided to access each record	in order.  The
       cursor can jump to a position specified by a key	and can	 step  forward
       or  backward  from the current position.	 Because each page is arranged
       as double linked	list,  the  time  complexity  of  stepping  cursor  is

       B+ tree database	is implemented,	based on above hash database.  Because
       each page of B+ tree is stored as each record of	hash database, B+ tree
       database	 inherits  efficiency  of storage management of	hash database.
       Because the header of each record is smaller and	alignment of each page
       is  adjusted  according	to  the	 page size, in most cases, the size of
       database	file is	cut by half compared to	one  of	 hash  database.   Al-
       though operation	of many	pages are required to update B+	tree, QDBM ex-
       pedites the process by caching pages and	reducing file operations.   In
       most  cases,  because whole of the sparse index is cached on memory, it
       is possible to retrieve a record	by one or less	path  of  file	opera-

       Each  pages  of B+ tree can be stored with compressed.  Two compression
       method; Deflate of ZLIB and Block Sorting of BZIP2, are supported.  Be-
       cause  each  record  in a page has similar patterns, high efficiency of
       compression is expected due to the Lempel-Ziv or	 the  BWT  algorithms.
       In  case	handling text data, the	size of	a database is reduced to about
       25%.  If	the scale of a database	is large and disk I/O is  the  bottle-
       neck,  featuring	 compression  makes the	processing speed improved to a
       large extent.

       Fixed-length database has restrictions that each	key should be a	 natu-
       ral number and that the length of each value is limited.	 However, time
       efficiency and space efficiency are higher than the other  data	struc-
       tures as	long as	the use	case is	within the restriction.

       Because	the  whole  region  of the database is mapped on memory	by the
       `mmap' call and referred	as a multidimensional array, the overhead  re-
       lated  to  the  file  I/O  is minimized.	 Due to	this simple structure,
       fixed-length database works faster than hash database, and its  concur-
       rency in	multi-thread environment is prominent.

       The  size  of the database is proportional to the range of keys and the
       limit size of each value.  That is, the smaller the range of keys is or
       the  smaller  the  length  of each value	is, the	higher the space effi-
       ciency is.  For example,	if the maximum key is 1000000  and  the	 limit
       size  of	the value is 100 bytes,	the size of the	database will be about
       100MB.  Because regions around referred records are only	loaded on  the
       RAM,  you can increase the size of the database to the size of the vir-
       tual memory.

       Table database does not express	simple	key/value  structure  but  ex-
       presses	a  structure like a table of relational	database.  Each	record
       is identified by	the primary key	and has	 a  set	 of  multiple  columns
       named with arbitrary strings.  For example, a stuff in your company can
       be expressed by a record	identified by the primary key of the  employee
       ID  number and structured by columns of his name, division, salary, and
       so on.  Unlike relational database, table database does not need	to de-
       fine any	data schema and	can contain records of various structures dif-
       ferent from each	other.

       Table database supports query functions with not	only the  primary  key
       but  also  with conditions about	arbitrary columns.  Each column	condi-
       tion is composed	of the name of a column	and  a	condition  expression.
       Operators of full matching, forward matching, regular expression	match-
       ing, and	so on are provided for the string  type.   Operators  of  full
       matching,  range	 matching  and so on are provided for the number type.
       Operators for tag search	and full-text search  are  also	 provided.   A
       query can contain multiple conditions for logical intersection.	Search
       by multiple queries for logical union is	also available.	 The order  of
       the result set can be specified as the ascending	or descending order of
       strings or numbers.

       You can create indices for arbitrary columns to improve performance  of
       search  and  sorting.  Although columns do not have data	types, indices
       have types for strings or numbers.  Inverted indices  for  space	 sepa-
       rated tokens and	character N-gram tokens	are also supported.  The query
       optimizer uses indices in suitable way according	to  each  query.   In-
       dices are implemented as	different files	of B+ tree database.

       Databases on the	filesystem feature transaction mechanisms.  It is pos-
       sible to	commit a series	of operations between the  beginning  and  the
       end  of the transaction in a lump, or to	abort the transaction and per-
       form rollback to	the state before the transaction.  Two isolation  lev-
       els  are	 supported;  serializable and read uncommitted.	 Durability is
       secured by write	ahead logging and shadow paging.

       Tokyo Cabinet provides two modes	to connect to a	database: "reader" and
       "writer".   A  reader  can  perform  retrieving but neither storing nor
       deleting.  A writer can perform all access methods.  Exclusion  control
       between	processes  is  performed when connecting to a database by file
       locking.	 While a writer	is connected to	a  database,  neither  readers
       nor  writers  can be connected.	While a	reader is connected to a data-
       base, other readers can be connect, but writers can not.	 According  to
       this  mechanism,	 data consistency is guaranteed	with simultaneous con-
       nections	in multitasking	environment.

       Functions of API	of  Tokyo  cabinet  are	 reentrant  and	 available  in
       multi-thread  environment.  Discrete database object can	be operated in
       parallel	entirely.  For simultaneous operations of  the	same  database
       object,	read-write lock	is used	for exclusion control.	That is, while
       a writing thread	is operating the database, other reading  threads  and
       writing	threads	are blocked.  However, while a reading thread is oper-
       ating the database, reading threads are not blocked.  The locking gran-
       ularity	of  hash database and fixed-length database is per record, and
       that of the other databases is per file.

       Tokyo Cabinet provides simple API based on the object oriented  design.
       Every  operation	 for  database	is encapsulated	and published as lucid
       methods as `open'  (connect),  `close'  (disconnect),  `put'  (insert),
       `out'  (remove),	 `get'	(retrieve),  and  so on.  Because the three of
       hash, B+	tree, and fixed-length array database APIs  are	 very  similar
       with  each other, porting an application	from one to the	other is easy.
       Moreover, the abstract API is provided to handle	these  databases  with
       the same	interface.  Applications of the	abstract API can determine the
       type of the database in runtime.

       The utility API is also provided.  Such fundamental data	 structure  as
       list  and  map  are  included.  And, some useful	features; memory pool,
       string processing, encoding, are	also included.

       Six kinds of API; the utility API, the hash database API, the  B+  tree
       database	 API,  the  fixed-length database API, the table database API,
       and the abstract	database API, are provided for the C  language.	  Com-
       mand line interfaces are	also provided corresponding to each API.  They
       are useful for prototyping, test, and debugging.	 Except	for  C,	 Tokyo
       Cabinet	provides  APIs	for Perl, Ruby,	Java, and Lua.	APIs for other
       languages will hopefully	be provided by third party.

       In cases	that multiple processes	access a database at the same time  or
       some  processes	access a database on a remote host, the	remote service
       is useful.  The remote service is composed of a database	server and its
       access  library.	  Applications can access the database server by using
       the remote database API.	 The server implements HTTP and	the  memcached
       protocol	partly so that client programs on almost all platforms can ac-
       cess the	server easily.

       Tokyo Cabinet provides API of the C language and	 it  is	 available  by
       programs	 conforming  to	the C89	(ANSI C) standard or the C99 standard.
       As the header files  of	Tokyo  Cabinet	are  provided  as  `tcutil.h',
       `tchdb.h',  and	`tcbdb.h',  applications should	include	one or more of
       them accordingly	to use	the  API.   As	the  library  is  provided  as
       `libtokyocabinet.a'   and   `'	  and	they   depends
       `',  `',	`',  `',  and   `',
       linker  options	`-ltokyocabinet', `-lz', `-lbz2', `-lrt', `-lpthread',
       `-lm', and `-lc'	are required for build command.	 A typical build  com-
       mand is the following.

	      gcc -I/usr/local/include tc_example.c -o tc_example \
		-L/usr/local/lib  -ltokyocabinet  -lz -lbz2 -lrt -lpthread -lm

       You can also use	Tokyo Cabinet in programs  written  in	C++.   Because
       each  header is wrapped in C linkage (`extern "C"' block), you can sim-
       ply include them	into your C++ programs.

       Tokyo Cabinet is	free software; you can redistribute it	and/or	modify
       it  under  the  terms  of the GNU Lesser	General	Public License as pub-
       lished by the Free Software Foundation; either version 2.1 of  the  Li-
       cense or	any later version.

       Tokyo  Cabinet  is  distributed in the hope that	it will	be useful, but
       WITHOUT ANY  WARRANTY;  without	even  the  implied  warranty  of  MER-
       General Public License for more details.

       You should have received	a copy of the GNU Lesser  General  Public  Li-
       cense  along with Tokyo Cabinet (See the	file `COPYING'); if not, write
       to the Free Software Foundation,	Inc., 59 Temple	Place, Suite 330, Bos-
       ton, MA 02111-1307 USA.

       Tokyo  Cabinet  was written by FAL Labs.	 You can contact the author by
       e-mail to `'.

       tcutil(3), tchdb(3), tcbdb(3), tcfdb(3),	tctdb(3), tcadb(3)

       Please see for detail.

Man Page			  2012-08-18		       TOKYOCABINET(3)


Want to link to this manual page? Use this URL:

home | help