Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
TOKYOCABINET(3)			 Tokyo Cabinet		       TOKYOCABINET(3)

NAME
       tokyocabinet - a	modern implementation of DBM

INTRODUCTION
       Tokyo  Cabinet  is  a library of	routines for managing a	database.  The
       database	is a simple data file containing records, each is a pair of  a
       key  and	 a  value.   Every key and value is serial bytes with variable
       length.	Both binary data and character string can be used as a key and
       a value.	 There is neither concept  of  data  tables  nor  data	types.
       Records are organized in	hash table, B+ tree, or	fixed-length array.

       As  for	database of hash table,	each key must be unique	within a data-
       base, so	it is impossible to store two or more records with a key over-
       laps.  The following access methods are provided	to the database: stor-
       ing a record with a key and a value, deleting a record by  a  key,  re-
       trieving	 a  record  by a key.  Moreover, traversal access to every key
       are provided, although the order	is arbitrary.	These  access  methods
       are  similar  to	 ones of DBM (or its followers:	NDBM and GDBM) library
       defined in the UNIX standard.  Tokyo Cabinet is an alternative for  DBM
       because of its higher performance.

       As  for	database  of B+	tree, records whose keys are duplicated	can be
       stored.	Access methods of storing, deleting, and retrieving  are  pro-
       vided  as with the database of hash table.  Records are stored in order
       by a comparison function	assigned by a user.  It	is possible to	access
       each  record with the cursor in ascending or descending order.  Accord-
       ing to this mechanism, forward matching search for  strings  and	 range
       search for integers are realized.

       As  for	database of fixed-length array,	records	are stored with	unique
       natural numbers.	 It is impossible to store two or more records with  a
       key  overlaps.	Moreover,  the length of each record is	limited	by the
       specified length.  Provided operations are the same  as	ones  of  hash
       database.

       Table  database	is  also provided as a variant of hash database.  Each
       record is identified by the primary key and has a set of	named columns.
       Although	there is no concept of data schema, it is possible  to	search
       for records with	complex	conditions efficiently by using	indices	of ar-
       bitrary columns.

       Tokyo  Cabinet  is written in the C language, and provided as API of C,
       Perl, Ruby, Java, and Lua.  Tokyo Cabinet  is  available	 on  platforms
       which  have  API	 conforming to C99 and POSIX.  Tokyo Cabinet is	a free
       software	licensed under the GNU Lesser General Public License.

THE DINOSAUR WING OF THE DBM FORKS
       Tokyo Cabinet is	developed as the successor of GDBM  and	 QDBM  on  the
       following  purposes.  They are achieved and Tokyo Cabinet replaces con-
       ventional DBM products.

	      improves space efficiency	: smaller size of database file.
	      improves time efficiency : faster	processing speed.
	      improves parallelism : higher performance	in multi-thread	 envi-
	      ronment.
	      improves usability : simplified API.
	      improves	robustness : database file is not corrupted even under
	      catastrophic situation.
	      supports 64-bit architecture : enormous memory space  and	 data-
	      base file	are available.

       As  with	 QDBM,	the following three restrictions of traditional	DBM: a
       process can handle only one database, the size of a key and a value  is
       bounded,	a database file	is sparse, are cleared.	 Moreover, the follow-
       ing  three restrictions of QDBM:	the size of a database file is limited
       to 2GB, environments with different byte	orders can not share  a	 data-
       base  file, only	one thread can search a	database at the	same time, are
       cleared.

       Tokyo Cabinet runs very fast.  For example, elapsed  time  to  store  1
       million	records	 is 0.7	seconds	for hash database, and 1.6 seconds for
       B+ tree database.  Moreover, the	size of	database of Tokyo  Cabinet  is
       very  small.   For  example, overhead for a record is 16	bytes for hash
       database, and 5 bytes for B+ tree database.   Furthermore,  scalability
       of Tokyo	Cabinet	is great.  The database	size can be up to 8EB (9.22e18
       bytes).

EFFECTIVE IMPLEMENTATION OF HASH DATABASE
       Tokyo Cabinet uses hash algorithm to retrieve records.  If a bucket ar-
       ray has sufficient number of elements, the time complexity of retrieval
       is "O(1)".  That	is, time required for retrieving a record is constant,
       regardless of the scale of a database.  It is also the same about stor-
       ing  and	 deleting.   Collision	of  hash values	is managed by separate
       chaining.  Data structure of the	chains is binary search	tree.  Even if
       a bucket	array has unusually scarce elements, the  time	complexity  of
       retrieval is "O(log n)".

       Tokyo  Cabinet attains improvement in retrieval by loading RAM with the
       whole of	a bucket array.	 If a bucket array is on RAM, it  is  possible
       to  access a region of a	target record by about one path	of file	opera-
       tions.  A bucket	array saved in a file is not read into	RAM  with  the
       `read'  call  but  directly mapped to RAM with the `mmap' call.	There-
       fore, preparation time on connecting to a database is very  short,  and
       two or more processes can share the same	memory map.

       If  the	number	of elements of a bucket	array is about half of records
       stored within a database, although it depends on	characteristic of  the
       input,  the  probability	 of  collision	of  hash values	is about 56.7%
       (36.8% if the same, 21.3% if twice, 11.5% if four times,	6.0% if	 eight
       times).	 In  such  case, it is possible	to retrieve a record by	two or
       less paths of file operations.  If it is	made into a performance	index,
       in order	to handle a database containing	 one  million  of  records,  a
       bucket  array  with  half a million of elements is needed.  The size of
       each element is 4 bytes.	 That is, if 2M	bytes of RAM is	 available,  a
       database	containing one million records can be handled.

       Traditional  DBM	provides two modes of the storing operations: "insert"
       and "replace".  In the case a key overlaps an existing record, the  in-
       sert  mode  keeps the existing value, while the replace mode transposes
       it to the specified value.  In addition to the two modes, Tokyo Cabinet
       provides	"concatenate" mode.  In	the mode, the specified	value is  con-
       catenated at the	end of the existing value and stored.  This feature is
       useful when adding an element to	a value	as an array.

       Generally  speaking,  while  succession	of  updating, fragmentation of
       available regions occurs, and the size of  a  database  grows  rapidly.
       Tokyo  Cabinet deal with	this problem by	coalescence of dispensable re-
       gions and reuse of them.	 When overwriting a record with	a value	 whose
       size  is	 greater  than the existing one, it is necessary to remove the
       region to another position of the file.	Because	the time complexity of
       the operation depends on	the size of the	region of a record,  extending
       values  successively  is	inefficient.  However, Tokyo Cabinet deal with
       this problem by alignment.  If increment	can be put in padding,	it  is
       not necessary to	remove the region.

       The  "free block	pool" to reuse dispensable regions efficiently is also
       implemented.  It	keeps a	list of	 dispensable  regions  and  reuse  the
       "best  fit" region, that	is the smallest	region in the list, when a new
       block is	requested.  Because fragmentation is inevitable	even then, two
       kinds of	optimization  (defragmentation)	 mechanisms  are  implemented.
       The  first is called static optimization	which deploys all records into
       another file and	then writes them back to the original  file  at	 once.
       The  second is called dynamic optimization which	gathers	up dispensable
       regions by replacing the	locations of records and  dispensable  regions
       gradually.

USEFUL IMPLEMENTATION OF B+ TREE DATABASE
       Although	B+ tree	database is slower than	hash database, it features or-
       dering  access  to  each	 record.   The order can be assigned by	users.
       Records of B+ tree are sorted and arranged in  logical  pages.	Sparse
       index organized in B tree that is multiway balanced tree	are maintained
       for  each  page.	  Thus,	 the time complexity of	retrieval and so on is
       "O(log n)".  Cursor is provided to access each record  in  order.   The
       cursor  can  jump to a position specified by a key and can step forward
       or backward from	the current position.  Because each page  is  arranged
       as  double  linked  list,  the  time  complexity	 of stepping cursor is
       "O(1)".

       B+ tree database	is implemented,	based on above hash database.  Because
       each page of B+ tree is stored as each record of	hash database, B+ tree
       database	inherits efficiency of storage management  of  hash  database.
       Because the header of each record is smaller and	alignment of each page
       is  adjusted  according	to  the	 page size, in most cases, the size of
       database	file is	cut by half compared to	one  of	 hash  database.   Al-
       though operation	of many	pages are required to update B+	tree, QDBM ex-
       pedites	the process by caching pages and reducing file operations.  In
       most cases, because whole of the	sparse index is	cached on  memory,  it
       is  possible  to	 retrieve  a record by one or less path	of file	opera-
       tions.

       Each pages of B+	tree can be stored with	compressed.   Two  compression
       method; Deflate of ZLIB and Block Sorting of BZIP2, are supported.  Be-
       cause  each  record  in a page has similar patterns, high efficiency of
       compression is expected due to the Lempel-Ziv or	 the  BWT  algorithms.
       In  case	handling text data, the	size of	a database is reduced to about
       25%.  If	the scale of a database	is large and disk I/O is  the  bottle-
       neck,  featuring	 compression  makes the	processing speed improved to a
       large extent.

NAIVE IMPLEMENTATION OF	FIXED-LENGTH DATABASE
       Fixed-length database has restrictions that each	key should be  a  nat-
       ural  number  and  that	the length of each value is limited.  However,
       time efficiency and space efficiency are	higher	than  the  other  data
       structures as long as the use case is within the	restriction.

       Because	the  whole  region  of the database is mapped on memory	by the
       `mmap' call and referred	as a multidimensional array, the overhead  re-
       lated  to  the  file  I/O  is minimized.	 Due to	this simple structure,
       fixed-length database works faster than hash database, and its  concur-
       rency in	multi-thread environment is prominent.

       The  size  of the database is proportional to the range of keys and the
       limit size of each value.  That is, the smaller the range of keys is or
       the smaller the length of each value is,	the  higher  the  space	 effi-
       ciency  is.   For  example, if the maximum key is 1000000 and the limit
       size of the value is 100	bytes, the size	of the database	will be	 about
       100MB.	Because	regions	around referred	records	are only loaded	on the
       RAM, you	can increase the size of the database to the size of the  vir-
       tual memory.

FLEXIBLE IMPLEMENTATION	OF TABLE DATABASE
       Table  database	does  not  express  simple key/value structure but ex-
       presses a structure like	a table	of relational database.	  Each	record
       is  identified  by  the	primary	 key and has a set of multiple columns
       named with arbitrary strings.  For example, a stuff in your company can
       be expressed by a record	identified by the primary key of the  employee
       ID  number and structured by columns of his name, division, salary, and
       so on.  Unlike relational database, table database does not need	to de-
       fine any	data schema and	can contain records of various structures dif-
       ferent from each	other.

       Table database supports query functions with not	only the  primary  key
       but  also  with conditions about	arbitrary columns.  Each column	condi-
       tion is composed	of the name of a column	and  a	condition  expression.
       Operators of full matching, forward matching, regular expression	match-
       ing,  and  so  on  are provided for the string type.  Operators of full
       matching, range matching	and so on are provided for  the	 number	 type.
       Operators  for  tag  search  and	full-text search are also provided.  A
       query can contain multiple conditions for logical intersection.	Search
       by multiple queries for logical union is	also available.	 The order  of
       the result set can be specified as the ascending	or descending order of
       strings or numbers.

       You  can	create indices for arbitrary columns to	improve	performance of
       search and sorting.  Although columns do	not have data  types,  indices
       have  types  for	 strings or numbers.  Inverted indices for space sepa-
       rated tokens and	character N-gram tokens	are also supported.  The query
       optimizer uses indices in suitable way according	to  each  query.   In-
       dices are implemented as	different files	of B+ tree database.

PRACTICAL FUNCTIONALITY
       Databases on the	filesystem feature transaction mechanisms.  It is pos-
       sible  to  commit  a series of operations between the beginning and the
       end of the transaction in a lump, or to abort the transaction and  per-
       form  rollback to the state before the transaction.  Two	isolation lev-
       els are supported; serializable and read	 uncommitted.	Durability  is
       secured by write	ahead logging and shadow paging.

       Tokyo Cabinet provides two modes	to connect to a	database: "reader" and
       "writer".   A  reader  can  perform  retrieving but neither storing nor
       deleting.  A writer can perform all access methods.  Exclusion  control
       between	processes  is  performed when connecting to a database by file
       locking.	 While a writer	is connected to	a  database,  neither  readers
       nor  writers  can be connected.	While a	reader is connected to a data-
       base, other readers can be connect, but writers can not.	 According  to
       this  mechanism,	 data consistency is guaranteed	with simultaneous con-
       nections	in multitasking	environment.

       Functions of API	of  Tokyo  cabinet  are	 reentrant  and	 available  in
       multi-thread  environment.  Discrete database object can	be operated in
       parallel	entirely.  For simultaneous operations of  the	same  database
       object,	read-write lock	is used	for exclusion control.	That is, while
       a writing thread	is operating the database, other reading  threads  and
       writing	threads	are blocked.  However, while a reading thread is oper-
       ating the database, reading threads are not blocked.  The locking gran-
       ularity of hash database	and fixed-length database is per  record,  and
       that of the other databases is per file.

SIMPLE BUT VARIOUS INTERFACES
       Tokyo  Cabinet provides simple API based	on the object oriented design.
       Every operation for database is encapsulated  and  published  as	 lucid
       methods	as  `open'  (connect),	`close'	 (disconnect), `put' (insert),
       `out' (remove), `get' (retrieve), and so	 on.   Because	the  three  of
       hash,  B+  tree,	 and fixed-length array	database APIs are very similar
       with each other,	porting	an application from one	to the other is	 easy.
       Moreover,  the  abstract	API is provided	to handle these	databases with
       the same	interface.  Applications of the	abstract API can determine the
       type of the database in runtime.

       The utility API is also provided.  Such fundamental data	 structure  as
       list  and  map  are  included.  And, some useful	features; memory pool,
       string processing, encoding, are	also included.

       Six kinds of API; the utility API, the hash database API, the  B+  tree
       database	 API,  the  fixed-length database API, the table database API,
       and the abstract	database API, are provided for the C  language.	  Com-
       mand line interfaces are	also provided corresponding to each API.  They
       are  useful  for	prototyping, test, and debugging.  Except for C, Tokyo
       Cabinet provides	APIs for Perl, Ruby, Java, and Lua.   APIs  for	 other
       languages will hopefully	be provided by third party.

       In  cases that multiple processes access	a database at the same time or
       some processes access a database	on a remote host, the  remote  service
       is useful.  The remote service is composed of a database	server and its
       access  library.	  Applications can access the database server by using
       the remote database API.	 The server implements HTTP and	the  memcached
       protocol	partly so that client programs on almost all platforms can ac-
       cess the	server easily.

HOW TO USE THE LIBRARY
       Tokyo  Cabinet  provides	 API  of the C language	and it is available by
       programs	conforming to the C89 (ANSI C) standard	or the	C99  standard.
       As  the	header	files  of  Tokyo  Cabinet  are provided	as `tcutil.h',
       `tchdb.h', and `tcbdb.h', applications should include one  or  more  of
       them  accordingly  to  use  the	API.   As  the	library	is provided as
       `libtokyocabinet.a'   and   `libtokyocabinet.so'	  and	they   depends
       `libz.so',   `librt.so',	 `libpthread.so',  `libm.so',  and  `libc.so',
       linker options `-ltokyocabinet',	`-lz', `-lbz2',	 `-lrt',  `-lpthread',
       `-lm',  and `-lc' are required for build	command.  A typical build com-
       mand is the following.

	      gcc -I/usr/local/include tc_example.c -o tc_example \
		-L/usr/local/lib -ltokyocabinet	-lz -lbz2 -lrt	-lpthread  -lm
	      -lc

       You  can	 also  use  Tokyo Cabinet in programs written in C++.  Because
       each header is wrapped in C linkage (`extern "C"' block), you can  sim-
       ply include them	into your C++ programs.

LICENSE
       Tokyo  Cabinet  is free software; you can redistribute it and/or	modify
       it under	the terms of the GNU Lesser General  Public  License  as  pub-
       lished  by  the Free Software Foundation; either	version	2.1 of the Li-
       cense or	any later version.

       Tokyo Cabinet is	distributed in the hope	that it	will  be  useful,  but
       WITHOUT	ANY  WARRANTY;	without	 even  the  implied  warranty  of MER-
       CHANTABILITY or FITNESS FOR A PARTICULAR	PURPOSE.  See the  GNU	Lesser
       General Public License for more details.

       You  should  have  received a copy of the GNU Lesser General Public Li-
       cense along with	Tokyo Cabinet (See the file `COPYING');	if not,	 write
       to  the	Free  Software	Foundation,  Inc., 59 Temple Place, Suite 330,
       Boston, MA 02111-1307 USA.

       Tokyo Cabinet was written by FAL	Labs.  You can contact the  author  by
       e-mail to `info@fallabs.com'.

SEE ALSO
       tcutil(3), tchdb(3), tcbdb(3), tcfdb(3),	tctdb(3), tcadb(3)

       Please see http://1978th.net/tokyocabinet/ for detail.

Man Page			  2012-08-18		       TOKYOCABINET(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=tokyocabinet&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help