Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
ODEUM(3)		    Quick Database Manager		      ODEUM(3)

NAME
       Odeum - the inverted API	of QDBM

SYNOPSIS
       #include	<depot.h>
       #include	<cabin.h>
       #include	<odeum.h>
       #include	<stdlib.h>

       typedef struct {	int id;	int score; } ODPAIR;

       ODEUM *odopen(const char	*name, int omode);

       int odclose(ODEUM *odeum);

       int odput(ODEUM *odeum, const ODDOC *doc, int wmax, int over);

       int odout(ODEUM *odeum, const char *uri);

       int odoutbyid(ODEUM *odeum, int id);

       ODDOC *odget(ODEUM *odeum, const	char *uri);

       ODDOC *odgetbyid(ODEUM *odeum, int id);

       int odgetidbyuri(ODEUM *odeum, const char *uri);

       int odcheck(ODEUM *odeum, int id);

       ODPAIR *odsearch(ODEUM *odeum, const char *word,	int max, int *np);

       int odsearchdnum(ODEUM *odeum, const char *word);

       int oditerinit(ODEUM *odeum);

       ODDOC *oditernext(ODEUM *odeum);

       int odsync(ODEUM	*odeum);

       int odoptimize(ODEUM *odeum);

       char *odname(ODEUM *odeum);

       double odfsiz(ODEUM *odeum);

       int odbnum(ODEUM	*odeum);

       int odbusenum(ODEUM *odeum);

       int oddnum(ODEUM	*odeum);

       int odwnum(ODEUM	*odeum);

       int odwritable(ODEUM *odeum);

       int odfatalerror(ODEUM *odeum);

       int odinode(ODEUM *odeum);

       time_t odmtime(ODEUM *odeum);

       int odmerge(const char *name, const CBLIST *elemnames);

       int odremove(const char *name);

       ODDOC *oddocopen(const char *uri);

       void oddocclose(ODDOC *doc);

       void oddocaddattr(ODDOC *doc, const char	*name, const char *value);

       void oddocaddword(ODDOC *doc, const char	*normal, const char *asis);

       int oddocid(const ODDOC *doc);

       const char *oddocuri(const ODDOC	*doc);

       const char *oddocgetattr(const ODDOC *doc, const	char *name);

       const CBLIST *oddocnwords(const ODDOC *doc);

       const CBLIST *oddocawords(const ODDOC *doc);

       CBMAP *oddocscores(const	ODDOC *doc, int	max, ODEUM *odeum);

       CBLIST *odbreaktext(const char *text);

       char *odnormalizeword(const char	*asis);

       ODPAIR  *odpairsand(ODPAIR *apairs, int anum, ODPAIR *bpairs, int bnum,
       int *np);

       ODPAIR *odpairsor(ODPAIR	*apairs, int anum, ODPAIR *bpairs,  int	 bnum,
       int *np);

       ODPAIR  *odpairsnotand(ODPAIR  *apairs,	int  anum, ODPAIR *bpairs, int
       bnum, int *np);

       void odpairssort(ODPAIR *pairs, int pnum);

       double odlogarithm(double x);

       double odvectorcosine(const int *avec, const int	*bvec, int vnum);

       void odsettuning(int ibnum, int idnum, int cbnum, int csiz);

       void odanalyzetext(ODEUM	*odeum,	const char *text, CBLIST *awords,  CB-
       LIST *nwords);

       void  odsetcharclass(ODEUM  *odeum,  const char *spacechars, const char
       *delimchars, const char *gluechars);

       ODPAIR *odquery(ODEUM *odeum, const char	*query,	int *np,  CBLIST  *er-
       rors);

DESCRIPTION
       Odeum is	the API	which handles an inverted index.  An inverted index is
       a  data structure to retrieve a list of some documents that include one
       of words	which were extracted from a population of  documents.	It  is
       easy  to	 realize  a  full-text	search	system with an inverted	index.
       Odeum provides an abstract data structure which consists	of  words  and
       attributes of a document.  It is	used when an application stores	a doc-
       ument  into a database and when an application retrieves	some documents
       from a database.

       Odeum does not provide methods to extract the text  from	 the  original
       data  of	 a  document.	It should be implemented by applications.  Al-
       though Odeum provides utilities to extract words	from  a	 text,	it  is
       oriented	to such	languages whose	words are separated with space charac-
       ters  as	 English.  If an application handles such languages which need
       morphological analysis or N-gram	analysis as Japanese, or if an	appli-
       cation  perform	more  such  rarefied  analysis of natural languages as
       stemming, its own analyzing method can be adopted.  Result of search is
       expressed as an array contains elements which are  structures  composed
       of  the	ID number of documents and its score.  In order	to search with
       two or more words, Odeum	provides utilities of set operations.

       Odeum is	implemented, based on Curia, Cabin, and	Villa.	Odeum  creates
       a  database  with  a directory name.  Some databases of Curia and Villa
       are placed in the specified  directory.	 For  example,	`casket/docs',
       `casket/index', and `casket/rdocs' are created in the case that a data-
       base  directory	named  as `casket'.  `docs' is a database directory of
       Curia.  The key of each record is the ID	number of a document, and  the
       value  is  such	attributes as URI.  `index' is a database directory of
       Curia.  The key of each record is the normalized	form of	 a  word,  and
       the  value  is  an  array whose element is a pair of the	ID number of a
       document	including the word and its score.  `rdocs' is a	database  file
       of  Villa.   The	 key  of each record is	the URI	of a document, and the
       value is	its ID number.

       In order	 to  use  Odeum,  you  should  include	`depot.h',  `cabin.h',
       `odeum.h'  and  `stdlib.h' in the source	files.	Usually, the following
       description will	be near	the beginning of a source file.

	      #include <depot.h>
	      #include <cabin.h>
	      #include <odeum.h>
	      #include <stdlib.h>

       A pointer to `ODEUM' is used as a database handle.  A  database	handle
       is  opened  with	 the function `odopen' and closed with `odclose'.  You
       should not refer	directly to any	member of the handle.  If a fatal  er-
       ror  occurs in a	database, any access method via	the handle except `od-
       close' will not work and	return error status.  Although	a  process  is
       allowed	to  use	multiple database handles at the same time, handles of
       the same	database file should not be used.

       A pointer to `ODDOC' is used as a document handle.  A  document	handle
       is  opened  with	the function `oddocopen' and closed with `oddocclose'.
       You should not refer directly to	any member of the handle.  A  document
       consists	 of attributes and words.  Each	word is	expressed as a pair of
       a normalized form and a appearance form.

       Odeum also assign the external variable `dpecode' with the error	 code.
       The  function `dperrmsg'	is used	in order to get	the message of the er-
       ror code.

       Structures of `ODPAIR' type is used  in	order  to  handle  results  of
       search.

       typedef struct {	int id;	int score; } ODPAIR;
	      `id'  specifies  the ID number of	a document.  `score' specifies
	      the score	calculated from	the number of searching	words  in  the
	      document.

       The function `odopen' is	used in	order to get a database	handle.

       ODEUM *odopen(const char	*name, int omode);
	      `name'  specifies	 the  name  of	a database directory.  `omode'
	      specifies	 the  connection  mode:	 `OD_OWRITER'  as  a   writer,
	      `OD_OREADER' as a	reader.	 If the	mode is	`OD_OWRITER', the fol-
	      lowing  may  be added by bitwise or: `OD_OCREAT',	which means it
	      creates a	new database if	not exist, `OD_OTRUNC',	which means it
	      creates a	new  database  regardless  if  one  exists.   Both  of
	      `OD_OREADER'  and	 `OD_OWRITER'  can  be added to	by bitwise or:
	      `OD_ONOLCK', which means it opens	a database  directory  without
	      file  locking,  or `OD_OLCKNB', which means locking is performed
	      without blocking.	 The return value is the  database  handle  or
	      `NULL'  if  it is	not successful.	 While connecting as a writer,
	      an exclusive lock	is invoked to the database  directory.	 While
	      connecting as a reader, a	shared lock is invoked to the database
	      directory.   The	thread	blocks until the lock is achieved.  If
	      `OD_ONOLCK' is used, the application is responsible  for	exclu-
	      sion control.

       The function `odclose' is used in order to close	a database handle.

       int odclose(ODEUM *odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is true, else, it is  false.   Because  the	 region	 of  a
	      closed handle is released, it becomes impossible to use the han-
	      dle.  Updating a database	is assured to be written when the han-
	      dle  is closed.  If a writer opens a database but	does not close
	      it appropriately,	the database will be broken.

       The function `odput' is used in order to	store a	document.

       int odput(ODEUM *odeum, const ODDOC *doc, int wmax, int over);
	      `odeum' specifies	a  database  handle  connected	as  a  writer.
	      `doc'  specifies	a  document  handle.  `wmax' specifies the max
	      number of	words to be stored in the document database.  If it is
	      negative,	the number is unlimited.  `over' specifies whether the
	      data of the duplicated document is overwritten or	not.  If it is
	      false and	the URI	of the document	is  duplicated,	 the  function
	      returns  as  an error.  If successful, the return	value is true,
	      else, it is false.

       The function `odout' is used in order to	delete a document specified by
       a URI.

       int odout(ODEUM *odeum, const char *uri);
	      `odeum' specifies	a  database  handle  connected	as  a  writer.
	      `uri'  specifies	the  string of the URI of a document.  If suc-
	      cessful, the return value	is true, else, it is false.  False  is
	      returned when no document	corresponds to the specified URI.

       The  function  `odoutbyid' is used in order to delete a document	speci-
       fied by an ID number.

       int odoutbyid(ODEUM *odeum, int id);
	      `odeum' specifies	a database handle connected as a writer.  `id'
	      specifies	the ID number of a document.  If successful,  the  re-
	      turn  value  is true, else, it is	false.	False is returned when
	      no document corresponds to the specified ID number.

       The function `odget' is used in order to	retrieve a document  specified
       by a URI.

       ODDOC *odget(ODEUM *odeum, const	char *uri);
	      `odeum' specifies	a database handle.  `uri' specifies the	string
	      of  the  URI  of a document.  If successful, the return value is
	      the handle of the	corresponding document,	else,  it  is  `NULL'.
	      `NULL' is	returned when no document corresponds to the specified
	      URI.   Because the handle	of the return value is opened with the
	      function `oddocopen', it should  be  closed  with	 the  function
	      `oddocclose'.

       The  function `odgetbyid' is used in order to retrieve a	document by an
       ID number.

       ODDOC *odgetbyid(ODEUM *odeum, int id);
	      `odeum' specifies	a database handle.  `id' specifies the ID num-
	      ber of a document.  If successful, the return value is the  han-
	      dle  of  the corresponding document, else, it is `NULL'.	`NULL'
	      is returned when no document corresponds	to  the	 specified  ID
	      number.	Because	 the handle of the return value	is opened with
	      the function `oddocopen',	it should be closed with the  function
	      `oddocclose'.

       The  function `odgetidbyuri' is used in order to	retrieve the ID	of the
       document	specified by a URI.

       int odgetidbyuri(ODEUM *odeum, const char *uri);
	      `odeum' specifies	a database handle.  `uri' specifies the	string
	      the URI of a document.  If successful, the return	value  is  the
	      ID  number of the	document, else,	it is -1.  -1 is returned when
	      no document corresponds to the specified URI.

       The function `odcheck' is used in order to check	whether	 the  document
       specified by an ID number exists.

       int odcheck(ODEUM *odeum, int id);
	      `odeum' specifies	a database handle.  `id' specifies the ID num-
	      ber of a document.  The return value is true if the document ex-
	      ists, else, it is	false.

       The  function  `odsearch' is used in order to search the	inverted index
       for documents including a particular word.

       ODPAIR *odsearch(ODEUM *odeum, const char *word,	int max, int *np);
	      `odeum' specifies	a database handle.  `word' specifies a search-
	      ing word.	 `max' specifies the max number	of documents to	be re-
	      trieve.  `np' specifies the pointer to a variable	to  which  the
	      number of	the elements of	the return value is assigned.  If suc-
	      cessful,	the  return value is the pointer to an array, else, it
	      is `NULL'.  Each element of the array is a pair of the ID	number
	      and the score of a document, and sorted in descending  order  of
	      their  scores.  Even if no document corresponds to the specified
	      word, it is not error but	returns	an dummy array.	  Because  the
	      region  of the return value is allocated with the	`malloc' call,
	      it should	be released with the `free' call if it is no longer in
	      use.  Note that each element of the array	of  the	 return	 value
	      can be data of a deleted document.

       The  function `odsearchnum' is used in order to get the number of docu-
       ments including a word.

       int odsearchdnum(ODEUM *odeum, const char *word);
	      `odeum' specifies	a database handle.  `word' specifies a search-
	      ing word.	 If successful,	the return value is the	number of doc-
	      uments including the word, else, it is -1.  Because  this	 func-
	      tion  does  not  read  the  entity  of the inverted index, it is
	      faster than `odsearch'.

       The function `oditerinit' is used in order to initialize	 the  iterator
       of a database handle.

       int oditerinit(ODEUM *odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is true, else, it is false.	 The iterator is used in order
	      to access	every document stored in a database.

       The function `oditernext' is used in order to get the next key  of  the
       iterator.

       ODDOC *oditernext(ODEUM *odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is the handle of the next document,	else,  it  is  `NULL'.
	      `NULL'  is returned when no document is to be get	out of the it-
	      erator.  It is possible to access	every document by iteration of
	      calling this function.  However, it is not assured  if  updating
	      the  database is occurred	while the iteration.  Besides, the or-
	      der of this traversal access method is arbitrary,	so it  is  not
	      assured  that the	order of string	matches	the one	of the traver-
	      sal access.  Because the handle of the return  value  is	opened
	      with  the	 function  `oddocopen',	 it  should be closed with the
	      function `oddocclose'.

       The function `odsync' is	used in	order to synchronize updating contents
       with the	files and the devices.

       int odsync(ODEUM	*odeum);
	      `odeum' specifies	a database handle connected as a  writer.   If
	      successful,  the	return value is	true, else, it is false.  This
	      function is useful when another process uses the connected data-
	      base directory.

       The function `odoptimize' is used in order to optimize a	database.

       int odoptimize(ODEUM *odeum);
	      `odeum' specifies	a database handle connected as a  writer.   If
	      successful,  the	return value is	true, else, it is false.  Ele-
	      ments of the deleted documents in	the inverted index are purged.

       The function `odname' is	used in	order to get the name of a database.

       char *odname(ODEUM *odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value  is	the pointer to the region of the name of the database,
	      else, it is `NULL'.  Because the region of the return  value  is
	      allocated	with the `malloc' call,	it should be released with the
	      `free' call if it	is no longer in	use.

       The  function  `odfsiz' is used in order	to get the total size of data-
       base files.

       double odfsiz(ODEUM *odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value is the total size of the database files, else, it is -1.0.

       The  function  `odbnum' is used in order	to get the total number	of the
       elements	of the bucket arrays in	the inverted index.

       int odbnum(ODEUM	*odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value  is	the total number of the	elements of the	bucket arrays,
	      else, it is -1.

       The function `odbusenum'	is used	in order to get	the  total  number  of
       the used	elements of the	bucket arrays in the inverted index.

       int odbusenum(ODEUM *odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is the total number	of the used elements of	the bucket ar-
	      rays, else, it is	-1.

       The function `oddnum' is	used in	order to get the number	of  the	 docu-
       ments stored in a database.

       int oddnum(ODEUM	*odeum);
	      `odeum'  specifies a database handle.  If	successful, the	return
	      value is the number of the documents  stored  in	the  database,
	      else, it is -1.

       The  function  `odwnum' is used in order	to get the number of the words
       stored in a database.

       int odwnum(ODEUM	*odeum);
	      `odeum' specifies	a database handle.  If successful, the	return
	      value  is	 the number of the words stored	in the database, else,
	      it is -1.	 Because of the	I/O buffer, the	return	value  may  be
	      less than	the hard number.

       The  function `odwritable' is used in order to check whether a database
       handle is a writer or not.

       int odwritable(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return value  is  true
	      if the handle is a writer, false if not.

       The  function  `odfatalerror' is	used in	order to check whether a data-
       base has	a fatal	error or not.

       int odfatalerror(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return value  is  true
	      if the database has a fatal error, false if not.

       The  function  `odinode'	 is used in order to get the inode number of a
       database	directory.

       int odinode(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return	value  is  the
	      inode number of the database directory.

       The  function  `odmtime'	is used	in order to get	the last modified time
       of a database.

       time_t odmtime(ODEUM *odeum);
	      `odeum' specifies	a database handle.  The	return	value  is  the
	      last modified time of the	database.

       The function `odmerge' is used in order to merge	plural database	direc-
       tories.

       int odmerge(const char *name, const CBLIST *elemnames);
	      `name'  specifies	 the  name  of a database directory to create.
	      `elemnames' specifies a list of names of element databases.   If
	      successful, the return value is true, else, it is	false.	If two
	      or more documents	which have the same URL	come in, the first one
	      is adopted and the others	are ignored.

       The  function  `odremove'  is used in order to remove a database	direc-
       tory.

       int odremove(const char *name);
	      `name' specifies the name	of a database directory.  If  success-
	      ful,  the	 return	 value is true,	else, it is false.  A database
	      directory	can contain databases of other APIs of QDBM, they  are
	      also removed by this function.

       The function `oddocopen'	is used	in order to get	a document handle.

       ODDOC *oddocopen(const char *uri);
	      `uri'  specifies	the  URI of a document.	 The return value is a
	      document handle.	The ID number of a new	document  is  not  de-
	      fined.  It is defined when the document is stored	in a database.

       The function `oddocclose' is used in order to close a document handle.

       void oddocclose(ODDOC *doc);
	      `doc'  specifies	a  document  handle.   Because the region of a
	      closed handle is released, it becomes impossible to use the han-
	      dle.

       The function `oddocaddattr' is used in order to add an attribute	 to  a
       document.

       void oddocaddattr(ODDOC *doc, const char	*name, const char *value);
	      `doc'  specifies a document handle.  `name' specifies the	string
	      of the name of an	attribute.  `value' specifies  the  string  of
	      the value	of the attribute.

       The  function  `oddocaddword' is	used in	order to add a word to a docu-
       ment.

       void oddocaddword(ODDOC *doc, const char	*normal, const char *asis);
	      `doc' specifies  a  document  handle.   `normal'	specifies  the
	      string  of  the normalized form of a word.  Normalized forms are
	      treated as keys of the inverted index.  If the  normalized  form
	      of  a  word is an	empty string, the word is not reflected	in the
	      inverted index.  `asis' specifies	the string of  the  appearance
	      form  of the word.  Appearance forms are used after the document
	      is retrieved by an application.

       The function `oddocid' is used in order to get the ID number of a docu-
       ment.

       int oddocid(const ODDOC *doc);
	      `doc' specifies a	document handle.  The return value is  the  ID
	      number of	a document.

       The function `oddocuri' is used in order	to get the URI of a document.

       const char *oddocuri(const ODDOC	*doc);
	      `doc'  specifies	a  document  handle.   The return value	is the
	      string of	the URI	of a document.

       The function `oddocgetattr' is used in order to get the value of	an at-
       tribute of a document.

       const char *oddocgetattr(const ODDOC *doc, const	char *name);
	      `doc' specifies a	document handle.  `name' specifies the	string
	      of  the name of an attribute.  The return	value is the string of
	      the value	of the attribute, or `NULL'  if	 no  attribute	corre-
	      sponds.

       The function `oddocnwords' is used in order to get the list handle con-
       tains words in normalized form of a document.

       const CBLIST *oddocnwords(const ODDOC *doc);
	      `doc' specifies a	document handle.  The return value is the list
	      handle contains words in normalized form.

       The function `oddocawords' is used in order to get the list handle con-
       tains words in appearance form of a document.

       const CBLIST *oddocawords(const ODDOC *doc);
	      `doc' specifies a	document handle.  The return value is the list
	      handle contains words in appearance form.

       The  function `oddocscores' is used in order to get the map handle con-
       tains keywords in normalized form and their scores.

       CBMAP *oddocscores(const	ODDOC *doc, int	max, ODEUM *odeum);
	      `doc' specifies a	document handle.  `max'	specifies the max num-
	      ber of keywords to get.  `odeum'	specifies  a  database	handle
	      with which the IDF for weighting is calculate.  If it is `NULL',
	      it  is  not  used.   The return value is the map handle contains
	      keywords and their scores.   Scores  are	expressed  as  decimal
	      strings.	 Because the handle of the return value	is opened with
	      the function `cbmapopen',	it should be closed with the  function
	      `cbmapclose' if it is no longer in use.

       The  function `odbreaktext' is used in order to break a text into words
       in appearance form.

       CBLIST *odbreaktext(const char *text);
	      `text' specifies the string of a text.  The return value is  the
	      list  handle contains words in appearance	form.  Words are sepa-
	      rated with space characters and such delimiters as period, comma
	      and so on.  Because the handle of	the  return  value  is	opened
	      with  the	 function  `cblistopen',  it should be closed with the
	      function `cblistclose' if	it is no longer	in use.

       The function `odnormalizeword' is used in order to make the  normalized
       form of a word.

       char *odnormalizeword(const char	*asis);
	      `asis'  specifies	 the  string of	the appearance form of a word.
	      The return value is is the string	of the normalized form of  the
	      word.  Alphabets of the ASCII code are unified into lower	cases.
	      Words  composed of only delimiters are treated as	empty strings.
	      Because the region of the	return value  is  allocated  with  the
	      `malloc'	call, it should	be released with the `free' call if it
	      is no longer in use.

       The function `odpairsand' is used in order to get the  common  elements
       of two sets of documents.

       ODPAIR *odpairsand(ODPAIR *apairs, int anum, ODPAIR *bpairs, int	bnum,
       int *np);
	      `apairs'	specifies  the	pointer	 to the	former document	array.
	      `anum' specifies the number of the elements of the former	 docu-
	      ment  array.  `bpairs' specifies the pointer to the latter docu-
	      ment array.  `bnum' specifies the	number of the elements of  the
	      latter document array.  `np' specifies the pointer to a variable
	      to  which	 the number of the elements of the return value	is as-
	      signed.  The return value	is the pointer to a new	document array
	      whose elements commonly belong to	the specified two sets.	  Ele-
	      ments  of	 the  array  are  sorted  in descending	order of their
	      scores.  Because the region of the  return  value	 is  allocated
	      with  the	 `malloc'  call, it should be released with the	`free'
	      call if it is no longer in use.

       The function `odpairsor'	is used	in order to get	the sum	of elements of
       two sets	of documents.

       ODPAIR *odpairsor(ODPAIR	*apairs, int anum, ODPAIR *bpairs, int bnum,
       int *np);
	      `apairs' specifies the pointer to	 the  former  document	array.
	      `anum'  specifies	the number of the elements of the former docu-
	      ment array.  `bpairs' specifies the pointer to the latter	 docu-
	      ment  array.  `bnum' specifies the number	of the elements	of the
	      latter document array.  `np' specifies the pointer to a variable
	      to which the number of the elements of the return	value  is  as-
	      signed.  The return value	is the pointer to a new	document array
	      whose  elements  belong  to  both	or either of the specified two
	      sets.  Elements of the array are sorted in descending  order  of
	      their  scores.   Because the region of the return	value is allo-
	      cated with the `malloc' call, it should  be  released  with  the
	      `free' call if it	is no longer in	use.

       The function `odpairsnotand' is used in order to	get the	difference set
       of documents.

       ODPAIR *odpairsnotand(ODPAIR *apairs, int anum, ODPAIR *bpairs, int
       bnum, int *np);
	      `apairs'	specifies  the	pointer	 to the	former document	array.
	      `anum' specifies the number of the elements of the former	 docu-
	      ment  array.  `bpairs' specifies the pointer to the latter docu-
	      ment array of the	sum of elements.  `bnum' specifies the	number
	      of  the  elements	 of the	latter document	array.	`np' specifies
	      the pointer to a variable	to which the number of the elements of
	      the return value is assigned.  The return	value is  the  pointer
	      to  a new	document array whose elements belong to	the former set
	      but not to the latter set.  Elements of the array	are sorted  in
	      descending order of their	scores.	 Because the region of the re-
	      turn value is allocated with the `malloc'	call, it should	be re-
	      leased with the `free' call if it	is no longer in	use.

       The  function `odpairssort' is used in order to sort a set of documents
       in descending order of scores.

       void odpairssort(ODPAIR *pairs, int pnum);
	      `pairs' specifies	the pointer to a document array.  `pnum' spec-
	      ifies the	number of the elements of the document array.

       The function `odlogarithm' is used in order to get  the	natural	 loga-
       rithm of	a number.

       double odlogarithm(double x);
	      `x'  specifies  a	number.	 The return value is the natural loga-
	      rithm of the number.  If the number is equal  to	or  less  than
	      1.0,  the	 return	value is 0.0.  This function is	useful when an
	      application calculates the IDF of	search results.

       The function `odvectorcosine' is	used in	order to get the cosine	of the
       angle of	two vectors.

       double odvectorcosine(const int *avec, const int	*bvec, int vnum);
	      `avec' specifies the pointer to one array	 of  numbers.	`bvec'
	      specifies	 the  pointer  to  the other array of numbers.	`vnum'
	      specifies	the number of elements	of  each  array.   The	return
	      value  is	the cosine of the angle	of two vectors.	 This function
	      is useful	when an	application  calculates	 similarity  of	 docu-
	      ments.

       The  function  `odsettuning'  is	used in	order to set the global	tuning
       parameters.

       void odsettuning(int ibnum, int idnum, int cbnum, int csiz);
	      `ibnum' specifies	the number of buckets  for  inverted  indexes.
	      `idnum'  specifies  the division number of inverted index.  `cb-
	      num' specifies the number	of buckets for dirty buffers.	`csiz'
	      specifies	 the  maximum  bytes  to use memory for	dirty buffers.
	      The default setting  is  equivalent  to  `odsettuning(32749,  7,
	      262139,  8388608)'.  This	function should	be called before open-
	      ing a handle.

       The function `odanalyzetext' is used in order  to  break	 a  text  into
       words and store appearance forms	and normalized form into lists.

       void odanalyzetext(ODEUM	*odeum,	const char *text, CBLIST *awords, CB-
       LIST *nwords);
	      `odeum'  specifies  a  database  handle.	 `text'	 specifies the
	      string of	a text.	 `awords' specifies a list handle  into	 which
	      appearance form is store.	 `nwords' specifies a list handle into
	      which normalized form is store.  If it is	`NULL',	it is ignored.
	      Words are	separated with space characters	and such delimiters as
	      period, comma and	so on.

       The  function  `odsetcharclass'	is used	in order to set	the classes of
       characters used by `odanalyzetext'.

       void odsetcharclass(ODEUM *odeum, const char *spacechars, const char
       *delimchars, const char *gluechars);
	      `odeum' specifies	a database handle.  `spacechars'  spacifies  a
	      string  contains	space  characters.   `delimchars'  spacifies a
	      string contains delimiter	characters.  `gluechars'  spacifies  a
	      string contains glue characters.

       The  function  `odquery'	 is  used in order to query a database using a
       small boolean query language.

       ODPAIR *odquery(ODEUM *odeum, const char	*query,	int *np, CBLIST	*er-
       rors);
	      `odeum' specifies	a database handle.  'query' specifies the text
	      of the query.  `np' specifies the	pointer	to a variable to which
	      the number of the	elements of  the  return  value	 is  assigned.
	      `errors'	specifies  a list handle into which error messages are
	      stored.  If it is	`NULL',	it is ignored.	If successful, the re-
	      turn value is the	pointer	to an array, else, it is `NULL'.  Each
	      element of the array is a	pair of	the ID number and the score of
	      a	document, and sorted in	 descending  order  of	their  scores.
	      Even  if	no document corresponds	to the specified condition, it
	      is not error but returns an dummy	array.	Because	the region  of
	      the  return value	is allocated with the `malloc' call, it	should
	      be released with the `free' call if it  is  no  longer  in  use.
	      Note  that  each element of the array of the return value	can be
	      data of a	deleted	document.

       If QDBM was built  with	POSIX  thread  enabled,	 the  global  variable
       `dpecode'  is  treated  as thread specific data,	and functions of Odeum
       are reentrant.  In that case, they are thread-safe as long as a	handle
       is  not	accessed  by  threads at the same time,	on the assumption that
       `errno',	`malloc', and so on are	thread-safe.

       If QDBM was built with ZLIB enabled, records in the database for	 docu-
       ment attributes are compressed.	In that	case, the size of the database
       is  reduced  to	30%  or	less.  Thus, you should	enable ZLIB if you use
       Odeum.  A database of Odeum created without ZLIB	enabled	is not	avail-
       able on environment with	ZLIB enabled, and vice versa.  If ZLIB was not
       enabled but LZO,	LZO is used instead.

       The  query  language of the function `odquery' is a basic language fol-
       lowing this grammar:

	      expr ::= subexpr ( op subexpr )*
	      subexpr ::= WORD
	      subexpr ::= LPAREN expr RPAREN

       Operators are "&" (AND),	"|" (OR),  and	"!"  (NOTAND).	 You  can  use
       parenthesis  to group sub-expressions together in order to change order
       of operations.  The given query is broken up using the function `odana-
       lyzetext', so if	you want to specify  different	text  breaking	rules,
       then  make sure that you	at least set "&", "|", "!", "(", and ")" to be
       delimiter characters.  Consecutive words	are treated as having  an  im-
       plicit  "&"  operator  between  them,  so "zed shaw" is actually	"zed &
       shaw".

       The encoding of the query text should be	the same with the encoding  of
       target  documents.  Moreover, each of space characters, delimiter char-
       acters, and glue	characters should be single byte.

SEE ALSO
       qdbm(3),	depot(3), curia(3), relic(3),  hovel(3),  cabin(3),  villa(3),
       ndbm(3),	gdbm(3)

Man Page			  2004-04-22			      ODEUM(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=odeum&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help