Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
mifluz(3)		   Library Functions Manual		     mifluz(3)

NAME
       mifluz -	C++ library to use and manage inverted indexes

SYNOPSIS
       #include	<mifluz.h>

       main()
       {
	  Configuration* config	= WordContext::Initialize();

	  WordList* words = new	WordList(*config);

	  ...

	  delete words;

	  WordContext::Finish();
       }

DESCRIPTION
       The  purpose of mifluz is to provide a C++ library to build and query a
       full text inverted index. It is dynamically updatable, scalable (up  to
       1Tb  indexes),  uses  a controlled amount of memory, shares index files
       and memory cache	among processes	or threads and compresses index	 files
       to  50%	of the raw data. The structure of the index is configurable at
       runtime and allows inclusion  of	 relevance  ranking  information.  The
       query  functions	 do  not  require  loading  all	 the  occurrences of a
       searched	term.  They consume very few resources and many	 searches  can
       be run in parallel.

       The  file  management  library used in mifluz is	a modified Berkeley DB
       (www.sleepycat.com) version 3.1.14.

CLASSES	AND COMMANDS
       Configuration

	      reads the	configuration file and manages it in memory.

       WordContext

	      read configuration and setup mifluz context.

       WordCursor

	      abstract class to	search and retrieve entries in a WordList  ob-
	      ject.

       WordCursorOne

	      search and retrieve entries in a WordListOne object.

       WordDBInfo
	      inverted index usage environment.

       WordDict

	      manage and use an	inverted index dictionary.

       WordKey
	      inverted index key.

       WordKeyInfo
	      information on the key structure of the inverted index.

       WordList

	      abstract class to	manage and use an inverted index file.

       WordListOne

	      manage and use an	inverted index file.

       WordMonitor
	      monitoring classes activity.

       WordRecord
	      inverted index record.

       WordRecordInfo
	      information on the record	structure of the inverted index.

       WordReference
	      inverted index occurrence.

       WordType
	      defines a	word in	term of	allowed	characters, length etc.

       htdb_dump

	      dump the content of an inverted index in Berkeley	DB fashion

       htdb_load

	      displays statistics for Berkeley DB environments.

       htdb_stat

	      displays statistics for Berkeley DB environments.

       mifluzdict

	      dump the dictionnary of an inverted index.

       mifluzdump

	      dump the content of an inverted index.

       mifluzload

	      load the content of an inverted index.

       mifluzsearch
	      search the content of an inverted	index.

CONFIGURATION
       The  format  of	the configuration file read by WordContext::Initialize
       is:
       keyword:	value
       Comments	may be added on	lines starting with a #. The default  configu-
       ration file is read from	from the file pointed by the MIFLUZ_CONFIG en-
       vironment  variable  or ~/.mifluz or /etc/mifluz.conf in	this order. If
       no configuration	file is	available, builtin defaults are	used.  Here is
       an example configuration	file:
       wordlist_extend:	true
       wordlist_cache_size: 10485760
       wordlist_page_size: 32768
       wordlist_compress: 1
       wordlist_wordrecord_description:	NONE
       wordlist_wordkey_description: Word/DocID	32/Flags 8/Location 16
       wordlist_monitor: true
       wordlist_monitor_period:	30
       wordlist_monitor_output:	monitor.out,rrd

       wordlist_allow_numbers {true|false} <number> (default false)
	      A	digit is considered a valid character within a	word  if  this
	      configuration  parameter is set to true otherwise	it is an error
	      to insert	a word containing digits.  See	the  Normalize	method
	      for more information.

       wordlist_cache_inserts {true|false} (default false)
	      If true all Insert calls are cached in memory. When the WordList
	      object  is  closed  or  a	 different access method is called the
	      cached entries are flushed in the	inverted index.

       wordlist_cache_max <bytes> (default 0)
	      Maximum size of the cumulated cache files	generated  when	 doing
	      bulk  insertion  with the	BatchStart() function. When this limit
	      is reached, the cache files are all merged into the inverted in-
	      dex.  The	value 0	means infinite size allowed.  See  WordList(3)
	      for the rationale	behind cache file handling.

       wordlist_cache_size <bytes> (default 500K)
	      Berkeley	DB  cache  size	 (see Berkeley DB documentation) Cache
	      makes a huge difference in performance. It must be at  least  2%
	      of the expected total data size. Note that if compression	is ac-
	      tivated the data size is eight times larger than the actual file
	      size.  In	 this  case the	cache must be scaled to	2% of the data
	      size, not	2% of the file size. See Cache tuning  in  the	mifluz
	      guide  for more hints.  See WordList(3) for the rationale	behind
	      cache file handling.

       wordlist_compress {true|false} (default false)
	      Activate compression of the index. The resulting index is	 eight
	      times smaller than the uncompressed index.

       wordlist_env_dir	<directory> (default .)
	      Only  valid  if wordlist_env_share set to	true.  Specify the di-
	      rectory in which the sharable environment	will be	 created.  All
	      inverted	indexes	specified with a non-absolute pathname will be
	      created relative to this directory.

       wordlist_env_share {true,false} (default	false)
	      If true a	sharable environment is	open or	created	if none	exist.

       wordlist_env_skip {true,false} (default false)
	      If true no environment is	created	at all.	 This  must  never  be
	      used  if	a WordList object is created. It may be	useful if only
	      WordKey objects are used,	for instance.

       wordlist_extend {true|false} (default false)
	      If true maintain reference count of unique  words.  The  Noccur-
	      rence method gives access	to this	count.

       wordlist_locale <locale>	(default C)
	      Set the locale of	the program to locale for more information.

       wordlist_lowercase {true|false} <number>	(default true)
	      If  a word contains upper	case letters it	is converted to	lower-
	      case if this configuration parameter is true,  otherwise	it  is
	      left untouched.

       wordlist_maximum_word_length <number> (default 25)
	      The maximum length of a word.  See the Normalize method for more
	      information.

       wordlist_mimimun_word_length <number> (default 3)
	      The minimum length of a word.  See the Normalize method for more
	      information.

       wordlist_monitor	{true|false} (default false)
	      If  true	create a WordMonitor instance to gather	statistics and
	      build reports.

       wordlist_monitor_output <file>[,{rrd,readable] (default stderr)
	      Print reports on file instead of the default stderr If  type  is
	      set  to  rrd  the	output is fit for the benchmark-report script.
	      Otherwise	it a (hardly :-) readable string.

       wordlist_monitor_period <sec> (default 0)
	      If the value sec is a positive integer, set a timer to print re-
	      ports every sec seconds. The timer is set	using the ALRM	signal
	      and  will	 fail if the calling application already has a handler
	      on that signal.

       wordlist_page_size <bytes> (default 8192)
	      Berkeley DB page size (see Berkeley DB documentation)

       wordlist_truncate {true|false} <number> (default	true)
	      If  a  word  is  too  long  according  to	  the	wordlist_maxi-
	      mum_word_length  it is truncated if this configuration parameter
	      is true otherwise	it is considered an invalid word.

       wordlist_valid_punctuation [characters] (default	none)
	      A	list of	punctuation characters that  may  appear  in  a	 word.
	      These  characters	will be	removed	from the word before insertion
	      in the index.

       wordlist_verbose	<number> (default 0)
	      Set the verbosity	level of the WordList class.

	      1	walk logic

	      2	walk logic details

	      3	walk logic lots	of details

       wordlist_wordkey_description <desc> (no default)
	      Describe the structure of	the inverted index key.	 In  the  fol-
	      lowing  explanation of the <desc>	format,	mandatory words	are in
	      bold and values that must	be replaced in italic.

	      Word bits/name bits [/...]

	      The name is an alphanumerical symbolic name for the  key	field.
	      The  bits	 is  the  number of bits required to store this	field.
	      Note that	all values are stored in unsigned  integers  (unsigned
	      int).  Example:
	      Word 8/Document 16/Location 8

       wordlist_wordkey_document [field	...] (default none)
	      A	white space separated list of field numbers that define	a doc-
	      ument.   The  field  number  list	must not contain gaps. For in-
	      stance 1 2 3 is valid but	1 3 4 is not valid.   This  configura-
	      tion parameter is	not used by the	mifluz library but may be used
	      by  a query application to define	the semantic of	a document. In
	      response to a query, the application will	return a list  of  re-
	      sults in which only distinct documents will be shown.

       wordlist_wordkey_location field (default	none)
	      A	 single	field number that contains the position	of a word in a
	      given document.  This configuration parameter is not used	by the
	      mifluz library but may be	used by	a query	application.

       wordlist_wordrecord_description {NONE|DATA|STR} (no default)
	      NONE: the	record is empty

	      DATA: the	record contains	an integer (unsigned int)

	      STR: the record contains a string	(String)

ENVIRONMENT
       MIFLUZ_CONFIG file name of configuration	file read  by  WordContext(3).
       Defaults	to ~/.mifluz.  or /usr/etc/mifluz.conf

AUTHORS
       Loic Dachary loic@gnu.org

       The Ht://Dig group http://dev.htdig.org/

SEE ALSO
       htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1),	mifluzload(1),
       mifluzsearch(1),	  mifluzdict(1),  WordContext(3),  WordList(3),	 Word-
       Dict(3),	WordListOne(3),	WordKey(3), WordKeyInfo(3), WordType(3), Word-
       DBInfo(3), WordRecordInfo(3), WordRecord(3), WordReference(3), WordCur-
       sor(3), WordCursorOne(3), WordMonitor(3), Configuration(3)

				     local			     mifluz(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=mifluz&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help