Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
WordList(3)		   Library Functions Manual		   WordList(3)

NAME
       WordList	-

       abstract	class to manage	and use	an inverted index file.

SYNOPSIS
       #include	<mifluz.h>

       WordContext context;

       WordList* words = context->List();

       delete words;

DESCRIPTION
       WordList	 is the	mifluz equivalent of a database	handler. Each WordList
       object is bound to an inverted index file and implements	the operations
       to create it, fill it with word occurrences and	search	for  an	 entry
       matching	a given	criterion.

       WordList	 is  an	 abstract  class and cannot be instanciated.  The List
       method of the class WordContext will create an instance using  the  ap-
       propriate  derived class, either	WordListOne or WordListMulti. Refer to
       the corresponding manual	pages for more information on  their  specific
       semantic.

       When doing bulk insertions, mifluz creates temporary files that contain
       the  entries  to	 be  inserted  in the index. Those files are typically
       named indexC00000000 temporary file is wordlist_cache_size  /  2.  When
       the  maximum  size of the temporary file	is reached, mifluz creates an-
       other temporary file named indexC00000001 created 50 temporary file. At
       this point it merges all	temporary files	into  one  that	 replaces  the
       first indexC00000000 to create temporary	file again and keeps following
       this  algorithm until the bulk insertion	is finished. When the bulk in-
       sertion is finished, mifluz has one big file named indexC00000000  that
       contains	 all  the  entries to be inserted in the index.	mifluz inserts
       all the entries from indexC00000000 into	the index and delete the  tem-
       porary file when	done. The insertion will be fast since all the entries
       in indexC00000000 are already sorted.

       The  parameter  wordlist_cache_max can be used to prevent the temporary
       files to	grow indefinitely. If the total	cumulated size of the  indexC*
       files  grow  beyond this	parameter, they	are merged into	the main index
       and deleted.  For  instance  setting  this  parameter  value  to	 500Mb
       garanties  that the total size of the indexC* files will	not grow above
       500Mb.

CONFIGURATION
       For more	information on the configuration  attributes  and  a  complete
       list of attributes, see the mifluz(3) manual page.

       wordlist_extend {true|false} (default false)
	      If  true	maintain  reference count of unique words. The Noccur-
	      rence method gives access	to this	count.

       wordlist_verbose	<number> (default 0)
	      Set the verbosity	level of the WordList class.

	      1	walk logic

	      2	walk logic details

	      3	walk logic lots	of details

       wordlist_page_size <bytes> (default 8192)
	      Berkeley DB page size (see Berkeley DB documentation)

       wordlist_cache_size <bytes> (default 500K)
	      Berkeley DB cache	size (see  Berkeley  DB	 documentation)	 Cache
	      makes  a	huge difference	in performance.	It must	be at least 2%
	      of the expected total data size. Note that if compression	is ac-
	      tivated the data size is eight times larger than the actual file
	      size. In this case the cache must	be scaled to 2%	 of  the  data
	      size,  not  2%  of the file size.	See Cache tuning in the	mifluz
	      guide for	more hints.  See WordList(3) for the rationale	behind
	      cache file handling.

       wordlist_cache_max <bytes> (default 0)
	      Maximum  size  of	the cumulated cache files generated when doing
	      bulk insertion with the BatchStart() function. When  this	 limit
	      is reached, the cache files are all merged into the inverted in-
	      dex.   The value 0 means infinite	size allowed.  See WordList(3)
	      for the rationale	behind cache file handling.

       wordlist_cache_inserts {true|false} (default false)
	      If true all Insert calls are cached in memory. When the WordList
	      object is	closed or a different  access  method  is  called  the
	      cached entries are flushed in the	inverted index.

       wordlist_compress {true|false} (default false)
	      Activate	compression of the index. The resulting	index is eight
	      times smaller than the uncompressed index.

METHODS
       inline WordContext* GetContext()
	      Return a pointer to the WordContext object used to  create  this
	      instance.

       inline const WordContext* GetContext() const
	      Return  a	 pointer to the	WordContext object used	to create this
	      instance as a const.

       virtual inline int Override(const WordReference&	wordRef)
	      Insert wordRef in	index. If the Key() part of the	wordRef	exists
	      in the index, override it.  Returns OK on	success, NOTOK on  er-
	      ror.

       virtual int Exists(const	WordReference& wordRef)
	      Returns OK if wordRef exists in the index, NOTOK otherwise.

       inline int Exists(const String& word)
	      Returns OK if word exists	in the index, NOTOK otherwise.

       virtual int WalkDelete(const WordReference& wordRef)
	      Delete all entries in the	index whose key	matches	the Key() part
	      of  wordRef  , using the Walk method.  Returns the number	of en-
	      tries successfully deleted.

       virtual int Delete(const	WordReference& wordRef)
	      Delete the entry in the index that  exactly  matches  the	 Key()
	      part  of	wordRef.  Returns OK if	deletion is successfull, NOTOK
	      otherwise.

       virtual int Open(const String& filename,	int mode)
	      Open inverted index filename.  mode may be O_RDONLY  or  O_RDWR.
	      If mode is O_RDWR	it can be or'ed	with O_TRUNC to	reset the con-
	      tent of an existing inverted index.  Return OK on	success, NOTOK
	      otherwise.

       virtual int Close()
	      Close inverted index.  Return OK on success, NOTOK otherwise.

       virtual unsigned	int Size() const
	      Return the size of the index in pages.

       virtual int Pagesize() const
	      Return the page size

       virtual WordDict	*Dict()
	      Return a pointer to the inverted index dictionnary.

       const String& Filename()	const
	      Return the filename given	to the last call to Open.

       int Flags() const
	      Return the mode given to the last	call to	Open.

       inline List *Find(const WordReference& wordRef)
	      Returns  the list	of word	occurrences exactly matching the Key()
	      part of wordRef.	The List returned contains  pointers  to  Wor-
	      dReference  objects.  It	is the responsibility of the caller to
	      free the list. See List.h	header for usage.

       inline List *FindWord(const String& word)
	      Returns the list of word occurrences exactly matching the	 word.
	      The List returned	contains pointers to WordReference objects. It
	      is the responsibility of the caller to free the list. See	List.h
	      header for usage.

       virtual List *operator [] (const	WordReference& wordRef)
	      Alias to the Find	method.

       inline List *operator []	(const String& word)
	      Alias to the FindWord method.

       virtual List *Prefix (const WordReference& prefix)
	      Returns  the list	of word	occurrences matching the Key() part of
	      wordRef.	In the Key() , the string (accessed with  GetWord()  )
	      matches  any  string that	begins with it.	The List returned con-
	      tains pointers to	WordReference objects. It is the  responsibil-
	      ity of the caller	to free	the list.

       inline List *Prefix (const String& prefix)
	      Returns  the list	of word	occurrences matching the word.	In the
	      Key() , the string (accessed with	GetWord() ) matches any	string
	      that begins with it. The List returned contains pointers to Wor-
	      dReference objects. It is	the responsibility of  the  caller  to
	      free the list.

       virtual List *Words()
	      Returns a	list of	all unique words contained in the inverted in-
	      dex.  The	 List returned contains	pointers to String objects. It
	      is the responsibility of the caller to free the list. See	List.h
	      header for usage.

       virtual List *WordRefs()
	      Returns a	list of	all entries contained in the  inverted	index.
	      The List returned	contains pointers to WordReference objects. It
	      is the responsibility of the caller to free the list. See	List.h
	      header for usage.

       virtual WordCursor *Cursor(wordlist_walk_callback_t callback, Object
       *callback_data)
	      Create  a	 cursor	 that  searches	all the	occurrences in the in-
	      verted index and call ncallback with  ncallback_data  for	 every
	      match.

       virtual WordCursor *Cursor(const	WordKey	&searchKey, int	action = HT-
       DIG_WORDLIST_WALKER)
	      Create  a	 cursor	 that  searches	all the	occurrences in the in-
	      verted index and that match nsearchKey.  If naction  is  set  to
	      HTDIG_WORDLIST_WALKER	 calls	   searchKey.callback	  with
	      searchKey.callback_data for every	match. If naction  is  set  to
	      HTDIG_WORDLIST_COLLECT  push  each match in searchKey.collectRes
	      data member as a WordReference object. It	is the	responsibility
	      of the caller to free the	searchKey.collectRes list.

       virtual WordCursor *Cursor(const	WordKey	&searchKey,
       wordlist_walk_callback_t	callback, Object * callback_data)
	      Create  a	 cursor	 that  searches	all the	occurrences in the in-
	      verted index and that match nsearchKey and calls ncallback  with
	      ncallback_data for every match.

       virtual WordKey Key(const String& bufferin)
	      Create  a	WordKey	object and return it. The bufferin argument is
	      used to initialize the key, as in	the WordKey::Set method.   The
	      first component of bufferin must be a word that is translated to
	      the   corresponding  numerical  id  using	 the  WordDict::Serial
	      method.

       virtual WordReference Word(const	String&	bufferin, int exists = 0)
	      Create a WordReference object and	return it. The bufferin	 argu-
	      ment  is	used to	initialize the structure, as in	the WordRefer-
	      ence::Set	method.	 The first component of	 bufferin  must	 be  a
	      word  that is translated to the corresponding numerical id using
	      the WordDict::Serial method.  If the exists argument is  set  to
	      1, the method WordDict::SerialExists is used instead, that is no
	      serial  is assigned to the word if it does not already have one.
	      Before translation  the  word  is	 normalized  using  the	 Word-
	      Type::Normalize  method.	The word is saved using	the WordRefer-
	      ence::SetWord method.

       virtual WordReference WordExists(const String& bufferin)
	      Alias for	Word(bufferin, 1).

       virtual void BatchStart()
	      Accelerate bulk insertions in the	inverted index.	All  insertion
	      done  with  the Override method are batched instead of being up-
	      dating the inverted index	immediately.  No  update  of  the  in-
	      verted index file	is done	before the BatchEnd method is called.

       virtual void BatchEnd()
	      Terminate	a bulk insertion started with a	call to	the BatchStart
	      method. When all insertions are done the AllRef method is	called
	      to restore statistics.

       virtual int Noccurrence(const String& key, unsigned int&	noccurrence)
       const
	      Return  in  noccurrence  the number of occurrences of the	string
	      contained	in the GetWord() part of key.  Returns OK on  success,
	      NOTOK otherwise.

       virtual int Write(FILE* f)
	      Write  on	 file  descriptor f an ASCII description of the	index.
	      Each line	of the file contains a	WordReference  ASCII  descrip-
	      tion.  Return OK on success, NOTOK otherwise.

       virtual int WriteDict(FILE* f)
	      Write on file descriptor f the complete dictionnary with statis-
	      tics.  Return OK on success, NOTOK otherwise.

       virtual int Read(FILE* f)
	      Read  WordReference ASCII	descriptions from f , returns the num-
	      ber of inserted WordReference or < 0 if an error occurs. Invalid
	      descriptions are ignored as well as empty	lines.

AUTHORS
       Loic Dachary loic@gnu.org

       The Ht://Dig group http://dev.htdig.org/

SEE ALSO
       htdb_dump(1), htdb_stat(1), htdb_load(1), mifluzdump(1),	mifluzload(1),
       mifluzsearch(1),	mifluzdict(1), WordContext(3),	WordDict(3),  WordLis-
       tOne(3),	 WordKey(3),  WordKeyInfo(3), WordType(3), WordDBInfo(3), Wor-
       dRecordInfo(3), WordRecord(3), WordReference(3),	 WordCursor(3),	 Word-
       CursorOne(3), WordMonitor(3), Configuration(3), mifluz(3)

				     local			   WordList(3)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=WordList&sektion=3&manpath=FreeBSD+Ports+14.3.quarterly>

home | help