FreeBSD Manual Pages

home | help
RECOLL.CONF(5)		      File Formats Manual		RECOLL.CONF(5)

NAME
       recoll.conf - main personal configuration file for Recoll

DESCRIPTION
       This  file  defines  the	 index	configuration for the Recoll full-text
       search system.

       The system-wide configuration file is normally located inside /usr/[lo-
       cal]/share/recoll/examples. Any parameter set in	the common file	may be
       overridden by setting it	in the personal	 configuration	file,  by  de-
       fault: $HOME/.recoll/recoll.conf

       Please note while I try to keep this manual page	reasonably up to date,
       it  will	 frequently  lag  the  current state of	the software. The best
       source of information about the configuration are the comments  in  the
       system-wide  configuration file or the user manual which	you can	access
       from the	recoll GUI help	menu or	on the recoll web site.

       A short extract of the file might look as follows:

	      #	Space-separated	list of	directories to index.
	      topdirs =	 ~/docs	/usr/share/doc

	      [~/somedirectory-with-utf8-txt-files]
	      defaultcharset = utf-8

       There are three kinds of	lines:

	      •	     Comment or	empty

	      •	     Parameter affectation

	      •	     Section definition

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in	the form 'name = value'.

       Section lines allow redefining a	parameter  for	a  directory  subtree.
       Some  of	 the parameters	used for indexing are looked up	hierarchically
       from the	more to	the less specific. Not all parameters can be  meaning-
       fully redefined,	this is	specified for each in the next section.

       The  tilde  character  (~) is expanded in file names to the name	of the
       user's home directory.

       Where values are	lists, white space is used for	separation,  and  ele-
       ments with embedded spaces can be quoted	with double-quotes.

OPTIONS
       topdirs = string
	      Space-separated  list of files or	directories to recursively in-
	      dex. Default to ~	(indexes $HOME). You can use symbolic links in
	      the list,	they will be followed, independently of	the  value  of
	      the followLinks variable.

       monitordirs = string
	      Space-separated  list of files or	directories to monitor for up-
	      dates. When running the real-time	indexer, this allows  monitor-
	      ing  only	 a subset of the whole indexed area. The elements must
	      be included in the tree defined by the 'topdirs' members.

       skippedNames = string
	      Files and	directories which should be ignored.  White space sep-
	      arated list of wildcard patterns (simple ones, not  paths,  must
	      contain  no / ), which will be tested against file and directory
	      names.  The list in the default configuration does  not  exclude
	      hidden  directories  (names  beginning  with a dot), which means
	      that it may index	quite a	few things that	you do	not  want.  On
	      the other	hand, email user agents	like Thunderbird usually store
	      messages	in  hidden directories,	and you	probably want this in-
	      dexed. One possible solution is to have ".*" in  "skippedNames",
	      and   add	  things   like	  "~/.thunderbird"  "~/.evolution"  to
	      "topdirs".  Not even the file names are indexed for patterns  in
	      this  list, see the "noContentSuffixes" variable for an alterna-
	      tive approach which indexes the file names. Can be redefined for
	      any subtree.

       skippedNames- = string
	      List of name endings to remove  from  the	 default  skippedNames
	      list.

       skippedNames+ = string
	      List of name endings to add to the default skippedNames list.

       onlyNames = string
	      Regular  file name filter	patterns If this is set, only the file
	      names not	in skippedNames	and matching one of the	patterns  will
	      be  considered  for indexing. Can	be redefined per subtree. Does
	      not apply	to directories.

       noContentSuffixes = string
	      List of name endings (not	 necessarily  dot-separated  suffixes)
	      for  which  we don't try MIME type identification, and don't un-
	      compress or index	content. Only the names	will be	indexed.  This
	      complements  the	now  obsoleted	recoll_noindex	list  from the
	      mimemap file, which will go away in a future release  (the  move
	      from  mimemap to recoll.conf allows editing the list through the
	      GUI). This is different from skippedNames	because	these are name
	      ending matches only (not wildcard	patterns), and the  file  name
	      itself  gets  indexed normally. This can be redefined for	subdi-
	      rectories.

       noContentSuffixes- = string
	      List of name endings to remove from  the	default	 noContentSuf-
	      fixes list.

       noContentSuffixes+ = string
	      List  of	name  endings  to add to the default noContentSuffixes
	      list.

       skippedPaths = string
	      Absolute paths we	should not go into.  Space-separated  list  of
	      wildcard	expressions for	absolute filesystem paths. Must	be de-
	      fined at the top level of	the configuration file,	not in a  sub-
	      section.	Can  contain  files  and directories. The database and
	      configuration directories	will automatically be added.  The  ex-
	      pressions	 are  matched using 'fnmatch(3)' with the FNM_PATHNAME
	      flag set by default. This	means  that  '/'  characters  must  be
	      matched  explicitly.  You	can set	'skippedPathsFnmPathname' to 0
	      to disable the use of FNM_PATHNAME (meaning that '/*/dir3'  will
	      match  '/dir1/dir2/dir3').  The default value contains the usual
	      mount point for removable	media to remind	you that it is	a  bad
	      idea  to have Recoll work	on these (esp. with the	monitor: media
	      gets indexed on mount, all data gets erased on unmount). Explic-
	      itly adding '/media/xxx' to the 'topdirs'	variable will override
	      this.

       skippedPathsFnmPathname = bool
	      Set to 0 to override use of FNM_PATHNAME	for  matching  skipped
	      paths.

       nowalkfn	= string
	      File  name  which	will cause its parent directory	to be skipped.
	      Any directory containing a file with this	name will  be  skipped
	      as if it was part	of the skippedPaths list. Ex: .recoll-noindex

       daemSkippedPaths	= string
	      skippedPaths equivalent specific to real time indexing. This en-
	      ables  having  parts of the tree which are initially indexed but
	      not monitored. If	daemSkippedPaths is not	set, the  daemon  uses
	      skippedPaths.

       zipUseSkippedNames = bool
	      Use  skippedNames	 inside	 Zip archives. Fetched directly	by the
	      rclzip.py	handler. Skip the patterns defined by skippedNames in-
	      side Zip archives. Can be	 redefined  for	 subdirectories.   See
	      https://www.lesbonscomptes.com/recoll/faqsandhowtos/Fil-
	      teringOutZipArchiveMembers.html

       zipSkippedNames = string
	      Space-separated  list  of	 wildcard  expressions	for names that
	      should be	ignored	inside zip archives. This is used directly  by
	      the  zip	handler. If zipUseSkippedNames is not set, zipSkipped-
	      Names defines the	patterns to be skipped inside archives.	If zi-
	      pUseSkippedNames is set, the  two	 lists	are  concatenated  and
	      used. Can	be redefined for subdirectories.  See https://www.les-
	      bonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMem-
	      bers.html

       followLinks = bool
	      Follow  symbolic links during indexing. The default is to	ignore
	      symbolic links to	avoid multiple indexing	of  linked  files.  No
	      effort  is  made to avoid	duplication when this option is	set to
	      true. This option	can  be	 set  individually  for	 each  of  the
	      'topdirs'	members	by using sections. It can not be changed below
	      the  'topdirs' level. Links in the 'topdirs' list	itself are al-
	      ways followed.

       indexedmimetypes	= string
	      Restrictive list of indexed mime types.  Normally	 not  set  (in
	      which  case all supported	types are indexed). If it is set, only
	      the types	from the list will have	their  contents	 indexed.  The
	      names  will  be  indexed anyway if indexallfilenames is set (de-
	      fault). MIME type	names should be	taken from  the	 mimemap  file
	      (the  values may be different from xdg-mime or file -i output in
	      some cases). Can be redefined for	subtrees.

       excludedmimetypes = string
	      List of excluded MIME types. Lets	you exclude  some  types  from
	      indexing.	 MIME type names should	be taken from the mimemap file
	      (the values may be different from	xdg-mime or file -i output  in
	      some cases) Can be redefined for subtrees.

       nomd5types = string
	      Don't  compute  md5 for these types. md5 checksums are used only
	      for deduplicating	results, and can be very expensive to  compute
	      on  multimedia  or  other	big files. This	list lets you turn off
	      md5 computation for selected types. It is	global	(no  redefini-
	      tion for subtrees). At the moment, it only has an	effect for ex-
	      ternal  handlers	(exec and execm). The file types can be	speci-
	      fied by listing either MIME types	(e.g. audio/mpeg)  or  handler
	      names (e.g. rclaudio.py).

       compressedfilemaxkbs = int
	      Size  limit for compressed files.	We need	to decompress these in
	      a	temporary directory for	identification,	which can be  wasteful
	      in  some	cases. Limit the waste.	Negative means no limit. 0 re-
	      sults in no processing of	any compressed file. Default 50	MB.

       textfilemaxmbs =	int
	      Size limit for text files. Mostly	for skipping monster logs. De-
	      fault 20 MB.

       indexallfilenames = bool
	      Index the	file names of unprocessed files	 Index	the  names  of
	      files  the  contents  of	which we don't index because of	an ex-
	      cluded or	unsupported MIME type.

       usesystemfilecommand = bool
	      Use a system command for file MIME type guessing as a final step
	      in file type identification This is generally useful,  but  will
	      usually cause the	indexing of many bogus 'text' files. See 'sys-
	      temfilecommand' for the command used.

       systemfilecommand = string
	      Command  used  to	guess MIME types if the	internal methods fails
	      This should be a "file -i" workalike.  The  file	path  will  be
	      added  as	a last parameter to the	command	line. "xdg-mime" works
	      better than the traditional "file" command, and is now the  con-
	      figured default (with a hard-coded fallback to "file")

       processwebqueue = bool
	      Decide  if  we  process  the Web queue. The queue	is a directory
	      where the	Recoll Web browser plugins create the copies  of  vis-
	      ited pages.

       textfilepagekbs = int
	      Page  size for text files. If this is set, text/plain files will
	      be divided into documents	of approximately this size.  Will  re-
	      duce  memory  usage  at index time and help with loading data in
	      the preview window at query time.	Particularly useful with  very
	      big  files,  such	 as  application  or  system  logs.  Also  see
	      textfilemaxmbs and compressedfilemaxkbs.

       membermaxkbs = int
	      Size limit for archive members. This is passed to	the filters in
	      the environment as RECOLL_FILTER_MAXMEMBERKB.

       indexStripChars = bool
	      Decide if	we store character case	and diacritics in  the	index.
	      If  we do, searches sensitive to case and	diacritics can be per-
	      formed, but the index will be bigger, and	some  marginal	weird-
	      ness  may	sometimes occur. The default is	a stripped index. When
	      using multiple indexes for a search, this	parameter must be  de-
	      fined  identically  for all. Changing the	value implies an index
	      reset.

       indexStoreDocText = bool
	      Decide if	we store the documents'	text  content  in  the	index.
	      Storing  the  text  allows  extracting snippets from it at query
	      time, instead of building	them from index	position data.	 Newer
	      Xapian index formats have	rendered our use of positions list un-
	      acceptably slow in some cases. The last Xapian index format with
	      good  performance	 for the old method is Chert, which is default
	      for 1.2, still supported but not default	in  1.4	 and  will  be
	      dropped in 1.6.  The stored document text	is translated from its
	      original	format to UTF-8	plain text, but	not stripped of	upper-
	      case, diacritics,	or punctuation signs. Storing it increases the
	      index size by 10-20% typically, but also allows for nicer	 snip-
	      pets, so it may be worth enabling	it even	if not strictly	needed
	      for  performance if you can afford the space.  The variable only
	      has an effect when creating an index, meaning that the  xapiandb
	      directory	 must  not  exist yet. Its exact effect	depends	on the
	      Xapian version.  For Xapian 1.4, if the variable is  set	to  0,
	      the  Chert format	will be	used, and the text will	not be stored.
	      If the variable is 1, Glass will be used,	and the	 text  stored.
	      For  Xapian 1.2, and for versions	after 1.5 and newer, the index
	      format is	always the default, but	the variable controls  if  the
	      text  is stored or not, and the abstract generation method. With
	      Xapian 1.5 and later, and	the variable set to 0, abstract	gener-
	      ation may	be very	slow, but this setting may still be useful  to
	      save space if you	do not use abstract generation at all.

       nonumbers = bool
	      Decides  if  terms  will	be  generated for numbers. For example
	      "123", "1.5e6", 192.168.1.4, would not be	indexed	 if  nonumbers
	      is  set ("value123" would	still be). Numbers are often quite in-
	      teresting	to search for, and this	should probably	not be set ex-
	      cept for special situations, ie, scientific documents with  huge
	      amounts  of numbers in them, where setting nonumbers will	reduce
	      the index	size. This can only be set for a whole index, not  for
	      a	subtree.

       dehyphenate = bool
	      Determines  if  we  index	'coworker' also	when the input is 'co-
	      worker'. This is new in version 1.22, and	on by default. Setting
	      the variable to off allows restoring the previous	behaviour.

       backslashasletter = bool
	      Process backslash	as normal letter. This may make	sense for peo-
	      ple wanting to index TeX commands	as such	but  is	 not  of  much
	      general use.

       underscoreasletter = bool
	      Process underscore as normal letter. This	makes sense in so many
	      cases that one wonders if	it should not be the default.

       maxtermlength = int
	      Maximum  term  length. Words longer than this will be discarded.
	      The default is 40	and used to be hard-coded, but it can  now  be
	      adjusted.	You need an index reset	if you change the value.

       nocjk = bool
	      Decides if specific East Asian (Chinese Korean Japanese) charac-
	      ters/word	splitting is turned off. This will save	a small	amount
	      of  CPU if you have no CJK documents. If your document base does
	      include such text	but you	are not	interested  in	searching  it,
	      setting nocjk may	be a significant time and space	saver.

       cjkngramlen = int
	      This  lets  you adjust the size of n-grams used for indexing CJK
	      text. The	default	value of 2 is  probably	 appropriate  in  most
	      cases. A value of	3 would	allow more precision and efficiency on
	      longer  words,  but  the	index  will  be	approximately twice as
	      large.

       indexstemminglanguages =	string
	      Languages	for which to create stemming expansion	data.  Stemmer
	      names  can  be  found by executing 'recollindex -l', or this can
	      also be set from a list in the GUI. The values are full language
	      names, e.g. english, french...

       defaultcharset =	string
	      Default character	set. This is used for files which do not  con-
	      tain a character set definition (e.g.: text/plain). Values found
	      inside files, e.g. a 'charset' tag in HTML documents, will over-
	      ride  it.	 If  this is not set, the default character set	is the
	      one defined by the NLS environment ($LC_ALL, $LC_CTYPE,  $LANG),
	      or  ultimately iso-8859-1	(cp-1252 in fact).  If for some	reason
	      you want a general default which does not	match your LANG	and is
	      not 8859-1, use this variable. This can  be  redefined  for  any
	      sub-directory.

       unac_except_trans = string
	      A	 list of characters, encoded in	UTF-8, which should be handled
	      specially	when converting	text to	unaccented lowercase. For  ex-
	      ample, in	Swedish, the letter a with diaeresis has full alphabet
	      citizenship and should not be turned into	an a.  Each element in
	      the space-separated list has the special character as first ele-
	      ment  and	 the  translation  following. The handling of both the
	      lowercase	and upper-case versions	of a character should be spec-
	      ified, as	appartenance to	the list will turn-off	both  standard
	      accent and case processing. The value is global and affects both
	      indexing and querying.  Examples:
	      Swedish:
	      unac_except_trans	=	ss oe oe ae ae ff fi fl
	      German:
	      unac_except_trans	=	ss oe oe ae ae ff fi fl
	      French:  you  probably  want  to	decompose oe and ae and	nobody
	      would type a German
	      unac_except_trans	= ss oe	oe ae ae ff fi fl
	      The default for all until	someone	protests follows. These	decom-
	      positions	are not	performed by unac, but	it  is	unlikely  that
	      someone would type the composed forms in a search.
	      unac_except_trans	= ss oe	oe ae ae ff fi fl

       maildefcharset =	string
	      Overrides	 the  default  character  set for email	messages which
	      don't specify one. This is mainly	useful	for  readpst  (libpst)
	      dumps, which are utf-8 but do not	say so.

       localfields = string
	      Set  fields on all files (usually	of a specific fs area).	Syntax
	      is the usual: name = value ; attr1 =  val1  ;  [...]   value  is
	      empty so this needs an initial semi-colon. This is useful, e.g.,
	      for  setting  the	rclaptg	field for application selection	inside
	      mimeview.

       testmodifusemtime = bool
	      Use mtime	instead	of ctime to test if a file has been  modified.
	      The  time	is used	in addition to the size, which is always used.
	      Setting this can reduce re-indexing on  systems  where  extended
	      attributes  are  used  (by  some other application), but not in-
	      dexed, because changing extended attributes only affects	ctime.
	      Notes:  -	 This may prevent detection of change in some marginal
	      file rename cases	(the target would need to have the  same  size
	      and  mtime).   - You should probably also	set noxattrfields to 1
	      in this case, except if you still	prefer to perform xattr	index-
	      ing, for example if the local file update	pattern	 makes	it  of
	      value  (as in general, there is a	risk for pure extended attrib-
	      utes updates without file	modification to	go  undetected).  Per-
	      form a full index	reset after changing this.

       noxattrfields = bool
	      Disable  extended	attributes conversion to metadata fields. This
	      probably needs to	be set if testmodifusemtime is set.

       metadatacmds = string
	      Define commands to gather	external  metadata,  e.g.  tmsu	 tags.
	      There  can  be  several  entries,	separated by semi-colons, each
	      defining which field name	the data goes into and the command  to
	      use.  Don't  forget  the initial semi-colon. All the field names
	      must be different. You can use aliases in	the  "field"  file  if
	      necessary.   As  a  not too pretty hack conceded to convenience,
	      any field	name beginning with "rclmulti" will be taken as	an in-
	      dication that the	command	returns	multiple field values inside a
	      text blob	formatted as a recoll configuration file ("fieldname =
	      fieldvalue" lines). The rclmultixx name  will  be	 ignored,  and
	      field  names  and	values will be parsed from the data.  Example:
	      metadatacmds = ; tags = tmsu tags	%f; rclmulti1 =	cmdOutputsConf
	      %f

       cachedir	= dfn
	      Top directory for	Recoll data. Recoll data directories are  nor-
	      mally  located  relative	to  the	 configuration directory (e.g.
	      ~/.recoll/xapiandb, ~/.recoll/mboxcache).	If 'cachedir' is  set,
	      the  directories	are  stored  under the specified value instead
	      (e.g. if cachedir	is ~/.cache/recoll, the	default	dbdir would be
	      ~/.cache/recoll/xapiandb).   This	 affects  dbdir,  webcachedir,
	      mboxcachedir,  aspellDicDir,  which  can	still  be individually
	      specified	to override cachedir.  Note that if you	have  multiple
	      configurations, each must	have a different cachedir, there is no
	      automatic	computation of a subpath under cachedir.

       maxfsoccuppc = int
	      Maximum  file system occupation over which we stop indexing. The
	      value is a percentage, corresponding to what the	"Capacity"  df
	      output  column  shows. The default value is 0, meaning no	check-
	      ing.

       dbdir = dfn
	      Xapian database directory	location.  This	 will  be  created  on
	      first indexing. If the value is not an absolute path, it will be
	      interpreted as relative to cachedir if set, or the configuration
	      directory	(-c argument or	$RECOLL_CONFDIR).  If nothing is spec-
	      ified, the default is then ~/.recoll/xapiandb/

       idxstatusfile = fn
	      Name  of	the scratch file where the indexer process updates its
	      status. Default: idxstatus.txt inside the	 configuration	direc-
	      tory.

       mboxcachedir = dfn
	      Directory	location for storing mbox message offsets cache	files.
	      This  is normally	'mboxcache' under cachedir if set, or else un-
	      der the configuration directory, but it may be useful to share a
	      directory	between	different configurations.

       mboxcacheminmbs = int
	      Minimum mbox file	size over which	we cache the offsets. There is
	      really no	sense in caching offsets for small files. The  default
	      is 5 MB.

       mboxmaxmsgmbs = int
	      Maximum  mbox  member message size in megabytes. Size over which
	      we assume	that the mbox format is	bad or we  misinterpreted  it,
	      at which point we	just stop processing the file.

       webcachedir = dfn
	      Directory	 where	we  store the archived web pages. This is only
	      used by the web history indexing code Default: cachedir/webcache
	      if cachedir is set, else $RECOLL_CONFDIR/webcache

       webcachemaxmbs =	int
	      Maximum size in MB of the	Web archive. This is only used by  the
	      web  history  indexing code.  Default: 40	MB.  Reducing the size
	      will not physically truncate the file.

       webqueuedir = fn
	      The path to the Web indexing queue. This used to	be  hard-coded
	      in  the  old plugin as ~/.recollweb/ToIndex so there would be no
	      need or possibility to change it,	but the	 WebExtensions	plugin
	      now  downloads  the files	to the user Downloads directory, and a
	      script moves them	to webqueuedir.	The script  reads  this	 value
	      from the config so it has	become possible	to change it.

       webdownloadsdir = fn
	      The  path	 to browser downloads directory. This is where the new
	      browser add-on extension has to create the files.	They are  then
	      moved by a script	to webqueuedir.

       aspellDicDir = dfn
	      Aspell dictionary	storage	directory location. The	aspell dictio-
	      nary  (aspdict.(lang).rws)  is  normally stored in the directory
	      specified	by cachedir if set, or under the configuration	direc-
	      tory.

       filtersdir = dfn
	      Directory	location for executable	input handlers.	If RECOLL_FIL-
	      TERSDIR  is  set in the environment, we use it instead. Defaults
	      to $prefix/share/recoll/filters. Can be redefined	for  subdirec-
	      tories.

       iconsdir	= dfn
	      Directory	 location  for	icons.	The only reason	to change this
	      would be if you want to change the icons displayed in the	result
	      list. Defaults to	$prefix/share/recoll/images

       idxflushmb = int
	      Threshold	(megabytes of new data)	where we flush from memory  to
	      disk  index.  Setting this allows	some control over memory usage
	      by the indexer process. A	value of 0 means no explicit flushing,
	      which lets Xapian	perform	its own	thing, meaning flushing	 every
	      $XAPIAN_FLUSH_THRESHOLD  documents created, modified or deleted:
	      as memory	usage depends on average document size,	not only docu-
	      ment count, the Xapian approach is is not	very useful,  and  you
	      should let Recoll	manage the flushes. The	program	compiled value
	      is  0.  The  configured default value (from this file) is	now 50
	      MB, and should be	ok in many cases.  You can set it as low as 10
	      to conserve memory, but if you are looking  for  maximum	speed,
	      you may want to experiment with values between 20	and 200. In my
	      experience,  values beyond this are always counterproductive. If
	      you find otherwise, please drop me a note.

       filtermaxseconds	= int
	      Maximum external filter execution	time in	seconds. Default  1200
	      (20mn).  Set to 0	for no limit. This is mainly to	avoid infinite
	      loops in postscript files	(loop.ps)

       filtermaxmbytes = int
	      Maximum  virtual	memory	space  for  filter  processes	(setr-
	      limit(RLIMIT_AS)),  in  megabytes.  Note	that this includes any
	      mapped libs (there is no reliable	Linux way to  limit  the  data
	      space only), so we need to be a bit generous here. Anything over
	      2000  will  be ignored on	32 bits	machines. The previous default
	      value of 2000 would prevent java pdftk  to  work	when  executed
	      from Python rclpdf.py.

       thrQSizes = string
	      Stage  input  queues  configuration.  There  are	three internal
	      queues in	the indexing pipeline stages  (file  data  extraction,
	      terms  generation,  index	 update).  This	 parameter defines the
	      queue depths for each stage (three integer values). If  a	 value
	      of  -1  is  given	 for  a	given stage, no	queue is used, and the
	      thread will go on	performing the next stage. In  practise,  deep
	      queues  have  not	been shown to increase performance. Default: a
	      value of 0 for the first queue tells Recoll to perform  autocon-
	      figuration based on the detected number of CPUs (no need for the
	      two  other  values  in  this case).  Use thrQSizes = -1 -1 -1 to
	      disable multithreading entirely.

       thrTCounts = string
	      Number of	threads	used for each indexing stage. The three	stages
	      are: file	data extraction, terms generation, index update).  The
	      use  of  the counts is also controlled by	some special values in
	      thrQSizes: if the	first queue depth is 0,	all counts are ignored
	      (autoconfigured);	if a value of -1 is used for  a	 queue	depth,
	      the  corresponding thread	count is ignored. It makes no sense to
	      use a value other	than 1 for the last stage because updating the
	      Xapian index is necessarily single-threaded (and protected by  a
	      mutex).

       loglevel	= int
	      Log  file	verbosity 1-6. A value of 2 will print only errors and
	      warnings.	3 will print information like document updates,	 4  is
	      quite verbose and	6 very verbose.

       logfilename = fn
	      Log  file	 destination.  Use  'stderr' (default) to write	to the
	      console.

       idxloglevel = int
	      Override loglevel	for the	indexer.

       idxlogfilename =	fn
	      Override logfilename for the indexer.

       daemloglevel = int
	      Override loglevel	for the	indexer	in real	time mode. The default
	      is to use	the idx... values if set, else the log... values.

       daemlogfilename = fn
	      Override logfilename for the indexer in real time	mode. The  de-
	      fault  is	 to use	the idx... values if set, else the log... val-
	      ues.

       pyloglevel = int
	      Override loglevel	for the	python module.

       pylogfilename = fn
	      Override logfilename for the python module.

       orgidxconfdir = dfn
	      Original location	of the configuration directory.	This  is  used
	      exclusively for movable datasets.	Locating the configuration di-
	      rectory  inside  the directory tree makes	it possible to provide
	      automatic	query time path	translations once  the	data  set  has
	      moved (for example, because it has been mounted on another loca-
	      tion).

       curidxconfdir = dfn
	      Current  location	 of  the  configuration	 directory. Complement
	      orgidxconfdir for	movable	datasets. This should be used  if  the
	      configuration  directory has been	copied from the	dataset	to an-
	      other location, either because the dataset is  readonly  and  an
	      r/w  copy	 is  desired, or for performance reasons. This records
	      the original moved location before copy, to allow	path  transla-
	      tion  computations.  For example if a dataset originally indexed
	      as '/home/me/mydata/config' has been mounted  to	'/media/me/my-
	      data',  and  the	GUI  is	 running  from a copied	configuration,
	      orgidxconfdir would  be  '/home/me/mydata/config',  and  curidx-
	      confdir  (as  set	 in  the  copied configuration)	would be '/me-
	      dia/me/mydata/config'.

       idxrundir = dfn
	      Indexing process current directory. The input handlers sometimes
	      leave temporary files in the  current  directory,	 so  it	 makes
	      sense  to	have recollindex chdir to some temporary directory. If
	      the value	is empty, the current directory	is not changed.	If the
	      value is (literal) tmp, we use the temporary directory as	set by
	      the environment (RECOLL_TMPDIR else TMPDIR else  /tmp).  If  the
	      value is an absolute path	to a directory,	we go there.

       checkneedretryindexscript = fn
	      Script  used to heuristically check if we	need to	retry indexing
	      files which previously failed.  The default  script  checks  the
	      modified	dates  on /usr/bin and /usr/local/bin. A relative path
	      will be looked up	in the filters dirs, then in the path. Use  an
	      absolute path to do otherwise.

       recollhelperpath	= string
	      Additional places	to search for helper executables. This is only
	      used on Windows for now.

       idxabsmlen = int
	      Length  of  abstracts  we	store while indexing. Recoll stores an
	      abstract for each	indexed	file.  The text	can come from  an  ac-
	      tual  'abstract' section in the document or will just be the be-
	      ginning of the document. It is stored in the index  so  that  it
	      can  be  displayed  inside the result lists without decoding the
	      original file. The idxabsmlen parameter defines the size of  the
	      stored  abstract.	The default value is 250 bytes.	The search in-
	      terface gives you	the choice to display this stored  text	 or  a
	      synthetic	 abstract  built  by extracting	text around the	search
	      terms. If	you always prefer the synthetic	abstract, you can  re-
	      duce this	value and save a little	space.

       idxmetastoredlen	= int
	      Truncation  length  of stored metadata fields. This does not af-
	      fect indexing (the whole field is	processed  anyway),  just  the
	      amount of	data stored in the index for the purpose of displaying
	      fields inside result lists or previews. The default value	is 150
	      bytes which may be too low if you	have custom fields.

       idxtexttruncatelen = int
	      Truncation  length for all document texts. Only index the	begin-
	      ning of documents. This is not recommended  except  if  you  are
	      sure  that  the interesting keywords are at the top and have se-
	      vere disk	space issues.

       aspellLanguage =	string
	      Language definitions to use when creating	the aspell dictionary.
	      The value	must match a set of aspell language definition	files.
	      You can type "aspell dicts" to see a list	The default if this is
	      not  set	is  to use the NLS environment to guess	the value. The
	      values are the 2-letter language codes (e.g. 'en', 'fr'...)

       aspellAddCreateParam = string
	      Additional option	and parameter to  aspell  dictionary  creation
	      command.	Some  aspell  packages	may  need an additional	option
	      (e.g. on Debian Jessie:  --local-data-dir=/usr/lib/aspell).  See
	      Debian bug 772415.

       aspellKeepStderr	= bool
	      Set  this	 to  have a look at aspell dictionary creation errors.
	      There are	always many, so	this is	mostly for debugging.

       noaspell	= bool
	      Disable aspell use. The aspell dictionary	generation takes time,
	      and some combinations of aspell  version,	 language,  and	 local
	      terms, result in aspell crashing,	so it sometimes	makes sense to
	      just disable the thing.

       monauxinterval =	int
	      Auxiliary	 database  update interval. The	real time indexer only
	      updates the auxiliary databases (stemdb,	aspell)	 periodically,
	      because  it  would  be  too  costly  to do it for	every document
	      change. The default period is one	hour.

       monixinterval = int
	      Minimum interval (seconds) between processings of	 the  indexing
	      queue. The real time indexer does	not process each event when it
	      comes  in,  but  lets the	queue accumulate, to diminish overhead
	      and to aggregate multiple	events affecting the  same  file.  De-
	      fault 30 S.

       mondelaypatterns	= string
	      Timing  parameters  for  the real	time indexing. Definitions for
	      files which get a	longer delay  before  reindexing  is  allowed.
	      This  is	for fast-changing files, that should only be reindexed
	      once in a	while. A list of  wildcardPattern:seconds  pairs.  The
	      patterns	are  matched  with  fnmatch(pattern,  path, 0) You can
	      quote entries containing white space with	double	quotes	(quote
	      the  whole entry,	not the	pattern). The default is empty.	 Exam-
	      ple: mondelaypatterns = *.log:20 "*with spaces.*:30"

       idxniceprio = int
	      "nice" process priority for the indexing processes. Default:  19
	      (lowest) Appeared	with 1.26.5. Prior versions were fixed at 19.

       monioniceclass =	int
	      ionice  class  for  the indexing process.	Despite	the misleading
	      name, and	on platforms where this	is supported, this affects all
	      indexing processes, not only the real time/monitoring ones.  The
	      default value is 3 (use lowest "Idle" priority).

       monioniceclassdata = string
	      ionice  class  level parameter if	the class supports it. The de-
	      fault is empty, as the default "Idle" class has no levels.

       autodiacsens = bool
	      auto-trigger diacritics sensitivity (raw index only). IF the in-
	      dex is not stripped, decide if we	automatically trigger diacrit-
	      ics sensitivity if the search term has accented characters  (not
	      in  unac_except_trans).  Else you	need to	use the	query language
	      and the "D" modifier to specify diacritics sensitivity.  Default
	      is no.

       autocasesens = bool
	      auto-trigger  case sensitivity (raw index	only). IF the index is
	      not stripped (see	indexStripChars), decide if  we	 automatically
	      trigger character	case sensitivity if the	search term has	upper-
	      case  characters in any but the first position. Else you need to
	      use the query language and the "C" modifier to  specify  charac-
	      ter-case sensitivity. Default is yes.

       maxTermExpand = int
	      Maximum  query expansion count for a single term (e.g.: when us-
	      ing wildcards). This only	affects	queries, not indexing. We used
	      to not limit this	at all (except for filenames where  the	 limit
	      was  too	low at 1000), but it is	unreasonable with a big	index.
	      Default 10000.

       maxXapianClauses	= int
	      Maximum number of	clauses	we add to a single Xapian query.  This
	      only affects queries, not	indexing. In some cases, the result of
	      term  expansion can be multiplicative, and we want to avoid eat-
	      ing all the memory. Default 50000.

       snippetMaxPosWalk = int
	      Maximum number of	positions we walk while	populating  a  snippet
	      for  the	result	list. The default of 1,000,000 may be insuffi-
	      cient for	very big documents, the	consequence would be  snippets
	      with possibly meaning-altering missing words.

       pdfocr =	bool
	      Attempt  OCR  of PDF files with no text content. This can	be de-
	      fined in subdirectories. The default is off because  OCR	is  so
	      very slow.

       pdfattach = bool
	      Enable  PDF  attachment extraction by executing pdftk (if	avail-
	      able). This is normally disabled,	because	it does	slow down  PDF
	      indexing a bit even if not one attachment	is ever	found.

       pdfextrameta = string
	      Extract  text  from selected XMP metadata	tags. This is a	space-
	      separated	list of	qualified XMP tag names. Each element can also
	      include a	translation to a Recoll	field name, separated by a '|'
	      character. If the	second element is absent, the tag name is used
	      as the Recoll field names. You will also need to add  specifica-
	      tions to the "fields" file to direct processing of the extracted
	      data.

       pdfextrametafix = fn
	      Define  name  of XMP field editing script. This defines the name
	      of a script to be	loaded	for  editing  XMP  field  values.  The
	      script should define a 'MetaFixer' class with a metafix()	method
	      which  will  be  called with the qualified tag name and value of
	      each selected field, for editing or erasing. A new  instance  is
	      created  for  each  document,  so	that the object	can keep state
	      for, e.g.	eliminating duplicate values.

       ocrprogs	= string
	      OCR modules to try. The top OCR script will try to load the cor-
	      responding modules in order and use the first which reports  be-
	      ing  capable  of	performing  OCR	on the input file. Modules for
	      tesseract	(tesseract) and	ABBYY FineReader (abbyy)  are  present
	      in  the standard distribution. For compatibility with the	previ-
	      ous version, if this is not defined at all, the default value is
	      "tesseract". Use an explicit empty value if needed. A  value  of
	      "abbyy tesseract"	will try everything.

       ocrcachedir = dfn
	      Location	for  caching OCR data. The default if this is empty or
	      undefined	 is  to	 store	the  cached  OCR  data	 under	 $REC-
	      OLL_CONFDIR/ocrcache.

       tesseractlang = string
	      Language	to  assume  for	tesseract OCR. Important for improving
	      the OCR accuracy.	This can also be set through the contents of a
	      file in the currently processed directory. See the rclocrtesser-
	      act.py script. Example values: eng,  fra...  See	the  tesseract
	      documentation.

       tesseractcmd = fn
	      Path  for	 the  tesseract	 command. Do not quote.	This is	mostly
	      useful on	Windows, or for	 specifying  a	non-default  tesseract
	      command.	  E.g.	  on	Windows.    tesseractcmd   =   C:/Pro-
	      gram&nbsp;Files&nbsp;(x86)/Tesseract-OCR/tesseract.exe

       abbyylang = string
	      Language to assume for abbyy OCR.	Important  for	improving  the
	      OCR  accuracy.  This  can	 also be set through the contents of a
	      file in the currently processed  directory.  See	the  rclocrab-
	      byy.py  script. Typical values: English, French... See the ABBYY
	      documentation.

       abbyycmd	= fn
	      Path for the abbyy command The ABBY directory is usually not  in
	      the path,	so you should set this.

       mhmboxquirks = string
	      Enable thunderbird/mozilla-seamonkey mbox	format quirks Set this
	      for the directory	where the email	mbox files are stored.

SEE ALSO
       recollindex(1) recoll(1)

			       14 November 2012			RECOLL.CONF(5)
NAME | DESCRIPTION | OPTIONS | SEE ALSO
Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=recoll.conf&sektion=5&manpath=FreeBSD+Ports+15.0>
home | help
Header And Logo

Peripheral Links

Site Navigation

FreeBSD Manual Pages

Header And Logo

Peripheral Links

Search

Site Navigation

FreeBSD Manual Pages