Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MU-INDEX(1)		    General Commands Manual		   MU-INDEX(1)

NAME
       mu index	- index	e-mail messages	stored in Maildirs

SYNOPSIS
       mu index	[options]

DESCRIPTION
       mu  index is the	mu command for scanning	the contents of	Maildir	direc-
       tories and storing the results in a Xapian database. The	data can  then
       be queried using	mu-find(1).

       Before  the  first  time	you run	mu index, you must run mu init to ini-
       tialize the database.

       index understands Maildirs as defined by	Daniel Bernstein for qmail(7).
       In  addition,  it  understands  recursive  Maildirs  (Maildirs	within
       Maildirs),  Maildir++.  It can also deal	with VFAT-based	Maildirs which
       use '!'	or ';' as the separators instead of ':'.

       E-mail messages which are not stored in something resembling a  maildir
       leaf-directory  (cur and	new) are ignored, as are the cache directories
       for notmuch and gnus, and any dot-directory.

       Starting	with mu	1.5.x, symlinks	are followed, and can be  spread  over
       multiple	 filesystems;  however	note  that moving files	around is much
       faster when multiple filesystems	are not	involved.

       If there	is a file called .noindex in a directory, the contents of that
       directory and all of its	subdirectories will be ignored.	 This  can  be
       useful  to  exclude  certain directories	from the indexing process, for
       example directories with	spam-messages.

       If there	is a file called .noupdate in a	 directory,  the  contents  of
       that directory and all of its subdirectories will be ignored, unless we
       do a full rebuild (with mu init). This can be useful to speed up	things
       you  have  some	maildirs  that	never  change. Note that you can still
       search for these	messages, this only  affects  updating	the  database.
       .noupdate  is  ignored  when  you start indexing	with an	empty database
       (such as	directly after mu init.

       There also the --lazy-check which can greatly speed  up	indexing;  see
       below for details.

       The  first  run of mu index may take a few minutes if you have a	lot of
       mail (tens of thousands of messages). Fortunately,  such	 a  full  scan
       needs  to  be  done  only  once;	 after	that  it suffices to index the
       changes,	 which	goes  much  faster.   See  the	'Note  on  performance
       (i,ii,iii)' below for more information.

       The optional 'phase two'	of the indexing-process	is the removal of mes-
       sages  from  the	 database for which there is no	longer a corresponding
       file in the Maildir.  If	you do not want	this, you can  use  -n,	 --no-
       cleanup.

       When  mu	 index	catches	 one  of the signals SIGINT, SIGHUP or SIGTERM
       (e.g., when you press Ctrl-C during the indexing	process), it tries  to
       shutdown	 gracefully;  it  tries	to save	and commit data, and close the
       database	etc. If	it receives another signal (e.g., when pressing	Ctrl-C
       once more), mu index will terminate immediately.

OPTIONS
       Some of the general options are described in the	mu(1) man-page and not
       here, as	they apply to multiple mu commands.

       --lazy-check
	      in lazy-check mode, mu does not consider messages	for which  the
	      time-stamp  (ctime)  of  the  directory  they  reside in has not
	      changed since the	previous indexing run.	This  is  much	faster
	      than  the	 non-lazy  check,  but won't update messages that have
	      change (rather than having been added or removed), since	merely
	      editing  a  message does not update the directory	time-stamp. Of
	      course, you can run mu-index occasionally	without	 --lazy-check,
	      to pick up such messages.

       --nocleanup
	      disables	the database cleanup that mu does by default after in-
	      dexing.

   A note on performance (i)
       As a non-scientific benchmark, a	simple test on the author's machine (a
       Thinkpad	X61s laptop using Linux	2.6.35 and an ext3 file	 system)  with
       no existing database, and a maildir with	27273 messages:

	$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
	$ time mu index	--quiet
	66,65s user 6,05s system 27% cpu 4:24,20 total
       (about 103 messages per second)

       A  second run, which is the more	typical	use case when there is a data-
       base already, goes much faster:

	$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
	$ time mu index	--quiet
	0,48s user 0,76s system	10% cpu	11,796 total
       (more than 56818	messages per second)

       Note that each test flushes the caches first; a more  common  use  case
       might  be to run	mu index when new mail has arrived; the	cache may stay
       quite 'warm' in that case:

	$ time mu index	--quiet
	0,33s user 0,40s system	80% cpu	0,905 total
       which is	more than 30000	messages per second.

   A note on performance (ii)
       As per June 2012, we did	the same non-scientific	benchmark,  this  time
       with  an	Intel i5-2500 CPU @ 3.30GHz, an	ext4 file system and a maildir
       with 22589 messages. We start without an	existing database.

	$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
	$ time mu index	--quiet
	27,79s user 2,17s system 48% cpu 1:01,47 total
       (about 813 messages per second)

       A second	run, which is the more typical use case	when there is a	 data-
       base already, goes much faster:

	$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
	$ time mu index	--quiet
	0,13s user 0,30s system	19% cpu	2,162 total
       (more than 173000 messages per second)

   A note on performance (iii)
       As  per July 2016, we did the same non-scientific benchmark, again with
       the Intel i5-2500 CPU @ 3.30GHz,	an ext4	file system.  This  time,  the
       maildir contains	72525 messages.

	$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
	$ time mu index	--quiet
	40,34s user 2,56s system 64% cpu 1:06,17 total
       (about 1099 messages per	second).

   A note on performance (iv)
       A  few years later and its June 2022. There's a lot more	happening dur-
       ing indexing, but  indexing  became  multi-threaded  and	 machines  are
       faster;	e.g.  this  is	with  an  AMD  Ryzen Threadripper 1950X	(32) @
       3.399GHz.

       The instructions	are a little different since we	have a proper  repeat-
       able benchmark now. After building,

	$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
       % THREAD_NUM=4 build/lib/tests/bench-indexer -m perf
       # random	seed: R02Sf5c50e4851ec51adaf301e0e054bd52b
       1..1
       # Start of bench	tests
       # Start of indexer tests
       indexed 5000 messages in	20 maildirs in 3763ms; 752 s/message; 1328 messages/s (4 thread(s))
       ok 1 /bench/indexer/4-cores
       # End of	indexer	tests
       # End of	bench tests

       Things are again	a little faster, even though the index does a lot more
       now  (text-normalizatian,  and  pre-generating message-sexps). A	faster
       machine helps, too!

RETURN VALUE
       mu index	return 0 upon successful completion; any other number  signals
       an error.

BUGS
       Please report bugs if you find any: https://github.com/djcb/mu/issues

AUTHOR
       Dirk-Jan	C. Binnema <djcb@djcbsoftware.nl>

SEE ALSO
       maildir(5), mu(1), mu-init(1), mu-find(1), mu-cfind(1)

User Manuals			   June	2022			   MU-INDEX(1)

Want to link to this manual page? Use this URL:
<https://man.freebsd.org/cgi/man.cgi?query=mu-index&sektion=1&manpath=FreeBSD+Ports+14.3.quarterly>

home | help