Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
INDEXER(1)			 Sphinxsearch			    INDEXER(1)

       indexer - Sphinxsearch fulltext index generator

       indexer [--config CONFIGFILE] [--rotate]	[--noprogress |	--quiet]
	       [--all |	INDEX |	...]

       indexer --buildstops OUTPUTFILE COUNT [--config CONFIGFILE]
	       [--noprogress | --quiet]	[--all | INDEX | ...]

       indexer --merge MAIN_INDEX DELTA_INDEX [--config	CONFIGFILE] [--rotate]
	       [--noprogress | --quiet]

       Sphinx is a collection of programs that aim to provide high quality
       fulltext	search.

       indexer is the first of the two principle tools as part of Sphinx.
       Invoked from either the command line directly, or as part of a larger
       script, indexer is solely responsible for gathering the data that will
       be searchable.

       The calling syntax for indexer is as follows:

	   $ indexer [OPTIONS] [indexname1 [indexname2 [...]]]

       Essentially you would list the different	possible indexes (that you
       would later make	available to search) in	sphinx.conf, so	when calling
       indexer,	as a minimum you need to be telling it what index (or indexes)
       you want	to index.

       If sphinx.conf contained	details	on 2 indexes, mybigindex and
       mysmallindex, you could do the following:

	   $ indexer mybigindex
	   $ indexer mysmallindex mybigindex

       As part of the configuration file, sphinx.conf, you specify one or more
       indexes for your	data. You might	call indexer to	reindex	one of them,
       ad-hoc, or you can tell it to process all indexes - you are not limited
       to calling just one, or all at once, you	can always pick	some
       combination of the available indexes.

       The majority of the options for indexer are given in the	configuration
       file, however there are some options you	might need to specify on the
       command line as well, as	they can affect	how the	indexing operation is
       performed. These	options	are:

	   Tells indexer to update every index listed in sphinx.conf, instead
	   of listing individual indexes. This would be	useful in small
	   configurations, or cron-type	or maintenance jobs where the entire
	   index set will get rebuilt each day,	or week, or whatever period is

	   Example usage:

	       $ indexer --config /home/myuser/sphinx.conf --all

       --buildstops outfile.txt	NUM
	   Reviews the index source, as	if it were indexing the	data, and
	   produces a list of the terms	that are being indexed.	In other
	   words, it produces a	list of	all the	searchable terms that are
	   becoming part of the	index. Note; it	does not update	the index in
	   question, it	simply processes the data 'as if' it were indexing,
	   including running queries defined with sql_query_pre	or
	   sql_query_post.  outputfile.txt will	contain	the list of words, one
	   per line, sorted by frequency with most frequent first, and NUM
	   specifies the maximum number	of words that will be listed; if
	   sufficiently	large to encompass every word in the index, only that
	   many	words will be returned.	Such a dictionary list could be	used
	   for client application features around "Did you mean..."
	   functionality, usually in conjunction with --buildfreqs, below.


	       $ indexer myindex --buildstops word_freq.txt 1000

	   This	would produce a	document in the	current	directory,
	   word_freq.txt with the 1,000	most common words in 'myindex',
	   ordered by most common first. Note that the file will pertain to
	   the last index indexed when specified with multiple indexes or
	   --all (i.e. the last	one listed in the configuration	file)

	   Used	in pair	with --buildstops (and is ignored if --buildstops is
	   not specified). As --buildstops provides the	list of	words used
	   within the index, --buildfreqs adds the quantity present in the
	   index, which	would be useful	in establishing	whether	certain	words
	   should be considered	stopwords if they are too prevalent. It	will
	   also	help with developing "Did you mean..." features	where you can
	   how much more common	a given	word compared to another, similar one.


	       $ indexer myindex --buildstops word_freq.txt 1000 --buildfreqs

	   This	would produce the word_freq.txt	as above, however after	each
	   word	would be the number of times it	occurred in the	index in

       --config	CONFIGRILE, -c CONFIGFILE
	   Use the given file as configuration.	Normally, it will look for
	   sphinx.conf in the installation directory
	   (e.g./usr/local/sphinx/etc/sphinx.conf if installed into
	   /usr/local/sphinx), followed	by the current directory you are in
	   when	calling	indexer	from the shell.	This is	most of	use in shared
	   environments	where the binary files are installed somewhere like
	   /usr/local/sphinx/ but you want to provide users with the ability
	   to make their own custom Sphinx set-ups, or if you want to run
	   multiple instances on a single server. In cases like	those you
	   could allow them to create their own	sphinx.conf files and pass
	   them	to indexer with	this option.

	   For example:

	       $ indexer --config /home/myuser/sphinx.conf myindex

       --dump-rows FILE
	   Dumps rows fetched by SQL source(s) into the	specified file,	in a
	   MySQL compatible syntax. Resulting dumps are	the exact
	   representation of data as received by indexer and help to repeat
	   indexing-time issues.

       --merge DST-INDEX SRC-INDEX
	   Physically merge together two indexes. For example if you have a
	   main+delta scheme, where the	main index rarely changes, but the
	   delta index is rebuilt frequently, and --merge would	be used	to
	   combine the two. The	operation moves	from right to left - the
	   contents of SRC-INDEX get examined and physically combined with the
	   contents of DST-INDEX and the result	is left	in DST-INDEX. In
	   pseudo-code,	it might be expressed as: DST-INDEX += SRC-INDEX

	   An example:

	       $ indexer --merge main delta --rotate

	   In the above	example, where the main	is the master, rarely modified
	   index, and delta is the less	frequently modified one, you might use
	   the above to	call indexer to	combine	the contents of	the delta into
	   the main index and rotate the indexes.

       --merge-dst-range ATTR MIN MAX
	   Run the filter range	given upon merging. Specifically, as the merge
	   is applied to the destination index (as part	of --merge, and	is
	   ignored if --merge is not specified), indexer will also filter the
	   documents ending up in the destination index, and only documents
	   will	pass through the filter	given will end up in the final index.
	   This	could be used for example, in an index where there is a
	   'deleted' attribute,	where 0	means 'not deleted'. Such an index
	   could be merged with:

	       $ indexer --merge main delta --merge-dst-range deleted 0	0

	   Any documents marked	as deleted (value 1) would be removed from the
	   newly-merged	destination index. It can be added several times to
	   the command line, to	add successive filters to the merge, all of
	   which must be met in	order for a document to	become part of the
	   final index.

       --merge-killlists, --merge-klists
	   Used	in pair	with --merge. Usually when merging indexer uses
	   kill-list of	source index (i.e., the	one which is merged into) as
	   the filter to wipe out the matching docs from the destination
	   index. At the same time the kill-list of the	destination itself
	   isn't touched at all. When using --merge-killlists, (or it shorter
	   form	--merge-klists)	the indexer will not filter the	dst-index docs
	   with	src-index killlist, but	it will	merge their kill-lists
	   together, so	the final result index will have the kill-list
	   containing the merged source	kill-lists.

	   Don't display progress details as they occur; instead, the final
	   status details (such	as documents indexed, speed of indexing	and so
	   on are only reported	at completion of indexing. In instances	where
	   the script is not being run on a console (or	'tty'),	this will be
	   on by default.

	   Example usage:

	       $ indexer --rotate --all	--noprogress

	   Prints out SQL queries that indexer sends to	the database, along
	   with	SQL connection and disconnection events. That is useful	to
	   diagnose and	fix problems with SQL sources.

	   Tells indexer not to	output anything, unless	there is an error.
	   Again, most used for	cron-type, or other script jobs	where the
	   output is irrelevant	or unnecessary,	except in the event of some
	   kind	of error.

	   Example usage:

	       $ indexer --rotate --all	--quiet

	   Used	for rotating indexes. Unless you have the situation where you
	   can take the	search function	offline	without	troubling users, you
	   will	almost certainly need to keep search running whilst indexing
	   new documents.  --rotate creates a second index, parallel to	the
	   first (in the same place, simply including .new in the filenames).
	   Once	complete, indexer notifies searchd via sending the SIGHUP
	   signal, and searchd will attempt to rename the indexes (renaming
	   the existing	ones to	include	.old and renaming the .new to replace
	   them), and then start serving from the newer	files. Depending on
	   the setting of seamless_rotate, there may be	a slight delay in
	   being able to search	the newer indexes.

	   Example usage:

	       $ indexer --rotate --all

	   is useful when you are rebuilding many big indexes, and want	each
	   one rotated into searchd as soon as possible. With --sighup-each,
	   indexer will	send a SIGHUP signal to	searchd	after succesfully
	   completing the work on each index. (The default behavior is to send
	   a single SIGHUP after all the indexes were built.)

	   Guarantees that every row that caused problems indexing (duplicate,
	   zero, or missing document ID; or file field IO issues; etc) will be
	   reported. By	default, this option is	off, and problem summaries may
	   be reported instead.

       Andrey Aksenoff ( This manual page is written
       by Alexey Vinogradov (, using	the one
       written by Christian Hofstaedtler for	the
       Debian system (but may be used by others). Permission is	granted	to
       copy, distribute	and/or modify this document under the terms of the GNU
       General Public License, Version 2 any later version published by	the
       Free Software Foundation.

       On Debian systems, the complete text of the GNU General Public License
       can be found in /usr/share/common-licenses/GPL.

       searchd(1), search(1), indextool(1), spelldump(1)

       Sphinx and it's programs	are documented fully by	the Sphinx reference
       manual available	in /usr/share/doc/sphinxsearch.

2.2.11-release			  07/19/2016			    INDEXER(1)


Want to link to this manual page? Use this URL:

home | help