.TH "MU INDEX" "1" 

.SH "NAME"
.PP
\fBmu\d\s-2index\s+2\u\fP -- index e-mail messages stored in Maildirs

.SH "SYNOPSIS"
.PP
\fBmu [common-options] index\fP

.SH "DESCRIPTION"
.PP
\fBmu index\fP is the \fBmu\fP command for scanning the contents of Maildir directories and
storing the results in a Xapian database. The data can then be queried using
\fBmu-find(1)\fP.

.PP
Before the first time you run \fBmu index\fP, you must run \fBmu init\fP to initialize the
database.

.PP
\fBindex\fP understands Maildirs as defined by Daniel Bernstein for \fBqmail(7)\fP. In
addition, it understands recursive Maildirs (Maildirs within Maildirs),
Maildir++. It also supports VFAT-based Maildirs which use '!' or ';' as the
separators instead of ':'.

.PP
E-mail messages which are not stored in something resembling a maildir
leaf-directory (\fIcur\fP and \fInew\fP) are ignored, as are the cache directories for
\fInotmuch\fP and \fIgnus\fP, and any dot-directory.

.PP
Starting with mu 1.5.x, symlinks are followed, and can be spread over multiple
filesystems; however note that moving files around is much faster when multiple
filesystems are not involved.

.PP
If there is a file called \fI.noindex\fP in a directory, the contents of that
directory and all of its subdirectories will be ignored. This can be useful to
exclude certain directories from the indexing process, for example directories
with spam-messages.

.PP
If there is a file called \fI.noupdate\fP in a directory, the contents of that
directory and all of its subdirectories will be ignored, unless we do a full
rebuild (with \fBmu init\fP). This can be useful to speed up things you have some
maildirs that never change. Note that you can still search for these messages,
this only affects updating the database. \fI.noupdate\fP is ignored when you start
indexing with an empty database (such as directly after \fImu init\fP.

.PP
There also the \fB--lazy-check\fP which can greatly speed up indexing; see below for
details.

.PP
The first run of \fBmu index\fP may take a few minutes if you have a lot of mail (tens
of thousands of messages). Fortunately, such a full scan needs to be done only
once; after that it suffices to index the changes, which goes much faster. See
the 'Note on performance (i,ii,iii)' below for more information.

.PP
The optional 'phase two' of the indexing-process is the removal of messages from
the database for which there is no longer a corresponding file in the Maildir.
If you do not want this, you can use \fB-n\fP, \fB--nocleanup\fP.

.PP
When \fBmu index\fP catches one of the signals \fBSIGINT\fP, \fBSIGHUP\fP or \fBSIGTERM\fP (e.g., when
you press Ctrl-C during the indexing process), it attempts to shutdown
gracefully; it tries to save and commit data, and close the database etc. If it
receives another signal (e.g., when pressing Ctrl-C once more), \fBmu index\fP will
terminate immediately.

.SH "INDEX OPTIONS"
.SS "--lazy-check"
.PP
in lazy-check mode, \fBmu\fP does not consider messages for which the time-stamp
(ctime) of the directory they reside in has not changed since the previous
indexing run. This is much faster than the non-lazy check, but won't update
messages that have change (rather than having been added or removed), since
merely editing a message does not update the directory time-stamp. Of course,
you can run \fBmu-index\fP occasionally without \fB--lazy-check\fP, to pick up such
messages.

.SS "--nocleanup"
.PP
disable the database cleanup that \fBmu\fP does by default after indexing.

.SS "--muhome"
.PP
use a non-default directory to store and read the database, write the logs, etc.
By default, \fBmu\fP uses the XDG Base Directory Specification (e.g. on GNU/Linux this
defaults to \fI~/.cache/mu\fP and \fI~/.config/mu\fP). Earlier versions of \fBmu\fP defaulted to
\fI~/.mu\fP, which now requires \fI\-\-muhome=~/.mu\fP.

.PP
The environment variable \fBMUHOME\fP can be used as an alternative to \fB--muhome\fP. The
latter has precedence.

.SH "COMMON OPTIONS"
.SS "-d, --debug"
.PP
makes mu generate extra debug information, useful for debugging the program
itself. By default, debug information goes to the log file, ~/.cache/mu/mu.log.
It can safely be deleted when mu is not running. When running with --debug
option, the log file can grow rather quickly. See the note on logging below.

.SS "-q, --quiet"
.PP
causes mu not to output informational messages and progress information to
standard output, but only to the log file. Error messages will still be sent to
standard error. Note that mu index is much faster with --quiet, so it is
recommended you use this option when using mu from scripts etc.

.SS "--log-stderr"
.PP
causes mu to not output log messages to standard error, in addition to sending
them to the log file.

.SS "--nocolor"
.PP
do not use ANSI colors. The environment variable \fBNO\d\s-2COLOR\s+2\u\fP can be used as an
alternative to \fB--nocolor\fP.

.SS "-V, --version"
.PP
prints mu version and copyright information.

.SS "-h, --help"
.PP
lists the various command line options.

.SH "PERFORMANCE"
.SS "indexing in ancient times (2009?)"
.PP
As a non-scientific benchmark, a simple test on the author's machine (a Thinkpad
X61s laptop using Linux 2.6.35 and an ext3 file system) with no existing
database, and a maildir with 27273 messages:

.RS
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
66,65s user 6,05s system 27% cpu 4:24,20 total

.fi
.RE
.PP
(about 103 messages per second)

.PP
A second run, which is the more typical use case when there is a database
already, goes much faster:

.RS
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,48s user 0,76s system 10% cpu 11,796 total

.fi
.RE
.PP
(more than 56818 messages per second)

.PP
Note that each test flushes the caches first; a more common use case might be to
run \fBmu index\fP when new mail has arrived; the cache may stay quite 'warm' in that
case:

.RS
.nf
$ time mu index --quiet
0,33s user 0,40s system 80% cpu 0,905 total

.fi
.RE
.PP
which is more than 30000 messages per second.

.SS "indexing in 2012"
.PP
As per June 2012, we did the same non-scientific benchmark, this time with an
Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589
messages. We start without an existing database.

.RS
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
27,79s user 2,17s system 48% cpu 1:01,47 total

.fi
.RE
.PP
(about 813 messages per second)

.PP
A second run, which is the more typical use case when there is a database
already, goes much faster:

.RS
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
0,13s user 0,30s system 19% cpu 2,162 total

.fi
.RE
.PP
(more than 173000 messages per second)

.SS "indexing in 2016"
.PP
As per July 2016, we did the same non-scientific benchmark, again with the Intel
i5-2500 CPU @ 3.30GHz, an ext4 file system. This time, the maildir contains
72525 messages.

.RS
.nf
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
$ time mu index --quiet
40,34s user 2,56s system 64% cpu 1:06,17 total

.fi
.RE
.PP
(about 1099 messages per second).

.SS "indexing in 2022"
.PP
A few years later and it is June 2022. There's a lot more happening during
indexing, but indexing became multi-threaded and machines are faster; e.g. this
is with an AMD Ryzen Threadripper 1950X (16 cores) @ 3.399GHz.

.PP
The instructions are a little different since we have a proper repeatable
benchmark now. After building,

.RS
.nf
 $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
% THREAD_NUM=4 build/lib/tests/bench-indexer -m perf
# random seed: R02Sf5c50e4851ec51adaf301e0e054bd52b
1..1
# Start of bench tests
# Start of indexer tests
indexed 5000 messages in 20 maildirs in 3763ms; 752 μs/message; 1328 messages/s (4 thread(s))
ok 1 /bench/indexer/4-cores
# End of indexer tests
# End of bench tests

.fi
.RE

.PP
Things are again a little faster, even though the index does a lot more now
(text-normalizatian, and pre-generating message-sexps). A faster machine helps,
too!

.SH "EXIT CODE"
.PP
This command returns 0 upon successful completion, or a non-zero exit code
otherwise. Typical values are 2 (no matches found), 11 (database schema
mismatch) and 12 (failed to acquire database lock).

.SS "no matches found (2)"
.PP
Nothing matching found; try a different query

.SS "database schema mismatch (11)"
.PP
You need to re-initialize \fBmu\fP, see \fBmu-init(1)\fP

.SS "failed to acquire lock (19)"
.PP
Some other program has exclusive access to the mu (Xapian) database

.SH "REPORTING BUGS"
.PP
Please report bugs at \fIhttps://github.com/djcb/mu/issues\fP.

.SH "AUTHOR"
.PP
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>

.SH "COPYRIGHT"
.PP
This manpage is part of \fBmu\fP 1.10.8.

.PP
Copyright © 2022-2023 Dirk-Jan C. Binnema. License GPLv3+: GNU GPL version 3
or later \fIhttps://gnu.org/licenses/gpl.html\fP. This is free software: you are
free to change and redistribute it. There is NO WARRANTY, to the extent
permitted by law.

.SH "SEE ALSO"
.PP
\fBmaildir(5)\fP, \fBmu(1)\fP, \fBmu-init(1)\fP, \fBmu-find(1)\fP, \fBmu-cfind(1)\fP