NAME¶
recoll.conf - main personal configuration file for Recoll
DESCRIPTION¶
This file defines the indexation configuration for the Recoll full-text search
system.
The system-wide configuration file is normally located inside
/usr/[local]/share/recoll/examples. Any parameter set in the common file may
be overridden by setting it in the personal configuration file, by default:
$HOME/.recoll/recoll.conf
Please note while we try to keep this manual page reasonably up to date, it will
frequently lag the current state of the software. The best source of
information about the configuration are the comments in the configuration
file.
A short extract of the file might look as follows:
-
# Space-separated list of directories to index.
topdirs = ~/docs /usr/share/doc
[~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8
There are three kinds of lines:
- •
- Comment or empty
- •
- Parameter affectation
- •
- Section definition
Empty lines or lines beginning with # are ignored.
Affectation lines are in the form 'name = value'.
Section lines allow redefining a parameter for a directory subtree. Some of the
parameters used for indexaction are looked up hierarchically from the more to
the less specific. Not all parameters can be meaningfully redefined, this is
specified for each in the next section.
The tilde character (~) is expanded in file names to the name of the user's home
directory.
Where values are lists, white space is used for separation, and elements with
embedded spaces can be quoted with double-quotes.
OPTIONS¶
- topdirs = directories
- Specifies the list of directories to index
(recursively).
- dbdir = directory
- The name of the Xapian database directory. It will be
created if needed when the database is initialized. If this is not an
absolute pathname, it will be taken relative to the configuration
directory.
- skippedNames = patterns
- A space-separated list of patterns for names of files or
directories that should be completely ignored. The list defined in the
default file is:
*~ #* bin CVS Cache caughtspam tmp
The list can be redefined for subdirectories, but is only actually changed
for the top level ones in topdirs
- skippedPaths = patterns
- A space-separated list of patterns for paths the indexer
should not descend into. Together with topdirs, this allows pruning the
indexed tree to one's content. daemSkippedPaths can be used to define a
specific value for the real time indexing monitor.
- followLinks = boolean
- Specifies if the indexer should follow symbolic links while
walking the file tree. The default is to ignore symbolic links to avoid
multiple indexing of linked files. No effort is made to avoid duplication
when this option is set to true. This option can be set individually for
each of the topdirs members by using sections. It can not be
changed below the topdirs level.
- loglevel = value
- Verbosity level for recoll and recollindex. A value of 4
lists quite a lot of debug/information messages. 3 lists only errors.
daemloglevel can be used to specify a different value for the
real-time indexing daemon.
- logfilename = file
- Where should the messages go. 'stderr' can be used as a
special value. daemlogfilename can be used to specify a different
value for the real-time indexing daemon.
- indexstemminglanguages = languages
- A list of languages for which the stem expansion databases
will be built. See recollindex(1) for possible values.
- defaultcharset = charset
- The name of the character set used for files that do not
contain a character set definition (ie: plain text files). This can be
redefined for any subdirectory.
- maxfsoccuppc = percentnumber
- Maximum file system occupation before we stop indexing. The
value is a percentage, corresponding to what the "Capacity" df
output column shows. The default value is 0, meaning no checking.
- idxflushmb = megabytes
- Threshold (megabytes of new text data) where we flush from
memory to disk index. Setting this can help control memory usage. A value
of 0 means no explicit flushing, letting Xapian use its own default, which
is flushing every 10000 documents (or XAPIAN_FLUSH_THRESHOLD), meaning
that memory usage depends on average document size. The default value is
10.
- filtersdir = directory
- A directory to search for the external filter scripts used
to index some types of files. The value should not be changed, except if
you want to modify one of the default scripts. The value can be redefined
for any subdirectory.
- iconsdir = directory
- The name of the directory where recoll result list
icons are stored. You can change this if you want different images.
- guesscharset = boolean
- Try to guess the character set of files if no internal
value is available (ie: for plain text files). This does not work well in
general, and should probably not be used.
- usesystemfilecommand = boolean
- Decide if we use the file -i system command as a
final step for determining the mime type for a file (the main procedure
uses suffix associations as defined in the mimemap file). This can
be useful for files with suffixless names, but it will also cause the
indexation of many bogus "text" files.
- indexedmimetypes = list
- Recoll normally indexes any file which it knows how to
read. This list lets you restrict the indexed mime types to what you
specify. If the variable is unspecified or the list empty (the default),
all supported types are processed.
- compressedfilemaxkbs = value
- Size limit for compressed (.gz or .bz2) files. These need
to be decompressed in a temporary directory for identification, which can
be very wasteful if 'uninteresting' big compressed files are present.
Negative means no limit, 0 means no processing of any compressed file.
Defaults to -1.
- indexallfilenames = boolean
- Recoll indexes file names into a special section of the
database to allow specific file names searches using wild cards. This
parameter decides if file name indexing is performed only for files with
mime types that would qualify them for full text indexation, or for all
files inside the selected subtrees, independent of mime type.
- idxabsmlen = value
- Recoll stores an abstract for each indexed file inside the
database. The text can come from an actual 'abstract' section in the
document or will just be the beginning of the document. It is stored in
the index so that it can be displayed inside the result lists without
decoding the original file. The idxabsmlen parameter defines the
size of the stored abstract. The default value is 250 bytes. The search
interface gives you the choice to display this stored text or a synthetic
abstract built by extracting text around the search terms. If you always
prefer the synthetic abstract, you can reduce this value and save a little
space.
- aspellLanguage = lang
- Language definitions to use when creating the aspell
dictionary. The value must match a set of aspell language definition
files. You can type "aspell config" to see where these are
installed (look for data-dir). The default if the variable is not set is
to use your desktop national language environment to guess the value.
- noaspell = boolean
- If this is set, the aspell dictionary generation is turned
off. Useful for cases where you don't need the functionality or when it is
unusable because aspell crashes during dictionary generation.
- nocjk = boolean
- If this set to true, specific east asian (Chinese Korean
Japanese) characters/word splitting is turned off. This will save a small
amount of cpu if you have no CJK documents. If your document base does
include such text but you are not interested in searching it, setting
nocjk may be a significant time and space saver.
- cjkngramlen = value
- This lets you adjust the size of n-grams used for indexing
CJK text. The default value of 2 is probably appropriate in most cases. A
value of 3 would allow more precision and efficiency on longer words, but
the index will be approximately twice as large.
SEE ALSO¶
recollindex(1) recoll(1)