Scroll to navigation

OMINDEX(1) User Commands OMINDEX(1)

NAME

omindex - Index static website data via the filesystem

SYNOPSIS

omindex [OPTIONS] --db DATABASE [BASEDIR] DIRECTORY

DESCRIPTION

omindex - Index static website data via the filesystem

DIRECTORY is the directory to start indexing from.

BASEDIR is the directory corresponding to URL (default: DIRECTORY).

OPTIONS

set duplicate handling: ARG can be 'ignore' or 'replace' (default: replace)
skip the deletion of documents corresponding to deleted files (--preserve-nonduplicates is a deprecated alias for --no-delete)
how to handle documents we extract no text from: ARG can be index, warn (issue a diagnostic and index), or skip. (default: warn)
path to database to use
base url BASEDIR corresponds to (default: /)
assume any file with extension EXT has MIME Content-Type TYPE, instead of using libmagic (empty TYPE removes any existing mapping for EXT; other special TYPE values: 'ignore' and 'skip')
assume any file with leaf name matching shell wildcard pattern GLOB has MIME Content-Type TYPE (special TYPE values: 'ignore' and 'skip')
process files with MIME Content-Type M using command CMD, which produces output (on stdout or in a temporary file) with format T (Content-Type or file extension; currently txt (default), html or svg) in character encoding C (default: UTF-8). E.g. -Fapplication/octet-stream:'strings -n8' or -Ftext/x-foo,,utf-16:'foo2utf16 %f %t'
bulk-load --filter arguments from FILE, which should contain one such argument per line (e.g. text/x-bar:bar2txt --utf8). Lines starting with # are treated as comments and ignored.
set recursion limit (0 = unlimited)
follow symbolic links
ignore meta robots tags and similar exclusions
index data for spelling correction
maximum size of file to index (in bytes or with a suffix of 'K'/'k', 'M'/'m', 'G'/'g') (default: unlimited)
what to use for the stored sample of text for HTML documents - SOURCE can be 'body' or 'description' (default: 'body')
maximum size for the document text sample (supports the same formats as --max-size). (default: 512)
maximum size for the document title (supports the same formats as --max-size). (default: 128)
retry files which omindex failed to extract text from on a previous run
sleep for SECS seconds before opening each directory - sleeping for 2 seconds seems to reliably work around problems with indexing files on Microsoft DFS shares.
track each file's ctime so we can detect changes to ownership or permissions.
show more information about what is happening
create the database anew (the default is to update if the database already exists)
set the stemming language (default: english). Possible values: arabic armenian basque catalan danish dutch earlyenglish english finnish french german german2 hungarian indonesian irish italian kraaij_pohlmann lithuanian lovins nepali norwegian porter portuguese romanian russian spanish swedish tamil turkish (pass 'none' to disable stemming)
display this help and exit
output version information and exit

Please report bugs at: https://xapian.org/bugs

July 2022 xapian-omega 1.4.20