.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "PUBLIC-INBOX-INDEX 1" .TH PUBLIC-INBOX-INDEX 1 "1993-10-02" "public-inbox.git" "public-inbox user manual" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" public\-inbox\-index \- create and update search indices .SH "SYNOPSIS" .IX Header "SYNOPSIS" public-inbox-index [\s-1OPTIONS\s0] \s-1INBOX_DIR...\s0 .PP public-inbox-index [\s-1OPTIONS\s0] \-\-all .SH "DESCRIPTION" .IX Header "DESCRIPTION" public-inbox-index creates and updates the search, overview and \&\s-1NNTP\s0 article number database used by the read-only public-inbox \&\s-1HTTP\s0 and \s-1NNTP\s0 interfaces. Currently, this requires DBD::SQLite and \s-1DBI\s0 Perl modules. Search::Xapian is optional, only to support the \s-1PSGI\s0 search interface. .PP Once the initial indices are created by public-inbox-index, \&\fBpublic\-inbox\-mda\fR\|(1) and \fBpublic\-inbox\-watch\fR\|(1) will automatically maintain them. .PP Running this manually to update indices is only required if relying on \fBgit\-fetch\fR\|(1) to mirror an existing public-inbox; or if upgrading to a new version of public-inbox using the \f(CW\*(C`\-\-reindex\*(C'\fR option. .PP Having the overview and article number database is essential to running the \s-1NNTP\s0 interface, and strongly recommended for the \&\s-1HTTP\s0 interface as it provides thread grouping in addition to normal search functionality. .SH "OPTIONS" .IX Header "OPTIONS" .IP "\-j \s-1JOBS\s0" 4 .IX Item "-j JOBS" .PD 0 .IP "\-\-jobs=JOBS" 4 .IX Item "--jobs=JOBS" .PD Influences the number of Xapian indexing shards in a (\fBpublic\-inbox\-v2\-format\fR\|(5)) inbox. .Sp See \*(L"\-\-jobs\*(R" in \fBpublic\-inbox\-init\fR\|(1) for a full description of sharding. .Sp \&\f(CW\*(C`\-\-jobs=0\*(C'\fR is accepted as of public-inbox 1.6.0 to disable parallel indexing regardless of the number of pre-existing shards. .Sp If the inbox has not been indexed or initialized, \f(CW\*(C`JOBS \- 1\*(C'\fR shards will be created (one job is always needed for indexing the overview and article number mapping). .Sp Default: the number of existing Xapian shards .IP "\-c" 4 .IX Item "-c" .PD 0 .IP "\-\-compact" 4 .IX Item "--compact" .PD Compacts the Xapian DBs after indexing. This is recommended when using \f(CW\*(C`\-\-reindex\*(C'\fR to avoid running out of disk space while indexing multiple inboxes. .Sp While option takes a negligible amount of time compared to \&\f(CW\*(C`\-\-reindex\*(C'\fR, it requires temporarily duplicating the entire contents of the Xapian \s-1DB.\s0 .Sp This switch may be specified twice, in which case compaction happens both before and after indexing to minimize the temporal footprint of the (re)indexing operation. .Sp Available since public-inbox 1.4.0. .IP "\-\-reindex" 4 .IX Item "--reindex" Forces a re-index of all messages in the inbox. This can be used for in-place upgrades and bugfixes while \&\s-1NNTP/HTTP\s0 server processes are utilizing the index. Keep in mind this roughly doubles the size of the already-large Xapian database. Using this with \f(CW\*(C`\-\-compact\*(C'\fR or running \&\fBpublic\-inbox\-compact\fR\|(1) afterwards is recommended to release free space. .Sp public-inbox protects writes to various indices with \&\fBflock\fR\|(2), so it is safe to reindex (and rethread) while \&\fBpublic\-inbox\-watch\fR\|(1), \fBpublic\-inbox\-mda\fR\|(1) or \&\fBpublic\-inbox\-learn\fR\|(1) run. .Sp This does not touch the \s-1NNTP\s0 article number database. It does not affect threading unless \f(CW\*(C`\-\-rethread\*(C'\fR is used. .IP "\-\-all" 4 .IX Item "--all" Index all inboxes configured in ~/.public\-inbox/config. This is an alternative to specifying individual inboxes directories on the command-line. .IP "\-\-rethread" 4 .IX Item "--rethread" Regenerate internal \s-1THREADID\s0 and message thread associations when reindexing. .Sp This fixes some bugs in older versions of public-inbox. While it is possible to use this without \f(CW\*(C`\-\-reindex\*(C'\fR, it makes little sense to do so. .Sp Available in public-inbox 1.6.0+. .IP "\-\-prune" 4 .IX Item "--prune" Run \fBgit\-gc\fR\|(1) to prune and expire reflogs if discontiguous history is detected. This is intended to be used in mirrors after running \&\fBpublic\-inbox\-edit\fR\|(1) or \fBpublic\-inbox\-purge\fR\|(1) to ensure data is expunged from mirrors. .Sp Available since public-inbox 1.2.0. .IP "\-\-max\-size \s-1SIZE\s0" 4 .IX Item "--max-size SIZE" Sets or overrides \*(L"publicinbox.indexMaxSize\*(R" on a per-invocation basis. See \*(L"publicinbox.indexMaxSize\*(R" below. .Sp Available since public-inbox 1.5.0. .IP "\-\-batch\-size \s-1SIZE\s0" 4 .IX Item "--batch-size SIZE" Sets or overrides \*(L"publicinbox.indexBatchSize\*(R" on a per-invocation basis. See \*(L"publicinbox.indexBatchSize\*(R" below. .Sp When using rotational storage but abundant \s-1RAM,\s0 using a large value (e.g. \f(CW\*(C`500m\*(C'\fR) with \f(CW\*(C`\-\-sequential\-shard\*(C'\fR can significantly speed up and reduce fragmentation during the initial index and full \f(CW\*(C`\-\-reindex\*(C'\fR invocations (but not incremental updates). .Sp Available in public-inbox 1.6.0+. .IP "\-\-no\-fsync" 4 .IX Item "--no-fsync" Disables \fBfsync\fR\|(2) and \fBfdatasync\fR\|(2) operations on SQLite and Xapian. This is only effective with Xapian 1.4+. This is primarily intended for systems with low \s-1RAM\s0 and the small (default) \f(CW\*(C`\-\-batch\-size=1m\*(C'\fR. Users of large \f(CW\*(C`\-\-batch\-size\*(C'\fR may even find disabling \fBfdatasync\fR\|(2) causes too much dirty data to accumulate, resulting on latency spikes from writeback. .Sp Available in public-inbox 1.6.0+. .IP "\-\-dangerous" 4 .IX Item "--dangerous" Speed up initial index by using in-place updates and denying support for concurrent readers. This is only effective with Xapian 1.4+. .Sp Available in public-inbox 1.8.0+ .IP "\-\-sequential\-shard" 4 .IX Item "--sequential-shard" Sets or overrides \*(L"publicinbox.indexSequentialShard\*(R" on a per-invocation basis. See \*(L"publicinbox.indexSequentialShard\*(R" below. .Sp Available in public-inbox 1.6.0+. .IP "\-\-skip\-docdata" 4 .IX Item "--skip-docdata" Stop storing document data in Xapian on an existing inbox. .Sp See \*(L"\-\-skip\-docdata\*(R" in \fBpublic\-inbox\-init\fR\|(1) for description and caveats. .Sp Available in public-inbox 1.6.0+. .IP "\-E \s-1EXTINDEX\s0" 4 .IX Item "-E EXTINDEX" .PD 0 .IP "\-\-update\-extindex=EXTINDEX" 4 .IX Item "--update-extindex=EXTINDEX" .PD Update the given external index (\fBpublic\-inbox\-extindex\-format\fR\|(5). Either the configured section name (e.g. \f(CW\*(C`all\*(C'\fR) or a directory name may be specified. .Sp Defaults to \f(CW\*(C`all\*(C'\fR if \f(CW\*(C`[extindex "all"]\*(C'\fR is configured, otherwise no external indices are updated. .Sp May be specified multiple times in rare cases where multiple external indices are configured. .IP "\-\-no\-update\-extindex" 4 .IX Item "--no-update-extindex" Do not update the \f(CW\*(C`all\*(C'\fR external index by default. This negates all uses of \f(CW\*(C`\-E\*(C'\fR / \f(CW\*(C`\-\-update\-extindex=\*(C'\fR on the command-line. .IP "\-\-since=DATESTRING" 4 .IX Item "--since=DATESTRING" .PD 0 .IP "\-\-after=DATESTRING" 4 .IX Item "--after=DATESTRING" .IP "\-\-until=DATESTRING" 4 .IX Item "--until=DATESTRING" .IP "\-\-before=DATESTRING" 4 .IX Item "--before=DATESTRING" .PD Passed directly to \fBgit\-log\fR\|(1) to limit changes for \f(CW\*(C`\-\-reindex\*(C'\fR .SH "FILES" .IX Header "FILES" For v1 (ssoma) repositories described in \fBpublic\-inbox\-v1\-format\fR\|(5). All public-inbox-specific files are contained within the \&\f(CW\*(C`$GIT_DIR/public\-inbox/\*(C'\fR directory. .PP v2 inboxes are described in \fBpublic\-inbox\-v2\-format\fR\|(5). .SH "CONFIGURATION" .IX Header "CONFIGURATION" .IP "publicinbox.indexMaxSize" 8 .IX Item "publicinbox.indexMaxSize" Prevents indexing of messages larger than the specified size value. A single suffix modifier of \f(CW\*(C`k\*(C'\fR, \f(CW\*(C`m\*(C'\fR or \f(CW\*(C`g\*(C'\fR is supported, thus the value of \f(CW\*(C`1m\*(C'\fR to prevents indexing of messages larger than one megabyte. .Sp This is useful for avoiding memory exhaustion in mirrors via git. It does not prevent \fBpublic\-inbox\-mda\fR\|(1) or \&\fBpublic\-inbox\-watch\fR\|(1) from importing (and indexing) a message. .Sp This option is only available in public-inbox 1.5 or later. .Sp Default: none .IP "publicinbox.indexBatchSize" 8 .IX Item "publicinbox.indexBatchSize" Flushes changes to the filesystem and releases locks after indexing the given number of bytes. The default value of \f(CW\*(C`1m\*(C'\fR (one megabyte) is low to minimize memory use and reduce contention with parallel invocations of \fBpublic\-inbox\-mda\fR\|(1), \&\fBpublic\-inbox\-learn\fR\|(1), and \fBpublic\-inbox\-watch\fR\|(1). .Sp Increase this value on powerful systems to improve throughput at the expense of memory use. The reduction of lock granularity may not be noticeable on fast systems. With SSDs, values above \&\f(CW\*(C`4m\*(C'\fR have little benefit. .Sp For \fBpublic\-inbox\-v2\-format\fR\|(5) inboxes, this value is multiplied by the number of Xapian shards. Thus a typical v2 inbox with 3 shards will flush every 3 megabytes by default unless parallelism is disabled via \f(CW\*(C`\-\-sequential\-shard\*(C'\fR or \f(CW\*(C`\-\-jobs=0\*(C'\fR. .Sp This influences memory usage of Xapian, but it is not exact. The actual memory used by Xapian and Perl has been observed in excess of 10x this value. .Sp This option is available in public-inbox 1.6 or later. public-inbox 1.5 and earlier used the current default, \f(CW\*(C`1m\*(C'\fR. .Sp Default: 1m (one megabyte) .IP "publicinbox.indexSequentialShard" 8 .IX Item "publicinbox.indexSequentialShard" For \fBpublic\-inbox\-v2\-format\fR\|(5) inboxes, setting this to \f(CW\*(C`true\*(C'\fR allows indexing Xapian shards in multiple passes. This speeds up indexing on rotational storage with high seek latency by allowing individual shards to fit into the kernel page cache. .Sp Using a higher-than-normal number of \f(CW\*(C`\-\-jobs\*(C'\fR with \&\fBpublic\-inbox\-init\fR\|(1) may be required to ensure individual shards are small enough to fit into cache. .Sp Warning: interrupting \f(CW\*(C`public\-inbox\-index(1)\*(C'\fR while this option is in use may leave the search indices out-of-date with respect to SQLite databases. \s-1WWW\s0 and \s-1IMAP\s0 users may notice incomplete search results, but it is otherwise non-fatal. Using \f(CW\*(C`\-\-reindex\*(C'\fR will bring everything back up-to-date. .Sp Available in public-inbox 1.6.0+. .Sp This is ignored on \fBpublic\-inbox\-v1\-format\fR\|(5) inboxes. .Sp Default: false, shards are indexed in parallel .IP "publicinbox..indexSequentialShard" 8 .IX Item "publicinbox..indexSequentialShard" Identical to \*(L"publicinbox.indexSequentialShard\*(R", but only affect the inbox matching . .SH "ENVIRONMENT" .IX Header "ENVIRONMENT" .IP "\s-1PI_CONFIG\s0" 8 .IX Item "PI_CONFIG" Used to override the default \*(L"~/.public\-inbox/config\*(R" value. .IP "\s-1XAPIAN_FLUSH_THRESHOLD\s0" 8 .IX Item "XAPIAN_FLUSH_THRESHOLD" The number of documents to update before committing changes to disk. This environment is handled directly by Xapian, refer to Xapian \s-1API\s0 documentation for more details. .Sp For public-inbox 1.6 and later, use \f(CW\*(C`publicinbox.indexBatchSize\*(C'\fR instead. .Sp Setting \f(CW\*(C`XAPIAN_FLUSH_THRESHOLD\*(C'\fR or \&\f(CW\*(C`publicinbox.indexBatchSize\*(C'\fR for a large \f(CW\*(C`\-\-reindex\*(C'\fR may cause \&\fBpublic\-inbox\-mda\fR\|(1), \fBpublic\-inbox\-learn\fR\|(1) and \&\fBpublic\-inbox\-watch\fR\|(1) tasks to wait long and unpredictable periods of time during \f(CW\*(C`\-\-reindex\*(C'\fR. .Sp Default: none, uses \f(CW\*(C`publicinbox.indexBatchSize\*(C'\fR .SH "UPGRADING" .IX Header "UPGRADING" Occasionally, public-inbox will update it's schema version and require a full index by running this command. .SH "CONTACT" .IX Header "CONTACT" Feedback welcome via plain-text mail to .PP The mail archives are hosted at and .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright all contributors .PP License: \s-1AGPL\-3.0+\s0 .SH "SEE ALSO" .IX Header "SEE ALSO" Search::Xapian, DBD::SQLite, \fBpublic\-inbox\-extindex\-format\fR\|(5)