.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.40) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "PUBLIC-INBOX-EXTINDEX-FORMAT 5" .TH PUBLIC-INBOX-EXTINDEX-FORMAT 5 "1993-10-02" "public-inbox.git" "public-inbox user manual" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" public\-inbox\-extindex\-format \- external index format description .SH "DESCRIPTION" .IX Header "DESCRIPTION" The extindex is an index-only evolution of the per-inbox SQLite and Xapian indices used by \fBpublic\-inbox\-v2\-format\fR\|(5) and \fBpublic\-inbox\-v1\-format\fR\|(5). It exists to facilitate searches across multiple inboxes as well as to reduce index space when messages are cross-posted to several existing inboxes. .PP It transparently indexes messages across any combination of v1 and v2 inboxes and data about inboxes themselves. .SH "DIRECTORY LAYOUT" .IX Header "DIRECTORY LAYOUT" While inspired by v2, there is no git blob storage nor \&\f(CW\*(C`msgmap.sqlite3\*(C'\fR \s-1DB.\s0 .PP Instead, there is an \f(CW\*(C`ALL.git\*(C'\fR (all caps) git repo which treats every indexed v1 inbox or v2 epoch as a git alternate. .PP As with v2 inboxes, it uses \f(CW\*(C`over.sqlite3\*(C'\fR and Xapian \*(L"shards\*(R" for \s-1WWW\s0 and \s-1IMAP\s0 use. Several exclusive new tables are added to deal with \*(L"\s-1XREF3 DEDUPLICATION\*(R"\s0 and metadata. .PP Unlike v1 and v2 inboxes, it is \s-1NOT\s0 designed to map to a \s-1NNTP\s0 newsgroup. Thus it lacks \f(CW\*(C`msgmap.sqlite3\*(C'\fR to enforce the unique Message-ID requirement of \s-1NNTP.\s0 .SS "\s-1INDEX OVERVIEW AND DEFINITIONS\s0" .IX Subsection "INDEX OVERVIEW AND DEFINITIONS" .Vb 2 \& $SCHEMA_VERSION \- DB schema version (for Xapian) \& $SHARD \- Integer starting with 0 based on parallelism \& \& foo/ # "foo" is the name of the index \& \- ei.lock # lock file to protect global state \& \- ALL.git # empty, alternates for inboxes \& \- ei$SCHEMA_VERSION/$SHARD # per\-shard Xapian DB \& \- ei$SCHEMA_VERSION/over.sqlite3 # overview DB for WWW, IMAP \& \- ei$SCHEMA_VERSION/misc # misc Xapian DB .Ve .PP File and directory names are intentionally different from analogous v2 names to ensure extindex and v2 inboxes can easily be distinguished from each other. .SS "\s-1XREF3 DEDUPLICATION\s0" .IX Subsection "XREF3 DEDUPLICATION" Due to cross-posted messages being the norm in the large Linux kernel development community and Xapian indices being the primary consumer of storage, it makes sense to deduplicate indexing as much as possible. .PP The internal storage format is based on the \s-1NNTP\s0 \*(L"Xref\*(R" tuple, but with the addition of a third element: the git blob \s-1OID.\s0 Thus the triple is expressed in string form as: .PP .Vb 1 \& $NEWSGROUP_NAME:$ARTICLE_NUM:$OID .Ve .PP If no \f(CW\*(C`newsgroup\*(C'\fR is configured for an inbox, the \f(CW\*(C`inboxdir\*(C'\fR of the inbox is used. .PP This data is stored in the \f(CW\*(C`xref3\*(C'\fR table of over.sqlite3. .SS "misc \s-1XAPIAN DB\s0" .IX Subsection "misc XAPIAN DB" In addition to the numeric Xapian shards for indexing messages, there is a new, in-development Xapian index for storing data about inboxes themselves and other non-message data. This index allows us to speed up operations involving hundreds or thousands of inboxes. .SH "BENEFITS" .IX Header "BENEFITS" In addition to providing cross-inbox search capabilities, it can also replace per-inbox Xapian shards (but not per-inbox over.sqlite3). This allows reduction in disk space, open file handles, and associated memory use. .SH "CAVEATS" .IX Header "CAVEATS" Relocating v1 and v2 inboxes on the filesystem will require extindex to be garbage-collected and/or reindexed. .PP Configuring and maintaining stable \f(CW\*(C`newsgroup\*(C'\fR names before any messages are indexed from every inbox can avoid expensive reindexing and rely exclusively on \s-1GC.\s0 .SH "LOCKING" .IX Header "LOCKING" \&\fBflock\fR\|(2) locking exclusively locks the empty ei.lock file for all non-atomic operations. .SH "THANKS" .IX Header "THANKS" Thanks to the Linux Foundation for sponsoring the development and testing. .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright 2020\-2021 all contributors .PP License: \s-1AGPL\-3.0+\s0 .SH "SEE ALSO" .IX Header "SEE ALSO" \&\fBpublic\-inbox\-v2\-format\fR\|(5)