.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "libinn_dbz 3"
.TH libinn_dbz 3 2024-04-01 "INN 2.7.2" "InterNetNews Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
dbz \- Database routines for InterNetNews
.SH SYNOPSIS
.IX Header "SYNOPSIS"
.Vb 1
\&    #include <inn/dbz.h>
\&
\&    #define DBZMAXKEY              ...
\&    #define DBZ_INTERNAL_HASH_SIZE ...
\&
\&    typedef enum
\&    {
\&        DBZSTORE_OK,
\&        DBZSTORE_EXISTS,
\&        DBZSTORE_ERROR
\&    } DBZSTORE_RESULT;
\&
\&    typedef enum
\&    {
\&        INCORE_NO,
\&        INCORE_MEM,
\&        INCORE_MMAP
\&    } dbz_incore_val;
\&
\&    typedef struct {
\&        bool writethrough;
\&        dbz_incore_val pag_incore;
\&        dbz_incore_val exists_incore;
\&        bool nonblock;
\&    } dbzoptions;
\&
\&    typedef struct {
\&        char hash[DBZ_INTERNAL_HASH_SIZE];
\&    } _\|_attribute_\|_((_\|_packed_\|_)) erec;
\&
\&    extern bool dbzinit(const char *name);
\&    extern bool dbzclose(void);
\&
\&    extern bool dbzfresh(const char *name, off_t size);
\&    extern bool dbzagain(const char *name, const char *oldname);
\&    extern bool dbzexists(const HASH key);
\&    extern bool dbzfetch(const HASH key, off_t *value);
\&    extern DBZSTORE_RESULT dbzstore(const HASH key, off_t data);
\&    extern bool dbzsync(void);
\&    extern long dbzsize(off_t contents);
\&    extern void dbzsetoptions(const dbzoptions options);
\&    extern void dbzgetoptions(dbzoptions *options);
.Ve
.SH DESCRIPTION
.IX Header "DESCRIPTION"
These functions provide an indexing system for rapid random access to a text
file, hereafter named the \fIbase\fR file.
.PP
\&\fIdbz\fR stores offsets into the base file for rapid retrieval.  All retrievals
are keyed on a hash value that is generated by the \fBHashMessageID\fR function
in libinn(3).
.PP
\&\fBdbzinit\fR opens a database, an index into the base file \fIname\fR, consisting
of files \fIname\fR\fI.dir\fR, \fIname\fR\fI.index\fR, and \fIname\fR\fI.hash\fR which must
already exist.  (If the database is new, they should be zero-length files.)
Subsequent accesses go to that database until \fBdbzclose\fR is called to close
the database.  When tagged hash format is used (if \fB\-\-enable\-tagged\-hash\fR was
given at configure time), a \fIname\fR\fI.pag\fR file is used instead of \fI.index\fR
and \fI.hash\fR.
.PP
\&\fBdbzfetch\fR searches the database for the specified \fIkey\fR, assigning the
offset of the base file for the corresponding \fIkey\fR to \fIvalue\fR, if any.
.PP
\&\fBdbzstore\fR stores the \fIkey\fR\-\fIdata\fR pair in the database.  It will
return \f(CW\*(C`DBZSTORE_EXISTS\*(C'\fR for duplicates (already existing entries), and
\&\f(CW\*(C`DBZSTORE_OK\*(C'\fR for success.  It will fail with \f(CW\*(C`DBZSTORE_ERROR\*(C'\fR if the
database files are not writable or not opened, or if any other error occurs.
.PP
\&\fBdbzexists\fR will verify whether or not the given hash exists or not.  \fIdbz\fR
is optimized for this operation and it may be significantly faster than
\&\fBdbzfetch\fR.
.PP
\&\fBdbzfresh\fR is a variant of \fBdbzinit\fR for creating a new database with
more control over details.  The \fIsize\fR parameter specifies the size of
the first hash table within the database, in number of key-value pairs.
Performance will be best if the number of key-value pairs stored in the
database does not exceed about 2/3 of \fIsize\fR, or 1/2 of \fIsize\fR when
the tagged hash format is used.  (The \fBdbzsize\fR function,
given the expected number of key-value pairs, will suggest a database size
that meets these criteria.)  Assuming that an \fIfseek\fR offset is 4 bytes,
the \fI.index\fR file will be 4 * \fIsize\fR bytes.  The \fI.hash\fR file will be
\&\f(CW\*(C`DBZ_INTERNAL_HASH_SIZE\*(C'\fR * \fIsize\fR bytes (the \fI.dir\fR file is tiny and
roughly constant in size) until the number of key-value pairs exceeds about
80% of \fIsize\fR.  (Nothing awful will happen if the database grows beyond 100%
of \fIsize\fR, but accesses will slow down quite a bit and the \fI.index\fR and
\&\fI.hash\fR files will grow somewhat.)
.PP
\&\fIdbz\fR stores up to \f(CW\*(C`DBZ_INTERNAL_HASH_SIZE\*(C'\fR bytes (by default, 4 bytes
if tagged hash format is used, 6 otherwise) of the Message-ID's hash in the
\&\fI.hash\fR file to confirm a hit.  This eliminates the need to read the base
file to handle collisions.
.PP
A \fIsize\fR of \f(CW\*(C`0\*(C'\fR given to \fBdbzfresh\fR is synonymous with the local default;
the normal default is suitable for tables of about 6,000,000 key-value
pairs (or 500,000 key-value pairs when the tagged hash format is used).
That default value is used by \fBdbzinit\fR.
.PP
When databases are regenerated periodically, as it is the case for the
\&\fIhistory\fR file, it is simplest to pick the parameters for a new database
based on the old one.  This also permits some memory of past sizes of the
old database, so that a new database size can be chosen to cover expected
fluctuations.  \fBdbzagain\fR is a variant of \fBdbzinit\fR for creating a new
database as a new generation of an old database.  The database files for
\&\fIoldname\fR must exist.  \fBdbzagain\fR is equivalent to calling \fBdbzfresh\fR with
a \fIsize\fR equal to the result of applying \fBdbzsize\fR to the largest number of
entries in the \fIoldname\fR database and its previous 10 generations.
.PP
When many accesses are being done by the same program, \fIdbz\fR is massively
faster if its first hash table is in memory.  If the \fIpag_incore\fR flag
is set to \f(CW\*(C`INCORE_MEM\*(C'\fR, an attempt is made to read the table in when the
database is opened, and \fBdbzclose\fR writes it out to disk again (if it was
read successfully and has been modified).  \fBdbzsetoptions\fR can be used to
set the \fIpag_incore\fR and \fIexists_incore\fR flags to different values which
should be \f(CW\*(C`INCORE_NO\*(C'\fR (read from disk), \f(CW\*(C`INCORE_MEM\*(C'\fR (read from memory)
or \f(CW\*(C`INCORE_MMAP\*(C'\fR (read from a mmap'ed file) for the \fI.hash\fR and \fI.index\fR
files separately; this does not affect the status of a database that has
already been opened.  The default is \f(CW\*(C`INCORE_NO\*(C'\fR for the \fI.index\fR file and
\&\f(CW\*(C`INCORE_MMAP\*(C'\fR for the \fI.hash\fR file.  The attempt to read the table in may
fail due to memory shortage; in this case \fIdbz\fR fails with an error.  Stores
to an in-memory database are not (in general) written out to the file until
\&\fBdbzclose\fR or \fBdbzsync\fR, so if robustness in the presence of crashes or
concurrent accesses is crucial, in-memory databases should probably be avoided
or the \fIwritethrough\fR option should be set to true (telling to systematically
write to the filesystem in addition to updating the in-memory database).
.PP
If the \fInonblock\fR option is true, then writes to the \fI.hash\fR and \fI.index\fR
files will be done using non-blocking I/O.  This can be significantly faster
if your platform supports non-blocking I/O with files.  It is only applicable
if you're not mmap'ing the database.
.PP
\&\fBdbzsync\fR causes all buffers etc. to be flushed out to the files.  It is
typically used as a precaution against crashes or concurrent accesses when
a \fIdbz\fR\-using process will be running for a long time.  It is a somewhat
expensive operation, especially for an in-memory database.
.PP
Concurrent reading of databases is fairly safe, but there is no
(inter)locking, so concurrent updating is not.
.PP
An open database occupies three \fIstdio\fR streams and two file descriptors;
Memory consumption is negligible except for in-memory databases (and \fIstdio\fR
buffers).
.SH DIAGNOSTICS
.IX Header "DIAGNOSTICS"
Functions returning \fIbool\fR values return true for success, false for failure.
.PP
\&\fBdbzinit\fR attempts to have \fIerrno\fR set plausibly on return, but otherwise
this is not guaranteed.  An \fIerrno\fR of \f(CW\*(C`EDOM\*(C'\fR from \fBdbzinit\fR indicates that
the database did not appear to be in \fIdbz\fR format.
.PP
If \f(CW\*(C`DBZTEST\*(C'\fR is defined at compile-time, then a \fBmain()\fR function will be
included.  This will do performance tests and integrity test.
.SH BUGS
.IX Header "BUGS"
Unlike \fIdbm\fR, \fIdbz\fR will refuse to \fBdbzstore\fR with a key already in the
database.  The user is responsible for avoiding this.
.PP
The RFC5322 case mapper implements only a first approximation to the
hideously-complex RFC5322 case rules.
.PP
\&\fIdbz\fR no longer tries to be call-compatible with \fIdbm\fR in any way.
.SH HISTORY
.IX Header "HISTORY"
The original \fIdbz\fR was written by Jon Zeeff <zeeff@b\-tech.ann\-arbor.mi.us>.
Later contributions by David Butler and Mark Moraes.  Extensive reworking,
including this documentation, by Henry Spencer <henry@zoo.toronto.edu> as part
of the C News project.  MD5 code borrowed from RSA.  Extensive reworking to
remove backwards compatibility and to add hashes into \fIdbz\fR files by Clayton
O'Neill <coneill@oneill.net>.  Rewritten into POD by Julien Elie.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
dbm(3), history(5), libinn(3).