NAME¶
dbzinit, dbzfresh, dbzagain, dbzclose, dbzexists, dbzfetch, dbzstore, dbzsync,
dbzsize, dbzgetoptions, dbzsetoptions, dbzdebug - database routines
SYNOPSIS¶
#include <inn/dbz.h>
bool dbzinit(const char *base)
bool dbzclose(void)
bool dbzfresh(const char *base, long size)
bool dbzagain(const char *base, const char *oldbase)
bool dbzexists(const HASH key)
off_t dbzfetch(const HASH key)
bool dbzfetch(const HASH key, void *ivalue)
DBZSTORE_RESULT dbzstore(const HASH key, off_t offset)
DBZSTORE_RESULT dbzstore(const HASH key, void *ivalue)
bool dbzsync(void)
long dbzsize(long nentries)
void dbzgetoptions(dbzoptions *opt)
void dbzsetoptions(const dbzoptions opt)
DESCRIPTION¶
These functions provide an indexing system for rapid random access to a text
file (the
base file).
Dbz stores offsets into the base text file for rapid retrieval. All
retrievals are keyed on a hash value that is generated by the
HashMessageID() function.
Dbzinit opens a database, an index into the base file
base,
consisting of files
base.dir ,
base.index , and
base.hash which must already exist. (If the database is new,
they should be zero-length files.) Subsequent accesses go to that database
until
dbzclose is called to close the database.
Dbzfetch searches the database for the specified
key, returning
the corresponding
value if any, if
<--enable-tagged-hash at
configure> is specified. If
<--enable-tagged-hash at
configure> is not specified, it returns true and content of
ivalue is set.
Dbzstore stores the
key - value pair in
the database, if
<--enable-tagged-hash at configure> is
specified. If
<--enable-tagged-hash at configure> is not
specified, it stores the content of
ivalue.
Dbzstore will fail
unless the database files are writable.
Dbzexists will verify whether
or not the given hash exists or not. Dbz is optimized for this operation and
it may be significantly faster than
dbzfetch().
Dbzfresh is a variant of
dbzinit for creating a new database with
more control over details.
Dbzfresh's
size parameter specifies the size of the first hash
table within the database, in key-value pairs. Performance will be best if the
number of key-value pairs stored in the database does not exceed about 2/3 of
size. (The
dbzsize function, given the expected number of
key-value pairs, will suggest a database size that meets these criteria.)
Assuming that an
fseek offset is 4 bytes, the
.index file will
be
4 * size bytes. The
.hash file will be
DBZ_INTERNAL_HASH_SIZE * size bytes (the
.dir file is tiny and
roughly constant in size) until the number of key-value pairs exceeds about
80% of
size. (Nothing awful will happen if the database grows beyond
100% of
size, but accesses will slow down quite a bit and the
.index and
.hash files will grow somewhat.)
Dbz stores up to
DBZ_INTERNAL_HASH_SIZE bytes of the
message-id's hash in the
.hash file to confirm a hit. This eliminates
the need to read the base file to handle collisions. This replaces the tagmask
feature in previous dbz releases.
A
size of ``0'' given to
dbzfresh is synonymous with the local
default; the normal default is suitable for tables of 5,000,000 key-value
pairs. Calling
dbzinit(name) with the empty name is equivalent to
calling
dbzfresh(name, 0).
When databases are regenerated periodically, as in news, it is simplest to pick
the parameters for a new database based on the old one. This also permits some
memory of past sizes of the old database, so that a new database size can be
chosen to cover expected fluctuations.
Dbzagain is a variant of
dbzinit for creating a new database as a new generation of an old
database. The database files for
oldbase must exist.
Dbzagain is
equivalent to calling
dbzfresh with a
size equal to the result
of applying
dbzsize to the largest number of entries in the
oldbase database and its previous 10 generations.
When many accesses are being done by the same program,
dbz is massively
faster if its first hash table is in memory. If the ``pag_incore'' flag is set
to INCORE_MEM, an attempt is made to read the table in when the database is
opened, and
dbzclose writes it out to disk again (if it was read
successfully and has been modified).
Dbzsetoptions can be used to set
the
pag_incore and
exists_incore flag to new value which should
be ``INCORE_NO'', ``INCORE_MEM'', or ``INCORE_MMAP'' for the
.hash and
.index files separately; this does not affect the status of a database
that has already been opened. The default is ``INCORE_NO'' for the
.index file and ``INCORE_MMAP'' for the
.hash file. The attempt
to read the table in may fail due to memory shortage; in this case
dbz
fails with an error.
Stores to an in-memory database are not (in
general) written out to the file until
dbzclose or
dbzsync, so
if robustness in the presence of crashes or concurrent accesses is crucial,
in-memory databases should probably be avoided or the
writethrough
option should be set to ``true'';
If the
nonblock option is ``true'', then writes to the
.hash and
.index files will be done using non-blocking I/O. This can be
significantly faster if your platform supports non-blocking I/O with files.
Dbzsync causes all buffers etc. to be flushed out to the files. It is
typically used as a precaution against crashes or concurrent accesses when a
dbz-using process will be running for a long time. It is a somewhat
expensive operation, especially for an in-memory database.
Concurrent reading of databases is fairly safe, but there is no (inter)locking,
so concurrent updating is not.
An open database occupies three
stdio streams and two file descriptors;
Memory consumption is negligible (except for
stdio buffers) except for
in-memory databases.
SEE ALSO¶
dbm(3),
history(5),
libinn(3)
DIAGNOSTICS¶
Functions returning
bool values return ``true'' for success, ``false''
for failure. Functions returning
off_t values return a value with
-1 for failure.
Dbzinit attempts to have
errno set
plausibly on return, but otherwise this is not guaranteed. An
errno of
EDOM from
dbzinit indicates that the database did not appear to
be in
dbz format.
If
DBZTEST is defined at compile-time then a
main()
function will be included. This will do performance tests and integrity test.
HISTORY¶
The original
dbz was written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us).
Later contributions by David Butler and Mark Moraes. Extensive reworking,
including this documentation, by Henry Spencer (henry@zoo.toronto.edu) as part
of the C News project. MD5 code borrowed from RSA. Extensive reworking to
remove backwards compatibility and to add hashes into dbz files by Clayton
O'Neill (coneill@oneill.net)
BUGS¶
Unlike
dbm,
dbz will refuse to
dbzstore with a key already
in the database. The user is responsible for avoiding this.
The RFC5322 case mapper implements only a first approximation to the
hideously-complex RFC5322 case rules.
Dbz no longer tries to be call-compatible with
dbm in any
way.