NAME¶
ovdb - Overview storage method for INN
DESCRIPTION¶
Ovdb is a storage method that uses the Berkeley DB library to store
overview data. It requires version 4.4 or later of the Berkeley DB
library (4.7+ is recommended because older versions suffer from various
issues).
Ovdb makes use of the full transaction/logging/locking functionality of the
Berkeley DB environment. Berkeley DB may be downloaded from
http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html
<
http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html>
and is needed to build the ovdb backend.
UPGRADING¶
This is version 2 of ovdb. If you have a database created with a previous
version of ovdb (such as the one shipped with INN 2.3.0) your database
will need to be upgraded using
ovdb_init(8). See the man page
ovdb_init(8) for upgrade instructions.
INSTALLATION¶
To build ovdb support into INN, specify the option
--with-berkeleydb when
running the configure script. By default, configure will search for
Berkeley DB in standard paths; there will be a message in the configure
output indicating the pathname that will be used.
You can override this pathname by adding a path to the option, e.g.,
--with-berkeleydb=/usr/BerkeleyDB.4.4. This directory is expected to
have subdirectories
include and
lib, containing
db.h, and
the library itself, respectively.
The ovdb database may take up more disk space for a given spool than the other
overview methods. Plan on needing at least 1.1 KB for every article in
your spool (not counting crossposts). So, if you have 5 million articles,
you'll need at least 5.5 GB of disk space for ovdb. With compression
enabled, this estimate changes to 0.7 KB per article. See the
COMPRESSION section below. Plus, you'll need additional space for transaction
logs: at least 100 MB. By default the transaction logs go in the same
directory as the database. To improve performance, they can be placed on a
different disk -- see the DB_CONFIG section.
CONFIGURATION¶
To enable ovdb, set the
ovmethod parameter in
inn.conf to
"ovdb". The ovdb database is stored in the directory specified by
the
pathoverview parameter in
inn.conf. This is the
"DB_HOME" directory. To start out, this directory should be empty
(other than an optional
DB_CONFIG file; see DB_CONFIG for details) and
innd (or
makehistory) will create the files as necessary in that
directory. Make sure the directory is owned by the news user.
Other parameters for configuring ovdb are in the
ovdb.conf(5)
configuration file. See also the sample
ovdb.conf.
- cachesize
- Size of the memory pool cache, in kilobytes. The cache will have a backing
store file in the DB directory which will be at least as big. In general,
the bigger the cache, the better. Use "ovdb_stat -m" to see
cache hit percentages. To make a change of this parameter take effect,
shut down and restart INN (be sure to kill all of the nnrpds when shutting
down). Default is 8000, which is adequate for small to medium sized
servers. Large servers will probably need at least 20000.
- compress
- If INN was compiled with zlib, and this compress parameter is true, OVDB
will compress overview records that are longer than 600 bytes. See the
COMPRESSION section below.
- numdbfiles
- Overview data is split between this many files. Currently, innd
will keep all of the files open, so don't set this too high or innd
may run out of file descriptors. nnrpd only opens one at a time,
regardless. May be set to one, or just a few, but only do that if your OS
supports large (>2G) files. Changing this parameter has no effect on an
already-established database. Default is 32.
- txn_nosync
- If txn_nosync is set to false, Berkeley DB flushes the log after
every transaction. This minimizes the number of transactions that may be
lost in the event of a crash, but results in significantly degraded
performance. Default is true.
- useshm
- If useshm is set to true, Berkeley DB will use shared memory
instead of mmap for its environment regions (cache, lock, etc). With some
platforms, this may improve performance. Default is false.
- shmkey
- Sets the shared memory key used by Berkeley DB when 'useshm' is
true. Berkeley DB will create several (usually 5) shared memory
segments, using sequentially numbered keys starting with 'shmkey'. Choose
a key that does not conflict with any existing shared memory segments on
your system. Default is 6400.
- pagesize
- Sets the page size for the DB files (in bytes). Must be a power of 2. Best
choices are 4096 or 8192. The default is 8192. Changing this parameter has
no effect on an already-established database.
- minkey
- Sets the minimum number of keys per page. See the Berkeley DB
documentation for more info. Default is based on page size and whether
compression is enabled:
default_minkey = MAX(2, pagesize / 2600) if compress is false
default_minkey = MAX(2, pagesize / 1500) if compress is true
The lowest allowed minkey is 2. Setting minkey higher than the default is
not recommended, as it will cause the databases to have a lot of overflow
pages. Changing this parameter has no effect on an already-established
database.
- maxlocks
- Sets the Berkeley DB "lk_max" parameter, which is the
maximum number of locks that can exist in the database at the same time.
Default is 4000.
- nocompact
- The nocompact parameter affects expireover's behavior. The expireover
function in ovdb can do its job in one of two ways: by simply deleting
expired records from the database, or by re-writing the overview records
into a different location leaving out the expired records. The first
method is faster, but it leaves 'holes' that result in space that can not
immediately be reused. The second method 'compacts' the records by
rewriting them.
If this parameter is set to 0, expireover will compact all newsgroups; if
set to 1, expireover will not compact any newsgroups; and if set to a
value greater than one, expireover will only compact groups that have less
than that number of articles.
Experience has shown that compacting has minimal effect (other than making
expireover take longer) so the default is now 1. This parameter will
probably be removed in the future.
- readserver
- Normally, each nnrpd process directly accesses the Berkeley DB
environment. The process of attaching to the database (and detaching when
finished) is fairly expensive, and can result in high loads in situations
when there are lots of reader connections of relatively short duration.
When the readserver parameter is true, the nnrpds will access
overview via a helper server ( ovdb_server -- which is
started by ovdb_init). This can also result in cleaner shutdowns
for the database, improving stability and avoiding deadlocks and corrupted
databases. If you are experiencing any instability in ovdb, try setting
this parameter to true. Default is false.
- numrsprocs
- This parameter is only used when readserver is true. It sets the
number of ovdb_server processes. As each ovdb_server can process only one
transaction at a time, running more servers can improve reader response
times. Default is 5.
- maxrsconn
- This parameter is only used when readserver is true. It sets a
maximum number of readers that a given ovdb_server process will serve at
one time. This means the maximum number of readers for all of the
ovdb_server processes is (numrsprocs * maxrsconn). This does not
limit the actual number of readers, since nnrpd will fall back to opening
the database directly if it can't connect to a readserver. Default is 0,
which means an umlimited number of connections is allowed.
COMPRESSION¶
New in this version of OVDB is the ability to compress overview data before it
is stored into the database. In addition to consuming less disk space,
compression keeps the average size of the database keys smaller. This in turn
increases the average number of keys per page, which can significantly improve
performance and also helps keep the database more compact. This feature
requires that INN be built with zlib. Only records larger than 600 bytes get
compressed, because that is the point at which compression starts to become
significant.
If compression is not enabled (either from the "compress" option in
ovdb.conf or INN was not built from zlib), the database will be
backward compatible with older versions of OVDB. However, if compression is
enabled, the database is marked with a newer version that will prevent older
versions of OVDB from opening the database.
You can upgrade an existing database to use compression simply by setting
compress to true in
ovdb.conf. Note that existing records in the
database will remain uncompressed; only new records added after enabling
compression will be compressed.
If you disable compression on a database that previously had it enabled, new
records will be stored uncompressed, but the database will still be
incompatible with older versions of OVDB (and will also be incompatible with
this version of OVDB if it was not built with zlib). So to downgrade to a
completely uncompressed database you will have to rebuild the database using
makehistory.
DB_CONFIG¶
A file called
DB_CONFIG may be placed in the database directory to
customize where the various database files and transaction logs are written.
By default, all of the files are written in the "DB_HOME" directory.
One way to improve performance is to put the transaction logs on a different
disk. To do this, put:
DB_LOG_DIR /path/to/logs
in the
DB_CONFIG file. If the pathname you give starts with a /, it is
treated as an absolute path; otherwise, it is relative to the
"DB_HOME" directory. Make sure that any directories you specify
exist and have proper ownership/mode before starting INN, because they won't
be created automatically. Also, don't change the DB_CONFIG file while anything
that uses ovdb is running.
Another thing that you can do with this file is to split the overview database
across multiple disks. In the
DB_CONFIG file, you can list directories
that Berkeley DB will search when it goes to open a database.
For example, let's say that you have
pathoverview set to
/mnt/overview and you have four additional file systems created on
/mnt/ov?. You would create a file "/mnt/overview/DB_CONFIG"
containing the following lines:
set_data_dir /mnt/overview
set_data_dir /mnt/ov1
set_data_dir /mnt/ov2
set_data_dir /mnt/ov3
set_data_dir /mnt/ov4
Distribute your ovNNNNN files into the four filesystems. (say, 8 each). When
called upon to open a database file, the db library will look for it in each
of the specified directories (in order). If said file is not found, one will
be created in the first of those directories.
Whenever you change DB_CONFIG or move database files around, make sure all news
processes that use the database are shut down first (including nnrpds).
The DB_CONFIG functionality is part of Berkeley DB itself, rather than
something provided by ovdb. See the Berkeley DB documentation for
complete details for the version of Berkeley DB that you're running.
RUNNING¶
When starting the news system,
rc.news will invoke
ovdb_init.
ovdb_init must be run before using the database. It performs the
following tasks:
- •
- Creates the database environment, if necessary.
- •
- If the database is idle, it performs a normal recovery. The recovery will
remove stale locks, recreate the memory pool cache, and repair any damage
caused by a system crash or improper shutdown.
- •
- Starts the DB housekeeping processes (ovdb_monitor) if they're not
already running.
And when stopping INN,
rc.news kills the ovdb_monitor processes after the
other INN processes have been shut down.
DIAGNOSTICS¶
Problems relating to ovdb are logged to news.err with "OVDB" in the
error message.
INN programs that use overview will fail to start up if the ovdb_monitor
processes aren't running. Be sure to run
ovdb_init before running
anything that accesses overview.
Also, INN programs that use overview will fail to start up if the user running
them is not the "news" user.
If a program accessing the database crashes, or otherwise exits uncleanly, it
might leave a stale lock in the database. This lock could cause other
processes to deadlock on that stale lock. To fix this, shut down all news
processes (using "kill -9" if necessary) and then restart.
ovdb_init should perform a recovery operation which will remove the
locks and repair damage caused by killing the deadlocked processes.
FILES¶
- inn.conf
- The ovmethod and pathoverview parameters are relevant to
ovdb.
- ovdb.conf
- Optional configuration file for tuning. See CONFIGURATION above.
- pathoverview
- Directory where the database goes. Berkeley DB calls it the
'DB_HOME' directory.
- pathoverview/DB_CONFIG
- Optional file to configure the layout of the database files.
- pathrun/ovdb.sem
- A file that gets locked by every process that is accessing the database.
This is used by ovdb_init to determine whether the database is
active or quiescent.
- pathrun/ovdb_monitor.pid
- Contains the process ID of ovdb_monitor.
TO DO¶
Implement a way to limit how many databases can be open at once (to reduce file
descriptor usage); maybe using something similar to the cache code in ov3.c
HISTORY¶
Written by Heath Kehoe <hakehoe@avalon.net> for InterNetNews
$Id: ovdb.pod 9577 2013-12-06 03:54:44Z eagle $
SEE ALSO¶
inn.conf(5),
innd(8),
nnrpd(8),
ovdb_init(8),
ovdb_monitor(8),
ovdb_stat(8)
Berkeley DB documentation: in the
docs directory of the
Berkeley DB source distribution, or on the Oracle Berkeley DB
web page:
http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html
<
http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html>.