table of contents
other versions
- wheezy 2.1.13-2
BULK_LOADER(1) | BULK_LOADER(1) |
NAME¶
bulk_loader - PgQ consumer that loads urlencoded records to slow databasesSYNOPSIS¶
bulk_loader.py [switches] config.ini
DESCRIPTION¶
bulk_loader is PgQ consumer that reads url encoded records from source queue and writes them into tables according to configuration file. It is targeted to slow databases that cannot handle applying each row as separate statement. Originally written for BizgresMPP/greenplumDB which have very high per-statement overhead, but can also be used to load regular PostgreSQL database that cannot manage regular replication.QUICK-START¶
Basic bulk_loader setup and usage can be summarized by the following steps: 1.pgq and logutriga must be installed in
source databases. See pgqadm man page for details. target database must also
have pgq_ext schema.
2.edit a bulk_loader configuration file, say
bulk_loader_sample.ini
3.create source queue
$ pgqadm.py ticker.ini create <queue>
4.Tune source queue to have big batches:
$ pgqadm.py ticker.ini config <queue> ticker_max_count="10000" ticker_max_lag="10 minutes" ticker_idle_period="10 minutes"
5.create target database and tables in
it.
6.launch bulk_loader in daemon mode
$ bulk_loader.py -d bulk_loader_sample.ini
7.start producing events (create logutriga
trggers on tables) CREATE OR REPLACE TRIGGER trig_bulk_replica AFTER INSERT OR
UPDATE ON some_table FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga(
<queue>)
CONFIG¶
Common configuration parameters¶
job_nameName for particulat job the script does.
Script will log under this name to logdb/logserver. The name is also used as
default for PgQ consumer name. It should be unique.
pidfile
Location for pid file. If not given, script is
disallowed to daemonize.
logfile
Location for log file.
loop_delay
If continuisly running process, how long to
sleep after each work loop, in seconds. Default: 1.
connection_lifetime
Close and reconnect older database
connections.
log_count
Number of log files to keep. Default: 3
log_size
Max size for one log file. File is rotated if
max size is reached. Default: 10485760 (10M)
use_skylog
If set, search for [./skylog.ini,
~/.skylog.ini, /etc/skylog.ini]. If found then the file is used as config file
for Pythons logging module. It allows setting up fully customizable logging
setup.
Common PgQ consumer parameters¶
pgq_queue_nameQueue name to attach to. No default.
pgq_consumer_id
Consumers ID to use when registering. Default:
%(job_name)s
Config options specific to bulk_loader¶
src_dbConnect string for source database where the
queue resides.
dst_db
Connect string for target database where the
tables should be created.
remap_tables
Optional parameter for table redirection.
Contains comma-separated list of <oldname>:<newname> pairs. Eg:
oldtable1:newtable1, oldtable2:newtable2.
load_method
Optional parameter for load method selection.
Available options:
0
UPDATE as UPDATE from temp table. This is
default.
1
UPDATE as DELETE+COPY from temp table.
2
merge INSERTs with UPDATEs, then do
DELETE+COPY from temp table.
LOGUTRIGA EVENT FORMAT¶
PgQ trigger function pgq.logutriga() sends table change event into queue in following format: ev_typeUrlencoded record of data. It uses db-specific
urlecoding where existence of = is meaningful - missing = means
NULL, present = means literal value. Example:
id=3&name=str&nullvalue&emptyvalue=
ev_extra1
Fully qualified table name.
COMMAND LINE SWITCHES¶
Following switches are common to all skytools.DBScript-based Python programs. -h, --helpshow help message and exit
-q, --quiet
make program silent
-v, --verbose
make program more verbose
-d, --daemon
make program go background
reload config (send SIGHUP)
-s, --stop
stop program safely (send SIGINT)
-k, --kill
kill program immidiately (send SIGTERM)
03/13/2012 |