'\" t .\" Title: bulk_loader .\" Author: [FIXME: author] [see http://docbook.sf.net/el/author] .\" Generator: DocBook XSL Stylesheets v1.75.2 .\" Date: 03/13/2012 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" .TH "BULK_LOADER" "1" "03/13/2012" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" bulk_loader \- PgQ consumer that loads urlencoded records to slow databases .SH "SYNOPSIS" .sp .nf bulk_loader\&.py [switches] config\&.ini .fi .SH "DESCRIPTION" .sp bulk_loader is PgQ consumer that reads url encoded records from source queue and writes them into tables according to configuration file\&. It is targeted to slow databases that cannot handle applying each row as separate statement\&. Originally written for BizgresMPP/greenplumDB which have very high per\-statement overhead, but can also be used to load regular PostgreSQL database that cannot manage regular replication\&. .sp Behaviour properties: \- reads urlencoded "logutriga" records\&. \- does not do partitioning, but allows optionally redirect table events\&. \- does not keep event order\&. \- always loads data with COPY, either directly to main table (INSERTs) or to temp tables (UPDATE/COPY) then applies from there\&. .sp Events are usually procuded by pgq\&.logutriga()\&. Logutriga adds all the data of the record into the event (also in case of updates and deletes)\&. .SH "QUICK-START" .sp Basic bulk_loader setup and usage can be summarized by the following steps: .sp .RS 4 .ie n \{\ \h'-04' 1.\h'+01'\c .\} .el \{\ .sp -1 .IP " 1." 4.2 .\} pgq and logutriga must be installed in source databases\&. See pgqadm man page for details\&. target database must also have pgq_ext schema\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 2.\h'+01'\c .\} .el \{\ .sp -1 .IP " 2." 4.2 .\} edit a bulk_loader configuration file, say bulk_loader_sample\&.ini .RE .sp .RS 4 .ie n \{\ \h'-04' 3.\h'+01'\c .\} .el \{\ .sp -1 .IP " 3." 4.2 .\} create source queue .sp .if n \{\ .RS 4 .\} .nf $ pgqadm\&.py ticker\&.ini create .fi .if n \{\ .RE .\} .RE .sp .RS 4 .ie n \{\ \h'-04' 4.\h'+01'\c .\} .el \{\ .sp -1 .IP " 4." 4.2 .\} Tune source queue to have big batches: .sp .if n \{\ .RS 4 .\} .nf $ pgqadm\&.py ticker\&.ini config ticker_max_count="10000" ticker_max_lag="10 minutes" ticker_idle_period="10 minutes" .fi .if n \{\ .RE .\} .RE .sp .RS 4 .ie n \{\ \h'-04' 5.\h'+01'\c .\} .el \{\ .sp -1 .IP " 5." 4.2 .\} create target database and tables in it\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 6.\h'+01'\c .\} .el \{\ .sp -1 .IP " 6." 4.2 .\} launch bulk_loader in daemon mode .sp .if n \{\ .RS 4 .\} .nf $ bulk_loader\&.py \-d bulk_loader_sample\&.ini .fi .if n \{\ .RE .\} .RE .sp .RS 4 .ie n \{\ \h'-04' 7.\h'+01'\c .\} .el \{\ .sp -1 .IP " 7." 4.2 .\} start producing events (create logutriga trggers on tables) CREATE OR REPLACE TRIGGER trig_bulk_replica AFTER INSERT OR UPDATE ON some_table FOR EACH ROW EXECUTE PROCEDURE pgq\&.logutriga(\fI\fR) .RE .SH "CONFIG" .SS "Common configuration parameters" .PP job_name .RS 4 Name for particulat job the script does\&. Script will log under this name to logdb/logserver\&. The name is also used as default for PgQ consumer name\&. It should be unique\&. .RE .PP pidfile .RS 4 Location for pid file\&. If not given, script is disallowed to daemonize\&. .RE .PP logfile .RS 4 Location for log file\&. .RE .PP loop_delay .RS 4 If continuisly running process, how long to sleep after each work loop, in seconds\&. Default: 1\&. .RE .PP connection_lifetime .RS 4 Close and reconnect older database connections\&. .RE .PP log_count .RS 4 Number of log files to keep\&. Default: 3 .RE .PP log_size .RS 4 Max size for one log file\&. File is rotated if max size is reached\&. Default: 10485760 (10M) .RE .PP use_skylog .RS 4 If set, search for [\&./skylog\&.ini, ~/\&.skylog\&.ini, /etc/skylog\&.ini]\&. If found then the file is used as config file for Pythons logging module\&. It allows setting up fully customizable logging setup\&. .RE .SS "Common PgQ consumer parameters" .PP pgq_queue_name .RS 4 Queue name to attach to\&. No default\&. .RE .PP pgq_consumer_id .RS 4 Consumers ID to use when registering\&. Default: %(job_name)s .RE .SS "Config options specific to bulk_loader" .PP src_db .RS 4 Connect string for source database where the queue resides\&. .RE .PP dst_db .RS 4 Connect string for target database where the tables should be created\&. .RE .PP remap_tables .RS 4 Optional parameter for table redirection\&. Contains comma\-separated list of : pairs\&. Eg: oldtable1:newtable1, oldtable2:newtable2\&. .RE .PP load_method .RS 4 Optional parameter for load method selection\&. Available options: .RE .PP 0 .RS 4 UPDATE as UPDATE from temp table\&. This is default\&. .RE .PP 1 .RS 4 UPDATE as DELETE+COPY from temp table\&. .RE .PP 2 .RS 4 merge INSERTs with UPDATEs, then do DELETE+COPY from temp table\&. .RE .SH "LOGUTRIGA EVENT FORMAT" .sp PgQ trigger function pgq\&.logutriga() sends table change event into queue in following format: .PP ev_type .RS 4 (op || ":" || pkey_fields)\&. Where op is either "I", "U" or "D", corresponging to insert, update or delete\&. And pkey_fields is comma\-separated list of primary key fields for table\&. Operation type is always present but pkey_fields list can be empty, if table has no primary keys\&. Example: I:col1,col2 .RE .PP ev_data .RS 4 Urlencoded record of data\&. It uses db\-specific urlecoding where existence of \fI=\fR is meaningful \- missing \fI=\fR means NULL, present \fI=\fR means literal value\&. Example: id=3&name=str&nullvalue&emptyvalue= .RE .PP ev_extra1 .RS 4 Fully qualified table name\&. .RE .SH "COMMAND LINE SWITCHES" .sp Following switches are common to all skytools\&.DBScript\-based Python programs\&. .PP \-h, \-\-help .RS 4 show help message and exit .RE .PP \-q, \-\-quiet .RS 4 make program silent .RE .PP \-v, \-\-verbose .RS 4 make program more verbose .RE .PP \-d, \-\-daemon .RS 4 make program go background .RE .sp Following switches are used to control already running process\&. The pidfile is read from config then signal is sent to process id specified there\&. .PP \-r, \-\-reload .RS 4 reload config (send SIGHUP) .RE .PP \-s, \-\-stop .RS 4 stop program safely (send SIGINT) .RE .PP \-k, \-\-kill .RS 4 kill program immidiately (send SIGTERM) .RE