'\" t
.\" Title: cube_dispatcher
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
.\" Generator: DocBook XSL Stylesheets v1.75.2
.\" Date: 03/13/2012
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "CUBE_DISPATCHER" "1" "03/13/2012" "\ \&" "\ \&"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
cube_dispatcher \- PgQ consumer that is used to write source records into partitoned tables
.SH "SYNOPSIS"
.sp
.nf
cube_dispatcher\&.py [switches] config\&.ini
.fi
.SH "DESCRIPTION"
.sp
cube_dispatcher is PgQ consumer that reads url encoded records from source queue and writes them into partitioned tables according to configuration file\&. Used to prepare data for business intelligence\&. Name of the table is read from producer field in event\&. Batch creation time is used for partitioning\&. All records created in same day will go into same table partion\&. If partiton does not exist cube dispatcer will create it according to template\&.
.sp
Events are usually procuded by pgq\&.logutriga()\&. Logutriga adds all the data of the record into the event (also in case of updates and deletes)\&.
.sp
cube_dispatcher can be used in to modes:
.PP
keep_all
.RS 4
keeps all the data that comes in\&. If record is updated several times during one day then table partiton for that day will contain several instances of that record\&.
.RE
.PP
keep_latest
.RS 4
only last instance of each record is kept for each day\&. That also means that all tables must have primary keys so cube dispatcher can delete previous versions of records before inserting new data\&.
.RE
.SH "QUICK-START"
.sp
Basic cube_dispatcher setup and usage can be summarized by the following steps:
.sp
.RS 4
.ie n \{\
\h'-04' 1.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 1." 4.2
.\}
pgq and logutriga must be installed in source databases\&. See pgqadm man page for details\&. target database must also have pgq_ext schema\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 2.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 2." 4.2
.\}
edit a cube_dispatcher configuration file, say cube_dispatcher_sample\&.ini
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 3.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 3." 4.2
.\}
create source queue
.sp
.if n \{\
.RS 4
.\}
.nf
$ pgqadm\&.py ticker\&.ini create
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 4.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 4." 4.2
.\}
create target database and parent tables in it\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 5.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 5." 4.2
.\}
launch cube dispatcher in daemon mode
.sp
.if n \{\
.RS 4
.\}
.nf
$ cube_dispatcher\&.py cube_dispatcher_sample\&.ini \-d
.fi
.if n \{\
.RE
.\}
.RE
.sp
.RS 4
.ie n \{\
\h'-04' 6.\h'+01'\c
.\}
.el \{\
.sp -1
.IP " 6." 4.2
.\}
start producing events (create logutriga trggers on tables) CREATE OR REPLACE TRIGGER trig_cube_replica AFTER INSERT OR UPDATE ON some_table FOR EACH ROW EXECUTE PROCEDURE pgq\&.logutriga(\fI\fR)
.RE
.SH "CONFIG"
.SS "Common configuration parameters"
.PP
job_name
.RS 4
Name for particulat job the script does\&. Script will log under this name to logdb/logserver\&. The name is also used as default for PgQ consumer name\&. It should be unique\&.
.RE
.PP
pidfile
.RS 4
Location for pid file\&. If not given, script is disallowed to daemonize\&.
.RE
.PP
logfile
.RS 4
Location for log file\&.
.RE
.PP
loop_delay
.RS 4
If continuisly running process, how long to sleep after each work loop, in seconds\&. Default: 1\&.
.RE
.PP
connection_lifetime
.RS 4
Close and reconnect older database connections\&.
.RE
.PP
log_count
.RS 4
Number of log files to keep\&. Default: 3
.RE
.PP
log_size
.RS 4
Max size for one log file\&. File is rotated if max size is reached\&. Default: 10485760 (10M)
.RE
.PP
use_skylog
.RS 4
If set, search for
[\&./skylog\&.ini, ~/\&.skylog\&.ini, /etc/skylog\&.ini]\&. If found then the file is used as config file for Pythons
logging
module\&. It allows setting up fully customizable logging setup\&.
.RE
.SS "Common PgQ consumer parameters"
.PP
pgq_queue_name
.RS 4
Queue name to attach to\&. No default\&.
.RE
.PP
pgq_consumer_id
.RS 4
Consumers ID to use when registering\&. Default: %(job_name)s
.RE
.SS "Config options specific to cube_dispatcher"
.PP
src_db
.RS 4
Connect string for source database where the queue resides\&.
.RE
.PP
dst_db
.RS 4
Connect string for target database where the tables should be created\&.
.RE
.PP
mode
.RS 4
Operation mode for cube_dispatcher\&. Either
keep_all
or
keep_latest\&.
.RE
.PP
dateformat
.RS 4
Optional parameter to specify how to suffix data tables\&. Default is
YYYY_MM_DD
which creates per\-day tables\&. With
YYYY_MM
per\-month tables can be created\&. If explicitly set empty, partitioning is disabled\&.
.RE
.PP
part_template
.RS 4
SQL fragment for table creation\&. Various magic replacements are done there:
.RE
.PP
_PKEY
.RS 4
comma separated list of primery key columns\&.
.RE
.PP
_PARENT
.RS 4
schema\-qualified parent table name\&.
.RE
.PP
_DEST_TABLE
.RS 4
schema\-qualified partition table\&.
.RE
.PP
_SCHEMA_TABLE
.RS 4
same as
\fIDEST_TABLE but dots replaced with "_\fR", to allow use as index names\&.
.RE
.SS "Example config file"
.sp
.if n \{\
.RS 4
.\}
.nf
[cube_dispatcher]
job_name = some_queue_to_cube
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
src_db = dbname=sourcedb_test
dst_db = dbname=dataminedb_test
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
pgq_queue_name = udata\&.some_queue
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
logfile = ~/log/%(job_name)s\&.log
pidfile = ~/pid/%(job_name)s\&.pid
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# how many rows are kept: keep_latest, keep_all
mode = keep_latest
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
# to_char() fmt for table suffix
#dateformat = YYYY_MM_DD
# following disables table suffixes:
#dateformat =
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
part_template =
create table _DEST_TABLE (like _PARENT);
alter table only _DEST_TABLE add primary key (_PKEY);
.fi
.if n \{\
.RE
.\}
.SH "LOGUTRIGA EVENT FORMAT"
.sp
PgQ trigger function pgq\&.logutriga() sends table change event into queue in following format:
.PP
ev_type
.RS 4
(op || ":" || pkey_fields)\&. Where op is either "I", "U" or "D", corresponging to insert, update or delete\&. And
pkey_fields
is comma\-separated list of primary key fields for table\&. Operation type is always present but pkey_fields list can be empty, if table has no primary keys\&. Example:
I:col1,col2
.RE
.PP
ev_data
.RS 4
Urlencoded record of data\&. It uses db\-specific urlecoding where existence of
\fI=\fR
is meaningful \- missing
\fI=\fR
means NULL, present
\fI=\fR
means literal value\&. Example:
id=3&name=str&nullvalue&emptyvalue=
.RE
.PP
ev_extra1
.RS 4
Fully qualified table name\&.
.RE
.SH "COMMAND LINE SWITCHES"
.sp
Following switches are common to all skytools\&.DBScript\-based Python programs\&.
.PP
\-h, \-\-help
.RS 4
show help message and exit
.RE
.PP
\-q, \-\-quiet
.RS 4
make program silent
.RE
.PP
\-v, \-\-verbose
.RS 4
make program more verbose
.RE
.PP
\-d, \-\-daemon
.RS 4
make program go background
.RE
.sp
Following switches are used to control already running process\&. The pidfile is read from config then signal is sent to process id specified there\&.
.PP
\-r, \-\-reload
.RS 4
reload config (send SIGHUP)
.RE
.PP
\-s, \-\-stop
.RS 4
stop program safely (send SIGINT)
.RE
.PP
\-k, \-\-kill
.RS 4
kill program immidiately (send SIGTERM)
.RE