'\" t .\" Title: cube_dispatcher .\" Author: [FIXME: author] [see http://docbook.sf.net/el/author] .\" Generator: DocBook XSL Stylesheets v1.75.2 .\" Date: 03/13/2012 .\" Manual: \ \& .\" Source: \ \& .\" Language: English .\" .TH "CUBE_DISPATCHER" "1" "03/13/2012" "\ \&" "\ \&" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" cube_dispatcher \- PgQ consumer that is used to write source records into partitoned tables .SH "SYNOPSIS" .sp .nf cube_dispatcher\&.py [switches] config\&.ini .fi .SH "DESCRIPTION" .sp cube_dispatcher is PgQ consumer that reads url encoded records from source queue and writes them into partitioned tables according to configuration file\&. Used to prepare data for business intelligence\&. Name of the table is read from producer field in event\&. Batch creation time is used for partitioning\&. All records created in same day will go into same table partion\&. If partiton does not exist cube dispatcer will create it according to template\&. .sp Events are usually procuded by pgq\&.logutriga()\&. Logutriga adds all the data of the record into the event (also in case of updates and deletes)\&. .sp cube_dispatcher can be used in to modes: .PP keep_all .RS 4 keeps all the data that comes in\&. If record is updated several times during one day then table partiton for that day will contain several instances of that record\&. .RE .PP keep_latest .RS 4 only last instance of each record is kept for each day\&. That also means that all tables must have primary keys so cube dispatcher can delete previous versions of records before inserting new data\&. .RE .SH "QUICK-START" .sp Basic cube_dispatcher setup and usage can be summarized by the following steps: .sp .RS 4 .ie n \{\ \h'-04' 1.\h'+01'\c .\} .el \{\ .sp -1 .IP " 1." 4.2 .\} pgq and logutriga must be installed in source databases\&. See pgqadm man page for details\&. target database must also have pgq_ext schema\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 2.\h'+01'\c .\} .el \{\ .sp -1 .IP " 2." 4.2 .\} edit a cube_dispatcher configuration file, say cube_dispatcher_sample\&.ini .RE .sp .RS 4 .ie n \{\ \h'-04' 3.\h'+01'\c .\} .el \{\ .sp -1 .IP " 3." 4.2 .\} create source queue .sp .if n \{\ .RS 4 .\} .nf $ pgqadm\&.py ticker\&.ini create .fi .if n \{\ .RE .\} .RE .sp .RS 4 .ie n \{\ \h'-04' 4.\h'+01'\c .\} .el \{\ .sp -1 .IP " 4." 4.2 .\} create target database and parent tables in it\&. .RE .sp .RS 4 .ie n \{\ \h'-04' 5.\h'+01'\c .\} .el \{\ .sp -1 .IP " 5." 4.2 .\} launch cube dispatcher in daemon mode .sp .if n \{\ .RS 4 .\} .nf $ cube_dispatcher\&.py cube_dispatcher_sample\&.ini \-d .fi .if n \{\ .RE .\} .RE .sp .RS 4 .ie n \{\ \h'-04' 6.\h'+01'\c .\} .el \{\ .sp -1 .IP " 6." 4.2 .\} start producing events (create logutriga trggers on tables) CREATE OR REPLACE TRIGGER trig_cube_replica AFTER INSERT OR UPDATE ON some_table FOR EACH ROW EXECUTE PROCEDURE pgq\&.logutriga(\fI\fR) .RE .SH "CONFIG" .SS "Common configuration parameters" .PP job_name .RS 4 Name for particulat job the script does\&. Script will log under this name to logdb/logserver\&. The name is also used as default for PgQ consumer name\&. It should be unique\&. .RE .PP pidfile .RS 4 Location for pid file\&. If not given, script is disallowed to daemonize\&. .RE .PP logfile .RS 4 Location for log file\&. .RE .PP loop_delay .RS 4 If continuisly running process, how long to sleep after each work loop, in seconds\&. Default: 1\&. .RE .PP connection_lifetime .RS 4 Close and reconnect older database connections\&. .RE .PP log_count .RS 4 Number of log files to keep\&. Default: 3 .RE .PP log_size .RS 4 Max size for one log file\&. File is rotated if max size is reached\&. Default: 10485760 (10M) .RE .PP use_skylog .RS 4 If set, search for [\&./skylog\&.ini, ~/\&.skylog\&.ini, /etc/skylog\&.ini]\&. If found then the file is used as config file for Pythons logging module\&. It allows setting up fully customizable logging setup\&. .RE .SS "Common PgQ consumer parameters" .PP pgq_queue_name .RS 4 Queue name to attach to\&. No default\&. .RE .PP pgq_consumer_id .RS 4 Consumers ID to use when registering\&. Default: %(job_name)s .RE .SS "Config options specific to cube_dispatcher" .PP src_db .RS 4 Connect string for source database where the queue resides\&. .RE .PP dst_db .RS 4 Connect string for target database where the tables should be created\&. .RE .PP mode .RS 4 Operation mode for cube_dispatcher\&. Either keep_all or keep_latest\&. .RE .PP dateformat .RS 4 Optional parameter to specify how to suffix data tables\&. Default is YYYY_MM_DD which creates per\-day tables\&. With YYYY_MM per\-month tables can be created\&. If explicitly set empty, partitioning is disabled\&. .RE .PP part_template .RS 4 SQL fragment for table creation\&. Various magic replacements are done there: .RE .PP _PKEY .RS 4 comma separated list of primery key columns\&. .RE .PP _PARENT .RS 4 schema\-qualified parent table name\&. .RE .PP _DEST_TABLE .RS 4 schema\-qualified partition table\&. .RE .PP _SCHEMA_TABLE .RS 4 same as \fIDEST_TABLE but dots replaced with "_\fR", to allow use as index names\&. .RE .SS "Example config file" .sp .if n \{\ .RS 4 .\} .nf [cube_dispatcher] job_name = some_queue_to_cube .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf src_db = dbname=sourcedb_test dst_db = dbname=dataminedb_test .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf pgq_queue_name = udata\&.some_queue .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf logfile = ~/log/%(job_name)s\&.log pidfile = ~/pid/%(job_name)s\&.pid .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf # how many rows are kept: keep_latest, keep_all mode = keep_latest .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf # to_char() fmt for table suffix #dateformat = YYYY_MM_DD # following disables table suffixes: #dateformat = .fi .if n \{\ .RE .\} .sp .if n \{\ .RS 4 .\} .nf part_template = create table _DEST_TABLE (like _PARENT); alter table only _DEST_TABLE add primary key (_PKEY); .fi .if n \{\ .RE .\} .SH "LOGUTRIGA EVENT FORMAT" .sp PgQ trigger function pgq\&.logutriga() sends table change event into queue in following format: .PP ev_type .RS 4 (op || ":" || pkey_fields)\&. Where op is either "I", "U" or "D", corresponging to insert, update or delete\&. And pkey_fields is comma\-separated list of primary key fields for table\&. Operation type is always present but pkey_fields list can be empty, if table has no primary keys\&. Example: I:col1,col2 .RE .PP ev_data .RS 4 Urlencoded record of data\&. It uses db\-specific urlecoding where existence of \fI=\fR is meaningful \- missing \fI=\fR means NULL, present \fI=\fR means literal value\&. Example: id=3&name=str&nullvalue&emptyvalue= .RE .PP ev_extra1 .RS 4 Fully qualified table name\&. .RE .SH "COMMAND LINE SWITCHES" .sp Following switches are common to all skytools\&.DBScript\-based Python programs\&. .PP \-h, \-\-help .RS 4 show help message and exit .RE .PP \-q, \-\-quiet .RS 4 make program silent .RE .PP \-v, \-\-verbose .RS 4 make program more verbose .RE .PP \-d, \-\-daemon .RS 4 make program go background .RE .sp Following switches are used to control already running process\&. The pidfile is read from config then signal is sent to process id specified there\&. .PP \-r, \-\-reload .RS 4 reload config (send SIGHUP) .RE .PP \-s, \-\-stop .RS 4 stop program safely (send SIGINT) .RE .PP \-k, \-\-kill .RS 4 kill program immidiately (send SIGTERM) .RE