'\" -*- coding: UTF-8 -*-
.if \n(.g .ds T< \\FC
.if \n(.g .ds T> \\F[\n[.fam]]
.de URL
\\$2 \(la\\$1\(ra\\$3
..
.if \n(.g .mso www.tmac
.TH slon 1 "22 September 2023" Application "Slony-I 2.2.11 Documentation"
.SH NAME
slon \- Slony-I daemon 
.SH SYNOPSIS
'nh
.fi
.ad l
\*(T<\fBslon\fR\*(T> \kx
.if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
'in \n(.iu+\nxu
[\fIoption\fR]\&... [\fIclustername\fR] [\fIconninfo\fR]
'in \n(.iu-\nxu
.ad b
'hy
.SH DESCRIPTION
slon is the daemon application that
\(oqruns\(cq Slony-I replication. A
slon instance must be run for each node
in a Slony-I cluster.
.SH OPTIONS
.TP 
\*(T<\fB\-d\fR\*(T>\fI log_level\fR
The \fBlog_level\fR specifies which levels of debugging messages
slon should display when logging its
activity.

The nine levels of logging are:
.RS 
.TP 0.2i
\(bu
Fatal
.TP 0.2i
\(bu
Error
.TP 0.2i
\(bu
Warn
.TP 0.2i
\(bu
Config
.TP 0.2i
\(bu
Info
.TP 0.2i
\(bu
Debug1
.TP 0.2i
\(bu
Debug2
.TP 0.2i
\(bu
Debug3
.TP 0.2i
\(bu
Debug4
.RE

The first five non-debugging log levels (from Fatal to
Info) are \fIalways\fR displayed in the logs. In
early versions of Slony-I, the \(oqsuggested\(cq
\fBlog_level\fR value was 2, which would list output at
all levels down to debugging level 2. In Slony-I version 2, it
is recommended to set \fBlog_level\fR to 0; most of the
consistently interesting log information is generated at levels
higher than that. 
.TP 
\*(T<\fB\-s\fR\*(T>\fI SYNC check interval\fR
The \fBsync_interval\fR, measured in milliseconds,
indicates how often slon should check
to see if a \fBSYNC\fR should be introduced.
Default is 2000 ms. The main loop in
\*(T<\fBsync_Thread_main()\fR\*(T> sleeps for intervals of
\fBsync_interval\fR milliseconds between iterations.

Short sync check intervals keep the origin on a \(oqshort
leash\(cq, updating its subscribers more frequently. If you
have replicated sequences that are frequently updated
\fIwithout\fR there being tables that are
affected, this keeps there from being times when only sequences
are updated, and therefore \fIno\fR syncs take
place

If the node is not an origin for any replication set, so no
updates are coming in, it is somewhat wasteful for this value to
be much less the \fBsync_interval_timeout\fR value.
.TP 
\*(T<\fB\-t\fR\*(T>\fI SYNC interval timeout\fR
At the end of each \fBsync_interval_timeout\fR timeout
period, a \fBSYNC\fR will be generated on the
\(oqlocal\(cq node even if there has been no replicable
data updated that would have caused a
\fBSYNC\fR to be generated. 

If application activity ceases, whether because the
application is shut down, or because human users have gone home
and stopped introducing updates, the \fBslon\fR(1) will iterate away,
waking up every \fBsync_interval\fR milliseconds, and,
as no updates are being made, no \fBSYNC\fR events
would be generated. Without this timeout parameter,
\fIno\fR \fBSYNC\fR events would be
generated, and it would appear that replication was falling
behind. 

The \fBsync_interval_timeout\fR value will lead
to eventually generating a \fBSYNC\fR, even though
there was no real replication work to be done. The lower that
this parameter is set, the more frequently \fBslon\fR(1) will generate
\fBSYNC\fR events when the application is not
generating replicable activity; this will have two effects:
.RS 
.TP 0.2i
\(bu
The system will do more replication work.

(Of course, since there is no application load on the
database, and no data to replicate, this load will be very easy
to handle. 
.TP 0.2i
\(bu
Replication will appear to be kept more
\(oqup to date.\(cq

(Of course, since there is no replicable activity going
on, being \(oqmore up to date\(cq is something of a
mirage.) 
.RE

Default is 10000 ms and maximum is 120000 ms. By default, you
can expect each node to \(oqreport in\(cq with a
\fBSYNC\fR every 10 seconds.

Note that \fBSYNC\fR events are also generated on
subscriber nodes. Since they are not actually generating any
data to replicate to other nodes, these \fBSYNC\fR
events are of not terribly much value.
.TP 
\*(T<\fB\-g\fR\*(T>\fI group size\fR
This controls the maximum \fBSYNC\fR group size,
\fBsync_group_maxsize\fR; defaults to 20. Thus, if a
particular node is behind by 200 \fBSYNC\fRs, it
will try to group them together into groups of a maximum size of
\fBsync_group_maxsize\fR. This can be expected to
reduce transaction overhead due to having fewer transactions to
\fBCOMMIT\fR.

The default of 20 is probably suitable for small systems that can
devote only very limited bits of memory to
slon. If you have plenty of memory,
it would be reasonable to increase this, as it will increase the
amount of work done in each transaction, and will allow a
subscriber that is behind by a lot to catch up more quickly.

Slon processes usually stay pretty small; even with large value
for this option, slon would be
expected to only grow to a few MB in size.

The big advantage in increasing this parameter comes from
cutting down on the number of transaction
\fBCOMMIT\fRs; moving from 1 to 2 will provide
considerable benefit, but the benefits will progressively fall
off once the transactions being processed get to be reasonably
large. There isn't likely to be a material difference in
performance between 80 and 90; at that point, whether
\(oqbigger is better\(cq will depend on whether the
bigger set of \fBSYNC\fRs makes the
\fBLOG\fR cursor behave badly due to consuming more
memory and requiring more time to sortt.

In Slony-I version 1.0, slon will
always attempt to group \fBSYNC\fRs together to
this maximum, which \fIwon't\fR be ideal if
replication has been somewhat destabilized by there being very
large updates (\fIe.g.\fR - a single transaction
that updates hundreds of thousands of rows) or by
\fBSYNC\fRs being disrupted on an origin node with
the result that there are a few \fBSYNC\fRs that
are very large. You might run into the problem that grouping
together some very large \fBSYNC\fRs knocks over a
slon process. When it picks up
again, it will try to process the same large grouped set of
\fBSYNC\fRs, and run into the same problem over and
over until an administrator interrupts this and changes the
\*(T<\fB\-g\fR\*(T> value to break this \(oqdeadlock.\(cq

In Slony-I version 1.1 and later versions, the slon
instead adaptively \(oqramps up\(cq from doing 1
\fBSYNC\fR at a time towards the maximum group
size. As a result, if there are a couple of
\fBSYNC\fRs that cause problems, the
slon will (with any relevant watchdog
assistance) always be able to get to the point where it
processes the troublesome \fBSYNC\fRs one by one,
hopefully making operator assistance unnecessary.
.TP 
\*(T<\fB\-o\fR\*(T>\fI desired sync time\fR
A \(oqmaximum\(cq time planned for grouped \fBSYNC\fRs.

If replication is running behind, slon will gradually
increase the numbers of \fBSYNC\fRs grouped
together, targeting that (based on the time taken for the
\fIlast\fR group of \fBSYNC\fRs) they
shouldn't take more than the specified
\fBdesired_sync_time\fR value.

The default value for \fBdesired_sync_time\fR is
60000ms, equal to one minute. 

That way, you can expect (or at least hope!) that you'll
get a \fBCOMMIT\fR roughly once per minute. 

It isn't \fItotally\fR predictable, as it
is entirely possible for someone to request a \fIvery
large update,\fR all as one transaction, that can
\(oqblow up\(cq the length of the resulting
\fBSYNC\fR to be nearly arbitrarily long. In such a
case, the heuristic will back off for the
\fInext\fR group.

The overall effect is to improve
Slony-I's ability to cope with
variations in traffic. By starting with 1 \fBSYNC\fR, and gradually
moving to more, even if there turn out to be variations large
enough to cause PostgreSQL backends to
crash, Slony-I will back off down to
start with one sync at a time, if need be, so that if it is at
all possible for replication to progress, it will.
.TP 
\*(T<\fB\-c\fR\*(T>\fI cleanup cycles\fR
The value \fBvac_frequency\fR indicates how often to
\fBVACUUM\fR in cleanup cycles.

Set this to zero to disable
slon-initiated vacuuming. If you are
using something like pg_autovacuum to
initiate vacuums, you may not need for slon to initiate vacuums
itself. If you are not, there are some tables
Slony-I uses that collect a
\fIlot\fR of dead tuples that should be vacuumed
frequently, notably \fBpg_listener\fR.

In Slony-I version 1.1, this changes a little; the
cleanup thread tracks, from iteration to iteration, the earliest
transaction ID still active in the system. If this doesn't
change, from one iteration to the next, then an old transaction
is still active, and therefore a \fBVACUUM\fR will
do no good. The cleanup thread instead merely does an
\fBANALYZE\fR on these tables to update the
statistics in \fBpg_statistics\fR.
.TP 
\*(T<\fB\-p\fR\*(T>\fI PID filename\fR
\fBpid_file\fR contains the filename in which the PID
(process ID) of the slon is stored.

This may make it easier to construct scripts to monitor multiple
slon processes running on a single
host.
.TP 
\*(T<\fB\-f\fR\*(T>\fI config file\fR
File from which to read slon configuration.

This configuration is discussed further in Slon Run-time Configuration [\(lqRun-time Configuration\(rq [not available as a man page]]. If there are to be a complex set of
configuration parameters, or if there are parameters you do not
wish to be visible in the process environment variables (such as
passwords), it may be convenient to draw many or all parameters
from a configuration file. You might either put common
parameters for all slon processes in a commonly-used
configuration file, allowing the command line to specify little
other than the connection info. Alternatively, you might create
a configuration file for each node.
.TP 
\*(T<\fB\-a\fR\*(T>\fI archive directory\fR
\fBarchive_dir\fR indicates a directory in which to
place a sequence of \fBSYNC\fR archive files for
use in log shipping [\(lqLog Shipping - Slony-I with Files\(rq [not available as a man page]] mode.
.TP 
\*(T<\fB\-x\fR\*(T>\fI command to run on log archive\fR
\fBcommand_on_logarchive\fR indicates a command to be run 
each time a SYNC file is successfully generated.

See more details on \(lqslon_conf_command_on_log_archive\(rq [not available as a man page].
.TP 
\*(T<\fB\-q\fR\*(T>\fI quit based on SYNC provider \fR
\fBquit_sync_provider\fR indicates which provider's
worker thread should be watched in order to terminate after a
certain event. This must be used in conjunction with the
\*(T<\fB\-r\fR\*(T> option below...

This allows you to have a slon
stop replicating after a certain point. 
.TP 
\*(T<\fB\-r\fR\*(T>\fI quit at event number \fR
\fBquit_sync_finalsync\fR indicates the event number
after which the remote worker thread for the provider above
should terminate. This must be used in conjunction with the
\*(T<\fB\-q\fR\*(T> option above...
.TP 
\*(T<\fB\-l\fR\*(T>\fI lag interval \fR
\fBlag_interval\fR indicates an interval value such as
\fB3 minutes \fRor \fB4 hours \fR
or \fB2 days \fRthat indicates that this node is
to lag its providers by the specified interval of time. This
causes events to be ignored until they reach the age
corresponding to the interval.
.RS 
\fBWarning\fR

There is a concommittant downside to this lag;
events that require all nodes to synchronize, as typically
happens with \fBSLONIK FAILOVER\fR(7) and \fBSLONIK MOVE SET\fR(7), will have to wait for this lagging
node. 

That might not be ideal behaviour at failover time, or at
the time when you want to run \fBSLONIK EXECUTE SCRIPT\fR(7). 
.RE
.SH "EXIT STATUS"
slon returns 0 to the shell if it
finished normally. It returns via \*(T<\fBexit(\-1)\fR\*(T>
(which will likely provide a return value of either 127 or 255,
depending on your system) if it encounters any fatal error.