NAME¶
FAILOVER - Fail a broken replication set over to a backup node
SYNOPSIS¶
FAILOVER (options);
DESCRIPTION¶
The
FAILOVER command causes the backup node to take over all sets that
currently originate on the failed node. slonik will contact all other direct
subscribers of the failed node to determine which node has the highest sync
status for each set. If another node has a higher sync status than the backup
node, the replication will first be redirected so that the backup node
replicates against that other node, before assuming the origin role and
allowing update activity.
After successful failover, all former direct subscribers of the failed node
become direct subscribers of the backup node. The failed node is abandoned,
and can and should be removed from the configuration with
SLONIK DROP
NODE(7).
If multiple set origin nodes have failed, then you should tell FAILOVER about
all of them in one request. This is done by passing a list like
NODE=(ID=val,BACKUP NODE=val), NODE=(ID=val2, BACKUP NODE=val2) to FAILOVER.
Nodes that are forwarding providers can also be passed to the failover command
as a failed node. The failover process will redirect the subscriptions from
these nodes to the backup node.
- ID = ival
- ID of the failed node
- BACKUP NODE = ival
- Node ID of the node that will take over all sets originating on the failed
node
This uses “schemadocfailednode(p_failed_nodes integer, p_backup_node
integer, p_failed_node integer[])” [not available as a man page].
EXAMPLE¶
FAILOVER (
ID = 1,
BACKUP NODE = 2
);
#example of multiple nodes
FAILOVER(
NODE=(ID=1, BACKUP NODE=2),
NODE=(ID=3, BACKUP NODE=4)
);
LOCKING BEHAVIOUR ¶
Exclusive locks on each replicated table will be taken out on both the new
origin node as replication triggers are changed. If the new origin was not
completely up to date, and replication data must be drawn from some other node
that is more up to date, the new origin will not become usable until those
updates are complete.
DANGEROUS/UNINTUITIVE BEHAVIOUR ¶
This command will abandon the status of the failed node. There is no possibility
to let the failed node join the cluster again without rebuilding it from
scratch as a slave. If at all possible, you would likely prefer to use
SLONIK MOVE SET(7) instead, as that does
not abandon the failed
node.
If a second failure occours in the middle of a FAILOVER operation then recovery
might be complicated.
SLONIK EVENT CONFIRMATION BEHAVIOUR ¶
Slonik will submit the FAILOVER_EVENT without waiting but wait until the most
ahead node has received confirmations of the FAILOVER_EVENT from all nodes
before completing.
This command was introduced in Slony-I 1.0
In version 2.0, the default
BACKUP NODE value of 1 was removed, so it is
mandatory to provide a value for this parameter
In version 2.2 support was added for passing multiple nodes to a single failover
command