boothd - The Booth Cluster Ticket Manager.
boothd daemon [-SD] [-c config] [-l lockfile]
booth list [-s site] [-c config]
booth grant [-s site] [-c config] [-FCw] ticket
booth revoke [-s site] [-c config] [-w] ticket
booth peers [-s site] [-c config]
booth status [-D] [-c config]
Booth manages tickets which authorizes one of the cluster sites located in geographically dispersed distances to run certain resources. It is designed to be extend Pacemaker to support geographically distributed clustering.
It is based on the RAFT protocol, see eg. <https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf> for details.
# boothd daemon -D # booth list # booth grant ticket-nfs # booth revoke ticket-nfs
Can be a full path to a configuration file, or a short name; in the latter case, the directory /etc/booth and suffix .conf are added. Per default booth is used, which results in the path /etc/booth/booth.conf.
The configuration name also determines the name of the PID file - for the defaults, /var/run/booth/booth.pid.
The special value 'other' can be used to specify the other site. Obviously, in that case, the booth configuration must have exactly two sites defined.
This option may be DANGEROUS. It makes booth grant the ticket even though it cannot ascertain that unreachable sites don't hold the same ticket. It is up to the user to make sure that unreachable sites don't have this ticket as granted.
Whether the binary is called as boothd or booth doesn’t matter; the first argument determines the mode of operation.
The grant and, under certain circumstances, revoke operations may take a while to return a definite operation’s outcome. The client will wait up to the network timeout value (by default 5 seconds) for the result. Unless the -w option was set, in which case the client waits indefinitely.
In this mode the configuration file is searched for an IP address that is locally reachable, ie. matches a configured subnet. This allows one to run the client commands on another node in the same cluster, as long as the config file and the service IP is locally reachable.
For instance, if the booth service IP is 192.168.55.200, and the local node has 192.168.55.15 configured on one of its network interfaces, it knows which site it belongs to.
Use -s to direct client to connect to a different site.
In addition to the type, name (IP address), and the last time the server was heard from, network statistics are also printed. The statistics are split into two rows, the first one consists of counters for the sent packets and the second one for the received packets. The first counter is the total number of packets and descriptions of the other counters follows:
The configuration file must be identical on all sites and arbitrators.
A minimal file may look like this:
site="192.168.201.100" site="192.168.202.100" arbitrator="192.168.203.100" ticket="ticket-db8"
Comments start with a hash-sign ('#'). Whitespace at the start and end of the line, and around the '=', are ignored.
The following key/value pairs are defined:
Clients use TCP to communicate with a daemon; Booth will always bind and listen to both UDP and TCP ports.
Booth needs at least three members for normal operation. Odd number of members provides more redundancy.
site-user, site-group, arbitrator-user, arbitrator-group
On a (Pacemaker) site the booth process will have to call crm_ticket, so the default is to use hacluster:'haclient'; for an arbitrator this user and group might not exists, so there we default to nobody:'nobody'.
Use the special ticket name defaults to modify the defaults. The defaults stanza must precede all the other ticket specifications.
All times are in seconds.
The default is 600.
This is to allow for the site that lost the ticket to relinquish the resources, by either stopping them or fencing a node.
A typical delay might be 60 seconds, but ultimately it depends on the protected resources and the fencing configuration.
The default is 0.
If the network reliability is often reduced over prolonged periods, it is advisable to try to renew more often.
Before every renewal, if defined, the command or commands specified in before-acquire-handler is run. In that case the renewal-freq parameter is effectively also the local cluster monitoring interval.
The default is 5.
Default is 10. Values lower than 3 are illegal.
Ticket renewals should allow for this number of retries. Hence, the total retry time must be shorter than the renewal time (either half the expire time or renewal-freq):
timeout*(retries+1) < renewal
Default is 0 for all; this means that the order in the configuration file defines priority for conflicting requests.
Thus it is possible to ensure whether the services and its dependencies protected by the ticket are in good shape at this site. For instance, if a service in the dependency-chain has a failcount of INFINITY on all available nodes, the service will be unable to run. In that case, it is of no use to claim the ticket.
One or more arguments may follow the program or directory location. Typically, there is at least the name of one of the resources which depend on this ticket.
See below for details about booth specific environment variables. The distributed service-runnable script is an example which may be used to test whether a pacemaker resource can be started.
Attributes are typically used to convey extra information about resources, for instance database replication status. The attributes are commonly updated by resource agents.
Attribute values are referenced in expressions and may be tested for equality with the eq binary operator or inequality with the ne operator. The usage is as follows:
attr-prereq = <grant_type> <name> <op> <value>
<grant_type>: "auto" | "manual" <name>: attribute name <op>: "eq" | "ne" <value>: attribute value
The two grant types are auto for ticket failover and manual for grants using the booth client. Only in case the expression evaluates to true can the ticket be granted.
It is not clear whether the manual grant type has any practical use because, obviously, this operation is anyway controlled by a human.
Note that there can be no guarantee on whether an attribute value is up to date, i.e. if it actually reflects the current state.
By default all tickets are automatic (that is, they are fully controlled by Raft algorithm). Assign the strings "manual" or "MANUAL" to define the ticket as manually controlled.
One example of a booth configuration file:
transport = udp port = 9930 # D-85774 site="192.168.201.100" # D-90409 site="::ffff:192.168.202.100" # A-1120 arbitrator="192.168.203.100" ticket="ticket-db8"
expire = 600
acquire-after = 60
timeout = 10
retries = 5
renewal-freq = 60
before-acquire-handler = /usr/share/booth/service-runnable db8
attr-prereq = auto repl_state eq ACTIVE
BOOTH TICKET MANAGEMENT¶
The booth cluster guarantees that every ticket is owned by only one site at the time.
Tickets must be initially granted with the booth client grant command. Once it gets granted, the ticket is managed by the booth cluster. Hence, only granted tickets are managed by booth.
If the ticket gets lost, i.e. that the other members of the booth cluster do not hear from the ticket owner in a sufficiently long time, one of the remaining sites will acquire the ticket. This is what is called ticket failover.
If the remaining members cannot form a majority, then the ticket cannot fail over.
A ticket may be revoked at any time with the booth client revoke command. For revoke to succeed, the site holding the ticket must be reachable.
Once the ticket is administratively revoked, it is not managed by the booth cluster anymore. For the booth cluster to start managing the ticket again, it must be again granted to a site.
The grant operation, in case not all sites are reachable, may get delayed for the ticket expire time (and, if defined, the acquire-after time). The reason is that the other booth members may not know if the ticket is currently granted at the unreachable site.
This delay may be disabled with the -F option. In that case, it is up to the administrator to make sure that the unreachable site is not holding the ticket.
When the ticket is managed by booth, it is dangerous to modify it manually using either crm_ticket command or crm site ticket. Neither of these tools is aware of booth and, consequently, booth itself may not be aware of any ticket status changes. A notable exception is setting the ticket to standby which is typically done before a planned failover.
Tickets are not meant to be moved around quickly, the default expire time is 600 seconds (10 minutes).
booth works with both IPv4 and IPv6 addresses.
booth renews a ticket before it expires, to account for possible transmission delays. The renewal time, unless explicitly set, is set to half the expire time.
Currently, there’s only one external handler defined (see the before-acquire-handler configuration item above).
The following environment variables are exported to the handler:
The handler is invoked with positional arguments specified after it.
In essence, every ticket corresponds to a separate Raft cluster.
A ticket is granted to the Raft Leader which then owns (or keeps) the ticket.
The booth daemon for an arbitrator which typically doesn’t run the cluster stack, may be started through systemd or with /etc/init.d/booth-arbitrator, depending on which init system the platform supports.
The SysV init script starts a booth arbitrator for every configuration file found in /etc/booth.
Platforms running systemd can enable and start every configuration separately using systemctl:
# systemctl enable booth@<configurationname> # systemctl start booth@<configurationname>
systemctl requires the configuration name, even for the default name booth.
Manual tickets allow users to create and manage tickets which are subsequently handled by booth without using the Raft algorithm. Granting and revoking manual tickets is fully controlled by the administrator. It is possible to define a number of manual and normal tickets in one GEO cluster.
Automatic ticket management provided by Raft algorithm isn’t applied to manually controlled tickets. In particular, there is no elections, automatic failover procedures, and term expiration.
However, booth controls if a ticket is currently being granted to any site and warns the user approprietly.
Tickets which were manually granted to a site, will remain there until they are manually revoked. Even if a site becomes offline, the ticket will not be moved to another site. This behavior allows administrators to make sure that some services will remain in a particular site and will not be moved to another site, possibly located in a different geographical location.
Also, configuring only manual tickets in a GEO cluster, allows one to have just two sites in a cluster, without a need of having an arbitrator. This is possible because there is no automatic elections and no voting performed for manual tickets.
Manual tickets are defined in a configuration files by adding a mode ticket parameter and setting it to manual or MANUAL:
mode = manual
Manual tickets can be granted and revoked by using normal grant and revoke commands, with the usual flags and parameters. The only difference is that specyfiyng -F flag during grant command, forced a site to become a leader of the specified ticket, even if the ticket is granted to another site.
Booth is tested regularly. See the README-testing file for more information.
Please report any bugs either at GitHub: <https://github.com/ClusterLabs/booth/issues>
Or, if you prefer bugzilla, at openSUSE bugzilla (component "High Availability"): <https://bugzilla.opensuse.org/enter_bug.cgi?product=openSUSE%20Factory>
boothd was originally written (mostly) by Jiaju Zhang.
In 2013 and 2014 Philipp Marek took over maintainership.
Since April 2014 it has been mainly developed by Dejan Muhamedagic.
Many people contributed (see the AUTHORS file).
Copyright © 2011 Jiaju Zhang < <firstname.lastname@example.org>>
Copyright © 2013-2014 Philipp Marek < <email@example.com>>
Copyright © 2014 Dejan Muhamedagic < <firstname.lastname@example.org>>
Free use of this software is granted under the terms of the GNU General Public License (GPL) as of version 2 (see COPYING file) or later.