.\" Automatically generated by Pod::Man 4.07 (Pod::Simple 3.32)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.if !\nF .nr F 0
.if \nF>0 \{\
.    de IX
.    tm Index:\\$1\t\\n%\t"\\$2"
..
.    if !\nF==2 \{\
.        nr % 0
.        nr F 2
.    \}
.\}
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
.    \" fudge factors for nroff and troff
.if n \{\
.    ds #H 0
.    ds #V .8m
.    ds #F .3m
.    ds #[ \f1
.    ds #] \fP
.\}
.if t \{\
.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
.    ds #V .6m
.    ds #F 0
.    ds #[ \&
.    ds #] \&
.\}
.    \" simple accents for nroff and troff
.if n \{\
.    ds ' \&
.    ds ` \&
.    ds ^ \&
.    ds , \&
.    ds ~ ~
.    ds /
.\}
.if t \{\
.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
.    \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
.    \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
.    \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
.    ds : e
.    ds 8 ss
.    ds o a
.    ds d- d\h'-1'\(ga
.    ds D- D\h'-1'\(hy
.    ds th \o'bp'
.    ds Th \o'LP'
.    ds ae ae
.    ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "SBD 8"
.TH SBD 8 "2017-01-17" "SBD" "STONITH Block Device"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
sbd \- STONITH Block Device daemon
.SH "SYNPOSIS"
.IX Header "SYNPOSIS"
sbd <\-d \fI/dev/...\fR> [options] \f(CW\*(C`command\*(C'\fR
.SH "SUMMARY"
.IX Header "SUMMARY"
\&\s-1SBD\s0 provides a node fencing mechanism (Shoot the other node in the head,
\&\s-1STONITH\s0) for Pacemaker-based clusters through the exchange of messages
via shared block storage such as for example a \s-1SAN,\s0 iSCSI, FCoE. This
isolates the fencing mechanism from changes in firmware version or
dependencies on specific firmware controllers, and it can be used as a
\&\s-1STONITH\s0 mechanism in all configurations that have reliable shared
storage.
.PP
The \fIsbd\fR binary implements both the daemon that watches the message
slots as well as the management tool for interacting with the block
storage device(s). This mode of operation is specified via the
\&\f(CW\*(C`command\*(C'\fR parameter; some of these modes take additional parameters.
.PP
To use, you must first \f(CW\*(C`create\*(C'\fR the messaging layout on one to three
block devices. Second, configure \fI/etc/sysconfig/sbd\fR to list those
devices (and possibly adjust other options), and restart the cluster
stack on each node to ensure that \f(CW\*(C`sbd\*(C'\fR is started. Third, configure
the \f(CW\*(C`external/sbd\*(C'\fR fencing resource in the Pacemaker \s-1CIB.\s0
.PP
Each of these steps is documented in more detail below the description
of the command options.
.PP
\&\f(CW\*(C`sbd\*(C'\fR can only be used as root.
.SS "\s-1GENERAL OPTIONS\s0"
.IX Subsection "GENERAL OPTIONS"
.IP "\fB\-d\fR \fI/dev/...\fR" 4
.IX Item "-d /dev/..."
Specify the block device(s) to be used. If you have more than one,
specify this option up to three times. This parameter is mandatory for
all modes, since \s-1SBD\s0 always needs a block device to interact with.
.Sp
This man page uses \fI/dev/sda1\fR, \fI/dev/sdb1\fR, and \fI/dev/sdc1\fR as
example device names for brevity. However, in your production
environment, you should instead always refer to them by using the long,
stable device name (e.g.,
\&\fI/dev/disk/by\-id/dm\-uuid\-part1\-mpath\-3600508b400105b5a0001500000250000\fR).
.IP "\fB\-v\fR" 4
.IX Item "-v"
Enable some verbose debug logging.
.IP "\fB\-h\fR" 4
.IX Item "-h"
Display a concise summary of \f(CW\*(C`sbd\*(C'\fR options.
.IP "\fB\-c\fR \fInode\fR" 4
.IX Item "-c node"
Set local node name; defaults to \f(CW\*(C`uname \-n\*(C'\fR. This should not need to be
set.
.IP "\fB\-R\fR" 4
.IX Item "-R"
Do \fBnot\fR enable realtime priority. By default, \f(CW\*(C`sbd\*(C'\fR runs at realtime
priority, locks itself into memory, and also acquires highest \s-1IO\s0
priority to protect itself against interference from other processes on
the system. This is a debugging-only option.
.IP "\fB\-I\fR \fIN\fR" 4
.IX Item "-I N"
Async \s-1IO\s0 timeout (defaults to 3 seconds, optional). You should not need
to adjust this unless your \s-1IO\s0 setup is really very slow.
.Sp
(In daemon mode, the watchdog is refreshed when the majority of devices
could be read within this time.)
.SS "create"
.IX Subsection "create"
Example usage:
.PP
.Vb 1
\&        sbd \-d /dev/sdc2 \-d /dev/sdd3 create
.Ve
.PP
If you specify the \fIcreate\fR command, sbd will write a metadata header
to the device(s) specified and also initialize the messaging slots for
up to 255 nodes.
.PP
\&\fBWarning\fR: This command will not prompt for confirmation. Roughly the
first megabyte of the specified block device(s) will be overwritten
immediately and without backup.
.PP
This command accepts a few options to adjust the default timings that
are written to the metadata (to ensure they are identical across all
nodes accessing the device).
.IP "\fB\-1\fR \fIN\fR" 4
.IX Item "-1 N"
Set watchdog timeout to N seconds. This depends mostly on your storage
latency; the majority of devices must be successfully read within this
time, or else the node will self-fence.
.Sp
If your sbd device(s) reside on a multipath setup or iSCSI, this should
be the time required to detect a path failure. You may be able to reduce
this if your device outages are independent, or if you are using the
Pacemaker integration.
.IP "\fB\-2\fR \fIN\fR" 4
.IX Item "-2 N"
Set slot allocation timeout to N seconds. You should not need to tune
this.
.IP "\fB\-3\fR \fIN\fR" 4
.IX Item "-3 N"
Set daemon loop timeout to N seconds. You should not need to tune this.
.IP "\fB\-4\fR \fIN\fR" 4
.IX Item "-4 N"
Set \fImsgwait\fR timeout to N seconds. This should be twice the \fIwatchdog\fR
timeout. This is the time after which a message written to a node's slot
will be considered delivered. (Or long enough for the node to detect
that it needed to self-fence.)
.Sp
This also affects the \fIstonith-timeout\fR in Pacemaker's \s-1CIB\s0; see below.
.SS "list"
.IX Subsection "list"
Example usage:
.PP
.Vb 4
\&        # sbd \-d /dev/sda1 list
\&        0       hex\-0   clear   
\&        1       hex\-7   clear   
\&        2       hex\-9   clear
.Ve
.PP
List all allocated slots on device, and messages. You should see all
cluster nodes that have ever been started against this device. Nodes
that are currently running should have a \fIclear\fR state; nodes that have
been fenced, but not yet restarted, will show the appropriate fencing
message.
.SS "dump"
.IX Subsection "dump"
Example usage:
.PP
.Vb 10
\&        # sbd \-d /dev/sda1 dump
\&        ==Dumping header on disk /dev/sda1
\&        Header version     : 2
\&        Number of slots    : 255
\&        Sector size        : 512
\&        Timeout (watchdog) : 15
\&        Timeout (allocate) : 2
\&        Timeout (loop)     : 1
\&        Timeout (msgwait)  : 30
\&        ==Header on disk /dev/sda1 is dumped
.Ve
.PP
Dump meta-data header from device.
.SS "watch"
.IX Subsection "watch"
Example usage:
.PP
.Vb 1
\&        sbd \-d /dev/sdc2 \-d /dev/sdd3 \-W \-P watch
.Ve
.PP
This command will make \f(CW\*(C`sbd\*(C'\fR start in daemon mode. It will constantly monitor
the message slot of the local node for incoming messages, reachability, and
optionally take Pacemaker's state into account.
.PP
\&\f(CW\*(C`sbd\*(C'\fR \fBmust\fR be started on boot before the cluster stack! See below
for enabling this according to your boot environment.
.PP
The options for this mode are rarely specified directly on the
commandline directly, but most frequently set via \fI/etc/sysconfig/sbd\fR.
.PP
It also constantly monitors connectivity to the storage device, and
self-fences in case the partition becomes unreachable, guaranteeing that it
does not disconnect from fencing messages.
.PP
A node slot is automatically allocated on the device(s) the first time
the daemon starts watching the device; hence, manual allocation is not
usually required.
.PP
If a watchdog is used together with the \f(CW\*(C`sbd\*(C'\fR as is strongly
recommended, the watchdog is activated at initial start of the sbd
daemon. The watchdog is refreshed every time the majority of \s-1SBD\s0 devices
has been successfully read. Using a watchdog provides additional
protection against \f(CW\*(C`sbd\*(C'\fR crashing.
.PP
If the Pacemaker integration is activated, \f(CW\*(C`sbd\*(C'\fR will \fBnot\fR self-fence
if device majority is lost, if:
.IP "1." 4
The partition the node is in is still quorate according to the \s-1CIB\s0;
.IP "2." 4
it is still quorate according to Corosync's node count;
.IP "3." 4
the node itself is considered online and healthy by Pacemaker.
.PP
This allows \f(CW\*(C`sbd\*(C'\fR to survive temporary outages of the majority of
devices. However, while the cluster is in such a degraded state, it can
neither successfully fence nor be shutdown cleanly (as taking the
cluster below the quorum threshold will immediately cause all remaining
nodes to self-fence). In short, it will not tolerate any further faults.
Please repair the system before continuing.
.PP
There is one \f(CW\*(C`sbd\*(C'\fR process that acts as a master to which all watchers
report; one per device to monitor the node's slot; and, optionally, one
that handles the Pacemaker integration.
.IP "\fB\-W\fR" 4
.IX Item "-W"
Enable or disable use of the system watchdog to protect against the sbd
processes failing and the node being left in an undefined state. Specify
this once to enable, twice to disable.
.Sp
Defaults to \fIenabled\fR.
.IP "\fB\-w\fR \fI/dev/watchdog\fR" 4
.IX Item "-w /dev/watchdog"
This can be used to override the default watchdog device used and should not
usually be necessary.
.IP "\fB\-p\fR \fI/var/run/sbd.pid\fR" 4
.IX Item "-p /var/run/sbd.pid"
This option can be used to specify a pidfile for the main sbd process.
.IP "\fB\-F\fR \fIN\fR" 4
.IX Item "-F N"
Number of failures before a failing servant process will not be restarted
immediately until the dampening delay has expired. If set to zero, servants
will be restarted immediately and indefinitely. If set to one, a failed
servant will be restarted once every \fB\-t\fR seconds. If set to a different
value, the servant will be restarted that many times within the dampening
period and then delay.
.Sp
Defaults to \fI1\fR.
.IP "\fB\-t\fR \fIN\fR" 4
.IX Item "-t N"
Dampening delay before faulty servants are restarted. Combined with \f(CW\*(C`\-F 1\*(C'\fR,
the most logical way to tune the restart frequency of servant processes.
Default is 5 seconds.
.Sp
If set to zero, processes will be restarted indefinitely and immediately.
.IP "\fB\-P\fR" 4
.IX Item "-P"
Check Pacemaker quorum and node health.
.IP "\fB\-S\fR \fIN\fR" 4
.IX Item "-S N"
Set the start mode. (Defaults to \fI0\fR.)
.Sp
If this is set to zero, sbd will always start up unconditionally,
regardless of whether the node was previously fenced or not.
.Sp
If set to one, sbd will only start if the node was previously shutdown
cleanly (as indicated by an exit request message in the slot), or if the
slot is empty. A reset, crashdump, or power-off request in any slot will
halt the start up.
.Sp
This is useful to prevent nodes from rejoining if they were faulty. The
node must be manually \*(L"unfenced\*(R" by sending an empty message to it:
.Sp
.Vb 1
\&        sbd \-d /dev/sda1 message node1 clear
.Ve
.IP "\fB\-s\fR \fIN\fR" 4
.IX Item "-s N"
Set the start-up wait time for devices. (Defaults to \fI120\fR.)
.Sp
Dynamic block devices such as iSCSI might not be fully initialized and
present yet. This allows to set a timeout for waiting for devices to
appear on start-up. If set to 0, start-up will be aborted immediately if
no devices are available.
.IP "\fB\-Z\fR" 4
.IX Item "-Z"
Enable trace mode. \fBWarning: this is unsafe for production, use at your
own risk!\fR Specifying this once will turn all reboots or power-offs, be
they caused by self-fence decisions or messages, into a crashdump.
Specifying this twice will just log them but not continue running.
.IP "\fB\-T\fR" 4
.IX Item "-T"
By default, the daemon will set the watchdog timeout as specified in the
device metadata. However, this does not work for every watchdog device.
In this case, you must manually ensure that the watchdog timeout used by
the system correctly matches the \s-1SBD\s0 settings, and then specify this
option to allow \f(CW\*(C`sbd\*(C'\fR to continue with start-up.
.IP "\fB\-5\fR \fIN\fR" 4
.IX Item "-5 N"
Warn if the time interval for tickling the watchdog exceeds this many seconds.
Since the node is unable to log the watchdog expiry (it reboots immediately
without a chance to write its logs to disk), this is very useful for getting
an indication that the watchdog timeout is too short for the \s-1IO\s0 load of the
system.
.Sp
Default is 3 seconds, set to zero to disable.
.IP "\fB\-C\fR \fIN\fR" 4
.IX Item "-C N"
Watchdog timeout to set before crashdumping. If \s-1SBD\s0 is set to crashdump
instead of reboot \- either via the trace mode settings or the \fIexternal/sbd\fR
fencing agent's parameter \-, \s-1SBD\s0 will adjust the watchdog timeout to this
setting before triggering the dump. Otherwise, the watchdog might trigger and
prevent a successful crashdump from ever being written.
.Sp
Defaults to 240 seconds. Set to zero to disable.
.SS "allocate"
.IX Subsection "allocate"
Example usage:
.PP
.Vb 1
\&        sbd \-d /dev/sda1 allocate node1
.Ve
.PP
Explicitly allocates a slot for the specified node name. This should
rarely be necessary, as every node will automatically allocate itself a
slot the first time it starts up on watch mode.
.SS "message"
.IX Subsection "message"
Example usage:
.PP
.Vb 1
\&        sbd \-d /dev/sda1 message node1 test
.Ve
.PP
Writes the specified message to node's slot. This is rarely done
directly, but rather abstracted via the \f(CW\*(C`external/sbd\*(C'\fR fencing agent
configured as a cluster resource.
.PP
Supported message types are:
.IP "test" 4
.IX Item "test"
This only generates a log message on the receiving node and can be used
to check if \s-1SBD\s0 is seeing the device. Note that this could overwrite a
fencing request send by the cluster, so should not be used during
production.
.IP "reset" 4
.IX Item "reset"
Reset the target upon receipt of this message.
.IP "off" 4
.IX Item "off"
Power-off the target.
.IP "crashdump" 4
.IX Item "crashdump"
Cause the target node to crashdump.
.IP "exit" 4
.IX Item "exit"
This will make the \f(CW\*(C`sbd\*(C'\fR daemon exit cleanly on the target. You should
\&\fBnot\fR send this message manually; this is handled properly during
shutdown of the cluster stack. Manually stopping the daemon means the
node is unprotected!
.IP "clear" 4
.IX Item "clear"
This message indicates that no real message has been sent to the node.
You should not set this manually; \f(CW\*(C`sbd\*(C'\fR will clear the message slot
automatically during start-up, and setting this manually could overwrite
a fencing message by the cluster.
.SH "Base system configuration"
.IX Header "Base system configuration"
.SS "Configure a watchdog"
.IX Subsection "Configure a watchdog"
It is highly recommended that you configure your Linux system to load a
watchdog driver with hardware assistance (as is available on most modern
systems), such as \fIhpwdt\fR, \fIiTCO_wdt\fR, or others. As a fall-back, you
can use the \fIsoftdog\fR module.
.PP
No other software must access the watchdog timer; it can only be
accessed by one process at any given time. Some hardware vendors ship
systems management software that use the watchdog for system resets
(f.e. \s-1HP ASR\s0 daemon). Such software has to be disabled if the watchdog
is to be used by \s-1SBD.\s0
.SS "Choosing and initializing the block device(s)"
.IX Subsection "Choosing and initializing the block device(s)"
First, you have to decide if you want to use one, two, or three devices.
.PP
If you are using multiple ones, they should reside on independent
storage setups. Putting all three of them on the same logical unit for
example would not provide any additional redundancy.
.PP
The \s-1SBD\s0 device can be connected via Fibre Channel, Fibre Channel over
Ethernet, or even iSCSI. Thus, an iSCSI target can become a sort-of
network-based quorum server; the advantage is that it does not require
a smart host at your third location, just block storage.
.PP
The \s-1SBD\s0 partitions themselves \fBmust not\fR be mirrored (via \s-1MD,
DRBD,\s0 or the storage layer itself), since this could result in a
split-mirror scenario. Nor can they reside on cLVM2 volume groups, since
they must be accessed by the cluster stack before it has started the
cLVM2 daemons; hence, these should be either raw partitions or logical
units on (multipath) storage.
.PP
The block device(s) must be accessible from all nodes. (While it is not
necessary that they share the same path name on all nodes, this is
considered a very good idea.)
.PP
\&\s-1SBD\s0 will only use about one megabyte per device, so you can easily
create a small partition, or very small logical units.  (The size of the
\&\s-1SBD\s0 device depends on the block size of the underlying device. Thus, 1MB
is fine on plain \s-1SCSI\s0 devices and \s-1SAN\s0 storage with 512 byte blocks. On
the \s-1IBM\s0 s390x architecture in particular, disks default to 4k blocks,
and thus require roughly 4MB.)
.PP
The number of devices will affect the operation of \s-1SBD\s0 as follows:
.IP "One device" 4
.IX Item "One device"
In its most simple implementation, you use one device only. This is
appropriate for clusters where all your data is on the same shared
storage (with internal redundancy) anyway; the \s-1SBD\s0 device does not
introduce an additional single point of failure then.
.Sp
If the \s-1SBD\s0 device is not accessible, the daemon will fail to start and
inhibit openais startup.
.IP "Two devices" 4
.IX Item "Two devices"
This configuration is a trade-off, primarily aimed at environments where
host-based mirroring is used, but no third storage device is available.
.Sp
\&\s-1SBD\s0 will not commit suicide if it loses access to one mirror leg; this
allows the cluster to continue to function even in the face of one outage.
.Sp
However, \s-1SBD\s0 will not fence the other side while only one mirror leg is
available, since it does not have enough knowledge to detect an asymmetric
split of the storage. So it will not be able to automatically tolerate a
second failure while one of the storage arrays is down. (Though you
can use the appropriate crm command to acknowledge the fence manually.)
.Sp
It will not start unless both devices are accessible on boot.
.IP "Three devices" 4
.IX Item "Three devices"
In this most reliable and recommended configuration, \s-1SBD\s0 will only
self-fence if more than one device is lost; hence, this configuration is
resilient against temporary single device outages (be it due to failures
or maintenance).  Fencing messages can still be successfully relayed if
at least two devices remain accessible.
.Sp
This configuration is appropriate for more complex scenarios where
storage is not confined to a single array. For example, host-based
mirroring solutions could have one \s-1SBD\s0 per mirror leg (not mirrored
itself), and an additional tie-breaker on iSCSI.
.Sp
It will only start if at least two devices are accessible on boot.
.PP
After you have chosen the devices and created the appropriate partitions
and perhaps multipath alias names to ease management, use the \f(CW\*(C`sbd create\*(C'\fR
command described above to initialize the \s-1SBD\s0 metadata on them.
.PP
\fISharing the block device(s) between multiple clusters\fR
.IX Subsection "Sharing the block device(s) between multiple clusters"
.PP
It is possible to share the block devices between multiple clusters,
provided the total number of nodes accessing them does not exceed \fI255\fR
nodes, and they all must share the same \s-1SBD\s0 timeouts (since these are
part of the metadata).
.PP
If you are using multiple devices this can reduce the setup overhead
required. However, you should \fBnot\fR share devices between clusters in
different security domains.
.SS "Configure \s-1SBD\s0 to start on boot"
.IX Subsection "Configure SBD to start on boot"
On systems using \f(CW\*(C`sysvinit\*(C'\fR, the \f(CW\*(C`openais\*(C'\fR or \f(CW\*(C`corosync\*(C'\fR system
start-up scripts must handle starting or stopping \f(CW\*(C`sbd\*(C'\fR as required
before starting the rest of the cluster stack.
.PP
For \f(CW\*(C`systemd\*(C'\fR, sbd simply has to be enabled using
.PP
.Vb 1
\&        systemctl enable sbd.service
.Ve
.PP
The daemon is brought online on each node before corosync and Pacemaker
are started, and terminated only after all other cluster components have
been shut down \- ensuring that cluster resources are never activated
without \s-1SBD\s0 supervision.
.SS "Configuration via sysconfig"
.IX Subsection "Configuration via sysconfig"
The system instance of \f(CW\*(C`sbd\*(C'\fR is configured via \fI/etc/sysconfig/sbd\fR.
In this file, you must specify the device(s) used, as well as any
options to pass to the daemon:
.PP
.Vb 2
\&        SBD_DEVICE="/dev/sda1;/dev/sdb1;/dev/sdc1"
\&        SBD_PACEMAKER="true"
.Ve
.PP
\&\f(CW\*(C`sbd\*(C'\fR will fail to start if no \f(CW\*(C`SBD_DEVICE\*(C'\fR is specified. See the
installed template for more options that can be configured here.
.SS "Testing the sbd installation"
.IX Subsection "Testing the sbd installation"
After a restart of the cluster stack on this node, you can now try
sending a test message to it as root, from this or any other node:
.PP
.Vb 1
\&        sbd \-d /dev/sda1 message node1 test
.Ve
.PP
The node will acknowledge the receipt of the message in the system logs:
.PP
.Vb 1
\&        Aug 29 14:10:00 node1 sbd: [13412]: info: Received command test from node2
.Ve
.PP
This confirms that \s-1SBD\s0 is indeed up and running on the node, and that it
is ready to receive messages.
.PP
Make \fBsure\fR that \fI/etc/sysconfig/sbd\fR is identical on all cluster
nodes, and that all cluster nodes are running the daemon.
.SH "Pacemaker CIB integration"
.IX Header "Pacemaker CIB integration"
.SS "Fencing resource"
.IX Subsection "Fencing resource"
Pacemaker can only interact with \s-1SBD\s0 to issue a node fence if there is a
configure fencing resource. This should be a primitive, not a clone, as
follows:
.PP
.Vb 2
\&        primitive fencing\-sbd stonith:external/sbd \e
\&                op start start\-delay="15"
.Ve
.PP
This will automatically use the same devices as configured in
\&\fI/etc/sysconfig/sbd\fR.
.PP
While you should not configure this as a clone (as Pacemaker will start
a fencing agent in each partition automatically), the \fIstart-delay\fR
setting ensures, in a scenario where a split-brain scenario did occur in
a two node cluster, that the one that still needs to instantiate a
fencing agent is slightly disadvantaged to avoid fencing loops.
.PP
\&\s-1SBD\s0 also supports turning the reset request into a crash request, which
may be helpful for debugging if you have kernel crashdumping configured;
then, every fence request will cause the node to dump core. You can
enable this via the \f(CW\*(C`crashdump="true"\*(C'\fR parameter on the fencing
resource. This is \fBnot\fR recommended for production use, but only for
debugging phases.
.SS "General cluster properties"
.IX Subsection "General cluster properties"
You must also enable \s-1STONITH\s0 in general, and set the \s-1STONITH\s0 timeout to
be at least twice the \fImsgwait\fR timeout you have configured, to allow
enough time for the fencing message to be delivered. If your \fImsgwait\fR
timeout is 60 seconds, this is a possible configuration:
.PP
.Vb 2
\&        property stonith\-enabled="true"
\&        property stonith\-timeout="120s"
.Ve
.PP
\&\fBCaution\fR: if \fIstonith-timeout\fR is too low for \fImsgwait\fR and the
system overhead, sbd will never be able to successfully complete a fence
request. This will create a fencing loop.
.PP
Note that the sbd fencing agent will try to detect this and
automatically extend the \fIstonith-timeout\fR setting to a reasonable
value, on the assumption that sbd modifying your configuration is
preferable to not fencing.
.SH "Management tasks"
.IX Header "Management tasks"
.SS "Recovering from temporary \s-1SBD\s0 device outage"
.IX Subsection "Recovering from temporary SBD device outage"
If you have multiple devices, failure of a single device is not immediately
fatal. \f(CW\*(C`sbd\*(C'\fR will retry to restart the monitor for the device every 5
seconds by default. However, you can tune this via the options to the
\&\fIwatch\fR command.
.PP
In case you wish the immediately force a restart of all currently
disabled monitor processes, you can send a \fI\s-1SIGUSR1\s0\fR to the \s-1SBD
\&\s0\fIinquisitor\fR process.
.SH "LICENSE"
.IX Header "LICENSE"
Copyright (C) 2008\-2013 Lars Marowsky-Bree
.PP
This program is free software; you can redistribute it and/or
modify it under the terms of the \s-1GNU\s0 General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
.PP
This software is distributed in the hope that it will be useful,
but \s-1WITHOUT ANY WARRANTY\s0; without even the implied warranty of
\&\s-1MERCHANTABILITY\s0 or \s-1FITNESS FOR A PARTICULAR PURPOSE. \s0 See the \s-1GNU\s0
General Public License for more details.
.PP
For details see the \s-1GNU\s0 General Public License at
http://www.gnu.org/licenses/gpl\-2.0.html (version 2) and/or
http://www.gnu.org/licenses/gpl.html (the newest as per \*(L"any later\*(R").