table of contents
OCF_LINBIT_DRBD(7) | OCF resource agents | OCF_LINBIT_DRBD(7) |
NAME¶
ocf_linbit_drbd - Manages a DRBD device as a Master/Slave resource
SYNOPSIS¶
drbd [start | stop | monitor | promote | demote | meta-data | validate-all]
DESCRIPTION¶
This resource agent manages a DRBD resource as a master/slave resource. DRBD is a shared-nothing replicated storage device.
NOTE: To avoid data-divergence, you should enable either DRBD "quorum" and "on-no-quorum io-error" (recommended), or configure proper fencing policies in both DRBD *and* Pacemaker (fencing resource-and-stonith). This cannot be done from this resource agent alone.
See the DRBD User's Guide for more information. https://docs.linbit.com/
SUPPORTED PARAMETERS¶
drbd_resource
(unique, required, string, no default)
drbdconf
(optional, string, default "/etc/drbd.conf")
adjust_master_score
Numeric values are expected to be non-decreasing.
The first value is 0 by default to prevent pacemaker from trying to promote while it is unclear whether the data is really the most recent copy. (DRBD knows it is "consistent", but is unsure about "uptodate"ness). Please configure proper fencing methods both in DRBD (fencing resource-and-stonith; appropriate (un)fence-peer handlers) AND in Pacemaker to make this work reliably.
Advanced use: Adjust the other values to better fit into complex dependency score calculations.
Intentionally diskless nodes ("Diskless Clients") with access to good data via some (or all) their peers will use the 3rd or 4th value (minus one) when they are (Secondary, not all peers up-to-date) or (ALL peers are up-to-date, or they are Primary themselves). This may need to change if this should become a frequent use case.
Special considerations:
If a Secondary DRBD is connected to a peer in Primary role, but Pacemaker does not know about any Primary (using crm_resource --locate), we conclude that there likely is a cluster-split-brain, and may try to "help" Pacemaker by removing the master-score. Also see "remove_master_score_if_peer_primary".
(optional, string, default "0 10 1000 10000")
stop_outdates_secondary
Note that this feature depends on the passed in information in OCF_RESKEY_CRM_meta_notify_master_uname to be correct, which unfortunately is not reliable for pacemaker versions up to at least 1.0.10 / 1.1.4.
If a Secondary is stopped (unconfigured), it may be marked as outdated in the drbd meta data, if we know there is still a Primary running in the cluster. Note that this does not affect fencing policies set in drbd config, but is an additional safety feature of this resource agent only. You can enable this behaviour by setting the parameter to true.
If this feature seems to not do what you expect, make sure you have defined fencing policies in the drbd configuration as well.
(optional, boolean, default false)
ignore_missing_notifications
(optional, boolean, default false)
wfc_timeout
(optional, integer, default 5)
remove_master_score_if_peer_primary
To prevent a potentially failed promotion attempt in case of cluster split-brain (Pacemaker communication loss) while DRBD is still connected to a Primary, you can request to remove any master score while DRBD is connected to a Primary (and that Primary peer looks like it has all disks up-to-date).
This may delay legitimate failovers after Primary crash by up to some TCP timeout (until DRBD realizes that the Primary is gone) plus one monitoring interval.
This parameter is interpreted almost as an "ocf boolean", with the exception of a literal "unexpected", that is:
- (yes|true|1) [actually, according to the OCF spec, also (YES|TRUE|True|ja|ON), but please don't go there]: is "true": remove (or never assign) master scores, if DRBD appears to see a (healthy) Primary
- "unexpected": assign master scores as described under "adjust_master_score", while removing it if DRBD appears to see a (healthy) Primary that Pacemaker does not know about (as determined by crm_resource --locate).
- everything else is "false": ignore the peer role while assigning master scores.
(optional, string, default "false")
fail_promote_early_if_peer_primary
To avoid a useless retry loop during promotion attempts in case of cluster split-brain (Pacemaker communication loss) while DRBD is still connected to a Primary, you can chose to give up after the first try if this situation is detected.
If a Primary "vanishes", TCP may not immediately detect this, and an idle DRBD may take some time until it does in-DRBD-protocol "pings". Pacemaker may well detect Primary loss earlier than DRBD, and try to promote while DRBD thinks it can still see a Primary. Which means, in general, trying to promote at least once is necessary, as that implies an in-DRBD-protocol "peer alive" check.
But if that does not succeed, re-trying until we hit the operation timeout may not be desired, so you can disable it.
(optional, boolean, default false)
unfence_if_all_uptodate
- With DRBD utils version <= 8.9.4, this is hardcoded to /usr/lib/drbd/crm-unfence-peer.sh -r $DRBD_RESOURCE
- With DRBD utils version >= 8.9.5, this is dispatched to $DRBDADM unfence-peer $DRBD_RESOURCE
In any case, the hook itself is responsible to fetch $OCF_RESKEY_unfence_extra_args from its environment.
(optional, boolean, default false)
unfence_extra_args
(optional, boolean, default --quiet --flock-required --flock-timeout 0 --unfence-only-if-owner-match)
require_drbd_module_version_ge
Example: use require_drbd_module_version_ge=9.0.16 to fail unless DRBD module version >= 9.0.16 is available (effectively requires DRBD 9).
The intention of this is to give a more useful failure message after accidentally downgrading the DRBD version by installing/upgrading a new kernel.
Note: "ge", "greater-or-equal", inclusive. Required format: x.y.z
Set empty to skip this check.
(optional, string, default "8.0.0")
require_drbd_module_version_lt
Example: use require_drbd_module_version_lt=9.0.0 to fail unless DRBD module version < 9.0 is available (effectively requires DRBD 8.4).
Note: "lt", "less-than", exclusive. Required format: x.y.z
Set empty to skip this check.
(optional, string, default "10.0.0")
connect_only_after_promote
Keep this DRBD instance disconnected until it is promoted. After promotion we issue an additional "adjust", which is supposed to initiate the connection attempts.
This causes a new data generation identifier ("current uuid") to be generated after the failover of a "healthy" DRBD.
(optional, boolean, default false)
SUPPORTED ACTIONS¶
This resource agent supports the following actions (operations):
start
reload
promote
demote
notify
stop
monitor (Slave role)
monitor (Master role)
meta-data
validate-all
EXAMPLE CRM SHELL¶
The following is an example configuration for a drbd resource using the crm(8) shell:
primitive p_drbd ocf:linbit:drbd \
params \
drbd_resource=string \
op monitor timeout="20" interval="20" role="Slave" \
op monitor timeout="20" interval="10" role="Master"
ms ms_drbd p_drbd \
meta notify="true" interleave="true"
EXAMPLE PCS¶
The following is an example configuration for a drbd resource using pcs(8)
pcs resource create p_drbd ocf:linbit:drbd \
drbd_resource=string \
op monitor timeout="20" interval="20" role="Slave" \
op monitor timeout="20" interval="10" role="Master" --master
SEE ALSO¶
https://docs.linbit.com/, https://clusterlabs.org/, https://www.linbit.com/drbd-community/
AUTHORS¶
LINBIT HA Solutions GmbH
01/09/2023 | drbd-pacemaker 9.22.0 |