other versions
- wheezy-backports 8.9.2~rc1-1~bpo70+1
- jessie 8.9.2~rc1-2+deb8u1
- jessie-backports 8.9.5-1~bpo8+1
- testing 8.9.10-2
- unstable 8.9.10-2
DRBDSETUP(8) | System Administration | DRBDSETUP(8) |
NAME¶
drbdsetup - Configure the DRBD kernel module.SYNOPSIS¶
drbdsetup
command {argument...} [option...]
DESCRIPTION¶
The drbdsetup utility serves to configure the DRBD kernel module and to show its current configuration. Users usually interact with the drbdadm utility, which provides a more high-level interface to DRBD than drbdsetup. (See drbdadm's --dry-run option to see how drbdadm uses drbdsetup.) Some option arguments have a default scale which applies when a plain number is specified (for example Kilo, or 1024 times the numeric value). Such default scales can be overridden by using a suffix (for example, M for Mega). The common suffixes K = 2^10 = 1024, M = 1024 K, and G = 1024 M are supported.COMMANDS¶
drbdsetup attach minor lower_dev meta_data_dev meta_data_index,The attach command attaches a
lower-level device to an existing replicated device. The disk-options
command changes the disk options of an attached lower-level device. In either
case, the replicated device must have been created with drbdsetup
new-minor.
Both commands refer to the replicated device by its minor number.
lower_dev is the name of the lower-level device. meta_data_dev
is the name of the device containing the metadata, and may be the same as
lower_dev. meta_data_index is either a numeric metadata index,
or the keyword internal for internal metadata, or the keyword
flexible for variable-size external metadata. Available options:
--al-extents extents
--c-fill-target fill_target,
--c-max-rate max_rate,
--c-plan-ahead plan_time
--disk-flushes,
--disk-drain
Fencing is a preventive measure to avoid situations where both nodes are
primary and disconnected. This is also known as a split-brain situation. DRBD
supports the following fencing policies:
dont-care
--md-flushes
drbdsetup check-resize minor
DRBD automatically maintains a "hot"
or "active" disk area likely to be written to again soon based on
the recent write activity. The "active" disk area can be written to
immediately, while "inactive" disk areas must be
"activated" first, which requires a meta-data write. We also refer
to this active disk area as the "activity log".
The activity log saves meta-data writes, but the whole log must be resynced upon
recovery of a failed node. The size of the activity log is a major factor of
how long a resync will take and how fast a replicated disk will become
consistent after a crash.
The activity log consists of a number of 4-Megabyte segments; the
al-extents parameter determines how many of those segments can be
active at the same time. The default value for al-extents is 1237, with
a minimum of 7 and a maximum of 65536.
Note that the effective maximum may be smaller, depending on how you created the
device meta data, see also drbdmeta(8) The effective maximum is 919 *
(available on-disk activity-log ring-buffer area/4kB -1), the default 32kB
ring-buffer effects a maximum of 6433 (covers more than 25 GiB of data) We
recommend to keep this well within the amount your backend storage and
replication link are able to resync inside of about 5 minutes.
--al-updates {yes | no}
With this parameter, the activity log can be
turned off entirely (see the al-extents parameter). This will speed up
writes because fewer meta-data writes will be necessary, but the entire device
needs to be resynchronized opon recovery of a failed primary node. The default
value for al-updates is yes.
--c-delay-target delay_target,
Dynamically control the resync speed. This
mechanism is enabled by setting the c-plan-ahead parameter to a
positive value. The goal is to either fill the buffers along the data path
with a defined amount of data if c-fill-target is defined, or to have a
defined delay along the path if c-delay-target is defined. The maximum
bandwidth is limited by the c-max-rate parameter.
The c-plan-ahead parameter defines how fast drbd adapts to changes in the
resync speed. It should be set to five times the network round-trip time or
more. Common values for c-fill-target for "normal" data paths
range from 4K to 100K. If drbd-proxy is used, it is advised to use
c-delay-target instead of c-fill-target. The
c-delay-target parameter is used if the c-fill-target parameter
is undefined or set to 0. The c-delay-target parameter should be set to
five times the network round-trip time or more. The c-max-rate option
should be set to either the bandwidth available between the DRBD-hosts and the
machines hosting DRBD-proxy, or to the available disk bandwidth.
The default values of these parameters are: c-plan-ahead = 20 (in units
of 0.1 seconds), c-fill-target = 0 (in units of sectors),
c-delay-target = 1 (in units of 0.1 seconds), and c-max-rate =
102400 (in units of KiB/s).
Dynamic resync speed control is available since DRBD 8.3.9.
--c-min-rate min_rate
A node which is primary and sync-source has to
schedule application I/O requests and resync I/O requests. The
c-min-rate parameter limits how much bandwidth is available for resync
I/O; the remaining bandwidth is used for application I/O.
A c-min-rate value of 0 means that there is no limit on the resync I/O
bandwidth. This can slow down application I/O significantly. Use a value of 1
(1 KiB/s) for the lowest possible resync rate.
The default value of c-min-rate is 4096, in units of KiB/s.
--disk-barrier,
DRBD has three methods of handling the
ordering of dependent write requests:
disk-barrier
From these three methods, drbd will use the first that is enabled and supported
by the backing storage device. If all three of these options are turned off,
DRBD will submit write requests without bothering about dependencies.
Depending on the I/O stack, write requests can be reordered, and they can be
submitted in a different order on different cluster nodes. This can result in
data loss or corruption. Therefore, turning off all three methods of
controlling write ordering is strongly discouraged.
A general guideline for configuring write ordering is to use disk barriers or
disk flushes when using ordinary disks (or an ordinary disk array) with a
volatile write cache. On storage without cache or with a battery backed write
cache, disk draining can be a reasonable choice.
--disk-timeout
Use disk barriers to make sure that requests
are written to disk in the right order. Barriers ensure that all requests
submitted before a barrier make it to the disk before any requests submitted
after the barrier. This is implemented using 'tagged command queuing' on SCSI
devices and 'native command queuing' on SATA devices. Only some devices and
device stacks support this method. The device mapper (LVM) only supports
barriers in some configurations.
Note that on systems which do not support disk barriers, enabling this option
can lead to data loss or corruption. Until DRBD 8.4.1, disk-barrier was
turned on if the I/O stack below DRBD did support barriers. Kernels since
linux-2.6.36 (or 2.6.32 RHEL6) no longer allow to detect if barriers are
supported. Since drbd-8.4.2, this option is off by default and needs to be
enabled explicitly.
disk-flushes
Use disk flushes between dependent write
requests, also referred to as 'force unit access' by drive vendors. This
forces all data to disk. This option is enabled by default.
disk-drain
Wait for the request queue to
"drain" (that is, wait for the requests to finish) before submitting
a dependent write request. This method requires that requests are stable on
disk when they finish. Before DRBD 8.0.9, this was the only method
implemented. This option is enabled by default. Do not disable in production
environments.
If the lower-level device on which a DRBD
device stores its data does not finish an I/O request within the defined
disk-timeout, DRBD treats this as a failure. The lower-level device is
detached, and the device's disk state advances to Diskless. If DRBD is
connected to one or more peers, the failed request is passed on to one of
them.
This option is dangerous and may lead to kernel panic!
"Aborting" requests, or force-detaching the disk, is intended for
completely blocked/hung local backing devices which do no longer complete
requests at all, not even do error completions. In this situation, usually a
hard-reset and failover is the only way out.
By "aborting", basically faking a local error-completion, we allow for
a more graceful swichover by cleanly migrating services. Still the affected
node has to be rebooted "soon".
By completing these requests, we allow the upper layers to re-use the associated
data pages.
If later the local backing device "recovers", and now DMAs some data
from disk into the original request pages, in the best case it will just put
random data into unused pages; but typically it will corrupt meanwhile
completely unrelated data, causing all sorts of damage.
Which means delayed successful completion, especially for READ requests, is a
reason to panic(). We assume that a delayed *error* completion is OK, though
we still will complain noisily about it.
The default value of disk-timeout is 0, which stands for an infinite
timeout. Timeouts are specified in units of 0.1 seconds. This option is
available since DRBD 8.3.12.
--fencing fencing_policy
No fencing actions are taken. This is the
default policy.
resource-only
If a node becomes a disconnected primary, it
tries to fence the peer. This is done by calling the fence-peer
handler. The handler is supposed to reach the peer over an alternative
communication path and call ' drbdadm outdate minor' there.
resource-and-stonith
If a node becomes a disconnected primary, it
freezes all its IO operations and calls its fence-peer handler. The fence-peer
handler is supposed to reach the peer over an alternative communication path
and call ' drbdadm outdate minor' there. In case it cannot do that, it
should stonith the peer. IO is resumed as soon as the situation is resolved.
In case the fence-peer handler fails, I/O can be resumed manually with '
drbdadm resume-io'.
Enable disk flushes and disk barriers on the
meta-data device. This option is enabled by default. See the
disk-flushes parameter.
--on-io-error handler
Configure how DRBD reacts to I/O errors on a
lower-level device. The following policies are defined:
pass_on
--read-balancing policy
Change the disk status to Inconsistent, mark
the failed block as inconsistent in the bitmap, and retry the I/O operation on
a remote cluster node.
call-local-io-error
Call the local-io-error handler (see
the handlers section).
detach
Detach the lower-level device and continue in
diskless mode.
Distribute read requests among cluster nodes
as defined by policy. The supported policies are prefer-local
(the default), prefer-remote, round-robin, least-pending,
when-congested-remote, 32K-striping, 64K-striping,
128K-striping, 256K-striping, 512K-striping and
1M-striping.
This option is available since DRBD 8.4.1.
resync-after minor
Define that a device should only resynchronize
after the specified other device. By default, no order between devices is
defined, and all devices will resynchronize in parallel. Depending on the
configuration of the lower-level devices, and the available network and disk
bandwidth, this can slow down the overall resync process. This option can be
used to form a chain or tree of dependencies among devices.
--resync-rate rate
Define how much bandwidth DRBD may use for
resynchronizing. DRBD allows "normal" application I/O even during a
resync. If the resync takes up too much bandwidth, application I/O can become
very slow. This parameter allows to avoid that. Please note this is option
only works when the dynamic resync controller is disabled.
--size size
Specify the size of the lower-level device
explicitly instead of determining it automatically. The device size must be
determined once and is remembered for the lifetime of the device. In order to
determine it automatically, all the lower-level devices on all nodes must be
attached, and all nodes must be connected. If the size is specified
explicitly, this is not necessary. The size value is assumed to be in
units of sectors (512 bytes) by default.
Remember the current size of the lower-level
device of the specified replicated device. Used by drbdadm. The size
information is stored in file /var/lib/drbd/drbd-minor-
minor.lkbd.
drbdsetup connect resource local_addr remote_addr,
The connect command connects a
resource to a peer host. The resource must have been created with
drbdsetup new-resource. The net-options command changes the
network options of an existing connection. In both commands, local_addr
and remote_addr refer to the local and remote protocol, network
address, and port in the format [
address-family:]address[:port]. The address families
ipv4, ipv6, ssocks (Dolphin Interconnect Solutions'
"super sockets"), sdp (Infiniband Sockets Direct Protocol),
and sci are supported ( sci is an alias for ssocks). If
no address family is specified, ipv4 is assumed. For all address
families except ipv6, the address uses IPv4 address notation
(for example, 1.2.3.4). For ipv6, the address is enclosed in brackets
and uses IPv6 address notation (for example, [fd01:2345:6789:abcd::1]). The
port defaults to 7788. Available options:
--after-sb-0pri policy
--congestion-fill threshold,
--congestion-extents threshold
drbdsetup cstate local_addr remote_addr
Define how to react if a split-brain scenario
is detected and none of the two nodes is in primary role. (We detect
split-brain scenarios when two nodes connect; split-brain decisions are always
between two nodes.) The defined policies are:
disconnect
discard-older-primary
--after-sb-1pri policy
No automatic resynchronization; simply
disconnect.
discard-younger-primary,
Resynchronize from the node which became
primary first ( discard-younger-primary) or last
(discard-older-primary). If both nodes became primary independently,
the discard-least-changes policy is used.
discard-zero-changes
If only one of the nodes wrote data since the
split brain situation was detected, resynchronize from this node to the other.
If both nodes wrote data, disconnect.
discard-least-changes
Resynchronize from the node with more modified
blocks.
discard-node-nodename
Always resynchronize to the named node.
Define how to react if a split-brain scenario
is detected, with one node in primary role and one node in secondary role. (We
detect split-brain scenarios when two nodes connect, so split-brain decisions
are always among two nodes.) The defined policies are:
disconnect
--after-sb-2pri policy
No automatic resynchronization, simply
disconnect.
consensus
Discard the data on the secondary node if the
after-sb-0pri algorithm would also discard the data on the secondary
node. Otherwise, disconnect.
violently-as0p
Always take the decision of the
after-sb-0pri algorithm, even if it causes an erratic change of the
primary's view of the data. This is only useful if a single-node file system
(i.e., not OCFS2 or GFS) with the allow-two-primaries flag is used.
This option can cause the primary node to crash, and should not be used.
discard-secondary
Discard the data on the secondary node.
call-pri-lost-after-sb
Always take the decision of the
after-sb-0pri algorithm. If the decision is to discard the data on the
primary node, call the pri-lost-after-sb handler on the primary
node.
Define how to react if a split-brain scenario
is detected and both nodes are in primary role. (We detect split-brain
scenarios when two nodes connect, so split-brain decisions are always among
two nodes.) The defined policies are:
disconnect
--allow-two-primaries
No automatic resynchronization, simply
disconnect.
violently-as0p
See the violently-as0p policy for
after-sb-1pri.
call-pri-lost-after-sb
Call the pri-lost-after-sb handler on
one of the machines. The handler is expected to reboot the machine, which
brings the node into secondary role.
The most common way to configure DRBD devices
is to allow only one node to be primary (and thus writable) at a time.
In some scenarios it is preferable to allow two nodes to be primary at once; a
mechanism outside of DRBD then must make sure that writes to the shared,
replicated device happen in a coordinated way. This can be done with a
shared-storage cluster file system like OCFS2 and GFS, or with virtual machine
images and a virtual machine manager that can migrate virtual machines between
physical machines.
The allow-two-primaries parameter tells DRBD to allow two nodes to be
primary at the same time. Never enable this option when using a
non-distributed file system; otherwise, data corruption and node crashes will
result!
--always-asbp
Normally the automatic after-split-brain
policies are only used if current states of the UUIDs do not indicate the
presence of a third node.
With this option you request that the automatic after-split-brain policies are
used as long as the data sets of the nodes are somehow related. This might
cause a full sync, if the UUIDs indicate the presence of a third node. (Or
double faults led to strange UUID sets.)
--connect-int time
As soon as a connection between two nodes is
configured with drbdsetup connect, DRBD immediately tries to establish
the connection. If this fails, DRBD waits for connect-int seconds and
then repeats. The default value of connect-int is 10 seconds.
--cram-hmac-alg hash-algorithm
Configure the hash-based message
authentication code (HMAC) or secure hash algorithm to use for peer
authentication. The kernel supports a number of different algorithms, some of
which may be loadable as kernel modules. See the shash algorithms listed in
/proc/crypto. By default, cram-hmac-alg is unset. Peer authentication
also requires a shared-secret to be configured.
--csums-alg hash-algorithm
Normally, when two nodes resynchronize, the
sync target requests a piece of out-of-sync data from the sync source, and the
sync source sends the data. With many usage patterns, a significant number of
those blocks will actually be identical.
When a csums-alg algorithm is specified, when requesting a piece of
out-of-sync data, the sync target also sends along a hash of the data it
currently has. The sync source compares this hash with its own version of the
data. It sends the sync target the new data if the hashes differ, and tells it
that the data are the same otherwise. This reduces the network bandwidth
required, at the cost of higher cpu utilization and possibly increased I/O on
the sync target.
The csums-alg can be set to one of the secure hash algorithms supported
by the kernel; see the shash algorithms listed in /proc/crypto. By default,
csums-alg is unset.
--csums-after-crash-only
Enabling this option (and csums-alg, above)
makes it possible to use the checksum based resync only for the first resync
after primary crash, but not for later "network hickups".
In most cases, block that are marked as need-to-be-resynced are in fact changed,
so calculating checksums, and both reading and writing the blocks on the
resync target is all effective overhead.
The advantage of checksum based resync is mostly after primary crash recovery,
where the recovery marked larger areas (those covered by the activity log) as
need-to-be-resynced, just in case. Introduced in 8.4.5.
--data-integrity-alg alg
DRBD normally relies on the data integrity
checks built into the TCP/IP protocol, but if a data integrity algorithm is
configured, it will additionally use this algorithm to make sure that the data
received over the network match what the sender has sent. If a data integrity
error is detected, DRBD will close the network connection and reconnect, which
will trigger a resync.
The data-integrity-alg can be set to one of the secure hash algorithms
supported by the kernel; see the shash algorithms listed in /proc/crypto. By
default, this mechanism is turned off.
Because of the CPU overhead involved, we recommend not to use this option in
production environments. Also see the notes on data integrity below.
--ko-count number
If a secondary node fails to complete a write
request in ko-count times the timeout parameter, it is excluded
from the cluster. The primary node then sets the connection to this secondary
node to Standalone. The default value of ko-count is 0, which disables
this feature.
--max-buffers number
Limits the memory usage per DRBD minor device
on the receiving side, or for internal buffers during resync or online-verify.
Unit is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible
setting is hard coded to 32 (=128 KiB). These buffers are used to hold data
blocks while they are written to/read from disk. To avoid possible distributed
deadlocks on congestion, this setting is used as a throttle threshold rather
than a hard limit. Once more than max-buffers pages are in use, further
allocation from this pool is throttled. You want to increase max-buffers if
you cannot saturate the IO backend on the receiving side.
--max-epoch-size number
Define the maximum number of write requests
DRBD may issue before issuing a write barrier. The default value is 2048, with
a minimum of 1 and a maximum of 20000. Setting this parameter to a value below
10 is likely to decrease performance.
--on-congestion policy,
By default, DRBD blocks when the TCP send
queue is full. This prevents applications from generating further write
requests until more buffer space becomes available again.
When DRBD is used together with DRBD-proxy, it can be better to use the
pull-ahead on-congestion policy, which can switch DRBD into
ahead/behind mode before the send queue is full. DRBD then records the
differences between itself and the peer in its bitmap, but it no longer
replicates them to the peer. When enough buffer space becomes available again,
the node resynchronizes with the peer and switches back to normal replication.
This has the advantage of not blocking application I/O even when the queues fill
up, and the disadvantage that peer nodes can fall behind much further. Also,
while resynchronizing, peer nodes will become inconsistent.
The available congestion policies are block (the default) and
pull-ahead. The congestion-fill parameter defines how much data
is allowed to be "in flight" in this connection. The default value
is 0, which disables this mechanism of congestion control, with a maximum of
10 GiBytes. The congestion-extents parameter defines how many bitmap
extents may be active before switching into ahead/behind mode, with the same
default and limits as the al-extents parameter. The
congestion-extents parameter is effective only when set to a value
smaller than al-extents.
Ahead/behind mode is available since DRBD 8.3.10.
--ping-int interval
When the TCP/IP connection to a peer is idle
for more than ping-int seconds, DRBD will send a keep-alive packet to
make sure that a failed peer or network connection is detected reasonably
soon. The default value is 10 seconds, with a minimum of 1 and a maximum of
120 seconds. The unit is seconds.
--ping-timeout timeout
Define the timeout for replies to keep-alive
packets. If the peer does not reply within ping-timeout, DRBD will
close and try to reestablish the connection. The default value is 0.5 seconds,
with a minimum of 0.1 seconds and a maximum of 3 seconds. The unit is tenths
of a second.
--socket-check-timeout timeout
In setups involving a DRBD-proxy and
connections that experience a lot of buffer-bloat it might be necessary to set
ping-timeout to an unusual high value. By default DRBD uses the same
value to wait if a newly established TCP-connection is stable. Since the
DRBD-proxy is usually located in the same data center such a long wait time
may hinder DRBD's connect process.
In such setups socket-check-timeout should be set to at least to the
round trip time between DRBD and DRBD-proxy. I.e. in most cases to 1.
The default unit is tenths of a second, the default value is 0 (which causes
DRBD to use the value of ping-timeout instead). Introduced in
8.4.5.
--protocol name
Use the specified protocol on this connection.
The supported protocols are:
A
--rcvbuf-size size
Writes to the DRBD device complete as soon as
they have reached the local disk and the TCP/IP send buffer.
B
Writes to the DRBD device complete as soon as
they have reached the local disk, and all peers have acknowledged the receipt
of the write requests.
C
Writes to the DRBD device complete as soon as
they have reached the local and all remote disks.
Configure the size of the TCP/IP receive
buffer. A value of 0 (the default) causes the buffer size to adjust
dynamically. This parameter usually does not need to be set, but it can be set
to a value up to 10 MiB. The default unit is bytes.
--rr-conflict policy
This option helps to solve the cases when the
outcome of the resync decision is incompatible with the current role
assignment in the cluster. The defined policies are:
disconnect
--shared-secret secret
No automatic resynchronization, simply
disconnect.
violently
Resync to the primary node is allowed,
violating the assumption that data on a block device are stable for one of the
nodes. Do not use this option, it is dangerous.
call-pri-lost
Call the pri-lost handler on one of the
machines. The handler is expected to reboot the machine, which puts it into
secondary role.
Configure the shared secret used for peer
authentication. The secret is a string of up to 64 characters. Peer
authentication also requires the cram-hmac-alg parameter to be
set.
--sndbuf-size size
Configure the size of the TCP/IP send buffer.
Since DRBD 8.0.13 / 8.2.7, a value of 0 (the default) causes the buffer size
to adjust dynamically. Values below 32 KiB are harmful to the throughput on
this connection. Large buffer sizes can be useful especially when protocol A
is used over high-latency networks; the maximum value supported is 10
MiB.
--tcp-cork
By default, DRBD uses the TCP_CORK socket
option to prevent the kernel from sending partial messages; this results in
fewer and bigger packets on the network. Some network stacks can perform worse
with this optimization. On these, the tcp-cork parameter can be used to
turn this optimization off.
--timeout time
Define the timeout for replies over the
network: if a peer node does not send an expected reply within the specified
timeout, it is considered dead and the TCP/IP connection is closed. The
timeout value must be lower than connect-int and lower than
ping-int. The default is 6 seconds; the value is specified in tenths of
a second.
--unplug-watermark number
Mainline kernels before version 2.6.39-rc1 use
an explicit plug / unplug mechanism to control when a block device starts
processing queued requests. On those kernels, the unplug-watermark
parameter defines how many requests must be queued until a secondary node
starts processing them. Some storage controllers perform best when
unplug-watermark is set to the same value as max-buffers; others
are more efficient with smaller values. The default value for
unplug-watermark is 128, with a minimum of 16 and a maximum of 131072.
More recent kernels handle plugging and unplugging implicitly; on those kernels,
this parameter has no effect. Note that some distributions have backported
this feature to older kernel versions.
--use-rle
Each replicated device on a cluster node has a
separate bitmap for each of its peer devices. The bitmaps are used for
tracking the differences between the local and peer device: depending on the
cluster state, a disk range can be marked as different from the peer in the
device's bitmap, in the peer device's bitmap, or in both bitmaps. When two
cluster nodes connect, they exchange each other's bitmaps, and they each
compute the union of the local and peer bitmap to determine the overall
differences.
Bitmaps of very large devices are also relatively large, but they usually
compress very well using run-length encoding. This can save time and bandwidth
for the bitmap transfers.
The use-rle parameter determines if run-length encoding should be used.
It is on by default since DRBD 8.4.0.
--verify-alg hash-algorithm
Online verification (drbdadm verify)
computes and compares checksums of disk blocks (i.e., hash values) in order to
detect if they differ. The verify-alg parameter determines which
algorithm to use for these checksums. It must be set to one of the secure hash
algorithms supported by the kernel before online verify can be used; see the
shash algorithms listed in /proc/crypto.
We recommend to schedule online verifications regularly during low-load periods,
for example once a month. Also see the notes on data integrity below.
--discard-my-data
Discard the local data and resynchronize with
the peer that has the most up-to-data data. Use this option to manually
recover from a split-brain situation.
--tentative
Only determine if a connection to the peer can
be established and if a resync is necessary (and in which direction) without
actually establishing the connection or starting the resync. Check the system
log to see what DRBD would do without the --tentative option.
Show the current state of a connection. The
connection is identified by its endpoints; see the drbdsetup connect
command.
drbdsetup del-minor minor
Remove a replicated device. No lower-level
device may be attached; see drbdsetup detach.
drbdsetup del-resource resource
Remove a resource. All volumes and connections
must be removed first ( drbdsetup del-minor, drbdsetup
disconnect). Alternatively, drbdsetup down can be used to remove a
resource together with all its volumes and connections.
drbdsetup detach minor
Detach the lower-level device of a replicated
device. Available options:
--force
drbdsetup disconnect local_addr remote_addr
Force the detach and return immediately. This
puts the lower-level device into failed state until all pending I/O has
completed, and then detaches the device. Any I/O not yet submitted to the
lower-level device (for example, because I/O on the device was suspended) is
assumed to have failed.
Remove a connection to a peer host. The
connection is identified by its endpoints; see the drbdsetup connect
command.
drbdsetup down {resource | all}
Take a resource down by removing all volumes,
connections, and the resource itself.
drbdsetup dstate minor
Show the current disk state of a lower-level
device.
drbdsetup events2 {resource | all}
Show the current state of all configured DRBD
objects, followed by all changes to the state.
The output format is meant to be human as well as machine readable. Each line
starts with the event number, which is followed by an asterisk if the event
continues in the next line. The second word in each line indicates the kind of
event: exists for an existing object; create, destroy,
and change if an object is created, destroyed, or changed; or
call or response if an event handler is called or it returns.
The third word indicates the object the event applies to: resource,
device, connection, peer-device, helper, or a dash
( -) to indicate that the current state has been dumped completely.
The remaining words identify the object and describe the state that he object is
in. Available options:
--now
drbdsetup get-gi volume local_addr remote_addr
Terminate after reporting the current state.
The default is to continuously listen and report state changes.
--statistics
Include statistics in the output.
Show the data generation identifiers for a
device on a particular connection. The device is identified by its volume
number. The connection is identified by its endpoints; see the drbdsetup
connect command.
The output consists of the current UUID, bitmap UUID, and the first two history
UUIDS, folowed by a set of flags. The current UUID and history UUIDs are
device specific; the bitmap UUID and flags are peer device specific. This
command only shows the first two history UUIDs. Internally, DRBD maintains one
history UUID for each possible peer device.
drbdsetup invalidate minor
Replace the local data of a device with that
of a peer. All the local data will be marked out-of-sync, and a resync with
the specified peer device will be initialted.
drbdsetup invalidate-remote volume local_addr
remote_addr
Replace a peer device's data of a resource
with the local data. The peer device's data will be marked out-of-sync, and a
resync from the local node to the specified peer will be initiated.
drbdsetup new-current-uuid minor
Generate a new current UUID and rotates all
other UUID values. This has at least two use cases, namely to skip the initial
sync, and to reduce network bandwidth when starting in a single node
configuration and then later (re-)integrating a remote site.
Available option:
--clear-bitmap
This can be used to skip the initial sync, if you want to start from scratch.
This use-case does only work on "Just Created" meta data. Necessary
steps:
One obvious side-effect is that the replica is full of old garbage (unless you
made them identical using other means), so any online-verify is expected to
find any number of out-of-sync blocks.
You must not use this on pre-existing data! Even though it may appear to
work at first glance, once you switch to the other node, your data is toast,
as it never got replicated. So do not leave out the mkfs (or
equivalent).
This can also be used to shorten the initial resync of a cluster where the
second node is added after the first node is gone into production, by means of
disk shipping. This use-case works on disconnected devices only, the device
may be in primary or secondary role.
The necessary steps on the current active server are:
Now add the disk to the new secondary node, and join it to the cluster. You will
get a resync of that parts that were changed since the first call to
drbdsetup in step 1.
drbdsetup new-minor resource minor volume
Clears the sync bitmap in addition to
generating a new current UUID.
1.On both nodes, initialize meta data
and configure the device.
drbdadm create-md --force res
2.They need to do the initial handshake, so
they know their sizes.
drbdadm up res
3.They are now Connected Secondary/Secondary
Inconsistent/Inconsistent. Generate a new current-uuid and clear the dirty
bitmap.
drbdadm --clear-bitmap new-current-uuid res
4.They are now Connected Secondary/Secondary
UpToDate/UpToDate. Make one side primary and create a file system.
drbdadm primary res
mkfs -t fs-type $(drbdadm sh-dev
res)
1.drbdsetup new-current-uuid
--clear-bitmap minor
2.Take the copy of the current active server.
E.g. by pulling a disk out of the RAID1 controller, or by copying with dd. You
need to copy the actual data, and the meta data.
3.drbdsetup new-current-uuid
minor
Create a new replicated device within a
resource. The command creates a block device inode for the replicated device
(by default, /dev/drbd minor). The volume number identifies the
device within the resource.
drbdsetup new-resource resource node_id,
The new-resource command creates a new
resource. The resource-options command changes the resource options of
an existing resource. Available options:
--auto-promote bool-value
drbdsetup outdate minor
A resource must be promoted to primary role
before any of its devices can be mounted or opened for writing.
Before DRBD 9, this could only be done explicitly ("drbdadm primary").
Since DRBD 9, the auto-promote parameter allows to automatically
promote a resource to primary role when one of its devices is mounted or
opened for writing. As soon as all devices are unmounted or closed with no
more remaining users, the role of the resource changes back to secondary.
Automatic promotion only succeeds if the cluster state allows it (that is, if an
explicit drbdadm primary command would succeed). Otherwise, mounting or
opening the device fails as it already did before DRBD 9: the mount(2)
system call fails with errno set to EROFS (Read-only file system); the
open(2) system call fails with errno set to EMEDIUMTYPE (wrong medium
type).
Irrespective of the auto-promote parameter, if a device is promoted
explicitly ( drbdadm primary), it also needs to be demoted explicitly
(drbdadm secondary).
The auto-promote parameter is available since DRBD 9.0.0, and defaults to
yes.
--cpu-mask cpu-mask
Set the cpu affinity mask for DRBD kernel
threads. The cpu mask is specified as a hexadecimal number. The default value
is 0, which lets the scheduler decide which kernel threads run on which CPUs.
CPU numbers in cpu-mask which do not exist in the system are
ignored.
--on-no-data-accessible policy
Determine how to deal with I/O requests when
the requested data is not available locally or remotely (for example, when all
disks have failed). The defined policies are:
io-error
This setting is available since DRBD 8.3.9; the default policy is
io-error.
--peer-ack-window value
System calls fail with errno set to EIO.
suspend-io
The resource suspends I/O. I/O can be resumed
by (re)attaching the lower-level device, by connecting to a peer which has
access to the data, or by forcing DRBD to resume I/O with drbdadm resume-io
res. When no data is available, forcing I/O to resume will
result in the same behavior as the io-error policy.
On each node and for each device, DRBD
maintains a bitmap of the differences between the local and remote data for
each peer device. For example, in a three-node setup (nodes A, B, C) each with
a single device, every node maintains one bitmap for each of its peers.
When nodes receive write requests, they know how to update the bitmaps for the
writing node, but not how to update the bitmaps between themselves. In this
example, when a write request propagates from node A to B and C, nodes B and C
know that they have the same data as node A, but not whether or not they both
have the same data.
As a remedy, the writing node occasionally sends peer-ack packets to its peers
which tell them which state they are in relative to each other.
The peer-ack-window parameter specifies how much data a primary node may
send before sending a peer-ack packet. A low value causes increased network
traffic; a high value causes less network traffic but higher memory
consumption on secondary nodes and higher resync times between the secondary
nodes after primary node failures. (Note: peer-ack packets may be sent due to
other reasons as well, e.g. membership changes or expiry of the
peer-ack-delay timer.)
The default value for peer-ack-window is 2 MiB, the default unit is
sectors. This option is available since 9.0.0.
--peer-ack-delay expiry-time
If after the last finished write request no
new write request gets issued for expiry-time, then a peer-ack packet
is sent. If a new write request is issued before the timer expires, the timer
gets reset to expiry-time. (Note: peer-ack packets may be sent due to
other reasons as well, e.g. membership changes or the peer-ack-window
option.)
This parameter may influence resync behavior on remote nodes. Peer nodes need to
wait until they receive an peer-ack for releasing a lock on an AL-extent.
Resync operations between peers may need to wait for for these locks.
The default value for peer-ack-delay is 100 milliseconds, the default
unit is milliseconds. This option is available since 9.0.0.
Mark the data on a lower-level device as
outdated. This is used for fencing, and prevents the resource the device is
part of from becoming primary in the future. See the --fencing disk
option.
drbdsetup pause-sync volume local_addr remote_addr
Stop resynchronizing between a local and a
peer device by setting the local pause flag. The resync can only resume if the
pause flags on both sides of a connection are cleared.
drbdsetup primary resource
Change the role of a node in a resource to
primary. This allows the replicated devices in this resource to be mounted or
opened for writing. Available options:
--overwrite-data-of-peer
Note that DRBD usually only allows one node in a cluster to be in primary role
at any time; this allows DRBD to coordinate access to the devices in a
resource across nodes. The --allow-two-primaries network option changes
this; in that case, a mechanism outside of DRBD needs to coordinate device
access.
drbdsetup resize minor
This option is an alias for the --force
option.
--force
Force the resource to become primary even if
some devices are not guaranteed to have up-to-date data. This option is used
to turn one of the nodes in a newly created cluster into the primary node, or
when manually recovering from a disaster.
Note that this can lead to split-brain scenarios. Also, when forcefully turning
an inconsistent device into an up-to-date device, it is highly recommended to
use any integrity checks available (such as a filesystem check) to make sure
that the device can at least be used without crashing the system.
Reexamine the size of the lower-level devices
of a replicated device on all nodes. This command is called after the
lower-level devices on all nodes have been grown to adjust the size of the
replicated device. Available options:
--assume-peer-has-space
drbdsetup resume-io minor
Resize the device even if some of the peer
devices are not connected at the moment. DRBD will try to resize the peer
devices when they next connect. It will refuse to connect to a peer device
which is too small.
--assume-clean
Do not resynchronize the added disk space;
instead, assume that it is identical on all nodes. This option can be used
when the disk space is uninitialized and differences do not matter, or when it
is known to be identical on all nodes. See the drbdsetup verify
command.
--size val
This option can be used to online shrink the
usable size of a drbd device. It's the users responsibility to make sure that
a file system on the device is not truncated by that operation.
--al-stripes val --al-stripes val
These options may be used to change the layout
of the activity log online. In case of internal meta data this may invovle
shrinking the user visible size at the same time (unsing the --size) or
increasing the avalable space on the backing devices.
Resume I/O on a replicated device. See the
--fencing disk option.
drbdsetup resume-sync volume local_addr remote_addr
Allow resynchronization to resume by clearing
the local sync pause flag.
drbdsetup role resource
Show the current role of a resource.
drbdsetup secondary resource
Change the role of a node in a resource to
secondary. This command fails if the replicated device is in use.
drbdsetup show {resource | all}
Show the current configuration of a resource,
or of all resources. Available options:
--show-defaults
drbdsetup show-gi volume local_addr remote_addr
Show all configuration parameters, even the
ones with default values. Normally, parameters with default values are not
shown.
Show the data generation identifiers for a
device on a particular connection. In addition, explain the output. The output
otherwise is the same as in the drbdsetup get-gi command.
drbdsetup state
This is an alias for drbdsetup role.
Deprecated.
drbdsetup status {resource | all}
Show the status of a resource, or of all
resources. The output consists of one paragraph for each configured resource.
Each paragraph contains one line for each resource, followed by one line for
each device, and one line for each connection. The device and connection lines
are indented. The connection lines are followed by one line for each peer
device; these lines are indented against the connection line.
Long lines are wrapped around at terminal width, and indented to indicate how
the lines belongs together. Available options:
--verbose
For example, the non-verbose output for a resource with only one connection and
only one volume could look like this:
With the --verbose option, the same resource could be reported as:
drbdsetup suspend-io minor
Include more information in the output even
when it is likely redundant or irrelevant.
--statistics
Include data transfer statistics in the
output.
--color={always | auto | never}
Colorize the output. With --color=auto,
drbdsetup emits color codes only when standard output is connected to a
terminal.
drbd0 role:Primary disk:UpToDate host2.example.com role:Secondary disk:UpToDate
drbd0 node-id:1 role:Primary suspended:no volume:0 minor:1 disk:UpToDate blocked:no host2.example.com local:ipv4:192.168.123.4:7788 peer:ipv4:192.168.123.2:7788 node-id:0 connection:WFReportParams role:Secondary congested:no volume:0 replication:Connected disk:UpToDate resync-suspended:no
Suspend I/O on a replicated device. It is not
usually necessary to use this command.
drbdsetup verify volume local_addr remote_addr
Start online verification, change which part
of the device will be verified, or stop online verification. The command
requires the specified peer to be connected.
Online verification compares each disk block on the local and peer node. Blocks
which differ between the nodes are marked as out-of-sync, but they are
not automatically brought back into sync. To bring them into sync, the
resource must be disconnected and reconnected. Progress can be monitored in
the output of drbdsetup status --statistics. Available options:
--start position
Also see the notes on data integrity in the drbd.conf(5) manual
page.
drbdsetup wait-connect-volume volume local_addr
remote_addr,
Define where online verification should start.
This parameter is ignored if online verification is already in progress. If
the start parameter is not specified, online verification will continue where
it was interrupted (if the connection to the peer was lost while verifying),
after the previous stop sector (if the previous online verification has
finished), or at the beginning of the device (if the end of the device was
reached, or online verify has not run before).
The position on disk is specified in disk sectors (512 bytes) by default.
--stop position
Define where online verification should stop.
If online verification is already in progress, the stop position of the active
online verification process is changed. Use this to stop online verification.
The position on disk is specified in disk sectors (512 bytes) by default.
The wait-connect-* commands waits until
a device on a peer is visible. The wait-sync-* commands waits until a
device on a peer is up to date. Available options for both commands:
--degr-wfc-timeout timeout
drbdsetup forget-peer resource peer_node_id
Define how long to wait until all peers are
connected in case the cluster consisted of a single node only when the system
went down. This parameter is usually set to a value smaller than
wfc-timeout. The assumption here is that peers which were unreachable
before a reboot are less likely to be be reachable after the reboot, so
waiting is less likely to help.
The timeout is specified in seconds. The default value is 0, which stands for an
infinite timeout. Also see the wfc-timeout parameter.
--outdated-wfc-timeout timeout
Define how long to wait until all peers are
connected if all peers were outdated when the system went down. This parameter
is usually set to a value smaller than wfc-timeout. The assumption here
is that an outdated peer cannot have become primary in the meantime, so we
don't need to wait for it as long as for a node which was alive before.
The timeout is specified in seconds. The default value is 0, which stands for an
infinite timeout. Also see the wfc-timeout parameter.
--wait-after-sb
This parameter causes DRBD to continue waiting
in the init script even when a split-brain situation has been detected, and
the nodes therefore refuse to connect to each other.
--wfc-timeout timeout
Define how long the init script waits until
all peers are connected. This can be useful in combination with a cluster
manager which cannot manage DRBD resources: when the cluster manager starts,
the DRBD resources will already be up and running. With a more capable cluster
manager such as Pacemaker, it makes more sense to let the cluster manager
control DRBD resources. The timeout is specified in seconds. The default value
is 0, which stands for an infinite timeout. Also see the
degr-wfc-timeout parameter.
The forget-peer command removes all
traces of a peer node from the meta-data. It frees a bitmap slot in the
meta-data and make it avalable for futher bitmap slot allocation in case a
so-far never seen node connects.
The connection must be taken down before this command may be used. In case the
peer re-connects at a later point a bit-map based resync will be turned into a
full-sync.
EXAMPLES¶
Please see the DRBD User's Guide[1] for examples.VERSION¶
This document was revised for version 9.0.0 of the DRBD distribution.AUTHOR¶
Written by Philipp Reisner philipp.reisner@linbit.com and Lars Ellenberg lars.ellenberg@linbit.com.REPORTING BUGS¶
Report bugs to drbd-user@lists.linbit.com.COPYRIGHT¶
Copyright 2001-2012 LINBIT Information Technologies, Philipp Reisner, Lars Ellenberg. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.SEE ALSO¶
drbd.conf(5), drbd(8), drbddisk(8), drbdadm(8), DRBD User's Guide[1], DRBD Web Site[2]NOTES¶
- 1.
- DRBD User's Guide
- 2.
- DRBD Web Site
3 December 2011 | DRBD 9.0.0 |