HA.CF(5) | Configuration Files | HA.CF(5) |
NAME¶
ha.cf - Configuration file for the Heartbeat cluster messaging layerDESCRIPTION¶
/etc/ha.d/ha.cf is read by heartbeat(8) upon node start-up. It lists the communication facilities enabled between nodes, enables or disables certain features, and optionally lists the cluster nodes by host name. This file can safely be made world readable, but should be writable only by root.GLOBAL DIRECTIVES¶
Some directives in ha.cf are global in nature. The order of these global options is important in configuring the ha.cf file, since each directive is interpreted as it is encountered in ha.cf. These directives are use_logd and udpport. It is recommended that these be placed first in the ha.cf file when they are entered. Other directives in this category are baud, logfacility, logfile, and debugfile, but those directives are deprecated and should no longer be used.SUPPORTED DIRECTIVES¶
The following directives are supported in ha.cf (listed here in alphabetical order): apiauthThis directive specifies what users and/or groups are
allowed to connect to a specific API group name. The syntax is simple:
You can specify either a uid list, or a gid list, or both. However you must
specify either a uid list or a gid list. If you include both a uid list and a
gid list, then a process is authorized to connect to that API group if if it
is either in the uid-list or it is in the gid-list.
The API group name default has special meaning. If it is specified, it will be
used for authorizing clients without any API group name, and all client groups
not identified by any other apiauth directive.
Unless you specify otherwise in the ha.cf file, certain services will be
provided default authorizations as follows:
Table 1. Default service authorizations
autojoin
apiauth apigroupname [uid=uid1,uid2 ...] [gid=gid1,gid2 ...]
Service | Default apiauth |
ipfail | uid=hacluster |
ccm | gid=haclient |
ping | gid=haclient |
cl_status | gid=haclient |
lha-snmpagent | uid=root |
crmd | uid=hacluster |
The autojoin directive enables nodes to join
automatically just by communicating with the cluster, hence not requiring node
directives in the ha.cf file. Since our communication is normally strongly
authenticated, only nodes which know the cluster key can join (automatically
or otherwise).
The values you can give for the autojoin directive have the following meanings:
bcast
•none: disables automatic joining.
•other: allows nodes other than ourself who are
not listed in ha.cf to join automatically. In other words, our node has to be
listed in ha.cf, but other nodes do not.
•any: allows any node to join automatically
without being listed in ha.cf, even the current node.
Note that the set of nodes currently considered part of the cluster is kept in
the hostcache file. With autojoin enabled, the node directive is no longer
authoritative - the hostcache file is.The bcast directive is used to configure which interfaces
Heartbeat sends UDP broadcast traffic on. More than one interface can be
specified on the line. The udpport directive is used to configure which port
is used for these broadcast communications if the udpport directive is
specified before the bcast directive, otherwise the default port will be used.
A couple of sample bcast lines are shown below.
Note
Broadcast links are not supported in Pacemaker clusters on BSD systems.
compression
bcast eth0 eth1 # on Linux systems bcast le0 # for Solaris systems
The compression directive sets which compression method
will be used when a message is big and compression is needed.
It could be either zlib or bz2, depending on whether you have the corresponding
library in the system. You can check /usr/lib/heartbeat/plugins/HBcompress,
or, with recent (>= 1.0.10) cluster-glue packages,
/usr/lib/heartbeat/plugins/compress, to see what compression module is
available.
If this directive is not set, there will be no compression.
compression_threshold
The compression_threshold directive sets the threshold to
compress a message, e.g. if the threshold is 1, then any message with size
greater than 1 KB will be compressed. The default is 2 (KB). This directive
only makes sense if you have set the compression directive.
conn_logd_time
The conn_logd_time directive specifies the time Heartbeat
will reconnect to the logging daemon if the connection between Heartbeat and
the logging daemon is broken. The conn_logd_time is specified according to the
Heartbeat time syntax, for example:
The default is 60 seconds.
Note
Heartbeat will not automatically reconnect to the logging daemon. It only tries
to reconnect when it needs to log a message and conn_logd_time have passed
since the last attempt to connect.
coredumps
conn_logd_time 60 #60 seconds
The coredumps directive tells Heartbeat to do things to
enable making core dumps - should it need to dump core.
The allowed values are true and false.
crm
historical, for Cluster Resource Manager, now an alias to
pacemaker
pacemaker
Enables the Pacemaker cluster manager. For historical
reasons, the default for this option is off; however, it should always
be set to respawn.
When set to respawn, the directive automatically implies:
deadtime
apiauth stonithd uid=root apiauth stonithd-ng uid=root apiauth attrd uid=hacluster apiauth crmd uid=hacluster apiauth cib uid=hacluster respawn hacluster ccm respawn hacluster cib respawn hacluster attrd respawn root stonithd respawn root lrmd respawn hacluster crmd
The deadtime directive is used to specify how quickly
Heartbeat should decide that a node in a cluster is dead. Setting this value
too low will cause the system to falsely declare itself dead. Setting it too
high will delay takeover after the failure of a node in the cluster.
debug
The debug directive is used to set the level of debugging
in effect in the system. Production systems should have their debug level set
to zero (i.e., turned off). This is the default. Legal values of the debug
option are between 0-255. The most useful values are between 0 (off) and 3.
Setting the debug level greater than 1 can have an adverse effect on the size
of your log files, and on the system's ability to send heartbeats at rapid
rates, thus affecting the cluster reliability.
The debug level of the system can also be specified on the command line using
the -d option. Additionally, the debug level of the system can be dynamically
changed by sending the heartbeat process SIGUSR1 and SIGUSR2 signals. SIGUSR1
raises the debug level, and SIGUSR2 lowers it.
hbgenmethod time|file
The hbgenmethod directive specifies how Heartbeat should
compute its current generation number for communications. This is a
specialized and obscure directive, used mainly in firewalls which have no
local disk, and other devices which do not have a method of storing data
persistently across reboots. It defaults to storing the Heartbeat generations
in a file. Generation numbers are used by Heartbeat for replay attack
protection.
Warning
If one specifies the time method, there are certain possible cases where
troubles can arise. If a machine restarts Heartbeat and its local time of day
clock is less than or equal to than the value of the time of day clock when
Heartbeat last started, then that node will be unable to join the cluster.
initdead
The initdead parameter is used to set the time that it
takes to declare a cluster node dead when Heartbeat is first started. This
parameter generally needs to be set to a higher value, because experience
suggests that it sometimes takes operating systems many seconds for their
communication systems before they operate correctly. initdead is specified
according to the Heartbeat time syntax. A sample initdead value is shown
below:
In some switched network environments, switches engage in a spanning tree
algorithm whenever a NIC connects to a port. This can take a long time to
complete, and it is only necessary if the NIC being connected is another
switch. If this is the case, you may be able to configure certain NICs as not
being switches and shrink the connection delay significantly. If not, you'll
need to raise initdead to make this problem go away.
If this is set too low, you'll see one node declare the other as dead.
keepalive
initdead 30
The keepalive directive sets the interval between
heartbeat packets. It is specified according to the Heartbeat time
syntax.
logfacility
The logfacility is used to tell Heartbeat which syslog
logging facility it should use for logging its messages.
The possible values for logfacility vary by operating system, but some of the
most common ones are {auth, authpriv, daemon, syslog, user, local0, local1,
local2, local3, local4, local5, local6, local7}.
A sample logfacility directive is shown below:
If you want to disable logging to syslog:
mcast
logfacility local7
logfacility none
The mcast directive is used to configure a multicast
communication path. The syntax of an mcast directive is:
mcast6
mcast dev mcast-group udp-port ttl 0
•dev - IP device to send/rcv heartbeats on
•mcast-group - multicast group to join (class D
multicast address 224.0.0.0 - 239.255.255.255). For most Heartbeat uses, the
first byte should be 239.
•port - UDP port to sendto/rcvfrom (set this to
the same value as udpport)
•ttl - the ttl value for outbound heartbeats. This
affects how far the multicast packet will propagate. (0-255). Set to 1 for the
current subnet. Must be greater than zero.
A sample mcast directive is shown below:
mcast eth0 239.0.0.1 694 1 0
The mcast6 directive is to configure an IPv6 multicast
communication path. The syntax of an mcast directive is:
For example, using link-local scope with some "transient" group:
msgfmt classic|netstring
mcast6 [device] [mcast6 group] [port] [mcast6 hops] [mcast6 loop]
mcast6 eth0 ff12::1:2:3:4 694 1 0
•device - IP device to send/rcv heartbeats
on
•mcast6 group - multicast group to join. Refer to
http://tools.ietf.org/html/rfc3513#section-2.7 for valid and reserved IPv6
multicast addresses.
For most heartbeat uses, addresses should be taken from:
Plausibility checking code during config file parsing will reject some, but will
probably not be able to catch all unsuitable addresses. Please understand the
IPv6 multicast addressing scheme first.
Do not use reserved or well known multicast addresses.
You likely would seriously confuse a lot of network devices.
ff12::/16
•port - UDP port to sendto/rcvfrom
•mcast6 hops - affects how far the multicast
packet will propagate (sockopt: IPV6_MULTICAST_HOPS). (0-4). Set to 1 for
link-local.
•loop - sockopt IPV6_MULTICAST_LOOP; always set to
0
The msgfmt directive specifies the format Heartbeat uses
in wire.
node
•classic - Heartbeat will convert a message into a
string and transmit in wire. Binary values are converted with a base64
library.
•netstring - Binary messages will be transmitted
directly. This is more efficient since it avoids conversion between string and
binary values.
When in doubt, leave the default (classic).The node directive tells what machines are in the
cluster. The syntax of the node directive is simple:
Node names in the directive must match the "uname -n" of that machine.
You can declare multiple node names in one directive. You can also use the
directive multiple times. Normally every node in the cluster must be listed in
the ha.cf file, including the current node, unless the autojoin directive is
enabled.
The node directive is not completely authoritative with regard to nodes
heartbeat will communicate with. If a node has ever been added in the past, it
will tend to remain in the hostcache file more until it's manually
removed.
realtime on|off
node nodename1 nodename2 ...
The realtime directive specifies whether or not Heartbeat
should try and take advantage of the operating system's realtime scheduling
features. When enabled, Heartbeat will lock itself into memory, and raise its
priority to a realtime priority (as set by the rtprio directive). This feature
is mainly used for debugging various kinds of loops which might otherwise
cripple the system and impair debugging them.
The default is on.
rtprio
The rtprio directive is used to specify the priority at
which Heartbeat runs. It does not need to be specified unless other realtime
priority programs are also running on the system. The minimum and maximum
values for this field can be determined from the
sched_get_priority_min(SCHED_FIFO) and sched_get_priority_max(SCHED_FIFO)
calls respectively. The default value for rtprio is halfway between the
minimum and maximum values.
A sample rtprio directive is shown below:
ucast
rtprio 5
The ucast directive configures Heartbeat to communicate
over a UDP unicast communications link. The udpport directive is used to
configure which port is used for these unicast communications if the udpport
directive is specified before the ucast directive, otherwise the default port
will be used.
The general syntax of a ucast directive is:
Where dev is the device to use when talking to the peer, and peer-ip-address is
the IP address we will send packets to.
A sample ucast directive is shown below:
This directive will cause us to send packets to 10.10.10.133 over interface
eth0.
Note that ucast directives which go to the local machine are effectively
ignored. This allows the ha.cf directives on all machines to be
identical.
udpport
ucast dev peer-ip-address
ucast eth0 10.10.10.133
The udpport directive specifies which port Heartbeat will
use for its UDP intra-cluster communication. There are two common reasons for
overriding this value: there are multiple bcast clusters on the same subnet,
or this port is already in use in accordance with some locally-established
policy.
The default value for this parameter is the the port ha-cluster in /etc/services
(if present), or 694 if port ha-cluster is not in /etc/services. 694 is the
IANA registered port number for Heartbeat (a.k.a. ha-cluster).
A sample udpport directive is shown below.
You have to configure udpport (in ha.cf) before you configure ucast or
bcast, if not heartbeat will use the default port (694).
Note
Due to a specification error in the syntax of the mcast directive, this
directive does not apply to mcast communications.
use_logd on|off
udpport 694
The use_logd directive specifies whether Heartbeat logs
its messages through logging daemon or not.
If the logging daemon is used, all log messages will be sent through IPC to the
logging daemon, which then writes them into log files. In case the logging
daemon dies (for whatever reason), a warning message will be logged and all
messages will be written to log files directly.
If the logging daemon is used, logfile/debugfile/logfacility in this file are
not meaningful any longer. You should check the config file for logging daemon
(the default is /etc/logd.cf).
If use_logd is not used, all log messages will be written to log files directly.
The logging daemon is started/stopped in heartbeat script.
Setting use_logd to "on" is recommended.
uuidfrom
In the normal case, heartbeat generates a UUID for each
node in the system as a way of uniquely identifying a node - even if it should
change nodenames. This UUID is typically stored in the file
/var/lib/heartbeat/hb_uuid.
For certain kinds of installations (those booting from CDs or other read-only
media), it is impossible for heartbeat to save a generated to disk as it
normally does. In these cases, one can use the uuidfrom directive to instruct
heartbeat to use the nodename as though it were a UUID, by specifying uuidfrom
nodename.
All possible legal uuidfrom directives are shown below.
warntime
uuidfrom file uuidfrom nodename
The warntime directive is used to specify how quickly
Heartbeat should issue a "late heartbeat" warning.
The warntime value is specified according to the HeartbeatTimeSyntax. A sample
warntime specification is shown below.
The warntime directive is important for tuning deadtime
warntime 10 # 10 seconds
DEPRECATED DIRECTIVES¶
The following directives are interpreted by the configuration file parser for historical reasons, but should be considered deprecated and should no longer be used. auto_failbackIn legacy Heartbeat clusters, the auto_failback option
would determine whether a resource would automatically fail back to its
"primary" node, or remain on whatever node is serving it until that
node fails, or an administrator intervenes. The possible values for
auto_failback were:
baud
•on - enable automatic failbacks
•off - disable automatic failback
•legacy - enable automatic failbacks in systems
where all nodes in the cluster do not yet support the auto_failback
option.
This option has been replaced the configurable failback policies in Pacemaker,
and should no longer be used.The baud directive is used to set the speed for serial
communications. Any of the following speeds can be specified, provided they
are supported by your operating system: 9600, 19200, 38400, 57600, 115200,
230400, 460800. The default speed is 19200.
This option is obsolete as serial links should not be used in Pacemaker
clusters.
deadping
The deadping directive is used to specify how quickly
Heartbeat should decide that a ping node in a cluster is dead. Setting this
value too low will cause the system to falsely declare the ping node dead.
Setting it too high will delay detection of communication failure.
This feature has been replaced by the more flexible pingd resource agent in
Pacemaker, and should no longer be used.
debugfile
The debugfile directive specifies the file Heartbeat will
write debug messages to.
This directive is ignored when use_logd is specified. Enabling
use_logd is the recommended approach.
hbaping
Hbaping directives are given to declare fiber channel
devices as ping nodes.
This directive was never fully supported in Heartbeat (requiring manual
modifications to the code base) and should not be used.
hopfudge
The hopfudge directive controls how many nodes a packet
can be forwarded through before it is thrown away in the worst case. However,
the hopfudge value is added to the number of nodes in the system. It defaults
to 1.
This option applies to serial links only, which are deprecated.
logfile
The logfile directive configures a log file. All
non-debug messages from Heartbeat will go into this file.
This directive is ignored when use_logd is specified. Enabling
use_logd is the recommended approach.
ping
Ping directives are given to declare ping nodes to
Heartbeat. The syntax of the ping directive is simple:
Each IP address listed in a ping directive is considered to be independent. That
is, connectivity to each node is considered to be equally important.
In order to declare that a group of nodes are equally qualified for a particular
function, and that the presence of any of them indicates successful
communication, use the ping_group directive.
This feature has been replaced by the more flexible pingd resource agent in
Pacemaker, and should no longer be used.
ping_group
ping ip-address ...
Ping group directives are given to declare a group ping
node to Heartbeat. syntax of the ping_group directive is as follows:
Each IP address listed in a ping_group directive is considered to be related,
and connectivity to any one node is considered to be connectivity to the
group.
A ping group is considered by Heartbeat to be a single cluster node
(group-name). The ability to communicate with any of the group members means
that the group-name member is reachable. This is useful when (for example) two
different routers may be used to contact the internet, depending on which is
up, or when finding an appropriate reliable single ping node is difficult.
This feature has been replaced by the more flexible pingd resource agent in
Pacemaker, and should no longer be used.
respawn
ping_group group-name ip-address ...
The respawn directive is used to specify a program to run
and monitor while it runs. If this program exits with anything other than exit
code 100, it will be automatically restarted. The first parameter is the user
id to run the program under, and the second parameter is the program to run.
Subsequent parameters will be given to the program as arguments.
This functionality was primarily designed for the legacy ipfail program, which
has been replaced by the more flexible pingd resource agent in Pacemaker.
Thus, this directive should no longer be used, except when it is implicitly
generated by pacemaker yes.
serial
The serial directive tells Heartbeat to use the specified
serial port(s) for its communication. The parameters to the serial directive
are the names of tty devices suitable for opening without waiting for carrier
first. On Linux, those ports are typically named /dev/ttySX.
A few sample serial directives are shown below:
The baud directive is used to configure the baud rate for the port(s) if the
baud directive is specified before the serial directive, otherwise the default
baud rate will be used.
Using this option is strongly discouraged in Pacemaker clusters, as its
CIB updates can easily hit practical message size limits for serial links,
with undefined results.
stonith
serial /dev/ttyS0 /dev/ttyS1 # Linux serial /dev/cuaa0 # FreeBSD serial /dev/cua/a # Solaris
The stonith directive is used to configure Heartbeat's
legacy STONITH configuration. It assumes you're going to put in a STONITH
configuration file on each machine in the cluster to configure the (single)
STONITH device that this node will use to reset the other node in the cluster.
This functionality has been replaced by STONITH agents in Pacemaker.
stonith_host
The stonith_host directive is used to configure
Heartbeat's (release 1 only), STONITH configuration. With this directive, you
put all the STONITH configuration information for the devices in your cluster
in the ha.cf file, rather than in a separate file.
This functionality has been replaced by STONITH agents in Pacemaker.
traditional_compression on|off
This directive enables traditional compression. It is
highly recommended that this be set to off (the default); otherwise heartbeat
performance can be significantly negatively impacted.
watchdog
The watchdog directive configures Heartbeat to use a
watchdog device. In some circumstances, a watchdog device can be used in place
of a STONITH device. In any case, it is a reasonable thing to configure if you
don't have a STONITH device, or if you wish, in addition to your STONITH
device.
It is the purpose of a watchdog device to shut the machine down if Heartbeat
does not hear its own heartbeats as often as it thinks it should. This keeps
things like scheduler bugs from becoming split-brain configurations.
The general syntax of a watchdog directive is:
A sample watchdog directive is shown below:
The most common watchdog device currently used with general Linux systems is the
softdog device. The softdog device is a software-based watchdog device and is
usually referred to as /dev/watchdog - although like most UNIX devices, this
is a convention not a rule.
This functionality has been replaced by cluster self-monitoring and STONITH
resource agents in Pacemaker. This directive should no longer be used.
watchdog watchdog-device-name
watchdog /dev/watchdog
REQUIRED DIRECTIVES¶
The following directives must always be present in ha.cf:•At least one communication topology directive
(bcast, mcast, or ucast);
•Either one or more node directives, or
autojoin any.
EXAMPLE¶
Below is an example ha.cf for a 2-node Pacemaker cluster with redundant network communication paths:use_logd on mcast eth0 239.0.0.42 694 1 0 bcast eth1 node alice node bob pacemaker respawn
AUTHORS¶
Alan Robertson <alanr@unix.sh>heartbeat, original Wiki page
Florian Haas <florian.haas@linbit.com>
man page
24 Nov 2009 | Heartbeat 3.0.5 |