.TH heart 3erl "kernel 8.5.4.2" "Ericsson AB" "Erlang Module Definition" .SH NAME heart \- Heartbeat monitoring of an Erlang runtime system. .SH DESCRIPTION .LP This modules contains the interface to the \fIheart\fR\& process\&. \fIheart\fR\& sends periodic heartbeats to an external port program, which is also named \fIheart\fR\&\&. The purpose of the \fIheart\fR\& port program is to check that the Erlang runtime system it is supervising is still running\&. If the port program has not received any heartbeats within \fIHEART_BEAT_TIMEOUT\fR\& seconds (defaults to 60 seconds), the system can be rebooted\&. .LP An Erlang runtime system to be monitored by a heart program is to be started with command-line flag \fI-heart\fR\& (see also \fIerl(1)\fR\&)\&. The \fIheart\fR\& process is then started automatically: .LP .nf % erl -heart \&.\&.\&. .fi .LP If the system is to be rebooted because of missing heartbeats, or a terminated Erlang runtime system, environment variable \fIHEART_COMMAND\fR\& must be set before the system is started\&. If this variable is not set, a warning text is printed but the system does not reboot\&. .LP To reboot on Windows, \fIHEART_COMMAND\fR\& can be set to \fIheart -shutdown\fR\& (included in the Erlang delivery) or to any other suitable program that can activate a reboot\&. .LP The environment variable \fIHEART_BEAT_TIMEOUT\fR\& can be used to configure the heart time-outs; it can be set in the operating system shell before Erlang is started or be specified at the command line: .LP .nf % erl -heart -env HEART_BEAT_TIMEOUT 30 \&.\&.\&. .fi .LP The value (in seconds) must be in the range 10 < X <= 65535\&. .LP When running on OSs lacking support for monotonic time, \fIheart\fR\& is susceptible to system clock adjustments of more than \fIHEART_BEAT_TIMEOUT\fR\& seconds\&. When this happens, \fIheart\fR\& times out and tries to reboot the system\&. This can occur, for example, if the system clock is adjusted automatically by use of the Network Time Protocol (NTP)\&. .LP If a crash occurs, an \fIerl_crash\&.dump\fR\& is \fInot\fR\& written unless environment variable \fIERL_CRASH_DUMP_SECONDS\fR\& is set: .LP .nf % erl -heart -env ERL_CRASH_DUMP_SECONDS 10 \&.\&.\&. .fi .LP If a regular core dump is wanted, let \fIheart\fR\& know by setting the kill signal to abort using environment variable \fIHEART_KILL_SIGNAL=SIGABRT\fR\&\&. If unset, or not set to \fISIGABRT\fR\&, the default behavior is a kill signal using \fISIGKILL\fR\&: .LP .nf % erl -heart -env HEART_KILL_SIGNAL SIGABRT \&.\&.\&. .fi .LP If heart should \fInot\fR\& kill the Erlang runtime system, this can be indicated using the environment variable \fIHEART_NO_KILL=TRUE\fR\&\&. This can be useful if the command executed by heart takes care of this, for example as part of a specific cleanup sequence\&. If unset, or not set to \fITRUE\fR\&, the default behaviour will be to kill as described above\&. .LP .nf % erl -heart -env HEART_NO_KILL 1 \&.\&.\&. .fi .LP Furthermore, \fIERL_CRASH_DUMP_SECONDS\fR\& has the following behavior on \fIheart\fR\&: .RS 2 .TP 2 .B \fIERL_CRASH_DUMP_SECONDS=0\fR\&: Suppresses the writing of a crash dump file entirely, thus rebooting the runtime system immediately\&. This is the same as not setting the environment variable\&. .TP 2 .B \fIERL_CRASH_DUMP_SECONDS=-1\fR\&: Setting the environment variable to a negative value does not reboot the runtime system until the crash dump file is completely written\&. .TP 2 .B \fIERL_CRASH_DUMP_SECONDS=S\fR\&: \fIheart\fR\& waits for \fIS\fR\& seconds to let the crash dump file be written\&. After \fIS\fR\& seconds, \fIheart\fR\& reboots the runtime system, whether the crash dump file is written or not\&. .RE .LP In the following descriptions, all functions fail with reason \fIbadarg\fR\& if \fIheart\fR\& is not started\&. .SH DATA TYPES .nf \fBheart_option()\fR\& = check_schedulers .br .fi .SH EXPORTS .LP .nf .B set_cmd(Cmd) -> ok | {error, {bad_cmd, Cmd}} .br .fi .br .RS .LP Types: .RS 3 Cmd = string() .br .RE .RE .RS .LP Sets a temporary reboot command\&. This command is used if a \fIHEART_COMMAND\fR\& other than the one specified with the environment variable is to be used to reboot the system\&. The new Erlang runtime system uses (if it misbehaves) environment variable \fIHEART_COMMAND\fR\& to reboot\&. .LP Limitations: Command string \fICmd\fR\& is sent to the \fIheart\fR\& program as an ISO Latin-1 or UTF-8 encoded binary, depending on the filename encoding mode of the emulator (see \fIfile:native_name_encoding/0\fR\&)\&. The size of the encoded binary must be less than 2047 bytes\&. .RE .LP .nf .B clear_cmd() -> ok .br .fi .br .RS .LP Clears the temporary boot command\&. If the system terminates, the normal \fIHEART_COMMAND\fR\& is used to reboot\&. .RE .LP .nf .B get_cmd() -> {ok, Cmd} .br .fi .br .RS .LP Types: .RS 3 Cmd = string() .br .RE .RE .RS .LP Gets the temporary reboot command\&. If the command is cleared, the empty string is returned\&. .RE .LP .nf .B set_callback(Module, Function) -> .B ok | {error, {bad_callback, {Module, Function}}} .br .fi .br .RS .LP Types: .RS 3 Module = Function = atom() .br .RE .RE .RS .LP This validation callback will be executed before any heartbeat is sent to the port program\&. For the validation to succeed it needs to return with the value \fIok\fR\&\&. .LP An exception within the callback will be treated as a validation failure\&. .LP The callback will be removed if the system reboots\&. .RE .LP .nf .B clear_callback() -> ok .br .fi .br .RS .LP Removes the validation callback call before heartbeats\&. .RE .LP .nf .B get_callback() -> {ok, {Module, Function}} | none .br .fi .br .RS .LP Types: .RS 3 Module = Function = atom() .br .RE .RE .RS .LP Get the validation callback\&. If the callback is cleared, \fInone\fR\& will be returned\&. .RE .LP .nf .B set_options(Options) -> ok | {error, {bad_options, Options}} .br .fi .br .RS .LP Types: .RS 3 Options = [heart_option()] .br .RE .RE .RS .LP Valid options \fIset_options\fR\& are: .RS 2 .TP 2 .B \fIcheck_schedulers\fR\&: If enabled, a signal will be sent to each scheduler to check its responsiveness\&. The system check occurs before any heartbeat sent to the port program\&. If any scheduler is not responsive enough the heart program will not receive its heartbeat and thus eventually terminate the node\&. .RE .LP Returns with the value \fIok\fR\& if the options are valid\&. .RE .LP .nf .B get_options() -> {ok, Options} | none .br .fi .br .RS .LP Types: .RS 3 Options = [atom()] .br .RE .RE .RS .LP Returns \fI{ok, Options}\fR\& where \fIOptions\fR\& is a list of current options enabled for heart\&. If the callback is cleared, \fInone\fR\& will be returned\&. .RE