NAME¶
gmond.conf - configuration file for ganglia monitoring daemon (gmond)
DESCRIPTION¶
The gmond.conf file is used to configure the ganglia monitoring daemon (gmond)
which is part of the
Ganglia Distributed Monitoring System.
SECTIONS AND ATTRIBUTES¶
All sections and attributes are case-insensitive. For example,
name or
NAME or
Name or
NaMe are all equivalent.
Some sections can be included in the configuration file multiple times and some
sections are singular. For example, you can have only one
cluster
section to define the attributes of the cluster being monitored; however, you
can have multiple
udp_recv_channel sections to allow gmond to receive
message on multiple UDP channels.
cluster¶
There should only be one
cluster section defined. This section controls
how gmond reports the attributes of the cluster that it is part of.
The
cluster section has four attributes:
name,
owner,
latlong and
url.
For example,
cluster {
name = "Millennium Cluster"
owner = "UC Berkeley CS Dept."
latlong = "N37.37 W122.23"
url = "http://www.millennium.berkeley.edu/"
}
The
name attributes specifies the name of the cluster of machines. The
owner tag specifies the administrators of the cluster. The pair
name/
owner should be unique to all clusters in the world.
The
latlong attribute is the latitude and longitude GPS coordinates of
this cluster on earth. Specified to 1 mile accuracy with two decimal places
per axis in decimal.
The
url for more information on the
cluster. Intended to give
purpose, owner, administration, and account details for this cluster.
There directives directly control the XML output of gmond. For example, the
cluster configuration example above would translate into the following XML.
<CLUSTER NAME="Millennium Cluster" OWNER="UC Berkeley CS Dept."
LATLONG="N37.37 W122.23" URL="http://www.millennium.berkeley.edu/">
...
</CLUSTER>
host¶
The
host section provides information about the host running this
instance of
gmond. Currently only the
location string attribute
is supported. Example:
host {
location = "1,2,3"
}
The numbers represent Rack, Rank and Plane respectively.
globals¶
The
globals section controls general characteristics of gmond such as
whether is should daemonize, what user it should run as, whether is should
send/receive date and such. The
globals section has the following
attributes:
daemonize,
setuid,
user,
debug_level,
mute,
deaf,
allow_extra_data,
host_dmax,
host_tmax,
cleanup_threshold,
gexec,
send_metadata_interval and
module_dir.
For example,
globals {
daemonize = true
setuid = true
user = nobody
host_dmax = 3600
host_tmax = 40
}
The
daemonize attribute is a boolean. When true,
gmond will
daemonize. When false,
gmond will run in the foreground.
The
setuid attribute is a boolean. When true,
gmond will set its
effective UID to the uid of the user specified by the
user attribute.
When false,
gmond will not change its effective user.
The
debug_level is an integer value. When set to zero (0),
gmond
will run normally. A
debug_level greater than zero will result in
gmond running in the foreground and outputting debugging information.
The higher the
debug_level the more verbose the output.
The
mute attribute is a boolean. When true,
gmond will not send
data regardless of any other configuration directives.
The
deaf attribute is a boolean. When true,
gmond will not receive
data regardless of any other configuration directives.
The
allow_extra_data attribute is a boolean. When false,
gmond
will not send out the EXTRA_ELEMENT and EXTRA_DATA parts of the XML. This
might be useful if you are using your own frontend to the metric data and will
like to save some bandwith.
The
host_dmax value is an integer with units in seconds. When set to zero
(0),
gmond will never delete a host from its list even when a remote
host has stopped reporting. If
host_dmax is set to a positive number
then
gmond will flush a host after it has not heard from it for
host_dmax seconds. By the way, dmax means "delete max".
The
host_tmax value is an integer with units in seconds. This value
represents the maximum amount of time that
gmond should wait between
updates from a host. As messages may get lost in the network,
gmond
will consider the host as being down if it has not received any messages from
it after 4 times this value. For example, if
host_tmax is set to 20,
the host will appear as down after 80 seconds with no messages from it. By the
way, tmax means "timeout max".
The
cleanup_threshold is the minimum amount of time before gmond will
cleanup any hosts or metrics where
tn >
dmax a.k.a. expired
data.
The
gexec boolean allows you to specify whether gmond will announce the
hosts availability to run gexec jobs.
Note: this requires that
gexecd is running on the host and the proper keys have been installed.
The
send_metadata_interval establishes an interval in which gmond will
send or resend the metadata packets that describe each enabled metric. This
directive by default is set to 0 which means that gmond will only send the
metadata packets at startup and upon request from other gmond nodes running
remotely. If a new machine running gmond is added to a cluster, it needs to
announce itself and inform all other nodes of the metrics that it currently
supports. In multicast mode, this isn't a problem because any node can request
the metadata of all other nodes in the cluster. However in unicast mode, a
resend interval must be established. The interval value is the minimum number
of seconds between resends.
The
override_hostname and
override_ip parameters allow an
arbitrary hostname and/or IP (hostname can be optionally specified without IP)
to use when identifying metrics coming from this host.
The
module_dir is an optional parameter indicating the directory where
the DSO modules are to be located. If absent, the value to use is set at
configure time with the --with-moduledir option which will default if omitted
to the a subdirectory named "ganglia" in the directory where
libganglia will be installed.
For example, in a 32-bit Intel compatible Linux host that is usually:
/usr/lib/ganglia
udp_send_channel¶
You can define as many
udp_send_channel sections as you like within the
limitations of memory and file descriptors. If
gmond is configured as
mute this section will be ignored.
The
udp_send_channel has a total of seven attributes:
mcast_join,
mcast_if,
host,
port,
ttl,
bind and
bind_hostname.
bind and
bind_hostname are mutually
exclusive.
For example, the 2.5.x version gmond would send on the following single channel
by default...
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
}
The
mcast_join and
mcast_if attributes are optional. When
specified
gmond will create the UDP socket and join the
mcast_join multicast group and send data out the interface specified by
mcast_if.
You can use the
bind attribute to bind to a particular local address to
be used as the source for the multicast packets sent or let gmond resolve the
default hostname if
bind_hostname = yes.
If only a
host and
port are specified then
gmond will send
unicast UDP messages to the hosts specified.
You could specify multiple unicast hosts for redundancy as
gmond will
send UDP messages to all UDP channels.
Be careful though not to mix multicast and unicast attributes in the same
udp_send_channel definition.
For example...
udp_send_channel {
host = host.foo.com
port = 2389
}
udp_send_channel {
host = 192.168.3.4
port = 2344
}
would configure gmond to send messages to two hosts. The
host
specification can be an IPv4/IPv6 address or a resolvable hostname.
The
ttl attribute lets you modify the Time-To-Live (TTL) of outgoing
messages (unicast or multicast).
udp_recv_channel¶
You can specify as many
udp_recv_channel sections as you like within the
limits of memory and file descriptors. If
gmond is configured
deaf this attribute will be ignored.
The
udp_recv_channel section has following attributes:
mcast_join,
bind,
port,
mcast_if,
family,
retry_bind
and
buffer. The
udp_recv_channel can also have an
acl
definition (see ACCESS CONTROL LISTS below).
For example, the 2.5.x gmond ran with a single udp receive channel...
udp_recv_channel {
mcast_join = 239.2.11.71
bind = 239.2.11.71
port = 8649
}
The
mcast_join and
mcast_if should only be used if you want to
have this UDP channel receive multicast packets the multicast group
mcast_join on interface
mcast_if. If you do not specify
multicast attributes then
gmond will simply create a UDP server on the
specified
port.
You can use the
bind attribute to bind to a particular local address.
The family address is set to
inet4 by default. If you want to bind the
port to an
inet6 port, you need to specify that in the family
attribute. Ganglia will not allow IPV6=>IPV4 mapping (for portability and
security reasons). If you want to listen on both
inet4 and
inet6
for a particular port, explicitly state it with the following:
udp_recv_channel {
port = 8666
family = inet4
}
udp_recv_channel {
port = 8666
family = inet6
}
If you specify a bind address, the family of that address takes precedence. f
your IPv6 stack doesn't support IPV6_V6ONLY, a warning will be issued but
gmond will continue working (this should rarely happen).
Multicast Note: for multicast, specifying a
bind address with the same
value used for
mcast_join will prevent unicast UDP messages to the same
port from being processed.
The sFlow protocol (see
http://www.sflow.org) can be used to collect a standard
set of performance metrics from servers. For servers that don't include
embedded sFlow agents, an open source sFlow agent is available on SourceForge
(see
http://host-sflow.sourceforge.net).
To configure
gmond to receive sFlow datagrams, simply add a
udp_recv_channel with the
port set to 6343 (the IANA registered
port for sFlow):
udp_recv_channel {
port = 6343
}
Note: sFlow is unicast protocol, so don't include
mcast_join join. Note:
To use some other port for sFlow, set it here and then specify the port in an
sflow section (see below).
gmond will fail to run if it can't bind to all defined
udp_recv_channels. Sometimes, on machines configured by DHCP, for
example, the
gmond daemon starts before a network address is assigned
to the interface. Consequently, the bind fails and the
gmond daemon
does not run. To assist in this situation, the boolean parameter
retry_bind can be set to the value
true and then the daemon will
not abort on failure, it will enter a loop and repeat the bind attempt every
60 seconds:
udp_recv_channel {
port = 6343
retry_bind = true
}
If you have a large system with lots of metrics, you might experience UDP drops.
This happens when
gmond is not able to process the UDP fast enough from
the network. In this case you might consider changing your setup into a more
distributed setup using aggregator
gmond hosts. Alternatively you can
choose to create a bigger receive
buffer:
udp_recv_channel {
port = 6343
buffer = 10485760
}
B<buffer> is specified in bytes, i.e.: 10485760 will allow 10MB UDP
to be buffered in memory.
Note: increasing buffer size will increase memory usage by
gmond
tcp_accept_channel¶
You can specify as many
tcp_accept_channel sections as you like within
the limitations of memory and file descriptors. If
gmond is configured
to be
mute, then these sections are ignored.
The
tcp_accept_channel has the following attributes:
bind,
port,
interface,
family and
timeout. A
tcp_accept_channel may also have an
acl section specified (see
ACCESS CONTROL LISTS below).
For example, 2.5.x gmond would accept connections on a single TCP channel.
tcp_accept_channel {
port = 8649
}
The
bind address is optional and allows you to specify which local
address
gmond will bind to for this channel.
The
port is an integer than specifies which port to answer requests for
data.
The
family address is set to
inet4 by default. If you want to bind
the port to an
inet6 port, you need to specify that in the family
attribute. Ganglia will not allow IPV6=>IPV4 mapping (for portability and
security reasons). If you want to listen on both
inet4 and
inet6
for a particular port, explicitly state it with the following:
tcp_accept_channel {
port = 8666
family = inet4
}
tcp_accept_channel {
port = 8666
family = inet6
}
If you specify a bind address, the family of that address takes precedence. If
your IPv6 stack doesn't support IPV6_V6ONLY, a warning will be issued but
gmond will continue working (this should rarely happen).
The
timeout attribute allows you to specify how many microseconds to
block before closing a connection to a client. The default is set to -1
(blocking IO) and will never abort a connection regardless of how slow the
client is in fetching the report data.
The
interface is not implemented at this time (use
bind).
collection_group¶
You can specify as many
collection_group section as you like within the
limitations of memory. A
collection_group has the following attributes:
collect_once,
collect_every and
time_threshold. A
collection_group must also contain one or more
metric sections.
The
metric section has the following attributes: (one of
name or
name_match;
name_match is only permitted if pcre support is
compiled in),
value_threshold and
title. For a list of available
metric names, run the following command:
% gmond -m
Here is an example of a collection group for a static metric...
collection_group {
collect_once = yes
time_threshold = 1800
metric {
name = "cpu_num"
title = "Number of CPUs"
}
}
This
collection_group entry would cause gmond to collect the
cpu_num metric once at startup (since the number of CPUs will not
change between reboots). The metric
cpu_num would be send every 1/2
hour (1800 seconds). The default value for the
time_threshold is 3600
seconds if no
time_threshold is specified.
The
time_threshold is the maximum amount of time that can pass before
gmond sends all
metrics specified in the
collection_group to all
configured
udp_send_channels. A
metric may be sent before this
time_threshold is met if during collection the value surpasses the
value_threshold (explained below).
Here is an example of a collection group for a volatile metric...
collection_group {
collect_every = 60
time_threshold = 300
metric {
name = "cpu_user"
value_threshold = 5.0
title = "CPU User"
}
metric {
name = "cpu_idle"
value_threshold = 10.0
title = "CPU Idle"
}
}
This collection group would collect the
cpu_user and
cpu_idle
metrics every 60 seconds (specified in
collect_every). If
cpu_user varies by 5.0% or
cpu_idle varies by 10.0%, then the
entire
collection_group is sent. If no
value_threshold is
triggered within
time_threshold seconds (in this case 300), the entire
collection_group is sent.
Each time the metric value is collected the new value is compared with the old
value collected. If the difference between the last value and the current
value is greater than the
value_threshold, the entire collection group
is send to the
udp_send_channels defined.
It's important to note that all metrics in a collection group are sent even when
only a single
value_threshold is surpassed.
In addition a user friendly title can be substituted for the metric name by
including a
title within the
metric section.
By using the
name_match parameter instead of
name, it is possible
to use a single definition to configure multiple metrics that match a regular
expression. The perl compatible regular expression (pcre) syntax is used. This
approach is particularly useful for a series of metrics that may vary in
number between reboots (e.g. metric names that are generated for each
individual NIC or CPU core).
Here is an example of using the
name_match directive to enable the
multicpu metrics:
metric {
name_match = "multicpu_([a-z]+)([0-9]+)"
value_threshold = 1.0
title = "CPU-\\2 \\1"
}
Note that in the example above, there are two matches: the alphabetical match
matches the variations of the metric name (e.g.
idle,
system)
while the numeric match matches the CPU core number. The second thing to note
is the use of substitutions within the argument to
title.
If both
name and
name_match are specified, then
name is
ignored.
Modules¶
A
modules section contains the parameters that are necessary to load a
metric module. A metric module is a dynamically loadable module that extends
the available metrics that gmond is able to collect. Each
modules
section contains at least one
module section. Within a
module
section are the directives
name,
language,
enabled,
path and
params. The module
name is the name of the
module as determined by the module structure if the module was developed in
C/C++. Alternatively, the
name can be the name of the source file if
the module has been implemented in a interpreted language such as python. A
language designation must be specified as a string value for each
module. The
language directive must correspond to the source code
language in which the module was implemented (ex. language =
"python"). If a
language directive does not exist for the
module, the assumed language will be "C/C++". The
enabled
directive allows a metric module to be easily enabled or disabled through the
configuration file. If the
enabled directive is not included in the
module configuration, the enabled state will default to "yes". One
thing to note is that if a module has been disabled yet the metric which that
module implements is still listed as part of a collection group, gmond will
produce a warning message. However gmond will continue to function normally by
simply ignoring the metric. The
path is the path from which gmond is
expected to load the module (C/C++ compiled dynamically loadable module only).
The
params directive can be used to pass a single string parameter
directly to the module initialization function (C/C++ module only). Multiple
parameters can be passed to the module's initialization function by including
one or more
param sections. Each
param section must be named and
contain a
value directive. Once a module has been loaded, the
additional metrics can be discovered by invoking
gmond -m.
modules {
module {
name = "example_module"
language = "C/C++"
enabled = yes
path = "modexample.so"
params = "An extra raw parameter"
param RandomMax {
value = 75
}
param ConstantValue {
value = 25
}
}
}
sFlow¶
The
sflow group is optional and has the following optional attributes:
udp_port,
accept_vm_metrics,
accept_http_metrics,
accept_memcache_metrics,
accept_jvm_metrics,
multiple_http_instances,
multiple_memcache_instances,
multiple_jvm_instances. By default, a
udp_recv_channel on port
6343 (the IANA registered port for sFlow) is all that is required to accept
and process sFlow datagrams. To receive sFlow on some other port requires both
a
udp_recv_channel for the other port and a
udp_port setting
here. For example:
udp_recv_channel {
port = 7343
}
sflow {
udp_port = 7343
}
An sFlow agent running on a hypervisor may also be sending metrics for its local
virtual machines. By default these metrics are ignored, but the
accept_vm_metrics flag can be used to accept those metrics too, and
prefix them with an identifier for each virtual machine.
sflow {
accept_vm_metrics = yes
}
The sFlow feed may also contain metrics sent from HTTP or memcached servers, or
from Java VMs. Extra options can be used to ignore or accept these metrics,
and to indicate that there may be multiple instances per host. For example:
sflow {
accept_http_metrics = yes
multiple_http_instances = yes
}
will allow the HTTP metrics, and also mark them with a distinguishing identifier
so that each instance can be trended separately. (If multiple instances are
reporting and this flag is not set, the results are likely to be garbled.)
Include¶
This directive allows the user to include additional configuration files rather
than having to add all gmond configuration directives to the gmond.conf file.
The following example includes any file with the extension of .conf contained
in the directory conf.d as if the contents of the included configuration files
were part of the original gmond.conf file. This allows the user to modularize
their configuration file. One usage example might be to load individual metric
modules by including module specific .conf files.
include ('/etc/ganglia/conf.d/*.conf')
ACCESS CONTROL¶
The
udp_recv_channel and
tcp_accept_channel directives can contain
an Access Control List (ACL). This ACL allows you to specify exactly which
hosts gmond process data from.
An example of an
acl entry looks like
acl {
default = "deny"
access {
ip = 192.168.0.4
mask = 32
action = "allow"
}
}
This ACL will by default reject all traffic that is not specifically from host
192.168.0.4 (the mask size for an IPv4 address is 32, the mask size for an
IPv6 address is 128 to represent a single host).
Here is another example
acl {
default = "allow"
access {
ip = 192.168.0.0
mask = 24
action = "deny"
}
access {
ip = ::ff:1.2.3.0
mask = 120
action = "deny"
}
}
This ACL will by default allow all traffic unless it comes from the two subnets
specified with action = "deny".
EXAMPLE¶
The default behavior for a 2.5.x gmond would be specified as...
udp_recv_channel {
mcast_join = 239.2.11.71
bind = 239.2.11.71
port = 8649
}
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
}
tcp_accept_channel {
port = 8649
}
To see the complete default configuration for gmond simply run:
% gmond -t
gmond will print out its default behavior in a configuration file and then exit.
Capturing this output to a file can serve as a useful starting point for
creating your own custom configuration.
% gmond -t > custom.conf
edit
custom.conf to taste and then
% gmond -c ./custom.conf
SEE ALSO¶
gmond(1).
NOTES¶
The ganglia web site is at
http://ganglia.info/.
COPYRIGHT¶
Copyright (c) 2005 The University of California, Berkeley