table of contents
- NAME
- SYNOPSIS
- DESCRIPTION
- FILE FORMAT
- CPU STATUS COLUMN SETTINGS
- DISK STATUS COLUMN SETTINGS
- MEMORY STATUS COLUMN SETTINGS
- PROCS STATUS COLUMN SETTINGS
- MSGS STATUS COLUMN SETTINGS
- FILES STATUS COLUMN SETTINGS
- PORTS STATUS COLUMN SETTINGS
- SVCS status (Microsoft Windows clients)
- DS - RRD based status override
- MQ Series SETTINGS
- CHANGING THE DEFAULT SETTINGS
- RULES TO SELECT HOSTS
- DIRECTING ALERTS TO GROUPS
- RULES: APPLYING SETTINGS TO SELECTED HOSTS
- NOTES
- SEE ALSO
ANALYSIS.CFG(5) | File Formats Manual | ANALYSIS.CFG(5) |
NAME¶
analysis.cfg - Configuration file for the xymond_client moduleSYNOPSIS¶
~Xymon/server/etc/analysis.cfgDESCRIPTION¶
The analysis.cfg file controls what color is assigned to the status-messages that are generated from the Xymon client data - typically the cpu, disk, memory, procs- and msgs-columns. Color is decided on the basis of some settings defined in this file; settings apply to specific hosts through a set of rules. Note: This file is only used on the Xymon server - it is not used by the Xymon client, so there is no need to distribute it to your client systems.FILE FORMAT¶
Blank lines and lines starting with a hash mark (#) are treated as comments and ignored.CPU STATUS COLUMN SETTINGS¶
LOAD warnlevel paniclevel If the system load exceeds "warnlevel" or "paniclevel", the "cpu" status will go yellow or red, respectively. These are decimal numbers. Defaults: warnlevel=5.0, paniclevel=10.0 UP bootlimit toolonglimit [color] The cpu status goes yellow/red if the system has been up for less than "bootlimit" time, or longer than "toolonglimit". The time is in minutes, or you can add h/d/w for hours/days/weeks - eg. "2h" for two hours, or "4w" for 4 weeks. Defaults: bootlimit=1h, toolonglimit=-1 (infinite), color=yellow. CLOCK max.offset [color] The cpu status goes yellow/red if the system clock on the client differs more than "max.offset" seconds from that of the Xymon server. Note that this is not a particularly accurate test, since it is affected by network delays between the client and the server, and the load on both systems. You should therefore not rely on this being accurate to more than +/- 5 seconds, but it will let you catch a client clock that goes completely wrong. The default is NOT to check the system clock.-
LOAD 5 10 UP 10m 4w yellow CLOCK 15 red
DISK STATUS COLUMN SETTINGS¶
DISK filesystem warnlevel paniclevelMEMORY STATUS COLUMN SETTINGS¶
MEMPHYS warnlevel paniclevel-
MEMSWAP 20 40 MEMACT 90 90 MEMPHYS 101 101
-
MEMPHYS warnlevel=100 paniclevel=101 (i.e. it will never go red). MEMSWAP warnlevel=50 paniclevel=80 MEMACT warnlevel=90 paniclevel=97
PROCS STATUS COLUMN SETTINGS¶
PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] The "ps" listing sent by the client will be scanned for how many processes containing "processname" are running, and this is then matched against the min/max settings defined here. If the running count is outside the thresholds, the color of the "procs" status changes to "color". To check for a process that must NOT be running: Set minimum and maximum to 0. "processname" can be a simple string, in which case this string must show up in the "ps" listing as a command. The scanner will find a ps-listing of e.g. "/usr/sbin/cron" if you only specify "processname" as "cron". "processname" can also be a Perl-compatiable regular expression, e.g. "%java.*inst[0123]" can be used to find entries in the ps-listing for "java -Xmx512m inst2" and "java -Xmx256 inst3". In that case, "processname" must begin with "%" followed by the regular expression. Note that Xymon defaults to case-insensitive pattern matching; if that is not what you want, put "(?-i)" between the "%" and the regular expression to turn this off. E.g. "%(?-i)HTTPD" will match the word HTTPD only when it is upper-case.PROC "%xymond_channel --channel=data.*xymond_rrd" 1 1 yellow or
PROC "java -DCLASSPATH=/opt/java/lib" 2 5 You can have multiple "PROC" entries for the same host, all of the checks are merged into the "procs" status and the most severe check defines the color of the status. The optional TRACK=id setting causes Xymon to track the number of processes found in an RRD file, and put this into a graph which is shown on the "procs" status display. The id setting is a simple text string which will be used as the legend for the graph, and also as part of the RRD filename. It is recommended that you use only letters and digits for the ID.
MSGS STATUS COLUMN SETTINGS¶
LOG logfilename pattern [COLOR=color] [IGNORE=excludepattern] [OPTIONAL] The Xymon client extracts interesting lines from one or more logfiles - see the client-local.cfg(5) man-page for information about how to configure which logs a client should look at. The LOG setting determine how these extracts of log entries are processed, and what warnings or alerts trigger as a result. "logfilename" is the name of the logfile. Only logentries from this filename will be matched against this rule. Note that "logfilename" can be a regular expression (if prefixed with a '%' character). "pattern" is a string or regular expression. If the logfile data matches "pattern", it will trigger the "msgs" column to change color. If no "color" parameter is present, the default is to go "red" when the pattern is matched. To match against a regular expression, "pattern" must begin with a '%' sign - e.g "%WARNING|NOTICE" will match any lines containing either of these two words. Note that Xymon defaults to case-insensitive pattern matching; if that is not what you want, put "(?-i)" between the "%" and the regular expression to turn this off. E.g. "%(?-i)WARNING" will match the word WARNING only when it is upper-case. "excludepattern" is a string or regular expression that can be used to filter out any unwanted strings that happen to match "pattern". The OPTIONAL keyword causes the check to be skipped if the logfile does not exist. Example: Trigger a red alert when the string "ERROR" appears in the "/var/adm/syslog" file:FILES STATUS COLUMN SETTINGS¶
FILE filename [color] [things to check] [OPTIONAL] [TRACK] DIR directoryname [color] [size<MAXSIZE] [size>MINSIZE] [TRACK] These entries control the status of the "files" column. They allow you to check on various data for files and directories. filename and directoryname are names of files or directories, with a full path. You can use a regular expression to match the names of files and directories reported by the client, if you prefix the expression with a '%' character. color is the color that triggers when one or more of the checks fail. The OPTIONAL keyword causes this check to be skipped if the file does not exist. E.g. you can use this to check if files that should be temporary are not deleted, by checking that they are not older than the max time you would expect them to stick around, and then using OPTIONAL to ignore the state where no files exist. The TRACK keyword causes the size of the file or directory to be tracked in an RRD file, and presented in a graph on the "files" status display. For files, you can check one or more of the following:- noexist
- triggers a warning if the file exists. By default, a warning is triggered for files that have a FILE entry, but which do not exist.
- ifexist
- only checks the file if it exists. If the file is reported as missing by the client, it is ignored.
- type=TYPE
- where TYPE is one of "file", "dir", "char", "block", "fifo", or "socket". Triggers warning if the file is not of the specified type.
- ownerid=OWNER
- triggers a warning if the owner does not match what is listed here. OWNER is specified either with the numeric uid, or the user name.
- groupid=GROUP
- triggers a warning if the group does not match what is listed here. GROUP is specified either with the numeric gid, or the group name.
- mode=MODE
- triggers a warning if the file permissions are not as listed. MODE is written in the standard octal notation, e.g. "644" for the rw-r--r-- permissions.
- size<MAX.SIZE and size>MIN.SIZE
- triggers a warning it the file size is greater than MAX.SIZE or less than MIN.SIZE, respectively. For filesizes, you can use the letters "K", "M", "G" or "T" to indicate that the filesize is in Kilobytes, Megabytes, Gigabytes or Terabytes, respectively. If there is no such modifier, Kilobytes is assumed. E.g. to warn if a file grows larger than 1MB, use size<1024M.
- mtime>MIN.MTIME mtime<MAX.MTIME
- checks how long ago the file was last modified (in seconds). E.g. to check if a file was updated within the past 10 minutes (600 seconds): mtime<600. Or to check that a file has NOT been updated in the past 24 hours: mtime>86400.
- mtime=TIMESTAMP
- checks if a file was last modified at TIMESTAMP. TIMESTAMP is a unix epoch time (seconds since midnight Jan 1 1970 UTC).
- ctime>MIN.CTIME, ctime<MAX.CTIME, ctime=TIMESTAMP
- acts as the mtime checks, but for the ctime timestamp (when the directory entry of the file was last changed, eg. by chown, chgrp or chmod).
- md5=MD5SUM, sha1=SHA1SUM, rmd160=RMD160SUM
- trigger a warning if the file checksum using the MD5, SHA1 or RMD160 message digest algorithms do not match the one configured here. Note: The "file" entry in the client-local.cfg(5) file must specify which algorithm to use.
- size<MAX.SIZE and size>MIN.SIZE
- triggers a warning it the directory size is greater than MAX.SIZE or less than MIN.SIZE, respectively. Directory sizes are reported in whatever unit the du command on the client uses - often KB or diskblocks - so MAX.SIZE and MIN.SIZE must be given in the same unit.
PORTS STATUS COLUMN SETTINGS¶
PORT criteria [MIN=mincount] [MAX=maxcount] [COLOR=color] [TRACK=id] [TEXT=displaytext] The "netstat" listing sent by the client will be scanned for how many sockets match the criteria listed. The criteria you can use are:- LOCAL=addr
- "addr" is a (partial) local address specification in the format used on the output from netstat.
- EXLOCAL=addr
- Exclude certain local addresses from the rule.
- REMOTE=addr
- "addr" is a (partial) remote address specification in the format used on the output from netstat.
- EXREMOTE=addr
- Exclude certain remote addresses from the rule.
- STATE=state
- Causes only the sockets in the specified state to be included, "state" is usually LISTEN or ESTABLISHED but can be any socket state reported by the clients "netstat" command.
- EXSTATE=state
- Exclude certain states from the rule.
PORT LOCAL=%[.:]22$ STATE=LISTEN "TEXT=SSH listener"
PORT LOCAL=%[.:]22$ STATE=ESTABLISHED MAX=5 TRACK=ssh TEXT=SSH
SVCS status (Microsoft Windows clients)¶
SVC servicename status=(started|stopped) [startup=automatic|disabled|manual]DS - RRD based status override¶
DS column filename:dataset rules COLOR=colorname TEXT=explanation "column" is the statuscolumn that will be modified. "filename" is the name of the RRD file holding the data you use for comparison. "dataset" is the name of the dataset in the RRD file - the "rrdtool info" command is useful when determining these. "rules" determine when to apply the override. You can use ">", ">=", "<" or "<=" to compare the current measurement value against one or more thresholds. "explanation" is a text that will be shown to explain the override - you can use some placeholders in the text: "&N" is replaced with the name of the dataset, "&V" is replaced with the current value, "&L" is replaced by the low threshold, "&U" is replaced with the upper threshold. NOTE: This rule uses the raw data value from a client to examine the rules. So this type of test is only really suitable for datasets that are of the "GAUGE" type. It cannot be used meaningfully for datasets that use "COUNTER" or "DERIVE" - e.g. the datasets that are used to capture network packet traffic - because the data stored in the RRD for COUNTER-based datasets undergo a transformation (calculation) when going into the RRD. Xymon does not have direct access to the calculated data. Example: Flag "conn" status a yellow if responsetime exceeds 100 msec.MQ Series SETTINGS¶
MQ_QUEUE queuename [age-warning=N] [age-critical=N] [depth-warning=N] [depth-critical=N]CHANGING THE DEFAULT SETTINGS¶
If you would like to use different defaults for the settings described above, then you can define the new defaults after a DEFAULT line. E.g. this would explicitly define all of the default settings:-
DEFAULT UP 1h LOAD 5.0 10.0 DISK * 90 95 MEMPHYS 100 101 MEMSWAP 50 80 MEMACT 90 97
RULES TO SELECT HOSTS¶
All of the settings can be applied to a group of hosts, by preceding them with rules. A rule defines of one of more filters using these keywords (note that this is identical to the rule definitions used in the alerts.cfg(5) file). PAGE=targetstring Rule matching an alert by the name of the page in Xymon. "targetstring" is the path of the page as defined in the hosts.cfg file. E.g. if you have this setup:-
page servers All Servers subpage web Webservers 10.0.0.1 www1.foo.com subpage db Database servers 10.0.0.2 db1.foo.com
-
group Web 10.0.0.1 www1.foo.com 10.0.0.2 www2.foo.com group Production databases 10.0.1.1 db1.foo.com
DIRECTING ALERTS TO GROUPS¶
For some tests - e.g. "procs" or "msgs" - the right group of people to alert in case of a failure may be different, depending on which of the client rules actually detected a problem. E.g. if you have PROCS rules for a host checking both "httpd" and "sshd" processes, then the Web admins should handle httpd-failures, whereas "sshd" failures are handled by the Unix admins. To handle this, all rules can have a "GROUP=groupname" setting. When a rule with this setting triggers a yellow or red status, the groupname is passed on to the Xymon alerts module, so you can use it in the alert rule definitions in alerts.cfg(5) to direct alerts to the correct group of people.RULES: APPLYING SETTINGS TO SELECTED HOSTS¶
Rules must be placed after the settings, e.g.-
LOAD 8.0 12.0 HOST=db.foo.com TIME=*:0800:1600
-
HOST=%db.*.foo.com TIME=W:0800:1600 LOAD 8.0 12.0 DISK /db 98 100 PROC mysqld 1
-
HOST=%.*.foo.com LOAD 7.0 12.0 HOST=bax.foo.com LOAD 3.0 8.0
NOTES¶
For the LOG, FILE and DIR checks, it is necessary also to configure the actual file- and directory-names in the client-local.cfg(5) file. If the filenames are not listed there, the clients will not collect any data about these files/directories, and the settings in the analysis.cfg file will be silently ignored. The ability to compute file checksums with MD5, SHA1 or RMD160 should not be used for general-purpose file integrity checking, since the overhead of calculating these on a large number of files can be significant. If you need this, look at tools designed for this purpose - e.g. Tripwire or AIDE. At the time of writing (april 2006), the SHA-1 and RMD160 algorithms are considered cryptographically safe. The MD5 algorithm has been shown to have some weaknesses, and is not considered strong enough when a high level of security is required.SEE ALSO¶
xymond_client(8), client-local.cfg(5), xymond(8), xymon(7)Version 4.3.17: 23 Feb 2014 | Xymon |