anomaly - anomalous data detection
anomaly [-h|--help] [-v|--version] [-d|--details]
[-t|--threshold] [--min N] [--max N]
[-s|--stddev] [-n|--sample N] [-c|--coefficient N]
Anomaly can detect anomalous data in a numeric stream. In order to do this, anomaly needs to see a stream of numeric data, and apply one of its detection methods. If an anomaly is detected, a response is made, chosen from one or more built in methods.
Anomaly works best in a pipe, and will read only numeric data from its input. As a simple example, suppose you wish to monitor load average and look for unusual spikes. The load average can be obtained from the 'uptime' command:
11:40 up 15 days, 4:04, 6 users, load averages: 0.38 0.32 0.32
We can extract the 5-minute load (the second of the three numbers) using this:
That number can be extracted once a minute, using this:
That is the kind of data stream that anomaly monitors. White space (spaces, tabs, newlines) between the numbers are ignored, so we can simulate the above stream like this:
This is a convenient way to demonstrate anomaly, shown below.
DETECTION - THRESHOLD¶
The simplest detection method is threshold, which compares the data to an absolute value. This method can use a minimum and a maximum value for comparison. These alternatives are all valid, and make use of --min, --max or both:
anomaly --threshold --min 1.22
anomaly --threshold --max 9.75
In the following example, the values '1' and '10' would be detected as anomalies:
Anomalous data detected. The value 1 is below the minimum of 1.5.
Anomalous data detected. The value 10 is above the maximum of 8.
DETECTION - STANDARD DEVIATION¶
Standard deviation measures differences from the mean value of a sample of data, and is useful for detecting extraordinary values. The sample size can be chosen such that there is enough data to determine a good mean value, but defaults to 10. The limited sample size means that a rolling window of data is used, and therefore the mean and standard deviation is updated for the current window. This makes the monitoring somewhat adaptive. Here is an example:
This uses a sample size of the 20 most recent values, and will detect any values that are +/- 1 standard deviation from the mean. An example:
Anomalous data detected. The value 6 is more than 1 sigma(s) above the mean value 3, with a sample size of 5.
With a sample size of 5, comparisons being only after the 6th value is seen. In the example, the mean value of [1 2 3 4 5] is 3, and the standard deviation is 1.58. This means that the 6th value is considered an anomaly if it is within the range (3 +/- 1.58), which is between 1.42 and 4.58.
To make this less sensitive, a coefficient is introduced, which defaults to 1.0 (as above) but can be overridden:
In this example, the 6th value is not considered an anomaly because it is within the range (3 +/- (1.9 * 1.58)), which is between -0.002 and 6.002.
RESPONSE - MESSAGE¶
The message response is the default, and consists of a single line of printed text. It is a description of why the data value is considered an anomaly. Here is an example:
Anomalous data detected. The value 3 is above the maximum of 2.5.
The message can be suppressed, but another response must be specified, so that there is some kind of response:
RESPONSE - EXECUTE¶
Anomaly can execute a program in response to detection. Here an example uses the 'date' command, but any program can be used:
RESPONSE - SIGNAL¶
Anomaly can send a USR1 signal to a program in response to detection:
This sends the USR1 signal to the process with PID 12345. The receiving program would need to respond accordingly.
CREDITS & COPYRIGHTS¶
Copyright (C) 2013 Göteborg Bit Factory.
Anomaly is distributed under the MIT license. See http://www.opensource.org/licenses/mit-license.php for more information.
For more information, see:
- The official site at
- You can contact the project by writing an email to