Anomaly Detection For Monitoring
Anomaly Detection For Monitoring
Auto-detect application
components and learn
dependencies
First Edition
First Release
Second Release
The OReilly logo is a registered trademark of OReilly Media, Inc. Anomaly Detec
tion for Monitoring, the cover image, and related trade dress are trademarks of
OReilly Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the authors disclaim all responsibility for errors or omissions, including without
limitation responsibility for damages resulting from the use of or reliance on this
work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is sub
ject to open source licenses or the intellectual property rights of others, it is your
responsibility to ensure that your use thereof complies with such licenses and/or
rights.
978-1-491-93578-1
[LSI]
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Why Anomaly Detection?
The Many Kinds of Anomaly Detection
Conclusions
2
4
6
10
11
11
13
14
16
24
25
27
27
31
34
35
36
37
vii
Fourier Transforms
Conclusions
38
39
42
43
43
46
52
53
54
56
56
57
58
59
59
60
A. Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
viii
Table of Contents
Foreword
Monitoring is currently undergoing a significant change. Until two
or three years ago, the main focus of monitoring tools was to pro
vide more and better data. Interpretation and visualization has too
often been an afterthought. While industries like e-commerce have
jumped on the data analytics train very early, monitoring systems
still need to catch up.
These days, systems are getting larger and more dynamic. Running
hundreds of thousands of servers with continuous new code pushes
in elastic, self-scaling server environments makes data interpretation
more complex than ever. We as an industry have reached a point
where we need software tooling to augment our human analytical
skills to master this challenge.
At Ruxit, we develop next-generation monitoring solutions based on
artificial intelligence and deep data (large amounts of highly inter
linked pieces of information). Building self-learning monitoring sys
temswhile still in its early dayshelps operations teams to focus
on core tasks rather than trying to interpret a wall of charts. Intelli
gent monitoring is also at the core of the DevOps movement, as
well-interpreted information enables sharing across organisations.
Whenever I give a talk about this topic, at least one person raises the
question about where he can buy a book to learn more about the
topic. This was a tough question to answer, as most literature is tar
geted toward mathematiciansif you want to learn more on topics
like anomaly detection, you are quickly exposed to very advanced
content. This book, written by practitioners in the space, finds the
perfect balance. I will definitely add it to my reading recommenda
tions.
Alois Reitbauer,
Chief Evangelist, Ruxit
CHAPTER 1
Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Conclusions
If you are like most of our friends in the DevOps and web opera
tions communities, you probably picked up this book because
youve been hearing a lot about anomaly detection in the last few
years, and youre intrigued by it. In addition to the previouslymentioned goal of making assumptions explicit, we hope to be able
to achieve a number of outcomes in this book.
We want to help orient you to the subject and the landscape in
general. We want you to have a frame of reference for thinking
about anomaly detection, so you can make your own decisions.
We want to help you understand how to assess not only the
meaning of the answers you get from anomaly detection algo
rithms, but how trustworthy the answers might be.
We want to teach you some things that you can actually apply to
your own systems and your own problems. We dont want this
to be just a bunch of theory. We want you to put it into practice.
We want your time spent reading this book to be useful beyond
this book. We want you to be able to apply what you have
learned to topics we dont cover in this book.
If you already know anything about anomaly detection, statistics, or
any of the other things we cover in this book, youre going to see
that we omit or gloss over a lot of important information. That is
inevitable. From prior experience, we have learned that it is better to
help people form useful thought processes and mental models than
to tell them what to think.
As a result of this, we hope you will be able to combine the material
in this book with your existing tools and skills to solve problems on
your systems. By and large, we want you to get better at what you
already do, and learn a new trick or two, rather than solving world
hunger. If you ask, what can I do thats a little better than Nagios?
youre on the right track.
Anomaly detection is not a black and white topic. There is a lot of
gray area, a lot of middle ground. Despite the complexity and rich
ness of the subject matter, it is both fun and productive. And despite
the difficulty, there is a lot of promise for applying it in practice.
Chapter 1: Introduction
Conclusions
CHAPTER 2
This isnt a book about the overall breadth and depth of anomaly
detection. It is specifically about applying anomaly detection to
solve common problems that the DevOps community faces when
trying to monitor the types of systems that we manage the most.
One of the implications is that this book is mostly about time series
anomaly detection. It also means that we focus on widely used tools
such as Graphite, JavaScript, R, and Python. There are several rea
sons for these choices, based on assumptions were making.
We assume that our audience is largely like ourselves: develop
ers, system administrators, database administrators, and
DevOps practitioners using mostly open source tools.
Neither of us has a doctorate in a field such as statistics or oper
ations research, and we assume you dont either.
We assume that you are doing time series monitoring, much
like we are.
As a result of these assumptions, this book is quite biased. It is all
about anomaly detection on metrics, and we will not cover anomaly
detection on configuration, comparing machines amongst each
other, log analysis, clustering similar kinds of things together, or
many other types of anomaly detection. We also focus on detecting
anomalies as they happen, because that is usually what we are trying
to do with our monitoring systems.
9
The following images from that paper show the metric and its devia
tion from the usual behavior.
10
11
cause most anomaly detection techniques to throw off lots and lots
of false positives.
13
lem in your system, thats great. Go ahead and alert on it. But other
wise, we suggest that you dont alert on things that may have no
impact or consequence.
Instead, we suggest that you record these anomalous observations,
but dont alert on them. Now you have essentially created an index
into the most unusual data points in your metrics, for later use in
case it is interesting. For example, during diagnosis of a problem
that you have detected.
One of the assumptions embedded in this recommendation is that
anomaly detection is cheap enough to do online in one pass as data
arrives into your monitoring system, but that ad hoc, after-the-fact
anomaly detection is too costly to do interactively. With the moni
toring data sizes that we are seeing in the industry today, and the
attitude that you should measure everything that moves, this is
generally the case. Multi-terabyte anomaly detection analysis is usu
ally unacceptably slow and requires more resources than you have
available. Again, we are placing this in the context of what most of
us are doing for monitoring, using typical open-source tools and
methodologies.
Conclusions
Although its easy to get excited about success stories in anomaly
detection, most of the time someone elses techniques will not trans
late directly to your systems and your data. Thats why you have to
learn for yourself what works, whats appropriate to use in some sit
uations and not in others, and the like.
Our suggestion, which will frame the discussion in the rest of this
book, is that, generally speaking, you probably should use anomaly
detection online as your data arrives. Store the results, but dont
alert on them in most cases. And keep in mind that the map is not
the territory: the metric isnt the system, an anomaly isnt a crisis,
three sigmas isnt unlikely, and so on.
14
CHAPTER 3
1 http://bit.ly/littleslaw
15
2 http://bit.ly/stathandbook
16
(say, the size of the drill bit), and the control lines are fixed some
number of standard deviations away from that mean. If youve heard
of the three sigma rule, this is what its about. Three sigmas repre
sents three standard deviations away from the mean. The two con
trol lines surrounding the mean represent an acceptable range of
values.
One of the assumptions made by the basic, fixed control chart is that
values are stable: the mean and spread of values is constant. As a
formula, this set of assumptions can be expressed as: y = + . The
3 History of the Normal Distribution
17
Figure 3-2. A basic control chart with fixed control limits, which are
represented with dashed lines. Values are considered to be anomalous
if they cross the control limits.
18
lies. To fix this problem, the control chart needs to adapt to a chang
ing mean and spread over time. There are two basic ways to do this:
Slice up your control chart into smaller time ranges or fixed
windows, and treat each window as its own independent fixed
control chart with a different mean and spread. The values
within each window are used to compute the mean and stan
dard deviation for that window. Within a small interval, every
thing looks like a regular fixed control chart. At a larger scale,
what you have is a control chart that changes across windows.
Use a moving window, also called a sliding window. Instead of
using predefined time ranges to construct windows, at each
point you generate a moving window that covers the previous N
points. The benefit is that instead of having a fixed mean within
a time range, the mean changes after each value yet still consid
ers the same number of points to compute the mean.
Moving windows have major disadvantages. You have to keep track
of recent history because you need to consider all of the values that
fall into a window. Depending on the size of your windows, this can
be computationally expensive, especially when tracking a large num
ber of metrics. Windows also have poor characteristics in the pres
ence of large spikes. When a spike enters a window, it causes an
abrupt shift in the window until the spike eventually leaves, which
causes another abrupt shift.
19
Figure 3-3. A moving window control chart. Unlike the fixed control
chart shown in Figure 3-2, this moving window control chart has an
adaptive control line and control limits. After each anomalous spike,
the control limits widen to form a noticeable box shape. This effect
ends when the anomalous value falls out of the moving window.
Moving window control charts have the following characteristics:
They require you to keep some amount of historical data to
compute the mean and control limits.
The values are assumed to be Gaussian (normally) distributed
around the mean.
They can detect one or multiple points that are outside the
desired range.
Spikes in the data can cause abrupt changes in parameters when
they are in the distant past (when they exit the window).
20
21
Window Functions
Sliding windows and EWMAs are part of a much bigger category of
window functions. They are window functions with two and one
sharp edges, respectively.
There are lots of window functions with many different shapes and
characteristics. Some functions increase smoothly from 0 to 1 and
22
back again, meaning that they smooth data using both past and
future data. Smoothing bidirectionally can eliminate the effects of
large spikes.
Figure 3-5. A window function control chart. This time, the window is
formed with values on both sides of the current value. As a result,
anomalous spikes wont generate abrupt shifts in control limits even
when they first enter the window.
The downside to window functions is that they require a larger time
delay, which is a result of not knowing the smoothed value until
enough future values have been observed. This is because when you
center a bidirectional windowing function on now, it extends into
the future. In practice, EWMAs are a good enough compromise for
situations where you cant measure or wait for future values.
Control charts based on bidirectional smoothing have the following
characteristics:
They will introduce time lag into calculations. If you smooth
symmetrically over 60 second-windows, you wont know the
smoothed value of now until 30 secondshalf the window
has passed.
Like sliding windows, they require more memory and CPU to
compute.
23
Like all the SPC control charts weve discussed thus far, they
assume Gaussian distribution of data.
4 http://bit.ly/arimamod
5 https://www.otexts.org/fpp/8
24
25
6 In statistics, robust generally means that outlying values dont throw things for a loop;
26
Evaluating Predictions
One of the most important and subtle parts of anomaly detection
happens at the intersection between predicting how a metric should
behave, and comparing observed values to those expectations.
In anomaly detection, youre usually using many standard deviations
from the mean as a replacement for very unlikely, and when you get
far from the mean, youre in the tails of the distribution. The fit
tends to be much worse here than youd expect, so even small devia
tions from Gaussian can result in many more outliers than you the
oretically should get.
Similarly, a lot of statistical tests such as hypothesis tests are deemed
to be significant or good based on what turns out to be statisti
cian rules of thumb. Just because some p-value looks really good
doesnt mean theres truly a lot of certainty. Significant might not
signify much. Hey, its statistics, after all!
As a result, theres a good chance your anomaly detection techniques
will sometimes give you more false positives than you think they
will. These problems will always happen; this is just par for the
course. Well discuss some ways to mitigate this in later chapters.
Evaluating Predictions
27
Figure 3-7. Histogram of the mystery time series, overlaid with the
normal distributions bell curve.
Uh-oh! It doesnt look like a great fit. Should you give up hope?
28
29
7 There may be advantages to the first method when dealing with large floating-point
30
Conclusions
All anomaly detection relies on predicting an expected value or
range of values for a metric, and then comparing observations to the
predictions. The predictions rely on models, which can be based on
theory or on empirical evidence. Models usually use historical data
as inputs to derive the parameters that are used to predict the future.
We discussed SPC techniques not only because theyre ubiquitous
and very useful when paired with a good model (a theme well
revisit), but because they embody a thought process that is tremen
dously helpful in working through all kinds of anomaly detection
problems. This thought process can be applied to lots of different
kinds of models, including ARIMA models.
When you model and predict some data in order to try to detect
anomalies in it, you need to evaluate the quality of the results. This
really means you need to measure the prediction errorsthe residu
alsand assess how good your model is at predicting the systems
data. If youll be using SPC to determine which observations are
anomalous, you generally need to ensure that the residuals are nor
mally distributed (Gaussian). When you do this, be sure that you
dont confuse the sample distribution with the population distribu
tion!
Conclusions
31
CHAPTER 4
33
Figure 4-1. A time series with a linear trend and two exponentially
weighted moving averages with different decay factors, demonstrating
that they lag the data when it has a trend.
How do you deal with trend? First, its important to understand that
metrics with trends can be considered as compositions of other met
rics. One of the components is the trend, and so the solution to deal
34
ing with trend is simple: find a model that describes the trend, and
subtract the trend from the metrics values! After the trend is
removed, you can use the models that weve previously mentioned
on the remainder.
There can be many different kinds of trend, but linear is pretty com
mon. This means a time series increases or decreases at a constant
rate. To remove a linear trend, you can simply use a first difference.
This means you consider the differences between consecutive values
of a time series rather than the raw values of the time series itself. If
you remember your calculus, this is related to a derivative, and in
time series its pretty common to hear people talk about first differ
ences as derivatives (or deltas).
Figure 4-2. A servers load average, showing repeated cycles over time.
Seasonality has very similar effects as trend. In fact, if you zoom
into a time series with seasonality, it really looks like trend. Thats
because seasonality is variable trend. Instead of increasing or
decreasing at a fixed rate, a metric with seasonality increases or
decreases with rates that vary with time. As you can imagine, things
like EWMAs have the same issues as with linear trend. They lag
behind, and in some cases it can get so bad that the EWMA is com
Dealing with Seasonality
35
pletely out of phase with the seasonal pattern. This is easy to see in
Figure 4-3.
Figure 4-3. A sine wave and a EWMA of the sine wave, showing how a
EWMAs lag causes it to predict the wrong thing most of the time.
Coping with seasonality is exactly the same as with trend: you need
to decompose and subtract. This time, however, its harder to do
because the model of the seasonal component is much more compli
cated. Furthermore, there can be multiple seasonal components in a
metric! For example, you can have a seasonal trend with a daily
period as well as a weekly period.
talked about when discussing methods to deal with trend, but this
time were doing it to the model instead of the original metric. With
a single EWMA, there is a single smoothing factor: (alpha).
Because there are two more EWMAs for trend and seasonality, they
also have their own smoothing factors. Typically theyre denoted as
(beta) for trend and (gamma) for seasonality.
Predicting the current value of a metric is similar to the previous
models weve discussed, but with a slight modification. You start
with the same next = current formula, but now you also have to
add in the trend and seasonal terms. Multiple exponential smooth
ing usually produces much better results than naive models, in the
presence of trend and seasonality.
Multiple exponential smoothing can get a little complicated to
express in terms of mathematical formulas, but intuitively it isnt so
bad. We recommend the Holt-Winters seasonal method section1
of the Forecasting: principles and practice for a detailed derivation. It
definitely makes things harder, though:
You have to know the period of the seasonality beforehand. The
method cant figure that out itself. If you dont get this right,
your model wont be accurate and neither will your results.
There are three EWMA smoothing parameters to pick. It
becomes a delicate process to pick the right values for the
parameters. Small changes in the parameters can create large
changes in the predicted values. Many implementations use
optimization techniques to figure out the parameters that work
best on given sample data.
With that in mind, you can use multiple exponential smoothing to
build SPC control charts just as we discussed in the previous chap
ter. The advantages and disadvantages are largely the same as weve
seen before.
37
mon situations. You can probably guess, for example, that outlying
data can throw off future predictions, and thats true, depending on
the parameters you use:
An outage can throw off a model by making it predict an outage
again in the next cycle, which results in a false alarm.
Holidays often arent in-sync with seasonality.
There might be unusual events like Michael Jacksons death.
This actually might be something you want to be alerted on, but
its clearly not a system fault or failure.
There are annoying problems such as daylight saving time
changes, especially across timezones and hemispheres.
In general, the Achilles heel of predictive models is the same thing
that gives them their power: they can observe predictable behavior
and predict it, but as a result they can be fooled into predicting the
wrong thing. This depends on the parameters you use. Too sensitive
and you get false positives; too robust and you miss them.
Another issue is that their predictive power operates at large time
scales. In most systems youre likely to work with, the seasonality is
hourly, daily, and/or weekly. If youre trying to predict things at
higher resolutions, such as second by second, theres so much mis
match between the time scales that theyre not very useful. Last
weeks Monday morning spike of traffic may predict this mornings
spike pretty well in the abstract, but not down to the level of the sec
ond.
Fourier Transforms
Its sometimes difficult to determine the seasonality of a metric. This
is especially true with metrics that are compositions of multiple sea
sonal components. Fortunately, theres a whole area of time series
analysis that focuses on this topic: spectral analysis, which is the
study of frequencies and their relative intensities. Within this field,
theres a very important function called the Fourier transform, which
decomposes any signal (like a time series) into separate frequencies.
This makes use of the very interesting fact that any signal can be
broken up into individual sine waves.
The Fourier transform is used in many domains such as sound pro
cessing, to decompose, manipulate, and recombine frequencies that
38
Conclusions
Trend and seasonality throw monkey wrenches into lots of models,
but they can often be handled fairly well by treating metrics as sums
of several signals. Predicting a metrics behavior then becomes a
matter of decomposing the signals into their component parts, fit
ting models to the components, and subtracting the predictable
components from the original.
Once youve done that, you have essentially gotten rid of the nonstationary parts of the signal, and, in theory, you should be able to
apply standard techniques to the stationary signal that remains.
2 http://bit.ly/netflixscryer
Conclusions
39
CHAPTER 5
Recall that one of our goals for this book is to help you actually get
anomaly detection running in production and solving monitoring
problems you have with your current systems.
Typical goals for adding anomaly detection probably include:
To avoid setting or changing thresholds per server, because
machines differ from each other
To avoid modifying thresholds when servers, features, and
workloads change over time
To avoid static thresholds that throw false alerts at some times
of the day or week, and miss problems at other times
In general you can probably describe these goals as just make
Nagios a little better for some checks.
Another goal might be to find all metrics that are abnormal without
generating alerts, for use in diagnosing problems. We consider this
to be a pretty hard problem because it is very general. You probably
understand why at this point in the book. We wont focus on this
goal in this chapter, although you can easily apply the discussion in
this chapter to that approach on a case by case basis.
The best place to begin is often where you experience the most pain
ful monitoring problem right now. Take a look at your alert history
41
or outages. Whats the source of the most noise or the place where
problems happen the most without an alert to notify you?
42
If you can get close to that, you might have a pretty good shot at
using anomaly detection to solve your alerting problem.
Choosing a Metric
Its important to be sure that the problem youre trying to detect has
a reliable signal. Its not a good idea to use anomaly detection to alert
on metrics that looked weird during that one outage that one time.
One of the things weve learned by doing this ourselves is that met
rics are weird constantly during normal system operation. You need
to find metrics (or combinations of metrics) that are always normal
during healthy system behavior, and always abnormal when systems
are in trouble. Here normal and abnormal are in comparison
with the local behavior of the metric, because if there were a reliable
global good/bad you could just use a threshold.
Its usually best to look at a single, specific API endpoint, web page
or other operation. You could detect anomalies globallyfor exam
ple, over all API endpointsbut this causes two problems. First, a
single small problem can be lost in the average. Second, youll get
multi-modal distributions and other complex signals. Its better to
check each different kind of thing individually. If this is too much,
then just pick one to begin with, such as the add to cart action for
example.
Example metrics you could check include:
Error rate
Throughput
Latency (response time), although this is tricky because latency
almost always has a complex multi-modal distribution
Concurrency, service demand, backlog, queue length, utiliza
tion, and similar metrics of load or saturation of capacity; all
also usually have characteristics that are difficult to analyze with
standard statistical tools unless you find an appropriate model
43
1 Simple control charts also work well, but again if you can use them you can use static
thresholds instead.
2 We tried to see if queuing theory predicts this, but were unable to determine whether
the underlying model of any type of queue would result in a particular distribution of
concurrency. In cases such as this, its great to be able to prove that a metric should
behave in a specific way, but absent a proof, as weve said, its okay to use a result that
holds even if you dont know why it does.
44
45
A Worked Example
In this section, well go through a practical example demonstrating
some of the techniques weve covered so far. Were going to use a
databases throughput (queries per second) as our metric.
To summarize our thought process, weve created the following
flowchart that you can use as a decision tree. This is an extreme sim
plification, and a little bit biased towards our own experiences, but it
should be enough to get you started and orient yourself in the space
of anomaly detection techniques.
46
A Worked Example
47
48
A Worked Example
49
50
A Worked Example
51
nent built into its model, so you dont have to do anything special to
handle metrics with trend; it trains itself, so to speak, on the actual
data it sees.
This is a tradeoff. You can either transform your data to use a better
model, which may hurt interpretability, or try to develop a more
complicated model.
Its worth noting that based on Figure 5-1 and Figure 5-2, neither
method seems to produce perfectly Gaussian residuals. This is not a
major issue. At least with the exponential smoothing control chart,
were still able to reasonably predict and detect the anomalies were
interested in.
Keep in mind that this is a narrowly focused example that only dem
onstrates one path in our decision tree. We started with a very spe
cific set of requirements (short timescale with significant spikes)
that made our final solution work, but it wont work for everything.
If we wanted to look at a larger time scale, like the full data set, wed
have to look at other techniques.
Conclusions
This chapter demonstrates relatively simple techniques that you can
probably apply to your own problems with the tools you have at
hand already, such as RRDTool, simple scripts, and Graphite. Maybe
a Redis instance or something if you really want to get fancy.
The idea here is to get as much done with as little fuss as possible.
Were not trying to be data scientists, were just trying to improve on
a Nagios threshold check.
What makes this work? Its mostly about choosing the right battle, to
tell the truth. Throughput is about as simple a KPI as you can
choose for a database server. Then we visualized our results and
picked the simplest thing that could possibly work.
Your mileage, needless to say, will vary.
52
CHAPTER 6
Shape Catalogs
In the book A New Look at Anomaly Detection by Dunning and
Friedman, the authors write about a technique that uses shape cata
logs. The gist of this technique is as follows. First, you have to start
with a sample data set that represents the time series of a metric
without any anomalies. You break this data set up into smaller win
dows, using a window function to mask out all but a specific region,
and catalog the resulting shapes. The assumption being made is that
any non-anomalous observation of this time series can be recon
structed by rearranging elements from this shape catalog. Anything
that doesnt match up to a reasonable extent is then considered to be
an anomaly.
This is nice, but most machine data doesnt really behave like an
EKG chart in our experience. At least, not on a small time scale.
53
Most machine data is much noisier than this on the second-tosecond basis.
1 Mean-shift analysis is not a single technique, but rather a family. Theres a Wikipedia
54
We could apply a EWMA control chart to this data set like in the
worked example. Heres what it looks like.
This control chart definitely could detect the mean shift since the
metric falls underneath the lower control line, but that happens
often with this highly variable data set with lots of spikes! An
EWMA control chart is great for detecting spikes, but not mean
shifts. Lets try out CUSUM. In this image well show only the first
portion of the data for clarity:
Much better! You can see that the CUSUM chart detected the mean
shift where the points drop below the lower threshold.
55
Clustering
Not all anomaly detection is based on time series of metrics. Clus
tering, or cluster analysis is one way of grouping elements together
to try to find the odd ones out. Netflix has written about their
anomaly detection methods based on cluster analysis.2 They apply
cluster analysis techniques on server clusters to identify anomalous,
misbehaving, or underperforming servers.
K-Means clustering is a common algorithm thats fairly simple to
implement. Heres an example:
Non-Parametric Analysis
Not all anomaly detection techniques need models to draw useful
conclusions about metrics. Some avoid models altogether! These are
called non-parametric anomaly detection methods, and use theory
from a larger field called non-parametric statistics.
The Kolmogorov-Smirnov test is one non-parametric method that
has gained popularity in the monitoring community. It tests for
changes in the distributions of two samples. An example of a type of
question that it can answer is, is the distribution of CPU usage this
week significantly different from last week? Your time intervals
dont necessarily have to be as long as a week, of course.
We once learned an interesting lesson while trying to solve a sticky
problem with a non-Gaussian distribution of values. We wanted to
2 Tracking down the Villains: Outlier Detection at Netflix
56
57
Machine Learning
Machine learning is a meta-technique that you can layer on top of
other techniques. It primarily involves the ability for computers to
3 https://github.com/twitter/BreakoutDetection
4 http://bit.ly/grubbstest
58
59
Tools
You generally dont have to implement an entire anomaly detection
framework yourself. As a significant component of monitoring,
anomaly detection has been the focus of many monitoring projects
and companies which have implemented many of the things weve
discussed in this book.
5 http://bit.ly/ruxitblog
60
R Packages
There are plenty of R packages available for many anomaly detec
tion methods such as forecasting and machine learning. The down
side is that many are quite simple. Theyre often little more than ref
erence implementations that were not intended for monitoring sys
tems, so it may be difficult to implement them into your own stack.
Twitters anomaly detection R package,6 on the other hand, actually
runs in their production monitoring system. Their package uses
time series decomposition techniques to detect point anomalies in a
data set.
Tools
61
62
APPENDIX A
Appendix
Code
Control Chart Windows
Moving Window
function fixedWindow(size) {
this.name = 'window';
this.ready = false;
this.points = [];
this.total = 0;
this.sos = 0;
this.push = function(newValue) {
if (this.points.length == size) {
var removed = this.points.shift();
this.total -= removed;
this.sos -= removed*removed;
}
this.total += newValue;
this.sos += newValue*newValue;
this.points.push(newValue);
this.ready = (this.points.length == size);
}
this.mean = function() {
if (this.points.length == 0) {
return 0;
}
return this.total / this.points.length;
}
63
this.stddev = function() {
var mean = this.mean();
return Math.sqrt(this.sos/this.points.length - mean*mean);
}
}
var window = new fixedWindow(5);
window.push(1);
window.push(5);
window.push(9);
console.log(window);
console.log(window.mean());
console.log(window.stddev()*3);
EWMA Window
function movingAverage(alpha) {
this.name = 'ewma';
this.ready = true;
function ma() {
this.value = NaN;
this.push = function(newValue) {
if (isNaN(this.value)) {
this.value = newValue;
ready = true;
return;
}
this.value = alpha*newValue + (1 - alpha)*this.value;
};
}
this.MA = new ma(alpha);
this.sosMA = new ma(alpha);
this.push = function(newValue) {
this.MA.push(newValue);
this.sosMA.push(newValue*newValue);
};
this.mean = function() {
return this.MA.value;
};
this.stddev = function() {
return
Math.sqrt(this.sosMA.value
this.mean()*this.mean());
};
}
64
Appendix A: Appendix
Window Function
function kernelSmoothing(weights) {
this.name = 'kernel';
this.ready = false;
this.points = [];
this.lag = (weights.length-1)/2;
this.push = function(newValue) {
if (this.points.length == weights.length) {
var removed = this.points.shift();
}
this.points.push(newValue);
this.ready = (this.points.length == weights.length);
}
this.mean = function() {
var total = 0;
for (var i = 0; i < weights.length; i++) {
total += weights[i]*this.points[i];
}
return total;
};
this.stddev = function() {
var mean = this.mean();
var sos = 0;
for (var i = 0; i < weights.length; i++) {
sos += weights[i]*this.points[i]*this.points[i];
}
return Math.sqrt(sos - mean*mean);
};
}
var ksmooth = new kernelSmoothing([0.3333, 0.3333, 0.3333]);
ksmooth.push(1);
ksmooth.push(5);
ksmooth.push(9);
console.log(ksmooth);
console.log(ksmooth.mean());
console.log(ksmooth.stddev()*3);
Appendix
65
Acknowledgments
Wed like to thank George Michie, who contributed some content to
this book as well as helping us to clarify and keep things at an
appropriate level of detail.