Lecture 12 Outliers and Guidelines For Exercises
Lecture 12 Outliers and Guidelines For Exercises
What is an Outlier?
Guideline for exercises
1
What Are Outliers?
Outlier: A data object that deviates significantly from the normal
objects as if it were generated by a different mechanism
Ex.: Unusual credit card purchase, sports: Michael Jordon, Wayne
Gretzky, ...
Outliers are different from the noise data
Noise is random error or variance in a measured variable
Customer segmentation
Medical analysis
2
Types of Outliers (I)
Three kinds: global, contextual and collective outliers
Global outlier (or point anomaly) Global Outlier
Object is O if it significantly deviates from the rest of the data set
g
Ex. Intrusion detection in computer networks
Issue: Find an appropriate measurement of deviation
The border between normal and outlier objects is often a gray area
between normal objects and outliers. It may help hide outliers and
reduce the effectiveness of outlier detection
Understandability
Understand why these are outliers: Justification of the detection