0% found this document useful (0 votes)
5 views

Lecture 12 Outliers and Guidelines For Exercises

Uploaded by

Ranjeet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 12 Outliers and Guidelines For Exercises

Uploaded by

Ranjeet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 6

Outlier Analysis

 What is an Outlier?
 Guideline for exercises

1
What Are Outliers?
 Outlier: A data object that deviates significantly from the normal
objects as if it were generated by a different mechanism
 Ex.: Unusual credit card purchase, sports: Michael Jordon, Wayne

Gretzky, ...
 Outliers are different from the noise data
 Noise is random error or variance in a measured variable

 Noise should be removed before outlier detection

 Outliers are interesting: It violates the mechanism that generates the


normal data
 Outlier detection vs. novelty detection: early stage, outlier; but later
merged into the model
 Applications:
 Credit card fraud detection

 Telecom fraud detection

 Customer segmentation

 Medical analysis
2
Types of Outliers (I)
 Three kinds: global, contextual and collective outliers
 Global outlier (or point anomaly) Global Outlier
 Object is O if it significantly deviates from the rest of the data set
g
 Ex. Intrusion detection in computer networks
 Issue: Find an appropriate measurement of deviation

 Contextual outlier (or conditional outlier)


 Object is O if it deviates significantly based on a selected context
c
 Ex. 80o F in Urbana: outlier? (depending on summer or winter?)
 Attributes of data objects should be divided into two groups

Contextual attributes: defines the context, e.g., time & location

Behavioral attributes: characteristics of the object, used in outlier
evaluation, e.g., temperature
 Can be viewed as a generalization of local outliers—whose density
significantly deviates from its local area
 Issue: How to define or formulate meaningful context?
3
Types of Outliers (II)
 Collective Outliers
 A subset of data objects collectively deviate
significantly from the whole data set, even if the
individual data objects may not be outliers
 Applications: E.g., intrusion detection: Collective Outlier

When a number of computers keep sending
denial-of-service packages to each other
 Detection of collective outliers

Consider not only behavior of individual objects, but also that of
groups of objects

Need to have the background knowledge on the relationship
among data objects, such as a distance or similarity measure
on objects.
 A data set may have multiple types of outlier
 One object may belong to more than one type of outlier
4
Challenges of Outlier Detection
 Modeling normal objects and outliers properly
 Hard to enumerate all possible normal behaviors in an application

 The border between normal and outlier objects is often a gray area

 Application-specific outlier detection


 Choice of distance measure among objects and the model of

relationship among objects are often application-dependent


 E.g., clinic data: a small deviation could be an outlier; while in

marketing analysis, larger fluctuations


 Handling noise in outlier detection
 Noise may distort the normal objects and blur the distinction

between normal objects and outliers. It may help hide outliers and
reduce the effectiveness of outlier detection
 Understandability
 Understand why these are outliers: Justification of the detection

 Specify the degree of an outlier: the unlikelihood of the object being

generated by a normal mechanism


5
Guidelines for Outlier detection
exercises
 Open the glass.arff file in Weka (download from the LMS).
 Turn the last class attribute to “Non-class attribute” or either delete this
attribute.
 Open a Microsoft Word file from the LMS that contains a description of
“To-Do today” exercises. You can find five exercises listed.
 For each exercise, Run the video, and check the required tasks that
you should perform on the given data file. Save your results, and
upload them as a solution to exercise to LMS.
 Repeat the same for the rest of the four exercises.

You might also like