Outliers ML
Outliers ML
in Data
Identification, Impact and Management
Referen
ce
https://www.geeksforgeeks.org/machine-learning-outlier/
Types of Outliers
0 Global Outliers 0 Collective Outliers 0 Contextual Outliers
1
Global outliers are Collective outliers are Contextual outliers are
isolated data points that
are far away from the
2 groups of data points
that collectively deviate
3 data points that deviate
significantly from the
main body of the data. significantly from the expected behavior within
They are often easy to overall distribution of a a specific context or
identify and remove. dataset. subgroup.
Referen https://www.geeksforgeeks.org/types-of-outliers-in-
data-mining/
How Outliers are
Potential Causes Formed?
Measurement Errors: Data Processing Errors:
● K-Nearest Neighbors
(KNN) & Local
Outlier Factor (LOF):
● KNN identifies outliers as data points
whose K nearest neighbors are far away
from them.
● This method calculates the local density of
data points and identifies outliers as those
with significantly lower density compared
to their neighbors.
Referen https://www.geeksforgeeks.org/machine-learning
-outlier/.........
ce:
Methods to Detect Outliers
Clustering-Based Referen https://www.geeksforgeeks.org/machine-learning
-outlier/...........
Methods ce:
● Density-Based Spatial ● Hierarchical
Clustering of clustering:
Applications with Noise
It involves building a hierarchy of
(DBSCAN):
It clusters data points based on clusters by iteratively merging or
their density and identifies outliers splitting clusters based on their
as points not belonging to any similarity. Outliers can be identified
cluster. as clusters containing only a single
data point or clusters significantly
smaller than others.
Importance of outlier detection
in machine learning
0 Biased models:
0 Reduced accuracy:
1 Outliers can bias estimates of 2 Outliers can introduce noise into the
parameters like mean, variance, data, making it difficult for a
and coefficients, resulting in machine learning model to learn the
misleading inferences. true underlying patterns.
0 Increased 0 Reduced
3 variance:
Outliers can increase the variance of a 4 interpretability:
Outliers can make it difficult to
machine learning model, making it more understand what a machine
sensitive to small changes in the data. learning model has learned from the
data.
Referen https://www.geeksforgeeks.org/machine-learning-outlier/................
Handling Methods to Eliminate
Outliers
Transforming Data:
Log Box-Cox
Compresses theTransformatio
range of data, reducing the impact of StabilizesTransformation
variance and normalizes the data.
large outliers.
n Referen https://github.com/dhirajnair04/Outlier_handling
Handling Methods to Eliminate
Outliers
Trimming/Capping:
Trimming Capping
Remove outliers by excluding data beyond a certain Limit extreme values by setting a maximum threshold
percentile. to reduce the impact of outliers.
Referen https://pub.towardsai.net/outlier-detection-and-treatm
ent.........
https://github.com/dhirajnair04/Outlier_handling
Handling Methods to Eliminate
Outliers
Imputation:
Replacing Outliers
Replace outliers with more typical values like the
Referen https://www.linkedin.com/pulse/when-ho...
............ mean,
median, or mode
https://github.com/dhirajnair04/Outlier_handling
ce:
Handling Methods to Eliminate
Outliers
Model-Based Methods: