Lecture5
Lecture5
Machine Learning in
Lecture
Cyber Security 05
• Replace missing values with the mean, median, or mode of the relevant variable.
Scaling
A technique often applied as part of data preparation for machine learning.
Goal: Change the values of numeric columns in the dataset to a common scale,
without
distorting differences in the ranges of values.
Normalization
Min-max normalization: Guarantees all features will have the exact same scale but
does not handle outliers well.
Z-score standardization: Handles outliers, but does not produce normalized data
with the
exact same scale.
Training, Testing and Validation 5
7
Sets
Training, Testing and 5
8
Validation Set
K-Fold Cross 5
9
Validation
K-fold cross-validation is a
technique for evaluating
predictive models.
Validation
Under-fitting and Over- 6
1
fitting
• Overfitting occurs when the model fits the training data too well and does
Overfittin not generalize so it performs badly on the test data.
g • Its the result of an excessively complic ated model.
Underfitting occurs when the model does not fit the data well
Underfittin • enough.
Is result of an excessively simple model.
g•
Under-fitting and Over- 6
2
fitting
on
Regression 8
Task
Regression 1
0
Task
Linear Regression Vs Logistic 1
1
Regression
Linear Regression Vs Logistic 1
2
Regression
Linear 1
3
Regression
Regression 1
4
Task
Linear 1
5
Regression
Y = mx +
c
Linear Regression 1
6
Example
Linear Regression 1
7
Example