0% found this document useful (0 votes)

8 views26 pages

Lecture5

The document discusses various aspects of machine learning, focusing on feature engineering, handling missing data, and the differences between supervised and unsupervised learning. It highlights techniques for data normalization, scaling, and the importance of training, testing, and validation sets, including K-fold cross-validation. Additionally, it addresses the issues of overfitting and underfitting in model training.

Uploaded by

usamasulemanleghari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views26 pages

Lecture5

Uploaded by

usamasulemanleghari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Special Topics of

Machine Learning in
Lecture

Cyber Security 05

Machine Learning Basics

Arslan Ali Khan arsl

[email protected]
Department of Cyber-Security and Data
Science Riphah Institute of Systems
Engineering (RISE), Riphah International
University, Islamabad, Pakistan.
Feature
Engineering
• Dealing with Missing Data
Missing values are data points that are absent for a specific
variable in a dataset. They can be represented in various ways,
such as blank cells, null values, or special symbols like “NA” or
“unknown.” These missing data points pose a significant
challenge in data analysis and can lead to inaccurate or biased
results.
Feature
Engineering
• Dealing with Missing Data
Missing values can pose a significant challenge in data analysis, as
they can:
• Reduce the sample size: This can decrease the accuracy and
reliability of your analysis.
• Introduce bias: If the missing data is not handled properly, it
can bias the results of your analysis.
• Make it difficult to perform certain analysis: Some statistical
techniques require complete data for all variables, making
them inapplicable when missing values are present
Feature
Engineering
• Dealing with Missing Data
Using Estimated values:

• Replacing missing values with estimated values.

• Preserves sample size: Doesn’t reduce data points.

• Can introduce bias: Estimated values might not be accurate.

Use of Mean, Median, and Mode:

• Replace missing values with the mean, median, or mode of the relevant variable.

• Simple and efficient: Easy to implement.

• Can be inaccurate: Doesn’t consider the relationships between variables.

Feature
Engineering
• Handling Categorical Data
Categorical data is data that can be divided into groups or
categories, such as gender, hair color, or product type.
Feature
Engineering
• Normalizing Data
Normalization in machine learning is the process of translating
data into the range [0, 1] (or any other range).
• Feature Construction or Generation
Feature Generation (also known as feature construction, feature
extraction or feature engineering) is the process of transforming
features into new features that better relate to the target. This
can involve mapping a feature into a new feature using a
function like log, or creating a new feature from one or multiple
features using multiplication or addition.
Feature 5
6

Scaling
A technique often applied as part of data preparation for machine learning.
Goal: Change the values of numeric columns in the dataset to a common scale,
without
distorting differences in the ranges of values.

Normalization
Min-max normalization: Guarantees all features will have the exact same scale but
does not handle outliers well.

Z-score standardization: Handles outliers, but does not produce normalized data
with the
exact same scale.
Training, Testing and Validation 5
7

Sets
Training, Testing and 5
8

Validation Set
K-Fold Cross 5
9

Validation
K-fold cross-validation is a
technique for evaluating
predictive models.

The dataset is divided into k

subsets or folds. The model is
trained and evaluated k
times, using a diff erent fold
as the validation set each
time.

Performance metrics from

each fold are averaged to
estimate the model's
generalization performance.
K-Fold Cross 6
0

Validation
Under-fitting and Over- 6
1

fitting
• Overfitting occurs when the model fits the training data too well and does
Overfittin not generalize so it performs badly on the test data.
g • Its the result of an excessively complic ated model.

Underfitting occurs when the model does not fit the data well
Underfittin • enough.
Is result of an excessively simple model.
g•
Under-fitting and Over- 6
2

fitting

• Both overfitting and underfitting lead to poor predictions on new

datasets.

• A learning model that overfits or underfits does not generalize

well.
Supervised vs. Unsupervised
Learning
• Supervised learning (classification)
 Supervision: The training d a t a
(observations, measurements, etc.) are
a c c o m p a n i e d by labels indicating the
class of the observations
 New d a ta is classified based on the
training set
• Unsupervised learning (clustering)
 The class labels of training d a t a is unknown
 Given a set of measurements, observations,
4
etc. with the aim of establishing the
Machine
Learning
• Supervised: We are given input samples (X) a n d output
samples (y) of a function y = f(X). We would like to “learn”
f, a n d evaluate it on new data. Types:
 Classification: y is discrete (c lass la b e ls).
 Regression: y is c ontinuous, e.g. linear regression.

• Unsupervised: Given only samples X of the data, w e

c om p u te a
function f suc h that y = f(X) is “simpler”.
 Clustering: y is discrete
 Y is continuous: Matrix factorization, Kalman filtering, unsupervised
neural
networks.
Technique
s
• Supervised
Learning:
 Linear Regression
 Logistic Regression
 Decision Tree
 Naïve Bayes
 Random Forests
• Unsupervised
Learning:
 Clustering
 Factor analysis
 Topic Models
Regressi 7

on
Regression 8

Task
Regression 1
0

Task
Linear Regression Vs Logistic 1
1

Regression
Linear Regression Vs Logistic 1
2

Regression
Linear 1
3

Regression
Regression 1
4

Task
Linear 1
5

Regression

Y = mx +
c
Linear Regression 1
6

Example
Linear Regression 1
7

Example

HRDF Training Preview - Grant Briefing Slides
No ratings yet
HRDF Training Preview - Grant Briefing Slides
20 pages
Working With Grids Course PDF
100% (1)
Working With Grids Course PDF
47 pages
ClaRITA - User Guide V 4.7
No ratings yet
ClaRITA - User Guide V 4.7
59 pages
A P M I T E Cspm1Dr: Tutorial Letter 104 - Portfolio
100% (1)
A P M I T E Cspm1Dr: Tutorial Letter 104 - Portfolio
3 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Model Evaluation
No ratings yet
Model Evaluation
39 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Classification
No ratings yet
Classification
53 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
5.3 Model
No ratings yet
5.3 Model
26 pages
ML & DL
No ratings yet
ML & DL
19 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Machine Learning: A Review of Classification and Combining Techniques
No ratings yet
Machine Learning: A Review of Classification and Combining Techniques
32 pages
Final ML
No ratings yet
Final ML
2 pages
Lecture-2-20022025-092902am
No ratings yet
Lecture-2-20022025-092902am
87 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
machine learning
No ratings yet
machine learning
37 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Complete ML Concepts
No ratings yet
Complete ML Concepts
30 pages
02 - Diagnostics For Machine Learning Model
No ratings yet
02 - Diagnostics For Machine Learning Model
20 pages
1635838720082
No ratings yet
1635838720082
35 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
Lecture 4 Machine Learning - Bcsc
No ratings yet
Lecture 4 Machine Learning - Bcsc
45 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
Unit 3
No ratings yet
Unit 3
55 pages
End SEM V IMP DSE 2
No ratings yet
End SEM V IMP DSE 2
9 pages
Fundamentals of ML Recap
No ratings yet
Fundamentals of ML Recap
21 pages
ML and Deploying It Using Flask and Docker.
No ratings yet
ML and Deploying It Using Flask and Docker.
30 pages
Semi Supervised Learning
No ratings yet
Semi Supervised Learning
86 pages
Data Science
No ratings yet
Data Science
64 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
14 pages
5.Feauture Engineering
No ratings yet
5.Feauture Engineering
34 pages
Unit_I_2
No ratings yet
Unit_I_2
78 pages
Data Science-Unit-4- 05.10.23
No ratings yet
Data Science-Unit-4- 05.10.23
59 pages
Machine Learning Juunit2.pdf Lands
No ratings yet
Machine Learning Juunit2.pdf Lands
7 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Chapter 2 Data Preprocessing
No ratings yet
Chapter 2 Data Preprocessing
23 pages
ML Interview Questions
No ratings yet
ML Interview Questions
60 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
Intro To ML
No ratings yet
Intro To ML
26 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
1 - Intro to Machine Learning
No ratings yet
1 - Intro to Machine Learning
34 pages
ML 01
No ratings yet
ML 01
24 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Cyber Security Important Questions Chapter 3, 4, 5 RGPV (CS503 (C) )
No ratings yet
Cyber Security Important Questions Chapter 3, 4, 5 RGPV (CS503 (C) )
20 pages
A350 Leaving The Aircraft 3
No ratings yet
A350 Leaving The Aircraft 3
1 page
Public Health Nursing 10th Edition
No ratings yet
Public Health Nursing 10th Edition
15 pages
Bank Reconciliation Procedure
No ratings yet
Bank Reconciliation Procedure
6 pages
UNIT 5-FB10
No ratings yet
UNIT 5-FB10
11 pages
DHX Gateway Driver Help
No ratings yet
DHX Gateway Driver Help
51 pages
MSC Computer Science Thesis Ideas
100% (2)
MSC Computer Science Thesis Ideas
6 pages
Medieval Islamic Medicine 1st Edition Peter E. Pormann All Chapters Instant Download
100% (12)
Medieval Islamic Medicine 1st Edition Peter E. Pormann All Chapters Instant Download
34 pages
imageRUNNERFirmwareChart 2
100% (1)
imageRUNNERFirmwareChart 2
3 pages
In Ra Managing Risk in Digital Transformation 1 Noexp
No ratings yet
In Ra Managing Risk in Digital Transformation 1 Noexp
16 pages
Mcom2
No ratings yet
Mcom2
1 page
32.25 X 42ft G.F House plan (1)
No ratings yet
32.25 X 42ft G.F House plan (1)
1 page
cipp-e_1
No ratings yet
cipp-e_1
16 pages
Ibmu Battery Monitoring Unit User Manual
No ratings yet
Ibmu Battery Monitoring Unit User Manual
36 pages
Installing and Registering FSUIPC4 PDF
No ratings yet
Installing and Registering FSUIPC4 PDF
6 pages
Question Text: Correct 1 Points Out of 1
No ratings yet
Question Text: Correct 1 Points Out of 1
4 pages
7.8. Nested Iteration: Image Processing: 7.8.1. The RGB Color Model
No ratings yet
7.8. Nested Iteration: Image Processing: 7.8.1. The RGB Color Model
8 pages
Computational Methods In Environmental Fluid Mechanics 1st Edition Professor Dring Olaf Kolditz Auth download
100% (1)
Computational Methods In Environmental Fluid Mechanics 1st Edition Professor Dring Olaf Kolditz Auth download
30 pages
Automotive Trend Report StartUs-Insights
No ratings yet
Automotive Trend Report StartUs-Insights
22 pages
STP Concepts - Rev 2022
No ratings yet
STP Concepts - Rev 2022
39 pages
Lenovo X1 Corbon 12298-2
No ratings yet
Lenovo X1 Corbon 12298-2
80 pages
College of Arts, Sciences, and Technology: 1 Semester, AY 2020-2021
No ratings yet
College of Arts, Sciences, and Technology: 1 Semester, AY 2020-2021
6 pages
FinalE Commerce Website Project Using CSS HTML and JavaScript
No ratings yet
FinalE Commerce Website Project Using CSS HTML and JavaScript
7 pages
Brksec 2445
No ratings yet
Brksec 2445
122 pages
Calendar Application Using C
No ratings yet
Calendar Application Using C
15 pages
Prediction and Analysis of Customer Complaints Usi
No ratings yet
Prediction and Analysis of Customer Complaints Usi
25 pages

Lecture5

Uploaded by

Lecture5

Uploaded by

Special Topics of

Machine Learning Basics

Arslan Ali Khan arsl

• Replacing missing values with estimated values.

• Preserves sample size: Doesn’t reduce data points.

• Can introduce bias: Estimated values might not be accurate.

Use of Mean, Median, and Mode:

• Simple and efficient: Easy to implement.

• Can be inaccurate: Doesn’t consider the relationships between variables.

The dataset is divided into k

Performance metrics from

• Both overfitting and underfitting lead to poor predictions on new

• A learning model that overfits or underfits does not generalize

• Unsupervised: Given only samples X of the data, w e

You might also like