0% found this document useful (0 votes)

43 views4 pages

Summery of Feature Eng

Uploaded by

h.k.osama18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views4 pages

Summery of Feature Eng

Uploaded by

h.k.osama18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Chapter 1: Data and Models

1. Data:

o Data represents observations of real-world phenomena.

o It provides fragmented insights into reality and is often incomplete or noisy.

o The goal is to extract meaningful answers from data through workflows.

2. Tasks:

o Data is used to answer questions, but the path is often complex, with many false starts
and iterative processes.

3. Models:

o Mathematical/statistical models describe the relationships between different aspects of

data.

o Feature: A numeric representation of raw data relevant to the task and model.

4. Feature Engineering:

o Process of selecting the right features based on the task, data, and model.

o Importance of Feature Quantity:

 Too few features: The model fails to perform.

 Too many features: It becomes expensive and difficult to train.

5. Machine Learning Workflow:

o Good features simplify the modeling process.

o Bad features complicate model performance and may require complex solutions.

6. Feature Engineering Types:

o Feature Improvement: Making features more usable (e.g., imputing missing data).

o Feature Construction: Creating new interpretable features from existing ones.

o Feature Extraction: Automatically generating new features based on parametric

assumptions.

o Feature Selection: Picking the best subset of features.

o Feature Learning: Automatically generating features from unstructured data (text,

images).

Chapter 2: Numeric Data and Scaling Techniques

1. Numeric Data:
o Easily ingestible by mathematical models.

o First check: Does the magnitude matter (positive/negative)?

o Consider the feature scale (range of values).

2. Binarization & Quantization:

o Convert numeric data into a binary format or grouped into bins to simplify
interpretation and processing.

3. Normalization:

o Scaling technique that transforms data to a common range, usually between 0 and 1.

o Useful when feature distribution is unknown.

o Min-Max scaling formula:

 Xn=X−XminXmax−XminX_n = \frac{X - X_{min}}{X_{max} - X_{min}}Xn=Xmax

−XminX−Xmin

4. Standardization:

o Adjusts data to have a mean of 0 and a standard deviation of 1.

o Useful for models involving distance measures (e.g., KNN, PCA).

o Standardization formula:

 X′=X−μσX' = \frac{X - \mu}{\sigma}X′=σX−μ

5. Normalization vs. Standardization:

o Normalization: Scales between fixed values, affected by outliers, good for unknown
distributions.
o Standardization: Not restricted to a specific range, less affected by outliers, good for
Gaussian distributions.

6. Feature Selection:

o Filtering: Preprocesses features by removing irrelevant ones.

o Wrapper Methods: Evaluates subsets of features but is computationally expensive.

o Embedded Methods: Feature selection occurs during model training.

Questions :
Chapter 1: Data and Models

Q1. What is data, and what are its characteristics?

A1. Data represents observations of real-world phenomena, providing small, fragmented insights into
reality. It is often incomplete or noisy and is used to extract meaningful answers, though this process
involves complexity and iteration.

Q2. What is feature engineering, and why is it important?

A2. Feature engineering is the process of creating and selecting features (numeric representations of
data) that are most appropriate for the task, data, and model. It’s essential because good features
simplify the modeling process and improve the model's performance, while poor features complicate it.

Q3. What are the different types of feature engineering?

A3.

1. Feature Improvement: Enhancing existing features by transformations or imputing missing data.

2. Feature Construction: Creating new features from existing ones.

3. Feature Extraction: Automatically creating new, often uninterpretable features using

algorithms.

4. Feature Selection: Choosing the best subset of features.

5. Feature Learning: Automatically generating new features from unstructured data like text or
images.

Q4. What is a feature, and what role does it play in machine learning?
A4. A feature is a numeric representation of raw data that is relevant to the task at hand and can be
processed by the model. The right features simplify the modeling step and improve the model’s ability
to complete the task effectively.

Q5. What are the risks of having too few or too many features in a model?
A5. Too few features result in a model that cannot perform the task adequately. Too many features,
especially irrelevant ones, make the model expensive and difficult to train, and can negatively impact its
performance.

Chapter 2: Numeric Data and Scaling Techniques

Q6. What is normalization, and when should it be used?

A6. Normalization is a scaling technique that transforms numeric data to a common scale, usually
between 0 and 1. It is useful when the feature distribution is unknown and ensures that different
features contribute proportionally to the model. It is typically used when there are varying scales in the
data.

Q7. What is the difference between normalization and standardization?

A7.

 Normalization scales feature values between a fixed range (usually [0, 1]) and is affected by
outliers.

 Standardization adjusts the data to have a mean of 0 and a standard deviation of 1 and is less
affected by outliers. It is useful when the feature distribution follows a Gaussian pattern.

Q8. What are the steps in handling numeric data in machine learning?
A8.

1. Check whether the magnitude (positive/negative) matters.

2. Consider the scale of the features (maximum and minimum values).

3. Apply appropriate scaling techniques such as normalization or standardization to prepare the
data for the model.

Q9. What is feature selection, and what are its main approaches?
A9. Feature selection is the process of choosing the most relevant subset of features for a model. The
main approaches are:

1. Filtering: Preprocessing to remove irrelevant features.

2. Wrapper Methods: Trying out subsets of features (computationally expensive).

3. Embedded Methods: Performing feature selection during model training.

Q10. What is standardization, and when is it preferable over normalization?

A10. Standardization is a scaling technique where feature values are transformed to have a mean of 0
and a standard deviation of 1. It is preferable over normalization when data follows a Gaussian
distribution and when the model involves distance-based techniques (e.g., KNN, PCA), as it is less
sensitive to outliers.

Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
Feature Engineering in ML Guide
No ratings yet
Feature Engineering in ML Guide
6 pages
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
No ratings yet
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
69 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
Unit-II Feature Engineering - Removed
No ratings yet
Unit-II Feature Engineering - Removed
158 pages
NN 7
No ratings yet
NN 7
26 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
6 pages
ML - Unit-2 FULL - Feature Engineering Theory-13!09!24-1
No ratings yet
ML - Unit-2 FULL - Feature Engineering Theory-13!09!24-1
29 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Feature Engineering
No ratings yet
Feature Engineering
6 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
ML-Unit 3
No ratings yet
ML-Unit 3
58 pages
نسخة من prep
No ratings yet
نسخة من prep
17 pages
NOTES
No ratings yet
NOTES
9 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
139 pages
What Is Feature Engineering
No ratings yet
What Is Feature Engineering
2 pages
Feature Engineering for ML Experts
No ratings yet
Feature Engineering for ML Experts
11 pages
Feature Engineering for BE Students
No ratings yet
Feature Engineering for BE Students
91 pages
Session 7 Feature Selection & Dimensionality Reduction
No ratings yet
Session 7 Feature Selection & Dimensionality Reduction
20 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
86 pages
What Is A Feature?: 5.5M 732 Oops Concepts in Java
No ratings yet
What Is A Feature?: 5.5M 732 Oops Concepts in Java
20 pages
Machine - Learning Note Modul2
No ratings yet
Machine - Learning Note Modul2
20 pages
Unit 4
No ratings yet
Unit 4
25 pages
Machine Learning Dataset Handling Guide
No ratings yet
Machine Learning Dataset Handling Guide
15 pages
Module 4
No ratings yet
Module 4
96 pages
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
No ratings yet
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
35 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
Data Preprocessing & Feature Engineering
No ratings yet
Data Preprocessing & Feature Engineering
12 pages
Deep Learning Vocabulary
No ratings yet
Deep Learning Vocabulary
6 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
ML Da
No ratings yet
ML Da
55 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
AI Feature Engineering in Detail
No ratings yet
AI Feature Engineering in Detail
12 pages
Feature Engineering and Normalization
No ratings yet
Feature Engineering and Normalization
7 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Unit II
No ratings yet
Unit II
119 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
20 Questions On Feature Engineering and Eda
No ratings yet
20 Questions On Feature Engineering and Eda
9 pages
UT-1-Machine Learning Lecture Notes-2
No ratings yet
UT-1-Machine Learning Lecture Notes-2
11 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Data
No ratings yet
Data
36 pages
Life Lesson
No ratings yet
Life Lesson
13 pages
Unit 2 Feature Engineering
No ratings yet
Unit 2 Feature Engineering
64 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Study Material For Machine Learning - 1 - 1754721598318
No ratings yet
Study Material For Machine Learning - 1 - 1754721598318
18 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Processed
No ratings yet
Processed
70 pages
DC Circuits
No ratings yet
DC Circuits
7 pages
G17 - Dynamics - Rectilinear Motion - Problems
No ratings yet
G17 - Dynamics - Rectilinear Motion - Problems
27 pages
121/1 - Mathematics - Paper 1 2024 - 2 Hours Name Adm No Class Date
No ratings yet
121/1 - Mathematics - Paper 1 2024 - 2 Hours Name Adm No Class Date
16 pages
Skewness and Kurtosis Guide
No ratings yet
Skewness and Kurtosis Guide
10 pages
Quarterly Test in Mathematics 5
No ratings yet
Quarterly Test in Mathematics 5
5 pages
Lesson 5-2 Option Valuation
No ratings yet
Lesson 5-2 Option Valuation
49 pages
Chapter 2: Linked Lists: - Linear List Concepts - Linked Lists - Complex Linked Lists
100% (1)
Chapter 2: Linked Lists: - Linear List Concepts - Linked Lists - Complex Linked Lists
53 pages
2023 STAAR Report Card Overview
No ratings yet
2023 STAAR Report Card Overview
4 pages
书-时变复杂媒质的电磁学Electromagnetics of Time Varying Complex Media - Frequency and Polarization Transformer
100% (2)
书-时变复杂媒质的电磁学Electromagnetics of Time Varying Complex Media - Frequency and Polarization Transformer
538 pages
Algebra 2 Test
No ratings yet
Algebra 2 Test
1 page
PMU-Based Transmission Line Identification
No ratings yet
PMU-Based Transmission Line Identification
4 pages
DLL Special Product
100% (1)
DLL Special Product
6 pages
Rising Stars Mathematics Practice Book Class 2 Answers
No ratings yet
Rising Stars Mathematics Practice Book Class 2 Answers
53 pages
MongoDB Aggregation Guide 2.6.4
No ratings yet
MongoDB Aggregation Guide 2.6.4
46 pages
Physics 71 Notes - Finals
100% (1)
Physics 71 Notes - Finals
21 pages
Microsoft Excel Formulas and Functions (Office 2021 and Microsoft 365) Paul Mcfedries Download
100% (1)
Microsoft Excel Formulas and Functions (Office 2021 and Microsoft 365) Paul Mcfedries Download
139 pages
Algebra Equation Comparison Task
No ratings yet
Algebra Equation Comparison Task
2 pages
Research Sales
No ratings yet
Research Sales
184 pages
BusMath Module 1
No ratings yet
BusMath Module 1
15 pages
2x2 Contingency Tables: Handy Reference Sheet - Hrp/Stats 261, Discrete Data
No ratings yet
2x2 Contingency Tables: Handy Reference Sheet - Hrp/Stats 261, Discrete Data
6 pages
Solving Diffusion Equation with Fluidlearn
No ratings yet
Solving Diffusion Equation with Fluidlearn
8 pages
Class 11 Work Energy Power Notes
100% (1)
Class 11 Work Energy Power Notes
8 pages
Rowley 2015
No ratings yet
Rowley 2015
19 pages
Beaconhouse Math Olympiad Guide
No ratings yet
Beaconhouse Math Olympiad Guide
9 pages
Detailed Lesson Plan in Math 9: 1. To Illustrate Quadratic Equations
No ratings yet
Detailed Lesson Plan in Math 9: 1. To Illustrate Quadratic Equations
2 pages
Shared Library Project Report
No ratings yet
Shared Library Project Report
33 pages
PV Power Forecast Using A Nonparametric PV Model: Sciencedirect
No ratings yet
PV Power Forecast Using A Nonparametric PV Model: Sciencedirect
15 pages
Understanding the DES Encryption Algorithm
No ratings yet
Understanding the DES Encryption Algorithm
14 pages
Data Handling Data Handling Data Handling Data Handling Data Handling
No ratings yet
Data Handling Data Handling Data Handling Data Handling Data Handling
40 pages

Summery of Feature Eng

Uploaded by

Summery of Feature Eng

Uploaded by

Chapter 1: Data and Models

o Data represents observations of real-world phenomena.

o It provides fragmented insights into reality and is often incomplete or noisy.

o The goal is to extract meaningful answers from data through workflows.

o Mathematical/statistical models describe the relationships between different aspects of

o Importance of Feature Quantity:

 Too few features: The model fails to perform.

 Too many features: It becomes expensive and difficult to train.

5. Machine Learning Workflow:

o Good features simplify the modeling process.

6. Feature Engineering Types:

o Feature Construction: Creating new interpretable features from existing ones.

o Feature Extraction: Automatically generating new features based on parametric

o Feature Selection: Picking the best subset of features.

o Feature Learning: Automatically generating features from unstructured data (text,

Chapter 2: Numeric Data and Scaling Techniques

o First check: Does the magnitude matter (positive/negative)?

o Consider the feature scale (range of values).

2. Binarization & Quantization:

o Useful when feature distribution is unknown.

o Min-Max scaling formula:

 Xn=X−XminXmax−XminX_n = \frac{X - X_{min}}{X_{max} - X_{min}}Xn=Xmax

o Adjusts data to have a mean of 0 and a standard deviation of 1.

o Useful for models involving distance measures (e.g., KNN, PCA).

 X′=X−μσX' = \frac{X - \mu}{\sigma}X′=σX−μ

5. Normalization vs. Standardization:

o Filtering: Preprocesses features by removing irrelevant ones.

o Wrapper Methods: Evaluates subsets of features but is computationally expensive.

o Embedded Methods: Feature selection occurs during model training.

Q1. What is data, and what are its characteristics?

Q2. What is feature engineering, and why is it important?

Q3. What are the different types of feature engineering?

1. Feature Improvement: Enhancing existing features by transformations or imputing missing data.

2. Feature Construction: Creating new features from existing ones.

3. Feature Extraction: Automatically creating new, often uninterpretable features using

4. Feature Selection: Choosing the best subset of features.

Chapter 2: Numeric Data and Scaling Techniques

Q6. What is normalization, and when should it be used?

Q7. What is the difference between normalization and standardization?

1. Check whether the magnitude (positive/negative) matters.

2. Consider the scale of the features (maximum and minimum values).

1. Filtering: Preprocessing to remove irrelevant features.

2. Wrapper Methods: Trying out subsets of features (computationally expensive).

3. Embedded Methods: Performing feature selection during model training.

Q10. What is standardization, and when is it preferable over normalization?

You might also like