0% found this document useful (0 votes)
43 views4 pages

Summery of Feature Eng

Uploaded by

h.k.osama18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views4 pages

Summery of Feature Eng

Uploaded by

h.k.osama18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Chapter 1: Data and Models

1. Data:

o Data represents observations of real-world phenomena.

o It provides fragmented insights into reality and is often incomplete or noisy.

o The goal is to extract meaningful answers from data through workflows.

2. Tasks:

o Data is used to answer questions, but the path is often complex, with many false starts
and iterative processes.

3. Models:

o Mathematical/statistical models describe the relationships between different aspects of


data.

o Feature: A numeric representation of raw data relevant to the task and model.

4. Feature Engineering:

o Process of selecting the right features based on the task, data, and model.

o Importance of Feature Quantity:

 Too few features: The model fails to perform.

 Too many features: It becomes expensive and difficult to train.

5. Machine Learning Workflow:

o Good features simplify the modeling process.

o Bad features complicate model performance and may require complex solutions.

6. Feature Engineering Types:

o Feature Improvement: Making features more usable (e.g., imputing missing data).

o Feature Construction: Creating new interpretable features from existing ones.

o Feature Extraction: Automatically generating new features based on parametric


assumptions.

o Feature Selection: Picking the best subset of features.

o Feature Learning: Automatically generating features from unstructured data (text,


images).

Chapter 2: Numeric Data and Scaling Techniques

1. Numeric Data:
o Easily ingestible by mathematical models.

o First check: Does the magnitude matter (positive/negative)?

o Consider the feature scale (range of values).

2. Binarization & Quantization:

o Convert numeric data into a binary format or grouped into bins to simplify
interpretation and processing.

3. Normalization:

o Scaling technique that transforms data to a common range, usually between 0 and 1.

o Useful when feature distribution is unknown.

o Min-Max scaling formula:

 Xn=X−XminXmax−XminX_n = \frac{X - X_{min}}{X_{max} - X_{min}}Xn=Xmax


−XminX−Xmin

4. Standardization:

o Adjusts data to have a mean of 0 and a standard deviation of 1.

o Useful for models involving distance measures (e.g., KNN, PCA).

o Standardization formula:

 X′=X−μσX' = \frac{X - \mu}{\sigma}X′=σX−μ

5. Normalization vs. Standardization:

o Normalization: Scales between fixed values, affected by outliers, good for unknown
distributions.
o Standardization: Not restricted to a specific range, less affected by outliers, good for
Gaussian distributions.

6. Feature Selection:

o Filtering: Preprocesses features by removing irrelevant ones.

o Wrapper Methods: Evaluates subsets of features but is computationally expensive.

o Embedded Methods: Feature selection occurs during model training.

Questions :
Chapter 1: Data and Models

Q1. What is data, and what are its characteristics?


A1. Data represents observations of real-world phenomena, providing small, fragmented insights into
reality. It is often incomplete or noisy and is used to extract meaningful answers, though this process
involves complexity and iteration.

Q2. What is feature engineering, and why is it important?


A2. Feature engineering is the process of creating and selecting features (numeric representations of
data) that are most appropriate for the task, data, and model. It’s essential because good features
simplify the modeling process and improve the model's performance, while poor features complicate it.

Q3. What are the different types of feature engineering?


A3.

1. Feature Improvement: Enhancing existing features by transformations or imputing missing data.

2. Feature Construction: Creating new features from existing ones.

3. Feature Extraction: Automatically creating new, often uninterpretable features using


algorithms.

4. Feature Selection: Choosing the best subset of features.

5. Feature Learning: Automatically generating new features from unstructured data like text or
images.

Q4. What is a feature, and what role does it play in machine learning?
A4. A feature is a numeric representation of raw data that is relevant to the task at hand and can be
processed by the model. The right features simplify the modeling step and improve the model’s ability
to complete the task effectively.

Q5. What are the risks of having too few or too many features in a model?
A5. Too few features result in a model that cannot perform the task adequately. Too many features,
especially irrelevant ones, make the model expensive and difficult to train, and can negatively impact its
performance.

Chapter 2: Numeric Data and Scaling Techniques

Q6. What is normalization, and when should it be used?


A6. Normalization is a scaling technique that transforms numeric data to a common scale, usually
between 0 and 1. It is useful when the feature distribution is unknown and ensures that different
features contribute proportionally to the model. It is typically used when there are varying scales in the
data.

Q7. What is the difference between normalization and standardization?


A7.

 Normalization scales feature values between a fixed range (usually [0, 1]) and is affected by
outliers.

 Standardization adjusts the data to have a mean of 0 and a standard deviation of 1 and is less
affected by outliers. It is useful when the feature distribution follows a Gaussian pattern.

Q8. What are the steps in handling numeric data in machine learning?
A8.

1. Check whether the magnitude (positive/negative) matters.

2. Consider the scale of the features (maximum and minimum values).


3. Apply appropriate scaling techniques such as normalization or standardization to prepare the
data for the model.

Q9. What is feature selection, and what are its main approaches?
A9. Feature selection is the process of choosing the most relevant subset of features for a model. The
main approaches are:

1. Filtering: Preprocessing to remove irrelevant features.

2. Wrapper Methods: Trying out subsets of features (computationally expensive).

3. Embedded Methods: Performing feature selection during model training.

Q10. What is standardization, and when is it preferable over normalization?


A10. Standardization is a scaling technique where feature values are transformed to have a mean of 0
and a standard deviation of 1. It is preferable over normalization when data follows a Gaussian
distribution and when the model involves distance-based techniques (e.g., KNN, PCA), as it is less
sensitive to outliers.

You might also like