0% found this document useful (0 votes)
16 views

3_AML _Lecture 3_Feature Engg

Feature scaling is a crucial preprocessing step in machine learning that standardizes independent features to improve model performance and accuracy. Techniques like normalization and standardization help ensure that features contribute equally to the learning process, preventing larger-magnitude features from dominating. Additionally, feature scaling aids in avoiding numerical instability and enhances the convergence of algorithms such as gradient descent and K-Nearest Neighbors.

Uploaded by

hetvibhora192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

3_AML _Lecture 3_Feature Engg

Feature scaling is a crucial preprocessing step in machine learning that standardizes independent features to improve model performance and accuracy. Techniques like normalization and standardization help ensure that features contribute equally to the learning process, preventing larger-magnitude features from dominating. Additionally, feature scaling aids in avoiding numerical instability and enhances the convergence of algorithms such as gradient descent and K-Nearest Neighbors.

Uploaded by

hetvibhora192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Feature Scaling

Scale data for better performance of Machine Learning


Model
https://towardsdatascience.com/all-about-feature-scaling-bcc0ad75cb35
Intro

• Feature Scaling is a technique to standardize the


independent features present in the data in a fixed range.

• It is a vital preprocessing step in machine learning that


involves transforming numerical features to a common scale.

• It plays a major role in ensuring accurate and efficient model


training and performance.

• Scaling techniques aim to normalize the range, distribution,


and magnitude of features, reducing potential biases and
inconsistencies that may arise from variations in their values.

• If feature scaling is not done, then a machine learning


algorithm tends to weigh greater values, higher and
consider smaller values as the lower values, regardless of
the unit of the values.

• Crucial for ensuring features are comparable in terms of


magnitude, values, and units.
Why Use Feature Scaling?

 Ensures Comparable Scales Across Features


• Feature normalization helps prevent larger-magnitude features from dominating the learning process.
• Guarantees equal contribution of each feature to the model.
 Enhances Algorithm Performance
• Improves Convergence and Accuracy
• Gradient descent-based algorithms converge faster with scaled features.
• Distance-based algorithms (e.g., K-Nearest Neighbors) rely on scaled data for accurate distance
measurement.
• Support Vector Machines perform better with standardized data.
 Prevents Numerical Instability
• Avoids Computational Issues
• Prevents overflow/underflow problems in distance calculations and matrix operations.
• Ensures stable and reliable computations, reducing the risk of errors in model predictions.

What is Feature Scaling and Why Does Machine Learning Need It? | Medium
• Suppose we have one data set of age ,salary and target as 1,0 for
product purchase or not.
• It’s a classification problem and we are applying KNN model for this
classification problem.
• We can clearly see that when we will calculate distance then salary
factor will dominate and KNN will not be able to perform nicely
In simple word: NO need to discuss same as previous

• Importance of Feature Scaling

• Equal Contribution: Scaling ensures that each feature has an equal impact on the learning

process.

• Improved Model Performance: Helps models like SVMs, k-nearest neighbors, and neural

networks perform better by improving convergence and accuracy.

• Prevent Numerical Instability: Avoids issues in calculations, especially with algorithms

sensitive to distance metrics.

• Image: Visualization showing improved convergence in gradient descent with scaling.


• Impact of Feature Scaling on Algorithms
• Gradient Descent Algorithms: Scaling allows gradient descent to converge faster and more
consistently by taking uniform steps towards the optimum.

• Distance-Based Algorithms: Algorithms like KNN rely on distance measurements, which


can be skewed by unscaled features.

• Impact on Performance: Without scaling, larger features can dominate distance calculations,
leading to biased results.

• Image: Diagram illustrating KNN with scaled vs. unscaled data.


Feature Scaling Type

Normalization (Min-Max Scaling):


•Description: Normalization transforms the features to a fixed range, typically between 0 and 1.
•Formula:
•Use Case: Best suited for scenarios where the data does not follow a Gaussian (normal) distribution.

Standardization (Z-Score Scaling):


Description: Standardization transforms the features so they have a mean of 0 and a standard deviation of 1.
Formula:

Use Case: Preferable when the data follows a normal distribution or when the distribution is unknown.
Key Point: Maintains the shape of the original distribution, does not restrict the feature to a specific range.
Standardization
Age salary
25
• Suppose we have a dataset having age and salary 26
15
• 500 rows then to standardize this we have to find xi-mean/SD
18
• Here we will get 500 new numbers after transforming. 19
.
• Now these new numbers mean will be 0 and SD will be 1 20
Understanding standardization
Geometric intuition

We are doing mean centering


We are spreading or reducing the spread in terms of SD
When to use Standardization
Normalization (Min –max)
• Normalization, a vital aspect of Feature Scaling, is a data preprocessing technique employed
to standardize the values of features in a dataset, bringing them to a common scale.

• In machine learning it is always said that we should not work with units like weight ,height
etc.So we should bring things in a common scale to eliminate the units.

• This process enhances data analysis and modeling accuracy by mitigating the influence of
varying scales on machine learning models.

• Normalization is a scaling technique in which values are shifted and rescaled so that they
end up ranging between 0 and 1. It is also known as Min-Max scaling.
Geometric intuition

• When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0
• On the other hand, when the value of X is the maximum value in the column, the numerator is
equal to the denominator, and thus the value of X’ is 1
• If the value of X is between the minimum and the maximum value, then the value of X’ is between 0
and 1
Type of Normalization
• min-max
• Mean normalization : used for cantered data like standardization hence
rarely used

• Max abs scaling : used when data is sparse i.e. too many zeros
Robust scaling

• Very good for outliers

the scaled values will have their median and IQR set to 0 and 1, respectively.
It is robust to outliers
Is feature scaling required? Depend on algorithm like if we are working with decision tree,xgboost
etc then no nee

Standardization is used mostly

Normalization When you know min and max like in image processing when we work on CNN
where we work on colored images, and each image has min 0 and max 255

If outliers use robust

https://proclusacademy.com/blog/robust-scaler-outliers/
Categorical Data
•Categorical data refers to variables that represent characteristics and can be divided into distinct groups or categories.
•Unlike numerical data, categorical data doesn’t involve numbers or measurements but rather labels or names.
•Common Use Cases:
•Grouping data by attributes like gender, nationality, brand, or type.
•Often used in statistical analysis to count occurrences, create frequency tables, or generate bar charts.

•Types of Categorical Data


•Nominal Data
• Definition:
• Nominal data consists of categories that are purely labels, with no specific order or ranking.
• Each category is unique, and there's no inherent logical sequence among them.
• Examples:
• Colors: Red, Blue, Green, Yellow (No color is “higher” than another).
• Types of Animals: Dog, Cat, Bird, Fish (Each type is distinct but not ordered).
• Brands: Nike, Adidas, Puma (Brand names without any ranking).
• Key Characteristics:
• No Order: Categories cannot be ordered or ranked.
• Mutually Exclusive: Each observation fits into one and only one category.
• Analysis Methods: Mode, frequency distribution, chi-square test for independence.
2. Ordinal Data
• Ordinal data involves categories that have a clear, meaningful order or ranking among them.
• However, the intervals between these categories are not necessarily equal or measurable.
•Examples:
• Survey Ratings: Poor, Fair, Good, Excellent (Ordered by quality).
• Education Levels: High School, Bachelor’s, Master’s, PhD (Ordered by level of education).
• Socioeconomic Status: Low, Middle, High (Ordered by status).
•Key Characteristics:
• Ordered: Categories can be ranked in a meaningful way.
• Unknown Intervals: Differences between ranks are not quantified.
• Analysis Methods: Median, percentile, non-parametric tests like the Mann-Whitney U test.

• Visualization & Analysis


• Visualization:
• Bar Charts: Used to represent the frequency of categories.
• Pie Charts: Used to show the proportion of categories within the whole.
• Common Analytical Techniques:
• Chi-Square Test: To assess relationships between categorical variables.
• Mode Analysis: To find the most common category.
• Logistic Regression: Often used when categorical data is the dependent variable.
Handling

• Some algorithms can work with categorical data directly (e.g., decision trees).

• Challenge: Many machine learning algorithms cannot operate on label data


directly and require all input and output variables to be numeric.

• Reason: This is generally a constraint of efficient algorithm implementation


rather than a fundamental limitation.

• Solution: Categorical data must be converted to numerical form, and


predictions may need to be converted back to categorical form for presentation
or application.
https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
Encoding Categorical data
• This involves two steps:
• Integer Encoding or label encoding or ordinal coding
• One-Hot Encoding

• 1. Integer Encoding
• As a first step, each unique category value is assigned an integer value.
• For example, “red” is 1, “green” is 2, and “blue” is 3.
• This is called a label encoding or an integer encoding and is easily reversible.
• For some variables, this may be enough.
• The integer values have a natural ordered relationship between each other and machine learning algorithms
may be able to understand and harness this relationship.
• For example, ordinal variables like the “place” example above would be a good example where a label
encoding would be sufficient.
• In sci-kitt learn, if the output is categorical we go for Label encoding.
• No major difference the method same
Use the ordinal encoding file,
One hot
• For categorical variables where no such ordinal relationship exists, the integer encoding is not enough.

• In fact, using this encoding and allowing the model to assume a natural ordering between categories may
result in poor performance or unexpected results (predictions halfway between categories).

• In this case, a one-hot encoding can be applied to the integer representation. This is where the integer
encoded variable is removed and a new binary variable is added for each unique integer value.

• Binary Vector Representation: Each category is represented as a binary vector, where only one element is
"1" (indicating the presence of that category) and the rest are "0".
• Example:
• Categories: [Red, Green, Blue]

• One-Hot Encoded Representation:

• Red: [1, 0, 0]

• Green: [0, 1, 0]

• Blue: [0, 0, 1]
Importance of One-Hot Encoding in Machine Learning

• Avoiding Ordinal Relationships:


• Prevents the model from assuming an ordinal relationship between categories (e.g., "Red" > "Green" >
"Blue"), which may not exist.

• Compatibility with Algorithms:


• Many ML algorithms require numerical input, and one-hot encoding allows categorical data to be used
effectively.

• Capturing Information:
• Each category is treated independently, ensuring that the model doesn't make incorrect assumptions about
the relationship between categories.

• Example :
• In logistic regression, treating categories as continuous variables without one-hot encoding can lead to
incorrect predictions.
Pros and Cons of One-Hot Encoding

•Advantages:
• Simplicity: Easy to implement and understand.

• No Assumptions: Does not impose an ordinal relationship between categories.

• Algorithm Compatibility: Works well with distance-based algorithms like KNN and linear models.

•Disadvantages:

• Curse of Dimensionality: For categorical features with a large number of unique values, the number
of dimensions increases significantly.

• Sparsity: The resulting vectors are sparse (many zeros), leading to inefficiencies in storage and
computation.

• Inapplicability for High-Cardinality Features: When categories are numerous (e.g., ZIP codes), one-
hot encoding becomes impractical.
Scenarios for Applying One-Hot Encoding
•Applicable Scenarios:
• Small Categorical Features: Works well for features with a limited number of unique categories (e.g., days
of the week).
• Nominal Data: Ideal for features where the categories do not have an intrinsic order (e.g., color, type).

•Not Ideal For:


• High Cardinality Features: Consider alternatives like target encoding or embedding for features with many
unique categories.
• Tree-Based Algorithms: Algorithms like decision trees and random forests can handle categorical data
without one-hot encoding.
What is Column Transformer?
•A utility in Scikit-Learn that allows for different preprocessing steps to be applied to different
subsets of features in a dataset.
•Handling Mixed Data Types: Essential when dealing with datasets containing both numerical
and categorical features.

•Example :
•Dataset Features: Age (Numerical), Gender (Categorical), Income (Numerical)

•Different Preprocessing: Age and Income might need scaling, while Gender
might require one-hot encoding.
Column transformer

Working:
Defining Transformers:
•Specify the preprocessing steps for different columns.
•Example:
•Numerical Columns: Apply StandardScaler to
standardize features.
•Categorical Columns: Apply OneHotEncoder to
transform categories into binary vectors.
•Combining Transformers:
•The ColumnTransformer combines these
preprocessing steps and applies them to the
corresponding columns simultaneously.
Mathematical Transformer

• Function transformer
• Power Transformer
• Binning and binarization
Feature splitting and
construction
• Feature splitting is a technique in machine learning that involves breaking down a
single feature into multiple features.

• This process generates more informative features that provide greater insight into the
relationships between input variables and the target variable.

• Purpose of Feature Splitting

• Convert continuous variables into categorical variables.

• Extract information from date and time features.

• Break down text features into smaller units for better analysis.
https://medium.com/@brijesh_soni/topic-11-feature-construction-splitting-b116c60c4b2f#:~:text=Feature%20splitting%20is%20a%20technique,variables%20and%20the%20target%20variable .
Benefits of Feature Splitting

• Enhances the performance of a machine-learning model by providing more relevant information.

• Helps models better capture relationships between features and the target variable.

• Increases the interpretability and effectiveness of machine-learning models.

Techniques for Feature Splitting


•Binning:
• Divides continuous variables into discrete intervals or bins.
• Useful for non-linear relationships.
•One-Hot Encoding:
• Converts categorical variables into binary features for each category.
• Each row has a binary value (0 or 1) for the corresponding category.

•Text Splitting:
• Splits text features into smaller units like words or phrases (tokenization, stemming).
•Date and Time Splitting:
• Extracts components like day, month, hour from date/time features.
• Useful for time-series data.
Feature Construction
•Feature construction involves creating new features from existing data to enhance the information available for model
training.
•Importance:
•Improves Model Accuracy: Well-constructed features can lead to better model performance by providing additional
insights.
•Captures Complex Relationships: Helps to model non-linear relationships that simple features may miss.
•Reduces Dimensionality: Effective feature construction can replace multiple simpler features with a single, more
informative feature.

•Key Techniques:

•Mathematical Transformations:
• Example: Creating polynomial features (e.g., x2 or x3).
•Aggregation:
• Example: Calculating the mean or sum of sales over different time periods.
•Domain-Specific Features:
• Example: For financial data, deriving ratios like debt-to-equity or creating features based on fiscal quarters.
Handling mixed data and time date
variables
Use L&T
Linear Regression

Simple
Multiple
Polynomial
Regression Matrices
Gradient Descent
Cost Function
Learning rate
Regularization
Logistic
Precision, recall, F1 score’

You might also like