3_AML _Lecture 3_Feature Engg
3_AML _Lecture 3_Feature Engg
What is Feature Scaling and Why Does Machine Learning Need It? | Medium
• Suppose we have one data set of age ,salary and target as 1,0 for
product purchase or not.
• It’s a classification problem and we are applying KNN model for this
classification problem.
• We can clearly see that when we will calculate distance then salary
factor will dominate and KNN will not be able to perform nicely
In simple word: NO need to discuss same as previous
• Equal Contribution: Scaling ensures that each feature has an equal impact on the learning
process.
• Improved Model Performance: Helps models like SVMs, k-nearest neighbors, and neural
• Impact on Performance: Without scaling, larger features can dominate distance calculations,
leading to biased results.
Use Case: Preferable when the data follows a normal distribution or when the distribution is unknown.
Key Point: Maintains the shape of the original distribution, does not restrict the feature to a specific range.
Standardization
Age salary
25
• Suppose we have a dataset having age and salary 26
15
• 500 rows then to standardize this we have to find xi-mean/SD
18
• Here we will get 500 new numbers after transforming. 19
.
• Now these new numbers mean will be 0 and SD will be 1 20
Understanding standardization
Geometric intuition
• In machine learning it is always said that we should not work with units like weight ,height
etc.So we should bring things in a common scale to eliminate the units.
• This process enhances data analysis and modeling accuracy by mitigating the influence of
varying scales on machine learning models.
• Normalization is a scaling technique in which values are shifted and rescaled so that they
end up ranging between 0 and 1. It is also known as Min-Max scaling.
Geometric intuition
• When the value of X is the minimum value in the column, the numerator will be 0, and hence X’ is 0
• On the other hand, when the value of X is the maximum value in the column, the numerator is
equal to the denominator, and thus the value of X’ is 1
• If the value of X is between the minimum and the maximum value, then the value of X’ is between 0
and 1
Type of Normalization
• min-max
• Mean normalization : used for cantered data like standardization hence
rarely used
• Max abs scaling : used when data is sparse i.e. too many zeros
Robust scaling
the scaled values will have their median and IQR set to 0 and 1, respectively.
It is robust to outliers
Is feature scaling required? Depend on algorithm like if we are working with decision tree,xgboost
etc then no nee
Normalization When you know min and max like in image processing when we work on CNN
where we work on colored images, and each image has min 0 and max 255
https://proclusacademy.com/blog/robust-scaler-outliers/
Categorical Data
•Categorical data refers to variables that represent characteristics and can be divided into distinct groups or categories.
•Unlike numerical data, categorical data doesn’t involve numbers or measurements but rather labels or names.
•Common Use Cases:
•Grouping data by attributes like gender, nationality, brand, or type.
•Often used in statistical analysis to count occurrences, create frequency tables, or generate bar charts.
• Some algorithms can work with categorical data directly (e.g., decision trees).
• 1. Integer Encoding
• As a first step, each unique category value is assigned an integer value.
• For example, “red” is 1, “green” is 2, and “blue” is 3.
• This is called a label encoding or an integer encoding and is easily reversible.
• For some variables, this may be enough.
• The integer values have a natural ordered relationship between each other and machine learning algorithms
may be able to understand and harness this relationship.
• For example, ordinal variables like the “place” example above would be a good example where a label
encoding would be sufficient.
• In sci-kitt learn, if the output is categorical we go for Label encoding.
• No major difference the method same
Use the ordinal encoding file,
One hot
• For categorical variables where no such ordinal relationship exists, the integer encoding is not enough.
• In fact, using this encoding and allowing the model to assume a natural ordering between categories may
result in poor performance or unexpected results (predictions halfway between categories).
• In this case, a one-hot encoding can be applied to the integer representation. This is where the integer
encoded variable is removed and a new binary variable is added for each unique integer value.
• Binary Vector Representation: Each category is represented as a binary vector, where only one element is
"1" (indicating the presence of that category) and the rest are "0".
• Example:
• Categories: [Red, Green, Blue]
• Red: [1, 0, 0]
• Green: [0, 1, 0]
• Blue: [0, 0, 1]
Importance of One-Hot Encoding in Machine Learning
• Capturing Information:
• Each category is treated independently, ensuring that the model doesn't make incorrect assumptions about
the relationship between categories.
• Example :
• In logistic regression, treating categories as continuous variables without one-hot encoding can lead to
incorrect predictions.
Pros and Cons of One-Hot Encoding
•Advantages:
• Simplicity: Easy to implement and understand.
• Algorithm Compatibility: Works well with distance-based algorithms like KNN and linear models.
•Disadvantages:
• Curse of Dimensionality: For categorical features with a large number of unique values, the number
of dimensions increases significantly.
• Sparsity: The resulting vectors are sparse (many zeros), leading to inefficiencies in storage and
computation.
• Inapplicability for High-Cardinality Features: When categories are numerous (e.g., ZIP codes), one-
hot encoding becomes impractical.
Scenarios for Applying One-Hot Encoding
•Applicable Scenarios:
• Small Categorical Features: Works well for features with a limited number of unique categories (e.g., days
of the week).
• Nominal Data: Ideal for features where the categories do not have an intrinsic order (e.g., color, type).
•Example :
•Dataset Features: Age (Numerical), Gender (Categorical), Income (Numerical)
•Different Preprocessing: Age and Income might need scaling, while Gender
might require one-hot encoding.
Column transformer
Working:
Defining Transformers:
•Specify the preprocessing steps for different columns.
•Example:
•Numerical Columns: Apply StandardScaler to
standardize features.
•Categorical Columns: Apply OneHotEncoder to
transform categories into binary vectors.
•Combining Transformers:
•The ColumnTransformer combines these
preprocessing steps and applies them to the
corresponding columns simultaneously.
Mathematical Transformer
• Function transformer
• Power Transformer
• Binning and binarization
Feature splitting and
construction
• Feature splitting is a technique in machine learning that involves breaking down a
single feature into multiple features.
• This process generates more informative features that provide greater insight into the
relationships between input variables and the target variable.
• Break down text features into smaller units for better analysis.
https://medium.com/@brijesh_soni/topic-11-feature-construction-splitting-b116c60c4b2f#:~:text=Feature%20splitting%20is%20a%20technique,variables%20and%20the%20target%20variable .
Benefits of Feature Splitting
• Helps models better capture relationships between features and the target variable.
•Text Splitting:
• Splits text features into smaller units like words or phrases (tokenization, stemming).
•Date and Time Splitting:
• Extracts components like day, month, hour from date/time features.
• Useful for time-series data.
Feature Construction
•Feature construction involves creating new features from existing data to enhance the information available for model
training.
•Importance:
•Improves Model Accuracy: Well-constructed features can lead to better model performance by providing additional
insights.
•Captures Complex Relationships: Helps to model non-linear relationships that simple features may miss.
•Reduces Dimensionality: Effective feature construction can replace multiple simpler features with a single, more
informative feature.
•Key Techniques:
•Mathematical Transformations:
• Example: Creating polynomial features (e.g., x2 or x3).
•Aggregation:
• Example: Calculating the mean or sum of sales over different time periods.
•Domain-Specific Features:
• Example: For financial data, deriving ratios like debt-to-equity or creating features based on fiscal quarters.
Handling mixed data and time date
variables
Use L&T
Linear Regression
Simple
Multiple
Polynomial
Regression Matrices
Gradient Descent
Cost Function
Learning rate
Regularization
Logistic
Precision, recall, F1 score’