0% found this document useful (0 votes)

29 views23 pages

Mlans

Uploaded by

tanmayshinde006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views23 pages

Mlans

Uploaded by

tanmayshinde006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning (SPPU 2019 Pattern)

Solutions to Insem Questions (Oct 2022, Sep 2023, Sep 2024)

Unit 1: Foundational Concepts

Comparison of Artificial Intelligence and Machine Learning
(Ref: Q1a Oct 2022, Q2a Sep 2024)

Definition:
● Artificial Intelligence (AI):A broad branch of computer science concerned with
uilding smart machines capable of performing tasks that typically require human
b
intelligence. The scope of AI is vast and includes areas like problem-solving,
reasoning, perception, and language understanding.
Machine Learning (ML):A specific subset of AI that focuses on the development
●
of algorithms that allow a computer to learn from and make predictions or
decisions based on data, without being explicitly programmed for the task.
elationship:
R
Machine Learning is a method to achieve Artificial Intelligence. Not all AI systems use machine
learning; some rely on hard-coded rules and logic. However, ML is currently the most
successful and dominant approach to AI.
Key Differences:

Feature Artificial Intelligence (AI) Machine Learning (ML)

Scope road. Aims to create

B arrow. Aims to learn from
N
intelligent systems to simulate data to perform a specific
human intelligence. task accurately.

Approach an use logic, rule-based

C rimarily uses statistical
P
systems, optimization, and methods and algorithms to
machine learning. learn from data.

Goal o build a system that can

T o build a model that can
T
perform complex, human-like make accurate predictions or
tasks. decisions on new data.

Example sophisticated humanoid

A n email spam filter that
A
robot or a strategic learns to identify junk mail, or
game-playing AI like Deep a recommendation engine.
Blue.
Parametric and Non-Parametric Machine Learning Models
(Ref: Q1b Oct 2022)

1. Parametric Models

● Definition:A parametric model is one that makes assumptionsabout the
f unctional form of the relationship between input and output variables. It
summarizes the data with a set of parameters of a fixed size, regardless of the
amount of training data. The learning process involves finding the optimal values
for these fixed parameters.
Characteristics:
●
○ Assumptions:Makes strong assumptions about the data(e.g., assumes a
linear relationship).
○ Speed:Faster to train and requires less data.
○ Complexity:Simpler and easier to interpret.
○ Limitation:Prone to high bias if the assumptionsare incorrect, leading to
lower accuracy.
● Examples:
○ Linear Regression:Assumes a linear relationship betweenfeatures and
output. The parameters are the coefficients (β) in the equation Y=β0+β1X1+....
○ Logistic Regression:Assumes a linear decision boundary.
○ Naive Bayes:Assumes features are conditionally independent.

2. Non-Parametric Models

● Definition:A non-parametric model does not make strongassumptions about
t he form of the target function. The number of parameters is not fixed and can
grow as it learns from more data. These models are more flexible and can fit a
wide range of functional forms.
Characteristics:
●
○ Assumptions:Makes few or no assumptions about thedata's underlying
structure.
○ Flexibility:Can model complex relationships, leadingto potentially higher
accuracy.
○ Complexity:Requires more data and is computationallymore expensive.
○ Limitation:Prone to high variance and overfittingif not handled carefully.
● Examples:
○ k-Nearest Neighbors (k-NN):The model is the entiretraining dataset.
○ Decision Trees:Can create complex, non-linear decisionboundaries.
○ Support Vector Machines (SVM) with kernels:Can modelhighly non-linear
boundaries.
Data Formats in Machine Learning
(Ref: Q1c Oct 2022, Q1c Sep 2023)

achine learning algorithms require data to be in a structured, machine-readable

M
format. The choice of format depends on the data type, size, and performance
requirements.

1. Tabular Formats:For structured data organizedin rows and columns.

● CSV (Comma-Separated Values):A plain text formatwhere values in a row are
s eparated by commas. It is simple, human-readable, and universally supported by
data analysis tools like Pandas.
Excel (XLS, XLSX):A spreadsheet format common forbusiness data. While easy
●
to use, it is less efficient for very large datasets compared to binary formats.
. Hierarchical/Semi-Structured Formats:For datawith nested structures, common
2
in web applications.
● JSON (JavaScript Object Notation):A lightweight,text-based format using
k ey-value pairs. It is ideal for data from web APIs and for storing configuration
files.
XML (eXtensible Markup Language):A tag-based formatthat is more verbose
●
than JSON but provides strong schema enforcement, often used in enterprise
systems.
3. High-Performance Binary Formats:For large-scaleand big data applications.
● Apache Parquet:A columnar storage format. It storesdata column by column,
hich allows for highly efficient compression and fast queries, as only the
w
required columns need to be read. It is the standard for large datasets in
ecosystems like Apache Spark.
HDF5 (Hierarchical Data Format):A format designedto store large,
●
multi-dimensional numerical arrays (tensors). It is widely used in scientific
computing and deep learning for storing model weights and large feature sets.
. Unstructured Data Formats:These are containersfor data like text and images,
4
which must be converted into a numerical representation before use.
● Image Formats (JPEG, PNG):Converted into 3D numericalarrays of pixel values
( height x width x color channels).
Text Formats (.txt):Converted into numerical vectorsusing techniques like
●
TF-IDF or word embeddings.
Supervised, Unsupervised, and Semi-supervised Learning
(Ref: Q2a Oct 2022)

1. Supervised Learning

● Definition:A type of machine learning where the algorithmlearns from a dataset
t hat is fullylabeled, meaning each data point istagged with a correct output. The
goal is to learn a mapping function that can predict the output for new, unseen
data.
Analogy:Learning with a teacher or an answer key.
●
● Types:
○ Classification:The output is a discrete category(e.g., "Spam" or "Not
Spam").
○ Regression:The output is a continuous value (e.g.,predicting the price of a
house).
● Example:A credit card fraud detection system trainedon a historical dataset of
transactions labeled as "fraudulent" or "legitimate."
2. Unsupervised Learning
● Definition:A type of machine learning where the algorithmlearns from a dataset
t hat hasno labels. The system tries to learn thepatterns and structure from the
data on its own.
Analogy:Finding patterns without a teacher.
●
● Types:
○ Clustering:Grouping similar data points together(e.g., customer
segmentation).
○ Association:Discovering rules that describe relationshipsin data (e.g.,
market basket analysis).
○ Dimensionality Reduction:Reducing the number of variables(e.g., PCA).
● Example:A marketing firm using clustering to identifydifferent segments of its
customer base from purchase history data.
3. Semi-Supervised Learning
● Definition:A hybrid approach that uses a trainingdataset containingboth
labeled and unlabeled data. This is useful in scenarioswhere labeling data is
expensive or time-consuming.
Analogy:Learning with a teacher who only answersa few questions.
●
● Process:The model uses the small set of labeled datato guide the learning
process and infer labels for the large set of unlabeled data.
● Example:A photo-tagging service that uses a few user-taggedfaces (labeled) to
elp automatically identify and tag the same faces in a much larger collection of
h
untagged photos (unlabeled).
Statistical Learning Approaches
(Ref: Q2b Oct 2022, Q1b Sep 2023)

tatistical learning is a framework for understanding data that formalizes the learning
S
problem as finding a function f that best models the relationship between input
variables (predictors, X) and an output variable (response, Y), represented as
Y=f(X)+ϵ, where ϵ is random error.

There are two main goals or approaches within this framework:

1. Prediction
● Objective:To accurately predict the output Y fornew, unseen inputs X.
● Focus:The accuracy of the prediction is the primaryconcern. The exact form of
t he function f is often treated as a "black box" and is not important as long as it
yields good predictions.
Example:Predicting stock prices or identifying spamemails.
●

2. Inference
● Objective:To understand the relationship betweenthe inputs X and the output Y.
● Focus:The interpretability of the model is key. Wewant to answer questions like:
○ Which predictors are most strongly associated with the output?
○ What is the nature of the relationship (linear, non-linear)?
○ Can we quantify the effect of each predictor on the output?
● Example:Understanding how factors like advertisingspend, price, and
competitor pricing affect product sales.
hese goals are pursued using different methods, which are categorized based on the
T
availability of a response variable (Supervised vs.Unsupervised Learning) and the
assumptions made about the function f (Parametricvs. Non-parametric Methods).

Machine Learning vs. Traditional Programming

(Ref: Q1a Sep 2023)

he fundamental difference between machine learning and traditional programming

T
lies in how a system generates an output.
● Traditional Programming:A developer writes explicit,step-by-step rules (an
algorithm) that the computer follows to process input data and produce an
output. The logic is entirely defined by the human programmer.
○ Workflow:Data + Program (Rules) -> Computer -> Output
Machine Learning:A developer provides the computerwith input data and the
●
corresponding correct outputs (labels). The learning algorithm then discovers the
rules and patterns connecting the inputs and outputs on its own, creating a
"model." This model can then make predictions on new data.
○ Workflow:Data + Outputs -> Computer (Learning Algorithm)-> Program
(Model)
Comparison Table:

Aspect Traditional Programming Machine Learning

Logic xplicitly coded by a

E earned automatically from
L
programmer. data.

Process eterministic; follows

D robabilistic; makes
P
predefined rules. predictions based on learned
patterns.

Scalability ifficult to scale for complex

D cales well for complex
S
problems with many rules. problems by learning from
more data.

Example: Spam Filter if email contains "viagra" then he model is trained on
T
mark as spam. thousands of spam/non-spam
emails and learns what
features (words, senders) are
indicative of spam.

Applications of Machine Learning in Data Science

(Ref: Q2a Sep 2023)

achine learning is a core component of data science, providing the tools to build
M
predictive and descriptive models from data. Key applications include:
1. Predictive Analytics:Using historical data to forecastfuture outcomes.
○ Example:A retail company using ML to predict salesfor the next quarter
ased on past sales data, seasonality, and economic indicators.
b
. Recommendation Engines:Personalizing user experiencesby suggesting
2
relevant items.
○ Example:Netflix analyzing your viewing history torecommend movies and TV
s hows you are likely to enjoy.
3. Customer Churn Prediction:Identifying customers whoare at high risk of
leaving a service.
○ Example:A telecom company using customer usage patternsand support
call history to predict which customers might switch to a competitor, allowing
them to offer retention incentives.
4. Fraud Detection:Identifying and preventing fraudulentactivities in real-time.
○ Example:Banks using ML models to analyze transactionpatterns and flag
unusual activities that may indicate a stolen credit card.
5. Sentiment Analysis:Automatically determining theemotional tone of text data.
○ Example:A company analyzing social media mentionsof its brand to gauge
public opinion and customer satisfaction.
6. Image Recognition:Identifying and classifying objectswithin images.
○ Example:A self-driving car using computer visionto identify pedestrians,
traffic signs, and other vehicles.
Geometric and Probabilistic Models
(Ref: Q2b Sep 2023, Q1c Sep 2024)

achine learning models can be conceptualized in different ways. Two major

M
categories are geometric and probabilistic models.

1. Geometric Models

● Concept:These models represent data instances aspoints in a high-dimensional
s pace (feature space). The learning process involves defining a geometric shape
or boundary to separate these points or find proximity between them.
Core Idea:Using concepts of distance, planes, andmargins to make predictions.
●
● Types & Examples:
○ Models based on Distance:Predictions are made basedon the proximity of
data points.
■ k-Nearest Neighbors (k-NN):A new data point is classifiedbased on the
majority class of its 'k' nearest neighbors.
○ Models based on Separating Hyperplanes:A linear boundary(a line in 2D, a
plane in 3D, a hyperplane in higher dimensions) is learned to separate classes.
■ Support Vector Machines (SVM):Finds the optimal hyperplanethat best
separates data points with the maximum possible margin.
■ Linear Regression:Fits a line (or hyperplane) thatis closest to all the
data points.
2. Probabilistic Models
● Concept:These models use the principles of probability theory to make
redictions. They aim to model the probability distribution of the data or the
p
probability of an outcome given the input.
Core Idea:Using probability to handle uncertaintyand make predictions based
●
on the most likely outcome. The output is often a probability score.
● Examples:
○ Naive Bayes:A classification algorithm based on Bayes'Theorem. It
calculates the probability of a data point belonging to a certain class, given its
features, e.g., P(Class∣Features).
○ Logistic Regression:Although it has a geometric interpretation,it is
fundamentally a probabilistic model. It models the probability that a given
input belongs to a certain class using the logistic (sigmoid) function.
○ Gaussian Mixture Models (GMM):A clustering algorithmthat assumes data
points are generated from a mixture of several Gaussian (normal)
distributions.
Steps in a Machine Learning Application
(Ref: Q2c Sep 2023, Q2b Sep 2024)

eveloping a machine learning application is an iterative, cyclical process involving

D
several key steps:
1. Define the Objective & Frame the Problem:Clearlyarticulate the business
roblem and define the success metric. Determine if the problem is a
p
classification, regression, or clustering task.
. Data Collection:Gather all necessary data from varioussources like databases,
2
APIs, or files.
3. Data Preprocessing and Preparation:This is the mostcritical and
time-consuming phase.
○ Data Cleaning:Handle missing values (imputation/deletion),correct errors,
and remove duplicates.
○ Feature Engineering:Create new, more informativefeatures from existing
ones.
○ Feature Scaling:Normalize or standardize numericalfeatures to bring them
to a common scale (e.g., Min-Max Scaling, Z-score Normalization).
○ Encoding:Convert categorical features into a numericalformat (e.g.,
One-Hot Encoding).
4. Data Splitting:Divide the dataset into three parts:
○ Training Set (70-80%):Used to train the machine learningmodel.
○ Validation Set (10-15%):Used to tune the model'shyperparameters.
○ Test Set (10-15%):Used for the final, unbiased evaluation of the model's
erformance on unseen data.
p
5. Model Selection & Training:Choose a suitable algorithmand train it on the
training dataset. During training, the model learns the underlying patterns by
minimizing a loss function.
6. Model Evaluation:Assess the trained model's performanceon the test set using
appropriate metrics (e.g., accuracy, precision, recall for classification; Mean
Squared Error for regression).
7. Hyperparameter Tuning:Systematically adjust the model'shyperparameters
(e.g., using Grid Search or Random Search) to find the combination that yields
the best performance on the validation set.
8. Deployment & Monitoring:Deploy the final model intoa production
environment where it can make predictions on real-world data. Continuously
monitor its performance and retrain it periodically with new data to maintain its
accuracy.
Grouping and Grading Models
(Ref: Q2c Sep 2024)

his terminology refers to two fundamental tasks in machine learning, corresponding

T
to unsupervised and supervised learning, respectively.

1. Grouping Models (Unsupervised Learning)

● Concept:"Grouping" models performunsupervised learning,specifically
lustering. Their goal is to automatically discovernatural groupings or clusters in
c
unlabeled data. The model groups data points such that points within the same
group are more similar to each other than to those in other groups.
● Purpose:To find hidden structure and patterns indata without prior knowledge
of the categories.
● Key Idea:Similarity or distance between data points.
● Examples:
○ K-Means Clustering:Partitions data into a pre-specifiednumber ('K') of
clusters.
○ DBSCAN:Groups together points that are closely packedin high-density
regions.
● Application:Customer segmentation, social networkanalysis, anomaly detection.
2. Grading Models (Supervised Learning)
● Concept:"Grading" models performsupervised learning.Their goal is to assign
a "grade"—which can be a categorical label or a continuous score—to a new data
oint based on what it has learned from labeled training data.
p
Purpose:To predict an output for new, unseen data.
●
● Key Idea:Learning a mapping function from inputsto labeled outputs.
● Types and Examples:
○ Classification (Categorical Grade):Assigns a discreteclass label.
■ Example:An email is "graded" as either 'Spam' or'Not Spam'. A tumor is
"graded" as 'Benign' or 'Malignant'.
○ Regression (Continuous Grade/Score):Assigns a continuousnumerical
value.
■ Example:A house is "graded" with a predicted price.A student is "graded"
with a predicted exam score.

Unit 2: Data Preprocessing and Feature Engineering

Feature Scaling Calculations
(Ref: Q3a Oct 2022, Q4a Sep 2023, Q4c Sep 2023, Q3a Sep 2024)

1. Min-Max Scaling (Normalization)

his technique scales data to a fixed range, usually [0, 1].
T
Formula: Xscaled=Xmax−XminX−Xmin
Problem (Oct 2022):Consider a vector x = (23, 29,52, 31, 45).
● Step 1:Find min and max values.
○ Xmin=23
○ Xmax=52
● Step 2:Apply the formula.
○ For 23: 52−2323−23=0.0
○ For 29: 52−2329−23=296≈0.207
○ For 52: 52−2352−23=1.0
○ For 31: 52−2331−23=298≈0.276
○ For 45: 52−2345−23=2922≈0.759
● Answer:The min-max scaled vector is (0.0, 0.207,1.0, 0.276, 0.759).

Problem (Sep 2024):Convert D = {23, 29, 52, 31, 45,19, 18, 27}.
● Step 1:Find min and max values.
○ Xmin=18
○ Xmax=52
● Step 2:Apply the formula (Denominator = 52 - 18 =34).
○ For 23: 3423−18≈0.147
○ For 29: 3429−18≈0.324
○ For 52: 3452−18=1.0
○ For 31: 3431−18≈0.382
○ For 45: 3445−18≈0.794
○ For 19: 3419−18≈0.029
○ For 18: 3418−18=0.0
○ For 27: 3427−18≈0.265
● Answer:The normalized data set is {0.147, 0.324,1.0, 0.382, 0.794, 0.029, 0.0,
0.265}.
2. Z-Score Normalization (Standardization)
his technique rescales data to have a mean (μ) of 0 and a standard deviation (σ) of 1.
T
Formula: Z=σX−μ
Problem (Oct 2022):For x = (23, 29, 52, 31, 45).
● Step 1:Calculate the mean (μ).
○ μ=523+29+52+31+45=36
● Step 2:Calculate the standard deviation (σ).
○ Variance
2=5(23−36)2+(29−36)2+(52−36)2+(31−36)2+(45−36)2=5169+49+256+25+81=1
σ
16
○ σ=116≈10.77
Step 3:Apply the z-score formula.
●
○ For 23: 10.7723−36≈−1.207
○ For 29: 10.7729−36≈−0.650
○ For 52: 10.7752−36≈1.486
○ For 31: 10.7731−36≈−0.464
○ For 45: 10.7745−36≈0.836
● Answer:The z-score normalized vector is (-1.207,-0.650, 1.486, -0.464, 0.836).

Problem (Sep 2023):For AGE = {18, 22, 25, 42, 28,43, 33, 35, 56, 28}.
● Step 1:Calculate the mean (μ).
○ μ=1018+22+25+42+28+43+33+35+56+28=33
● Step 2:Calculate the standard deviation (σ).
○ ∑(X−μ)2=(18−33)2+...+(28−33)2=225+121+64+81+25+100+0+4+529+25=1174
○ Variance σ2=101174=117.4
○ σ=117.4≈10.835
● Step 3:Apply the z-score formula.
○ Z(18) = 10.83518−33≈−1.384
○ Z(22) = 10.83522−33≈−1.015
○ Z(25) = 10.83525−33≈−0.738
○ Z(42) = 10.83542−33≈0.831
○ Z(28) = 10.83528−33≈−0.461
○ Z(43) = 10.83543−33≈0.923
○ Z(33) = 10.83533−33=0.0
○ Z(35) = 10.83535−33≈0.185
○ Z(56) = 10.83556−33≈2.122
● Answer:The Z-score normalized data is {-1.384, -1.015,-0.738, 0.831, -0.461,
0.923, 0.0, 0.185, 2.122, -0.461}.
Principal Component Analysis (PCA)
(Ref: Q3b Oct 2022, Q4c Sep 2024)

efinition:
D
Principal Component Analysis (PCA) is an unsupervised, linear dimensionality reduction
technique. Its main goal is to transform a dataset with a large number of potentially correlated
variables into a smaller set of new, uncorrelated variables called principal components, while
retaining as much of the original data's variance (information) as possible.
Process of PCA:
1. Standardize the Data:PCA is sensitive to the scaleof the features. Therefore, all
f eatures must be scaled to have a mean of 0 and a standard deviation of 1
(Z-score normalization).
2. Compute the Covariance Matrix:A covariance matrixis calculated for the
standardized data. This square matrix shows the correlation between all pairs of
variables and indicates how they move together.
3. Calculate Eigenvectors and Eigenvalues:The eigenvectorsand eigenvalues of
the covariance matrix are computed.
○ Eigenvectors:These represent the directions of thenew feature space (the
principal components). They are orthogonal to each other.
○ Eigenvalues:These represent the magnitude or importanceof the
corresponding eigenvector. A high eigenvalue means that the principal
component explains a large amount of variance in the data.
4. Select Principal Components:The eigenvectors aresorted in descending order
based on their corresponding eigenvalues. The top 'k' eigenvectors are chosen to
be the new feature dimensions. The value of 'k' is selected based on the desired
amount of cumulative variance to be retained (e.g., 95%).
5. Transform the Data:The original standardized datais projected onto the new
feature space defined by the selected 'k' eigenvectors. This is done by taking the
dot product of the standardized data and the matrix of chosen eigenvectors. The
result is a new dataset with 'k' dimensions.
Use of PCA in Preprocessing:
● Dimensionality Reduction:It reduces the number offeatures, which helps to
ombat the "curse of dimensionality," reduce model training time, and lower
c
computational complexity.
Noise Reduction:By discarding components with lowvariance (low eigenvalues),
●
PCA can effectively filter out statistical noise from the data.
● Multicollinearity Removal:It transforms correlatedfeatures into a set of
uncorrelated principal components. This is highly beneficial for algorithms that
are sensitive to multicollinearity, such as Linear Regression.
● Data Visualization:By reducing a high-dimensionaldataset to 2 or 3 principal
components, PCA allows for the data to be plotted and visually inspected, which
can help in understanding its structure and identifying clusters or outliers.
Handling Missing Values
(Ref: Q4a Oct 2022)

andling missing values is a critical data preprocessing step, as most ML algorithms

H
cannot work with them. The choice of method depends on the nature and amount of
missing data.

1. Deletion Methods:

● Listwise Deletion:The entire row (observation) containingone or more missing
v alues is removed. This is the simplest method but can lead to significant data
loss if missing values are widespread, potentially introducing bias.
Pairwise Deletion:When calculating statistics likecovariance, the algorithm only
●
uses pairs of data points that are complete, ignoring missing values for specific
calculations. This retains more data but can be complex to implement.
. Imputation Methods (Filling the Values):
2
This involves filling the missing values with a plausible substitute.
● Mean/Median/Mode Imputation:This is the most commonapproach.
○ Mean:Replace missing numerical values with the meanof the entire column.
est for normally distributed data without outliers.
B
○ Median:Replace missing numerical values with themedian of the column.
This is more robust to outliers than the mean.
○ Mode:Replace missing categorical values with themode (most frequent
value) of the column.
End of Tail Imputation:Missing values are replacedby a value at the far end of
●
the distribution (e.g., mean + 3 * std dev). This can help the model learn that the
value was originally missing.
● Model-Based Imputation:Use other features to predict the missing value. This
is more accurate but more complex.
○ Regression Imputation:A regression model is builtto predict the missing
numerical value based on other features.
○ k-NN Imputation:The missing value is imputed usingthe mean/median of
the 'k' most similar neighbors, where similarity is based on other features.
3. Using Algorithms that Support Missing Values:
● Some modern tree-based algorithms, such asXGBoost,LightGBM, and
atBoost, can handle missing values internally withoutrequiring explicit
C
imputation, often by learning the best direction to send missing values down the
tree.
Wrapper Methods for Feature Selection
(Ref: Q4b Oct 2022)

efinition:
D
Wrapper methods are a class of feature selection techniques that use a specific machine
learning model to evaluate the usefulness of a subset of features. They "wrap" the model
training process inside the feature selection loop. They treat feature selection as a search
problem, where different feature combinations are prepared, evaluated, and compared. The
performance of the model (e.g., accuracy) on a validation set is the objective function used to
score each feature subset.
Characteristics:
● Model-Specific:The selected features are optimizedfor the specific machine
learning algorithm used.
Computationally Expensive:They require training anew model for each feature
●
subset considered, making them much slower than filter methods.
● High Performance:They tend to find feature subsetsthat yield better model
performance because they consider feature interactions.
Types of Wrapper Methods:
1. Forward Selection:
○ Process:Starts with an empty set of features. Ineach iteration, it adds the
s ingle feature from the remaining set that results in the best model
performance. This process is repeated until adding new features no longer
improves the model significantly.
○ Limitation:It cannot remove features once they areadded, so it might miss
the optimal combination if a feature becomes redundant later.
. Backward Elimination:
2
○ Process:Starts with the full set of all features. In each iteration, it removes
t he single feature whose removal leads to the best model performance (or the
least performance degradation). This process is repeated until no further
improvement is gained by removing features.
○ Limitation:Extremely computationally expensive, especiallywith a large
number of initial features, as it starts with the most complex model.
. Recursive Feature Elimination (RFE):
3
○ Process:This is a greedy optimization algorithm thatis more efficient than
backward elimination.
1. Train a model on the entire set of features.
2. Compute an importance score for each feature (e.g., coefficients in a
linear model or feature importances from a tree-based model).
3. Remove the least important feature(s).
4. Repeat the process with the remaining features until the desired number
of features is reached.
○ Advantage:It is often more robust and faster thansimple backward
elimination.
Local Binary Pattern (LBP)
(Ref: Q4c Oct 2022, Q3c Sep 2023)

efinition:
D
Local Binary Pattern (LBP) is a simple yet very efficient feature extraction technique used for
texture classification in computer vision. It works by describing the local texture pattern of an
image by comparing each pixel with its surrounding neighbors.
Process of LBP:
1. For each pixel in an image, a neighborhood is selected (typically a 3x3 grid with
t he pixel of interest at the center).
2. The intensity value of the center pixel is used as athreshold.
3. This threshold is compared with the intensity value of its 8 neighbors.
4. For each neighbor, if its value is greater than or equal to the center pixel's value, it
is assigned a binary '1'. Otherwise, it is assigned a '0'.
5. This creates an 8-bit binary number. The bits are collected in a sequence (e.g.,
clockwise starting from the top-left neighbor).
6. This binary number is converted to its decimal equivalent. This decimal value is
the LBP code for the center pixel and represents the local texture.
7. After computing the LBP code for every pixel, ahistogramof these LBP codes is
created for the entire image (or for regions of it). This histogram serves as the
feature vector that describes the overall texture of the image and can be used to
train a classifier.
xample Calculation (from Sep 2023 paper):
E
Calculate the LBP code for the central point (9) in the neighborhood:
107912921864
1. Center Pixel Value (Threshold):9
2. Compare neighbors to the center (9):
○ Top-left (10) >= 9 ->1
○ Top (12) >= 9 ->1
○ Top-right (18) >= 9 ->1
○ Right (6) < 9 ->0
○ Bottom-right (4) < 9 ->0
○ Bottom (2) < 9 ->0
○ Bottom-left (9) >= 9 ->1
○ Left (7) < 9 ->0
3. Form the binary string(reading clockwise from top-left):11100010
4. Convert the binary number to decimal:
1 ⋅27+1⋅26+1⋅25+0⋅24+0⋅23+0⋅22+1⋅21+0⋅20
=128+64+32+0+0+0+2+0
=226
Answer:The LBP code for the central pixel is226.
●

Feature Selection and Filtering Technique

(Ref: Q3a Sep 2023)

efinition of Feature Selection:

D
Feature Selection is the process of automatically or manually selecting a subset of the most
relevant features from a dataset to be used in model construction. The primary goals are to:
● Simplify models to make them easier to interpret.
● Reduce the time required to train a model.
● Reduce overfitting by removing irrelevant or redundant features (combating the
urse of Dimensionality).
C
Improve model performance and generalization to new data.
●
iltering Technique (Filter Methods):
F
Filter methods are a class of feature selection techniques where features are selected before
the model training process begins. They evaluate and rank features based only on their
intrinsic statistical properties and their relationship with the target variable, independent of
any machine learning algorithm.
● How they work:
1. A statistical measure (or scoring function) is used to score each feature's
relevance.
2. The features are ranked based on their scores.
3. A threshold is applied to select the highest-scoring features (e.g., select the
t op 'k' features or all features above a certain score).
Characteristics:
●
○ Fast and Efficient:They are computationally muchcheaper than other
methods like wrapper methods.
○ Model-Agnostic:The selected feature set is not tiedto a specific model and
can be used with any algorithm.
○ Limitation:They ignore the interaction between features.A feature might be
useless by itself but highly valuable when combined with another. They also
ignore the impact of the selected features on the performance of a specific
model.
Common Filter Techniques:
● Correlation Coefficient (e.g., Pearson's r):Measuresthe linear relationship
etween a numerical feature and the numerical target variable. Features with high
b
correlation to the target are selected.
Chi-Squared Test:Used for categorical features. Ittests the independence
●
between a categorical feature and the categorical target variable. A higher
Chi-Squared value indicates a stronger, more relevant relationship.
● Information Gain / Mutual Information:Measures thereduction in uncertainty
(entropy) of the target variable given the knowledge of a feature. A higher
information gain means the feature is more useful for predicting the target.
● ANOVA F-test:Used when the input features are numericaland the target
variable is categorical. It checks if the mean of a numerical feature is significantly
different across the different classes.
Kernel PCA
(Ref: Q3b Sep 2023)

efinition:
D
Kernel Principal Component Analysis (Kernel PCA) is a non-linear dimensionality reduction
technique. It is an extension of standard PCA that is used when the data is not linearly
separable, meaning its structure cannot be captured by linear components.
The Problem with Standard PCA:
Standard PCA works by finding linear projections of the data. If the data has a complex,
non-linear structure (e.g., data points arranged in two concentric circles), a straight line (a
linear principal component) cannot effectively separate the classes or capture the variance.
How Kernel PCA Works:
Kernel PCA overcomes this limitation by using the "kernel trick."
1. Implicit Mapping to a Higher-Dimensional Space:KernelPCA uses akernel
f unction(e.g., Polynomial, Radial Basis Function - RBF, Sigmoid) to implicitly map
the original data from its input space into a much higher-dimensional feature
space. The core idea is that in this higher-dimensional space, the complex
non-linear structure of the data becomes simpler and can be captured by linear
methods.
. The Kernel Trick:The key to Kernel PCA is that itnever actually computes the
2
coordinatesof the data points in this high-dimensionalspace, which would be
computationally infeasible. Instead, the kernel function computes the dot
products between the images of all pairs of data points in the high-dimensional
space directly from the original data points. This matrix of dot products is called
thekernel matrixorGram matrix.
3. Performing PCA in the High-Dimensional Space:KernelPCA then performs
the standard PCA algorithm (i.e., centering the data, finding
eigenvectors/eigenvalues) in this new, high-dimensional space. However, it does
so using the kernel matrix instead of the explicit coordinates and the covariance
matrix.
4. Result:The result is a set ofnon-linear principalcomponentsin the original
space. These components are projections of the data that capture its non-linear
structure, allowing for effective dimensionality reduction and feature extraction
for complex datasets.
I n summary, Kernel PCA = Kernel Trick + Standard PCA.It allows PCA to find
non-linear patterns in data by implicitly projecting it into a space where those patterns
become linear.

Matrix Factorization and Content-Based Filtering

(Ref: Q4b Sep 2023)

1. Matrix Factorization

● Definition:Matrix Factorization is a class of collaborativefiltering algorithms
sed in recommendation systems, and more generally, a dimensionality reduction
u
technique. The core idea is todecompose a large matrixinto the product of
two or more smaller matrices(called factors).
Application in Recommendation Systems:
●
○ The large matrix is typically auser-item interactionmatrix(e.g., a matrix of
users' ratings for movies). This matrix is usually very sparse, as users only rate
a small fraction of the available items.
○ Matrix Factorization decomposes this user-item matrix (R) into two
lower-dimensional matrices: auser-feature matrix(P) and anitem-feature
matrix(Q).
■ Rm×n≈Pm×k×Qk×nT
The "features" (the 'k' dimension) arelatent (hidden)factorslearned by the
○
algorithm. These factors might represent abstract concepts like genres,
actors, or user tastes.
○ The predicted rating for a user for an item is simply the dot product of that
user's vector in P and that item's vector in Q. This allows the system to predict
ratings for empty cells in the original matrix, thus generating
recommendations.
2. Content-Based Filtering
● Definition:Content-Based Filtering is a type of recommendationsystem that
r ecommends items to a user based on theattributesof the items(their
"content") that the user has liked in the past. It operates on the principle: "Show
me more of what I like." It does not use information about other users.
How it works:
●
1. Item Profile Creation:For each item, a profile iscreated containing its
attributes. For a movie, this could include genre, director, actors, plot
keywords, etc. This is often represented as a feature vector.
2. User Profile Creation:A profile is created for eachuser that summarizes
their preferences based on the content of the items they have rated highly. If
a user likes many sci-fi movies, their user profile will indicate a strong
preference for the "sci-fi" attribute.
3. Recommendation Generation:The system compares theuser's profile with
the profiles of unrated items. It then recommends items whose content
profiles are a close match to the user's profile (e.g., by calculating the cosine
similarity between the user profile vector and item profile vectors).
● Example:
○ A user watches and rates "The Matrix" and "Blade Runner" highly.
○ The system analyzes the content of these movies and builds a profile for the
user that shows a high preference for attributes like genre: sci-fi, theme:
dystopian, theme: artificial intelligence.
○ The system then searches its catalog for other movies with similar attributes,
such as "Ex Machina," and recommends it to the user.
Categorical Variable Encoding and One-Hot Encoding
(Ref: Q3b Sep 2024)

1 . Need for Categorical Variable Encoding

Machine learning algorithms are based on mathematical equations and can only operate on
umerical data. They cannot directly process text-based categorical variables (e.g., 'Red',
n
'Green', 'USA', 'India'). Therefore, to use categorical data in a model, we must first convert
these non-numeric categories into a numerical format that the algorithm can understand. This
conversion process is called categorical variable encoding. Without it, the model cannot be
trained.
2. One-Hot Encoding
● Definition:One-Hot Encoding is one of the most commonand effective
t echniques for encoding nominal categorical variables (where there is no intrinsic
order among categories). It works by creatingnewbinary (0 or 1) columnsfor
each unique category present in the original feature.
Process:
●
1. Identify all unique categories in the feature column.
2. Create a new binary column for each unique category.
3. For each data row, place a '1' in the column corresponding to its original
category, and place a '0' in all other new columns.
● Advantage:This method avoids imposing an artificialorder on the categories. If
we were to encode 'Red' as 1, 'Green' as 2, and 'Blue' as 3 (Label Encoding), the
model might incorrectly assume that Green is "greater than" Red, which is not
true. One-Hot Encoding prevents this.
● Example:
Imagine a feature called 'City' in a dataset.
Original Data:
| ID | City |
|:--:|:---:|
| 1 | Pune |
| 2 | Mumbai |
| 3 | Delhi |
| 4 | Pune |
The feature 'City' has three unique categories: 'Pune', 'Mumbai', and 'Delhi'.
After One-Hot Encoding:
Three new binary columns are created: City_Pune, City_Mumbai, and City_Delhi.

ID City_Pune City_Mumbai City_Delhi

1 1 0 0

2 0 1 0

3 0 0 1

4 1 0 0

●
his new numerical representation can now be used by machine learning
T
algorithms.
Statistical Methods to Describe Data
(Ref: Q4a Sep 2024)

he statistical methods used to summarize and describe the main features and nature
T
of a dataset are known asDescriptive Statistics.They provide simple summaries
about the sample and the measures, forming the basis of virtually every quantitative
analysis of data. They do not allow us to make conclusions or inferences about a
larger population beyond the data we have analyzed.

The key descriptive statistical methods are categorized as follows:

1 . Measures of Central Tendency:

These measures describe the center or typical value of a dataset.
● Mean:The arithmetic average of all data points. Itis sensitive to outliers.
○ Formula: μ=N∑X
● Median:The middle value of a dataset when it is sortedin ascending or
escending order. It is robust to outliers.
d
Mode:The most frequently occurring value in the dataset.A dataset can have
●
one mode (unimodal), two modes (bimodal), or more (multimodal).
. Measures of Dispersion (or Variability):
2
These measures describe the spread or how much the data points differ from each other and
from the central tendency.
● Range:The difference between the maximum and minimumvalues in the dataset.
It is simple but highly affected by outliers.
Variance (σ2):The average of the squared differencesof each data point from
●
the Mean. It measures how far the data is spread out from its average value.
○ Formula: σ2=N∑(X−μ)2
● Standard Deviation (σ):The square root of the variance.It is expressed in the
same units as the data, making it more interpretable than variance as a measure
of spread.
● Interquartile Range (IQR):The range between the firstquartile (Q1, the 25th
percentile) and the third quartile (Q3, the 75th percentile). It represents the
spread of the middle 50% of the data and is robust to outliers.
3. Measures of Shape:
These describe the shape of the data distribution.
● Skewness:Measures the asymmetry of the probabilitydistribution of a variable.
distribution can be right-skewed (positive skew), left-skewed (negative skew),
A
or symmetric (zero skew).
Kurtosis:Measures the "tailedness" of the distribution,or how heavy its tails are
●
compared to a normal distribution. It indicates the presence of outliers.
Features of Multidimensional Scaling (MDS)
(Ref: Q4b Sep 2024)

efinition:
D
Multidimensional Scaling (MDS) is a dimensionality reduction and data visualization technique.
Its primary objective is to represent the relationships between a set of items as distances in a
low-dimensional space (typically 2D or 3D), such that the geometric distances between points
in the low-dimensional "map" correspond as closely as possible to the known dissimilarities
between the items.
Key Features of MDS:
1. Input is a Dissimilarity Matrix:Unlike PCA, whichtakes a feature matrix as
input, the primary input for MDS is adistance ordissimilarity matrix. This is an
N×N matrix where N is the number of items, and each entry (i,j) represents the
measured dissimilarity between item i and item j. This dissimilarity can be derived
from Euclidean distance, correlation distance, or even subjective human ratings
of dissimilarity.
. Preservation of Distances:The core goal of MDS isto find a low-dimensional
2
configuration of points whose pairwise distances are as close as possible to the
original dissimilarities. It aims to create a faithful geometric representation of the
dissimilarity data.
3. Primary Use is Visualization:The most common applicationof MDS is to
visualize the underlying structure of data. By plotting the items as points in a 2D
or 3D space, one can visually inspect their relationships, identify natural clusters,
discover patterns, and understand the dimensions along which the items seem to
vary.
4. Types of MDS:
○ Classical MDS (or Metric MDS):This type assumes theinput dissimilarities
are actual distances in a high-dimensional Euclidean space. It aims to
preserve these distances as accurately as possible. It is mathematically
equivalent to PCA when the input is a Euclidean distance matrix.
○ Non-metric MDS:This is a more flexible version thatassumes only therank
orderof the dissimilarities is important. It triesto arrange the points in the
low-dimensional map such that the order of distances is preserved (i.e., if
item A is more dissimilar to B than to C, the distance between points A and B
on the map should be greater than the distance between A and C). This is
useful for psychological or survey data where dissimilarities are subjective.
. Axes are Not Directly Interpretable:A keydifference from PCA is that the axes
5
in an MDS plot do not have a direct, interpretable meaning. They are arbitrary
dimensions chosen to best represent the pairwise distances. The focus of
interpretation is on therelative positions of thepoints and the clusters they
form, not their specific coordinates on the axes.

ML Insem
No ratings yet
ML Insem
48 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
AI Unit 2 Chap2
No ratings yet
AI Unit 2 Chap2
91 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
30 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
SPPU ML Exam Solutions
No ratings yet
SPPU ML Exam Solutions
59 pages
Pa 2
No ratings yet
Pa 2
13 pages
Machine Learning Concise Notes
No ratings yet
Machine Learning Concise Notes
7 pages
Module - 1 Lecture-1
No ratings yet
Module - 1 Lecture-1
40 pages
Insem
No ratings yet
Insem
18 pages
Machine Learning
100% (1)
Machine Learning
31 pages
Module - 1
No ratings yet
Module - 1
9 pages
Unit 1 - Intro - ML
No ratings yet
Unit 1 - Intro - ML
20 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
ML Unit 2
No ratings yet
ML Unit 2
23 pages
Machinelearning Unit1
No ratings yet
Machinelearning Unit1
9 pages
Machine Learning Fundamentals: 1. Training Data
No ratings yet
Machine Learning Fundamentals: 1. Training Data
3 pages
Unit 1
No ratings yet
Unit 1
10 pages
Unit I Machine Learning
No ratings yet
Unit I Machine Learning
22 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
DSF Unit 4
No ratings yet
DSF Unit 4
12 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
14 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
AIML Question Ans Part2
No ratings yet
AIML Question Ans Part2
25 pages
An Introduction To Machine Learning and AI: Basics, Key Concepts, and Types
No ratings yet
An Introduction To Machine Learning and AI: Basics, Key Concepts, and Types
3 pages
Ds Unit 2
No ratings yet
Ds Unit 2
36 pages
UNIT 2 Merged
No ratings yet
UNIT 2 Merged
56 pages
AI Module 1 Simple Notes
No ratings yet
AI Module 1 Simple Notes
14 pages
Unit3-2 Marks
No ratings yet
Unit3-2 Marks
10 pages
UNIT1@
No ratings yet
UNIT1@
4 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
Book of 843 - AI - Student - HandbookXI-104-127
No ratings yet
Book of 843 - AI - Student - HandbookXI-104-127
24 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
7 pages
R22 Machine Learning Digital Notes Final
No ratings yet
R22 Machine Learning Digital Notes Final
143 pages
LaTeX Research Paper Writing Guide
No ratings yet
LaTeX Research Paper Writing Guide
17 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
Assignment No 1
No ratings yet
Assignment No 1
9 pages
AI Lab6
No ratings yet
AI Lab6
7 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Ai 3rd Slide - 250515 - 144356
No ratings yet
Ai 3rd Slide - 250515 - 144356
5 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
12 pages
Module1 Introduction
No ratings yet
Module1 Introduction
35 pages
Module 1 (ML)
No ratings yet
Module 1 (ML)
17 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
ML Unit 1
No ratings yet
ML Unit 1
20 pages
Aml Unit 1
No ratings yet
Aml Unit 1
66 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
Advanced Concepts of Ai Modeling
No ratings yet
Advanced Concepts of Ai Modeling
4 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
AI For Eng Supervised-Learning
No ratings yet
AI For Eng Supervised-Learning
25 pages
60 Day DSA Roadmap
No ratings yet
60 Day DSA Roadmap
2 pages
Nqbranchandbond: Java Io Java Util Arrays Nqueens Void String Int Nqbranchandbond
No ratings yet
Nqbranchandbond: Java Io Java Util Arrays Nqueens Void String Int Nqbranchandbond
5 pages
3
No ratings yet
3
1 page
This This This: Java Io Java Lang Java Util
No ratings yet
This This This: Java Io Java Lang Java Util
9 pages
Daa Ans
No ratings yet
Daa Ans
16 pages
Practice Paper - M2 PDF
No ratings yet
Practice Paper - M2 PDF
18 pages
Experiment No:5, Determination of Specific Charge of Electron (E/m)
No ratings yet
Experiment No:5, Determination of Specific Charge of Electron (E/m)
4 pages
Statistics & Probability Summative Test
60% (5)
Statistics & Probability Summative Test
1 page
Cisco Meeting Server Deployment Planning and Preparation With Expressway Guide
No ratings yet
Cisco Meeting Server Deployment Planning and Preparation With Expressway Guide
30 pages
(Module 1) Physical Properties of Solutions
No ratings yet
(Module 1) Physical Properties of Solutions
56 pages
Ten Essential Tips For Robust Statistics in Cell Biology. Nat Cell Biol (2025) .
No ratings yet
Ten Essential Tips For Robust Statistics in Cell Biology. Nat Cell Biol (2025) .
3 pages
Synopsis Project Title E-Rto System: College of Management and Computer Sicnece, Yavatmal
No ratings yet
Synopsis Project Title E-Rto System: College of Management and Computer Sicnece, Yavatmal
15 pages
Selec XT56 Timer Operating Instructions
No ratings yet
Selec XT56 Timer Operating Instructions
2 pages
Db25-1r5am Nov Manual
No ratings yet
Db25-1r5am Nov Manual
7 pages
Calcium Carbonate and Calcium Sulfate Precipitation, Crystallization and Dissolution: Evidence For The Activated Steps and The Mechanisms From The Enthalpy and Entropy of Activation Values
No ratings yet
Calcium Carbonate and Calcium Sulfate Precipitation, Crystallization and Dissolution: Evidence For The Activated Steps and The Mechanisms From The Enthalpy and Entropy of Activation Values
10 pages
1 s2.0 S1876610217363087 Main
No ratings yet
1 s2.0 S1876610217363087 Main
7 pages
F11-Limiting Equilibrium With Pulley and Horizontal QP
No ratings yet
F11-Limiting Equilibrium With Pulley and Horizontal QP
2 pages
PF Lab 2
No ratings yet
PF Lab 2
10 pages
Section 3-2 SENT Testing RP F108
No ratings yet
Section 3-2 SENT Testing RP F108
33 pages
United States Patent (10) Patent No.: US 6,632,262 B2: Gabrielson (45) Date of Patent: Oct. 14, 2003
No ratings yet
United States Patent (10) Patent No.: US 6,632,262 B2: Gabrielson (45) Date of Patent: Oct. 14, 2003
7 pages
Extraction of Banana Peels
No ratings yet
Extraction of Banana Peels
14 pages
Project
No ratings yet
Project
44 pages
Nuclear HL Q Sol
No ratings yet
Nuclear HL Q Sol
10 pages
Statistics
100% (1)
Statistics
9 pages
Output Transformer Specifications - Norman H. Crowhurst (Audio, Jun 1957)
100% (2)
Output Transformer Specifications - Norman H. Crowhurst (Audio, Jun 1957)
6 pages
Linux Webmin Setup & Configuration Guide
No ratings yet
Linux Webmin Setup & Configuration Guide
27 pages
Voids in Crystals
100% (1)
Voids in Crystals
30 pages
JFM - Tryggvason - Burner - Effect of Bubble Deformation On The Properties of Bubbly Flows
No ratings yet
JFM - Tryggvason - Burner - Effect of Bubble Deformation On The Properties of Bubbly Flows
42 pages
Govt. Job Preparation by Anowar Hossain
No ratings yet
Govt. Job Preparation by Anowar Hossain
19 pages
Dual Microphone Adaptive Noise Reduction Software: White Paper
No ratings yet
Dual Microphone Adaptive Noise Reduction Software: White Paper
8 pages
8614 Assignment No. 1
No ratings yet
8614 Assignment No. 1
17 pages
This C-Band: 250 Kilowatts Minimum Peak Power Tunable Frequency 5450 To 5825 MHZ
No ratings yet
This C-Band: 250 Kilowatts Minimum Peak Power Tunable Frequency 5450 To 5825 MHZ
2 pages
OOP Exam Guide for Students
No ratings yet
OOP Exam Guide for Students
9 pages
Control System
No ratings yet
Control System
2 pages

Mlans

Uploaded by

Mlans

Uploaded by

​Machine Learning (SPPU 2019 Pattern)​

​Solutions to Insem Questions (Oct 2022, Sep 2023, Sep 2024)​

​Unit 1: Foundational Concepts​

​Feature​ ​Artificial Intelligence (AI)​ ​Machine Learning (ML)​

​Scope​ ​ road. Aims to create​

​Approach​ ​ an use logic, rule-based​

​Goal​ ​ o build a system that can​

​Example​ ​ sophisticated humanoid​

​1. Parametric Models​

​2. Non-Parametric Models​

​ achine learning algorithms require data to be in a structured, machine-readable​

​1. Tabular Formats:​​For structured data organized​​in rows and columns.​

​1. Supervised Learning​

​There are two main goals or approaches within this framework:​

​Machine Learning vs. Traditional Programming​

​ he fundamental difference between machine learning and traditional programming​

​Aspect​ ​Traditional Programming​ ​Machine Learning​

​Logic​ ​ xplicitly coded by a​

​Process​ ​ eterministic; follows​

​Scalability​ ​ ifficult to scale for complex​

​Applications of Machine Learning in Data Science​

​ achine learning models can be conceptualized in different ways. Two major​

​1. Geometric Models​

​ eveloping a machine learning application is an iterative, cyclical process involving​

​ his terminology refers to two fundamental tasks in machine learning, corresponding​

​1. Grouping Models (Unsupervised Learning)​

​Unit 2: Data Preprocessing and Feature Engineering​

​1. Min-Max Scaling (Normalization)​

​ andling missing values is a critical data preprocessing step, as most ML algorithms​

​1. Deletion Methods:​

​Feature Selection and Filtering Technique​

​ efinition of Feature Selection:​

​Matrix Factorization and Content-Based Filtering​

​1. Matrix Factorization​

1​ . Need for Categorical Variable Encoding​

​ID​ ​City_Pune​ ​City_Mumbai​ ​City_Delhi​

​1​ ​1​ ​0​ ​0​

​2​ ​0​ ​1​ ​0​

​3​ ​0​ ​0​ ​1​

​The key descriptive statistical methods are categorized as follows:​

1​ . Measures of Central Tendency:​

You might also like

Machine Learning (SPPU 2019 Pattern)

Solutions to Insem Questions (Oct 2022, Sep 2023, Sep 2024)

Unit 1: Foundational Concepts

Feature Artificial Intelligence (AI) Machine Learning (ML)

Scope road. Aims to create

Approach an use logic, rule-based

Goal o build a system that can

Example sophisticated humanoid

1. Parametric Models

2. Non-Parametric Models

achine learning algorithms require data to be in a structured, machine-readable

1. Tabular Formats:For structured data organizedin rows and columns.

1. Supervised Learning

There are two main goals or approaches within this framework:

Machine Learning vs. Traditional Programming

he fundamental difference between machine learning and traditional programming

Aspect Traditional Programming Machine Learning

Logic xplicitly coded by a

Process eterministic; follows

Scalability ifficult to scale for complex

Applications of Machine Learning in Data Science

achine learning models can be conceptualized in different ways. Two major

1. Geometric Models

eveloping a machine learning application is an iterative, cyclical process involving

his terminology refers to two fundamental tasks in machine learning, corresponding

1. Grouping Models (Unsupervised Learning)

Unit 2: Data Preprocessing and Feature Engineering

1. Min-Max Scaling (Normalization)

andling missing values is a critical data preprocessing step, as most ML algorithms

1. Deletion Methods:

Feature Selection and Filtering Technique

efinition of Feature Selection:

Matrix Factorization and Content-Based Filtering

1. Matrix Factorization

1 . Need for Categorical Variable Encoding

ID City_Pune City_Mumbai City_Delhi

1 1 0 0

2 0 1 0

3 0 0 1

The key descriptive statistical methods are categorized as follows:

1 . Measures of Central Tendency: