0% found this document useful (0 votes)
16 views8 pages

T2 Scheme 24 25

Uploaded by

skandabs01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

T2 Scheme 24 25

Uploaded by

skandabs01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

BANGALORE INSTITUTE OF TECHNOLOGY

K R ROAD, V V PURA, BANGALURU-04


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
II Internals -2024-25(EVEN)
Scheme of Evaluation

COURSE : Machine Learning Fundamentals BATCH:2023


CODE : BRI405B SEM:4th
DATE & TIME : 06/05/2025 at 3.45 pm to 4.45 pm MAX MARKS:25
Q No.
Solution Marks

1 a) Logistic regression is a statistical method used for binary classification problems, predicting 6M
the probability of a binary outcome (e.g., 0 or 1, yes or no).
1Mark
Linear regression predicts continuous outcomes, while logistic regression predicts
categorical outcomes. Linear regression outputs a continuous value, while logistic
regression outputs a probability between 0 and 1. 1Mark

What is the sigmoid function and why is it important in logistic regression? 2Mark
The sigmoid function, also known as the logistic function, maps any real-valued number to
a range between 0 and 1. This function is crucial in logistic regression because it transforms
the output of the linear equation into a probability.
Mathematical Representation
The sigmoid function is defined as:

The different types of logistic regression 1Mark


 Binary logistic regression (two categories)
 Multinomial logistic regression (more than two unordered categories) and
 Ordinal logistic regression (more than two ordered categories).
How do you make predictions using a logistic regression model? 1Mark
After training the model, you can use it to predict the probability of a new data point
belonging to a specific category. The prediction is usually based on the predicted
probability value, with a threshold (e.g., 0.5) used to classify the data point.
1 b) Definition carries 1Mark 7M
Overfitting in decision trees occurs when a model becomes too complex, learning the
training data's noise and irrelevant details, leading to poor performance on unseen data. A
decision tree can overfit because it recursively splits the data, creating highly specific rules
that only apply to the training set. This results in a model that performs well on the training
data but struggles to generalize to new, unobserved data.
Why Overfitting Happens: 3Mark
Model Complexity: Decision trees can become very complex with many branches and leaf
nodes, allowing them to fit the training data perfectly, including its noise.
Training Data Issues:If the training data is small, noisy, or not representative of the
overall population, the model is more likely to overfit.
Lack of Generalization:An overfitted model is essentially "memorizing" the training data
rather than learning the underlying patterns. This makes it unable to make accurate
predictions on new, unseen data.
How to Prevent Overfitting: 3Mark
Pruning: This involves removing parts of the decision tree that don't contribute
significantly to its predictive power.
Setting Parameters:
Maximum Depth: Limiting the depth of the tree prevents it from becoming overly
complex.
Minimum Samples: Ensuring each leaf node has a minimum number of samples prevents
it from being based on too little data.
Feature Selection:Using only relevant features and excluding irrelevant ones can help
reduce overfitting.
Ensemble Methods:Using ensemble methods like Random Forests and Gradient Boosting
can help reduce overfitting because they combine multiple decision trees, making the
overall model more robust.
Cross-Validation:This technique evaluates the model's performance on multiple subsets of
the data, helping to identify overfitting.

2 a) Random Forest is a popular machine learning algorithm that belongs to the supervised 7M
learning technique. It can be used for both Classification and Regression problems in ML.
It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of de-
cision trees on various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset."
Instead of relying on one decision tree, the random forest takes the prediction from each
tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem
of overfitting.
Why use Random Forest?
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs efficiently.
o It can also maintain accuracy when a large proportion of data is missing.
How does Random Forest algorithm work?
o Random Forest works in two-phase first is to create the random forest by combining
N decision tree, and second is to make predictions for each tree created in the first
phase.
o The Working process can be explained in the below steps:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
Advantages of Random Forest
Random Forest is capable of performing both Classification and Regression tasks.
It is capable of handling large datasets with high dimensionality.
It enhances the accuracy of the model and prevents the overfitting issue.
Disadvantages of Random Forest
Although random forest can be used for both classification and regression tasks, it is not
more suitable for Regression tasks.
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.
Land Use: We can identify the areas of similar land use by this algorithm.
Marketing: Marketing trends can be identified using this algorithm.

2 b)
Support Vector Machines (SVMs) are supervised machine learning algorithms used for
6M
both classification and regression. They work by finding the optimal hyperplane that separ-
ates data points into different classes, aiming to maximize the margin between the hyper-
plane and the nearest data points of each class. This hyperplane acts as a decision boundary,
and the data points closest to it are called support vectors. SVMs can also handle non-linear
data by using kernel functions to map the data to higher-dimensional spaces, where it be-
comes linearly separable.
1. Finding the Optimal Hyperplane:
SVMs aim to find the hyperplane that best separates the data into different classes. This hy -
perplane is the decision boundary that minimizes the classification error. The margin is the
distance between the hyperplane and the closest data points (support vectors) of each class.
SVMs try to maximize this margin, making the decision boundary more robust to new, un-
seen data.
2. Support Vectors:
Support vectors are the data points closest to the hyperplane. They define the margin and
are crucial in determining the decision boundary.Only support vectors are needed to train
the model, making SVMs memory efficient.

3. Kernel Trick:
When data is not linearly separable, SVMs use a kernel trick to map it into a higher-dimen -
sional space. This higher-dimensional space might make the data linearly separable, allow-
ing SVMs to find a hyperplane in this space. Common kernel functions include linear, poly-
nomial, and radial basis function (RBF) kernels.
Advantages of SVMs: Effective for high-dimensional data, Robust to outliers, Versatile
due to the use of kernel functions, and Can handle both linear and non-linear relationships.
Applications of SVMs: Classification, Regression, Text categorization, Image recognition,
and Facial expression classification.
3 a) 6M
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Super-
vised Learning technique. K-NN algorithm assumes the similarity between the new case/
data and available cases and put the new case into the category that is most similar to the
available categories.
K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.K-NN algorithm can be used for Regression as
well as for Classification but mostly it is used for the Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on un-
derlying data.It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of classification, it per-
forms an action on the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN al-
gorithm, as it works on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the most similar features it 6M
will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of prob-
lem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the cat-
egory or class of a particular dataset.
Consider the below diagram:

How does K-NN work? The K-NN working can be explained on the basis of the be-
low algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category. Consider
the below image: Firstly, we will choose the number of neighbors, so we will choose the
k=5.Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry. It
can be calculated as:
By calculating the Euclidean distance we got the nearest neighbors, as three nearest neigh-
bors in category A and two nearest neighbors in category B. Consider the below image:

As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
How to select the value of K in the K-NN Algorithm?
There is no particular way to determine the best value for "K", so we need to try some val-
ues to find the best out of them. The most preferred value for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers
in the model.
Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
Always needs to determine the value of K which may be complex some time.
The computation cost is high because of calculating the distance between the data points for
all the training samples.
6M
Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes the-
3 b) orem and used for solving classification problems. It is mainly used in text classification
that includes a high-dimensional training dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object. Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Why is it called Naïve Bayes? The Naïve Bayes algorithm is comprised of two words Naïve
and Bayes, Which can be described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is inde-
pendent of the occurrence of other features. Such as if the fruit is identified on the bases of
color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence
each feature individually contributes to identify that it is an apple without depending on
each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
The formula for Bayes' theorem is given as:

Where, P(A|B) is Posterior probability: Probability of hypothesis A on the observed event


B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of
a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence..
Example: Consider a spam detection task where you want to classify emails as "spam" or
"not spam" based on the words they contain.
1. Training Data: You start with a dataset of emails labeled as either "spam" or "not
spam".
2. Calculating Probabilities: The Naive Bayes algorithm calculates the probability of a
word appearing in a spam email (P(word|spam)) and the probability of that word appearing
in a non-spam email (P(word|not spam)) based on the training data.
3. Making Predictions: When classifying a new email, the algorithm calculates the prob-
ability of the email being spam given the words it contains, using Bayes' Theorem and the
independence assumption.
Text Classification (Spam Detection)
Imagine you want to classify emails as "Spam" or "Not Spam" based on words in the
email.

Sample Data:
Email Text Class
"Win money now" Spam
"Important meeting tomorrow" Not Spam
"Claim your free prize" Spam
"Project update attached" Not Spam
From this data, the algorithm learns:
 P(Spam) and P(Not Spam) (prior probabilities)
 P(word | Spam) and P(word | Not Spam) (likelihood of each word given the class)
Classifying New Email: "Free money offer"
 The model computes:
o P(Spam∣free, money, offer)P(\text{Spam} | \text{free, money,
offer})P(Spam∣free, money, offer)
o P(Not Spam∣free, money, offer)P(\text{Not Spam} | \text{free, money, of-
fer})P(Not Spam∣free, money, offer)
 Whichever is higher, that class is chosen (likely Spam in this case).
Ensemble methods are techniques that combine predictions from multiple models to create
a more accurate and robust final model. The main idea is that a group of weak learners
(models that perform slightly better than random guessing) can be combined to form a 7M
4 a) strong learner with significantly better performance. Carries 1M
Types of Ensemble Methods: Bagging vs. Boosting
Feature Bagging (Bootstrap Aggregating) Boosting

Purpose Reduce variance Reduce bias

Trains multiple models independently Trains models sequentially, where each


How it works on random subsets of data (with new model focuses on errors made by the
replacement) previous ones

Bootstrapped (random sampling with Weighted sampling based on previous


Data Sampling
replacement) errors

Model Usually averaging (regression) or


Weighted sum of model predictions
Combination majority voting (classification)

Lower risk, especially for high- Higher risk if overdone, but generally
Overfitting Risk
variance models like decision trees improves accuracy

Parallelization Models can be trained in parallel Models must be trained sequentially

Examples Random Forest AdaBoost, Gradient Boosting, XGBoost


Step 1: Compute Euclidean Distance
The Euclidean distance between two points (x1,y1)(x_1, y_1)(x1,y1) and (x2,y2)(x_2, y_2)(x2,y2) is:

4 b)
Compute distances from (4, 4) to all training points:

5M

Step 2: Pick the 3 Nearest Neighbors


Based on the distances:
 B – Red – 2.24
 C – Blue – 2.24
 D – Blue – 2.24
So the 3 nearest neighbors are B, C, D.
Step 3: Vote for the Class
 Red: 1 vote
 Blue: 2 votes Predicted Class = Blue

Faculty-Incharge

You might also like