T2 Scheme 24 25

Uploaded by

skandabs01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

T2 Scheme 24 25

Uploaded by

skandabs01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

BANGALORE INSTITUTE OF TECHNOLOGY

K R ROAD, V V PURA, BANGALURU-04

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
II Internals -2024-25(EVEN)
Scheme of Evaluation

COURSE : Machine Learning Fundamentals BATCH:2023

CODE : BRI405B SEM:4th
DATE & TIME : 06/05/2025 at 3.45 pm to 4.45 pm MAX MARKS:25
Q No.
Solution Marks

1 a) Logistic regression is a statistical method used for binary classification problems, predicting 6M
the probability of a binary outcome (e.g., 0 or 1, yes or no).
1Mark
Linear regression predicts continuous outcomes, while logistic regression predicts
categorical outcomes. Linear regression outputs a continuous value, while logistic
regression outputs a probability between 0 and 1. 1Mark

What is the sigmoid function and why is it important in logistic regression? 2Mark
The sigmoid function, also known as the logistic function, maps any real-valued number to
a range between 0 and 1. This function is crucial in logistic regression because it transforms
the output of the linear equation into a probability.
Mathematical Representation
The sigmoid function is defined as:

The different types of logistic regression 1Mark

 Binary logistic regression (two categories)
 Multinomial logistic regression (more than two unordered categories) and
 Ordinal logistic regression (more than two ordered categories).
How do you make predictions using a logistic regression model? 1Mark
After training the model, you can use it to predict the probability of a new data point
belonging to a specific category. The prediction is usually based on the predicted
probability value, with a threshold (e.g., 0.5) used to classify the data point.
1 b) Definition carries 1Mark 7M
Overfitting in decision trees occurs when a model becomes too complex, learning the
training data's noise and irrelevant details, leading to poor performance on unseen data. A
decision tree can overfit because it recursively splits the data, creating highly specific rules
that only apply to the training set. This results in a model that performs well on the training
data but struggles to generalize to new, unobserved data.
Why Overfitting Happens: 3Mark
Model Complexity: Decision trees can become very complex with many branches and leaf
nodes, allowing them to fit the training data perfectly, including its noise.
Training Data Issues:If the training data is small, noisy, or not representative of the
overall population, the model is more likely to overfit.
Lack of Generalization:An overfitted model is essentially "memorizing" the training data
rather than learning the underlying patterns. This makes it unable to make accurate
predictions on new, unseen data.
How to Prevent Overfitting: 3Mark
Pruning: This involves removing parts of the decision tree that don't contribute
significantly to its predictive power.
Setting Parameters:
Maximum Depth: Limiting the depth of the tree prevents it from becoming overly
complex.
Minimum Samples: Ensuring each leaf node has a minimum number of samples prevents
it from being based on too little data.
Feature Selection:Using only relevant features and excluding irrelevant ones can help
reduce overfitting.
Ensemble Methods:Using ensemble methods like Random Forests and Gradient Boosting
can help reduce overfitting because they combine multiple decision trees, making the
overall model more robust.
Cross-Validation:This technique evaluates the model's performance on multiple subsets of
the data, helping to identify overfitting.

2 a) Random Forest is a popular machine learning algorithm that belongs to the supervised 7M
learning technique. It can be used for both Classification and Regression problems in ML.
It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of de-
cision trees on various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset."
Instead of relying on one decision tree, the random forest takes the prediction from each
tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem
of overfitting.
Why use Random Forest?
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs efficiently.
o It can also maintain accuracy when a large proportion of data is missing.
How does Random Forest algorithm work?
o Random Forest works in two-phase first is to create the random forest by combining
N decision tree, and second is to make predictions for each tree created in the first
phase.
o The Working process can be explained in the below steps:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
Advantages of Random Forest
Random Forest is capable of performing both Classification and Regression tasks.
It is capable of handling large datasets with high dimensionality.
It enhances the accuracy of the model and prevents the overfitting issue.
Disadvantages of Random Forest
Although random forest can be used for both classification and regression tasks, it is not
more suitable for Regression tasks.
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.
Land Use: We can identify the areas of similar land use by this algorithm.
Marketing: Marketing trends can be identified using this algorithm.

2 b)
Support Vector Machines (SVMs) are supervised machine learning algorithms used for
6M
both classification and regression. They work by finding the optimal hyperplane that separ-
ates data points into different classes, aiming to maximize the margin between the hyper-
plane and the nearest data points of each class. This hyperplane acts as a decision boundary,
and the data points closest to it are called support vectors. SVMs can also handle non-linear
data by using kernel functions to map the data to higher-dimensional spaces, where it be-
comes linearly separable.
1. Finding the Optimal Hyperplane:
SVMs aim to find the hyperplane that best separates the data into different classes. This hy -
perplane is the decision boundary that minimizes the classification error. The margin is the
distance between the hyperplane and the closest data points (support vectors) of each class.
SVMs try to maximize this margin, making the decision boundary more robust to new, un-
seen data.
2. Support Vectors:
Support vectors are the data points closest to the hyperplane. They define the margin and
are crucial in determining the decision boundary.Only support vectors are needed to train
the model, making SVMs memory efficient.

3. Kernel Trick:
When data is not linearly separable, SVMs use a kernel trick to map it into a higher-dimen -
sional space. This higher-dimensional space might make the data linearly separable, allow-
ing SVMs to find a hyperplane in this space. Common kernel functions include linear, poly-
nomial, and radial basis function (RBF) kernels.
Advantages of SVMs: Effective for high-dimensional data, Robust to outliers, Versatile
due to the use of kernel functions, and Can handle both linear and non-linear relationships.
Applications of SVMs: Classification, Regression, Text categorization, Image recognition,
and Facial expression classification.
3 a) 6M
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Super-
vised Learning technique. K-NN algorithm assumes the similarity between the new case/
data and available cases and put the new case into the category that is most similar to the
available categories.
K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.K-NN algorithm can be used for Regression as
well as for Classification but mostly it is used for the Classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on un-
derlying data.It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of classification, it per-
forms an action on the dataset.
KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN al-
gorithm, as it works on a similarity measure. Our KNN model will find the similar features
of the new data set to the cats and dogs images and based on the most similar features it 6M
will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of prob-
lem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the cat-
egory or class of a particular dataset.
Consider the below diagram:

How does K-NN work? The K-NN working can be explained on the basis of the be-
low algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category. Consider
the below image: Firstly, we will choose the number of neighbors, so we will choose the
k=5.Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry. It
can be calculated as:
By calculating the Euclidean distance we got the nearest neighbors, as three nearest neigh-
bors in category A and two nearest neighbors in category B. Consider the below image:

As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
How to select the value of K in the K-NN Algorithm?
There is no particular way to determine the best value for "K", so we need to try some val-
ues to find the best out of them. The most preferred value for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers
in the model.
Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
It is simple to implement.
It is robust to the noisy training data
It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
Always needs to determine the value of K which may be complex some time.
The computation cost is high because of calculating the distance between the data points for
all the training samples.
6M
Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes the-
3 b) orem and used for solving classification problems. It is mainly used in text classification
that includes a high-dimensional training dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object. Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Why is it called Naïve Bayes? The Naïve Bayes algorithm is comprised of two words Naïve
and Bayes, Which can be described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is inde-
pendent of the occurrence of other features. Such as if the fruit is identified on the bases of
color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence
each feature individually contributes to identify that it is an apple without depending on
each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
The formula for Bayes' theorem is given as:

Where, P(A|B) is Posterior probability: Probability of hypothesis A on the observed event

B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of
a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence..
Example: Consider a spam detection task where you want to classify emails as "spam" or
"not spam" based on the words they contain.
1. Training Data: You start with a dataset of emails labeled as either "spam" or "not
spam".
2. Calculating Probabilities: The Naive Bayes algorithm calculates the probability of a
word appearing in a spam email (P(word|spam)) and the probability of that word appearing
in a non-spam email (P(word|not spam)) based on the training data.
3. Making Predictions: When classifying a new email, the algorithm calculates the prob-
ability of the email being spam given the words it contains, using Bayes' Theorem and the
independence assumption.
Text Classification (Spam Detection)
Imagine you want to classify emails as "Spam" or "Not Spam" based on words in the
email.

Sample Data:
Email Text Class
"Win money now" Spam
"Important meeting tomorrow" Not Spam
"Claim your free prize" Spam
"Project update attached" Not Spam
From this data, the algorithm learns:
 P(Spam) and P(Not Spam) (prior probabilities)
 P(word | Spam) and P(word | Not Spam) (likelihood of each word given the class)
Classifying New Email: "Free money offer"
 The model computes:
o P(Spam∣free, money, offer)P(\text{Spam} | \text{free, money,
offer})P(Spam∣free, money, offer)
o P(Not Spam∣free, money, offer)P(\text{Not Spam} | \text{free, money, of-
fer})P(Not Spam∣free, money, offer)
 Whichever is higher, that class is chosen (likely Spam in this case).
Ensemble methods are techniques that combine predictions from multiple models to create
a more accurate and robust final model. The main idea is that a group of weak learners
(models that perform slightly better than random guessing) can be combined to form a 7M
4 a) strong learner with significantly better performance. Carries 1M
Types of Ensemble Methods: Bagging vs. Boosting
Feature Bagging (Bootstrap Aggregating) Boosting

Purpose Reduce variance Reduce bias

Trains multiple models independently Trains models sequentially, where each

How it works on random subsets of data (with new model focuses on errors made by the
replacement) previous ones

Bootstrapped (random sampling with Weighted sampling based on previous

Data Sampling
replacement) errors

Model Usually averaging (regression) or

Weighted sum of model predictions
Combination majority voting (classification)

Lower risk, especially for high- Higher risk if overdone, but generally
Overfitting Risk
variance models like decision trees improves accuracy

Parallelization Models can be trained in parallel Models must be trained sequentially

Examples Random Forest AdaBoost, Gradient Boosting, XGBoost

Step 1: Compute Euclidean Distance
The Euclidean distance between two points (x1,y1)(x_1, y_1)(x1,y1) and (x2,y2)(x_2, y_2)(x2,y2) is:

4 b)
Compute distances from (4, 4) to all training points:

Step 2: Pick the 3 Nearest Neighbors

Based on the distances:
 B – Red – 2.24
 C – Blue – 2.24
 D – Blue – 2.24
So the 3 nearest neighbors are B, C, D.
Step 3: Vote for the Class
 Red: 1 vote
 Blue: 2 votes Predicted Class = Blue

Faculty-Incharge

MC Learning
No ratings yet
MC Learning
4 pages
Module 5
No ratings yet
Module 5
6 pages
Classification
No ratings yet
Classification
10 pages
Lecture2 MCQ Guide
No ratings yet
Lecture2 MCQ Guide
8 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Unit3 ML
No ratings yet
Unit3 ML
7 pages
What Is An SVM
No ratings yet
What Is An SVM
24 pages
Unit III
No ratings yet
Unit III
5 pages
Random Forest
No ratings yet
Random Forest
25 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Random Forest Algorithm Overview
No ratings yet
Random Forest Algorithm Overview
11 pages
Unit 3 PDF
No ratings yet
Unit 3 PDF
7 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
20schonlau Rforest
No ratings yet
20schonlau Rforest
23 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
24 pages
Machine Learning Algorithm Guide
100% (1)
Machine Learning Algorithm Guide
15 pages
Unit 3
No ratings yet
Unit 3
12 pages
Machine Learning Project
No ratings yet
Machine Learning Project
12 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
ML-classification Models
No ratings yet
ML-classification Models
27 pages
21AI502 Syllbus
No ratings yet
21AI502 Syllbus
5 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Random Forest
No ratings yet
Random Forest
8 pages
DetailsofML 1
No ratings yet
DetailsofML 1
22 pages
PerceptiLabs-ML Handbook
No ratings yet
PerceptiLabs-ML Handbook
31 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
Random Forest
No ratings yet
Random Forest
14 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Random Forest
No ratings yet
Random Forest
10 pages
ML Imp Que
No ratings yet
ML Imp Que
57 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Raghav Soni (20IOT6014) Algo - Assignment
No ratings yet
Raghav Soni (20IOT6014) Algo - Assignment
14 pages
Algorithms
No ratings yet
Algorithms
5 pages
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
No ratings yet
CS601 - Machine Learning - Unit 1 - Notes - 1672759748
13 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
Unit 3
No ratings yet
Unit 3
9 pages
Unit - 5
No ratings yet
Unit - 5
111 pages
ES335
No ratings yet
ES335
22 pages
Random Forest
No ratings yet
Random Forest
29 pages
Day 2 Presentation
No ratings yet
Day 2 Presentation
65 pages
Machine Learning Issues & Algorithms
No ratings yet
Machine Learning Issues & Algorithms
133 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Ai Notes For Isa
No ratings yet
Ai Notes For Isa
9 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
ML Tutorial
No ratings yet
ML Tutorial
45 pages
Introduction to Classification Techniques
No ratings yet
Introduction to Classification Techniques
10 pages
Problem Statements: ACM India East Hackathon Feb 8-9, 2020 at Kolkata
No ratings yet
Problem Statements: ACM India East Hackathon Feb 8-9, 2020 at Kolkata
3 pages
Lugth And: Align D
No ratings yet
Lugth And: Align D
16 pages
T1 Scheme 24 25
No ratings yet
T1 Scheme 24 25
5 pages
T3 Scheme 24 25
No ratings yet
T3 Scheme 24 25
4 pages
Question Bank 3 Updated
No ratings yet
Question Bank 3 Updated
1 page
L9 Special Programming Models Transportation Model
No ratings yet
L9 Special Programming Models Transportation Model
113 pages
Adsa Unit-I Mcqs
No ratings yet
Adsa Unit-I Mcqs
4 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219 - Read the ebook online or download it for the best experience
100% (13)
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219 - Read the ebook online or download it for the best experience
85 pages
DS LAB - Manual
No ratings yet
DS LAB - Manual
58 pages
Probability and Estimation Review
No ratings yet
Probability and Estimation Review
18 pages
Lab Manual 10: Z-Transform and Inverse Z-Transform Analysis Objective
No ratings yet
Lab Manual 10: Z-Transform and Inverse Z-Transform Analysis Objective
7 pages
ML Research Paper
No ratings yet
ML Research Paper
9 pages
Lab 9 - Identification Trees
No ratings yet
Lab 9 - Identification Trees
3 pages
Understanding CIC Filters
No ratings yet
Understanding CIC Filters
10 pages
Quiz Format (1) DAA4A
No ratings yet
Quiz Format (1) DAA4A
1 page
06 Activity 1 ARG Data Structures Orienza Kenan M PDF
No ratings yet
06 Activity 1 ARG Data Structures Orienza Kenan M PDF
1 page
B.Tech Soft Computing Exam 2020-21
100% (1)
B.Tech Soft Computing Exam 2020-21
2 pages
Design & Analysis of Algorithms - 88 MCQs With Answers - Part 1 - Department of Computer Engineers PDF
No ratings yet
Design & Analysis of Algorithms - 88 MCQs With Answers - Part 1 - Department of Computer Engineers PDF
29 pages
DIP Lecture3
No ratings yet
DIP Lecture3
13 pages
Daa Unit1 Unit 1
No ratings yet
Daa Unit1 Unit 1
78 pages
LAB 9 - Multirate Sampling
No ratings yet
LAB 9 - Multirate Sampling
8 pages
Package Spectral': R Topics Documented
No ratings yet
Package Spectral': R Topics Documented
35 pages
Dimensionality Reduction & Models
No ratings yet
Dimensionality Reduction & Models
59 pages
Lecture Notes
No ratings yet
Lecture Notes
54 pages
Gauss-Seidel Method - More Examples Computer Engineering: Example 1
No ratings yet
Gauss-Seidel Method - More Examples Computer Engineering: Example 1
3 pages
Product of Two Binomials
No ratings yet
Product of Two Binomials
19 pages
Machine Learning Notes For KTU Semester 7
No ratings yet
Machine Learning Notes For KTU Semester 7
226 pages
Line Coding Gla University Notes
No ratings yet
Line Coding Gla University Notes
25 pages
Deep Reinforcement Learning for Aerospace
No ratings yet
Deep Reinforcement Learning for Aerospace
55 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
SAP HANA K-Means for Tech Experts
No ratings yet
SAP HANA K-Means for Tech Experts
3 pages
Feature Pruning and Normalization
No ratings yet
Feature Pruning and Normalization
8 pages
Linear Block Codes Explained
No ratings yet
Linear Block Codes Explained
20 pages
Neural Network Data Preprocessing Guide
No ratings yet
Neural Network Data Preprocessing Guide
17 pages
Final Quiz 1 - Attempt Review
No ratings yet
Final Quiz 1 - Attempt Review
3 pages