0% found this document useful (0 votes)

22 views

Machine Learning Most Important Question For Mid Term Ipu University

Uploaded by

Mayank Fulara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Machine Learning Most Important Question For Mid Term Ipu University

Uploaded by

Mayank Fulara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Unit-1

1. Introduction to Machine Learning (ML)

Q: What is Machine Learning, and how does it differ from

traditional programming?

Answer:

Machine Learning (ML) is a branch of artificial intelligence that

enables systems to learn patterns and make decisions without
explicit programming. Unlike traditional programming, where the
logic is pre-defined by programmers, ML algorithms identify
patterns from data and improve over time based on experience.

In traditional programming, we provide rules and input data to

generate output. However, in ML, we feed the system with input
data and the output, allowing it to discover rules (model) by itself.
ML models can handle complex, data-driven problems that would
be infeasible to solve using hand-written rules, such as image
recognition or recommendation systems.

2. Why is Machine Learning Important?

Q: Why is Machine Learning gaining importance across

industries?

Answer:

Machine Learning is essential for several reasons:

1. Handling Complex Data: ML algorithms can analyze large and
complex datasets efficiently, which is difficult for humans.
2. Automation of Tasks: Many tasks, such as spam filtering,
fraud detection, and recommendation systems, can be
automated using ML.
3. Adaptability: ML systems improve with experience, becoming
more accurate over time as new data becomes available.
4. Personalization: ML enables businesses to tailor products
and services to individual customers (e.g., personalized
advertisements).
5. Decision Support: ML models assist in making better
business decisions, such as forecasting market trends or
stock prices.

Industries like healthcare, finance, retail, and manufacturing rely on

ML for improved decision-making, operational efficiency, and
customer satisfaction.

3. Types of Machine Learning Problems

Q: What are the main categories of Machine Learning problems?

Provide examples.

Answer:

Machine Learning problems are generally divided into the following

categories:

1. Supervised Learning:
a. The algorithm learns from labeled data, meaning both
input and corresponding output are provided.
b. Examples:
i. Predicting house prices (Regression)
ii. Email spam detection (Classification)
2. Unsupervised Learning:
a. The algorithm learns from unlabeled data, aiming to
identify patterns or groupings.
b. Examples:
i. Customer segmentation
ii. Market basket analysis
3. Reinforcement Learning:
a. The algorithm learns by interacting with an environment
and receiving feedback in the form of rewards or
penalties.
b. Examples:
i. Self-driving cars
ii. Game-playing agents (e.g., AlphaGo)

4. Applications of Machine Learning

Q: Describe some real-world applications of Machine Learning.

Answer:

1. Healthcare:
a. Predicting diseases using patient data (e.g., early
detection of cancer).
b. Personalized treatment plans and drug discovery.
2. Finance:
a. Fraud detection in banking transactions.
b. Stock market predictions and automated trading.
3. E-commerce and Retail:
a. Recommendation systems (e.g., Amazon, Netflix).
b. Inventory management and demand forecasting.
4. Transportation:
a. Predictive maintenance of vehicles.
b. Optimization of delivery routes (e.g., Uber, FedEx).

5. Supervised Learning: Regression and Classification

Q: Explain the concepts of Regression and Classification in

supervised learning.

Answer:

• Regression:
o Regression is used to predict a continuous numerical
value based on input variables.
o Example: Predicting the price of a house based on its
size, location, and other features.
o Algorithm Example: Linear Regression, Polynomial
Regression.
• Classification:
o Classification is used to assign data points to predefined
categories or classes.
o Example: Identifying whether an email is spam or not.
o Algorithm Example: Logistic Regression, Decision Trees,
Support Vector Machines (SVM).

6. Binary Classification, Multiclass Classification, and

Multilabel Classification

Q: What is the difference between Binary, Multiclass, and

Multilabel Classification?

Answer:

1. Binary Classification:
a. The model predicts one of two possible outcomes.
b. Example: Predicting whether a patient has a disease
(Yes/No).
2. Multiclass Classification:
a. The model predicts one of more than two classes.
b. Example: Classifying an image as a cat, dog, or bird.
3. Multilabel Classification:
a. Each instance can belong to multiple classes
simultaneously.
b. Example: A news article categorized as both politics and
sports.
7. Performance Measures: Confusion Matrix, Accuracy,
Precision & Recall, ROC Curve

Q: Explain the different performance measures used for

classification models.

Answer:

1. Confusion Matrix:
a. A table that summarizes the performance of a
classification model.
i. True Positives (TP): Correctly predicted positives
ii. True Negatives (TN): Correctly predicted negatives
iii. False Positives (FP): Incorrectly predicted as
positive
iv. False Negatives (FN): Incorrectly predicted as
negative
2. Accuracy:
a. Accuracy=TP+TNTP+TN+FP+FNAccuracy = \frac{TP +
TN}{TP + TN + FP + FN}Accuracy=TP+TN+FP+FNTP+TN
b. It measures how often the model makes correct
predictions.
3. Precision:
a. Precision=TPTP+FPPrecision = \frac{TP}{TP +
FP}Precision=TP+FPTP
b. Precision measures the proportion of true positive
predictions out of all positive predictions.
4. Recall (Sensitivity):
a. Recall=TPTP+FNRecall = \frac{TP}{TP +
FN}Recall=TP+FNTP
b. Recall measures how well the model identifies positive
cases.
5. ROC Curve:
a. A graphical representation of the trade-off between True
Positive Rate (TPR) and False Positive Rate (FPR).
b. The area under the curve (AUC) indicates the model's
ability to distinguish between classes.

8. Advanced Python Libraries: NumPy and Pandas

Q: Explain the importance of NumPy and Pandas in data

analysis.

Answer:

• NumPy:
o A library for numerical computing, providing support for
multi-dimensional arrays and mathematical operations.
o Example Usage: Matrix operations, linear algebra, and
random number generation.
• Pandas:
o A library for data manipulation and analysis, offering data
structures like DataFrames and Series.
o Example Usage: Data cleaning, filtering, and merging
datasets.
9. Scikit-Learn: A Python Machine Learning Library

Q: Why is Scikit-Learn popular for implementing Machine

Learning algorithms?

Answer:

Scikit-Learn is a widely used ML library because:

• It provides easy-to-use tools for both supervised and

unsupervised learning.
• It offers utilities for data preprocessing, model selection, and
performance evaluation.
• It integrates well with other libraries like NumPy and Pandas.

10. Linear Regression (Single and Multiple Variables)

Q: Explain the concept of Linear Regression with one and

multiple variables.

Answer:

• Linear Regression with One Variable:

o A model that predicts the output as a linear function of a
single input feature.
o Equation: y=mx+cy = mx + cy=mx+c
• Linear Regression with Multiple Variables:
o A model that predicts the output using multiple input
features.
o Equation: y=w1x1+w2x2+...+wnxn+by = w_1x_1 +
w_2x_2 + ... + w_nx_n + by=w1 x1 +w2 x2 +...+wn xn +b
o Example: Predicting house prices based on size,
location, and age.

11. Logistic Regression

Q: What is Logistic Regression, and how is it used in

classification problems?

Answer:

• Logistic Regression is a classification algorithm that predicts

the probability of an instance belonging to a particular class.
• It uses the sigmoid function to map predicted values between
0 and 1.
• Example: Predicting whether a student will pass or fail based
on study hours.

Unit-2
1. Decision Trees

Q: Explain how the Decision Tree algorithm works and discuss

its advantages and disadvantages.

Answer:

A Decision Tree is a supervised learning algorithm used for both

classification and regression tasks. It works by recursively
splitting the dataset into subsets based on feature values to create
a tree structure, where each internal node represents a feature,
branches represent decision rules, and leaves represent the
output (class or value).

Working:

1. Selecting the Best Split:

a. The algorithm uses metrics like Gini Impurity or Entropy
(Information Gain) to select the best feature to split at
each step.
2. Recursive Splitting:
a. It continues splitting until either all data points are
classified perfectly, or a stopping condition is met.
3. Leaf Nodes:
a. The final nodes provide the predicted outcome (class or
value).

Advantages:

• Simple to understand and visualize.

• Can handle both numerical and categorical data.
• Requires little data preprocessing (no need for feature
scaling).

Disadvantages:

• Prone to overfitting, especially on noisy datasets.

• Not suitable for large datasets as trees can grow complex and
deep.

2. Tree Pruning

Q: What is tree pruning in Decision Trees, and why is it

important?

Answer:

Tree pruning is a technique used to reduce the size of a decision

tree by removing sections that provide little predictive power. It
helps in avoiding overfitting, making the model more generalizable
to unseen data.

Types of Pruning:

1. Pre-pruning (Early Stopping):

a. The tree-building process stops early, before it perfectly
fits the data.
b. Criteria like minimum samples per leaf or maximum tree
depth are used to limit growth.
2. Post-pruning:
a. The tree is fully grown first and then non-critical branches
are removed.
b. Cost-complexity pruning uses a trade-off between
model complexity and accuracy.

Importance:

• Reduces overfitting by controlling the complexity of the model.

• Makes the model faster and more interpretable.

3. Rule-based Classification

Q: Describe the concept of rule-based classification and its

working. Provide examples.

Answer:

In rule-based classification, the model learns a set of if-then rules

to classify data. Each rule corresponds to a specific condition on
feature values that determines the class label.

Working:

1. Generating Rules:
a. Rules are often extracted from decision trees or trained
directly using rule-generation algorithms like RIPPER.
2. Rule Matching:
a. For each input, the model checks which rule applies
based on the feature values.
3. Conflict Resolution:
a. If multiple rules apply, a conflict resolution strategy like
rule priority or majority voting is used.
Example:

• If (age > 30) AND (income > 50K) Then Class = Premium
Customer.

Advantages:

• Easy to interpret and implement.

• Good for small datasets where relationships between features
are simple.

Disadvantages:

• May not perform well with noisy or complex data.

• Generating high-quality rules can be challenging.

4. Naïve Bayes Algorithm

Q: What is the Naïve Bayes algorithm? Explain how it works with

an example.

Answer:

Naïve Bayes is a probabilistic classifier based on Bayes’ theorem.

It assumes that all features are independent, which is rarely true in
real-world data but simplifies computation. Despite the
assumption, Naïve Bayes performs well in many practical
scenarios.
Bayes’ Theorem:

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A | B) = \frac{P(B | A) \cdot

P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)
• P(A∣B)P(A | B)P(A∣B): Posterior probability of class A given
feature B.
• P(B∣A)P(B | A)P(B∣A): Likelihood of feature B given class A.
• P(A)P(A)P(A): Prior probability of class A.
• P(B)P(B)P(B): Probability of feature B.

Example:

In spam detection:

• Features: Occurrence of specific words in an email.

• Classes: Spam or Not Spam.
The algorithm calculates the probability of an email being spam
based on the presence of words and assigns the class with the
highest probability.

Advantages:

• Simple and fast to implement.

• Works well with high-dimensional data.

Disadvantages:

• Assumes feature independence, which may not always hold.

• Struggles with zero probabilities (handled using Laplace
smoothing).
5. Bayesian Network

Q: What is a Bayesian Network? Explain with an example.

Answer:

A Bayesian Network is a probabilistic graphical model that

represents variables and their conditional dependencies using a
directed acyclic graph (DAG). Each node in the graph represents a
variable, and each edge represents a conditional dependency.

Example:

• In a healthcare scenario:
o Nodes: Smoking, Lung Cancer, Shortness of Breath.
o If a person is a smoker, it increases the probability of lung
cancer, which in turn increases the likelihood of
shortness of breath.

Bayesian Networks help in reasoning under uncertainty and are

used in fields like medicine, fraud detection, and decision support
systems.

6. Support Vector Machines (SVM)

Q: Explain how Support Vector Machines (SVM) work.

Answer:

SVM is a supervised learning algorithm used for both

classification and regression. It aims to find the optimal
hyperplane that maximally separates the data points of different
classes.

Working:

1. Margin Maximization:
a. SVM finds the hyperplane with the largest margin
(distance) from the nearest points of any class, called
support vectors.
2. Kernel Trick:
a. For non-linearly separable data, SVM applies the kernel
trick to map data into a higher-dimensional space where
it becomes linearly separable.

Advantages:

• Works well with high-dimensional data.

• Effective in cases where the number of features is large
relative to the number of samples.

Disadvantages:

• Computationally expensive for large datasets.

• Requires careful tuning of parameters (e.g., kernel type and
regularization).

7. k-Nearest Neighbors (k-NN)

Q: What is the k-Nearest Neighbors algorithm? Explain with an

example.
Answer:

k-NN is a lazy learning algorithm that makes predictions based on

the k closest neighbors of a data point. It does not build an explicit
model but relies on the entire training dataset during prediction.

Working:

1. Choosing k:
a. Select the number of neighbors (k) to consider.
2. Calculating Distance:
a. Use distance metrics like Euclidean distance to find the
nearest neighbors.
3. Voting:
a. For classification, the class with the majority vote among
neighbors is assigned.

Example:

If k = 3, and the three nearest neighbors are two dogs and one cat,
the new instance is classified as a dog.

8. Ensemble Learning and Random Forest Algorithm

Q: What is Ensemble Learning? Explain the Random Forest

algorithm.

Answer:
Ensemble Learning is a technique that combines multiple models
(often called weak learners) to produce a stronger, more robust
model.

Types of Ensemble Methods:

1. Bagging:
a. Multiple models are trained on different subsets of the
data. Example: Random Forest.
2. Boosting:
a. Models are trained sequentially, with each model
focusing on the mistakes of the previous ones. Example:
AdaBoost.

Random Forest Algorithm:

• Random Forest is an ensemble of decision trees, where each

tree is trained on a random subset of the data with a random
subset of features.
• During prediction, the outputs of individual trees are
aggregated (majority voting for classification or averaging
for regression).

Advantages of Random Forest:

• Reduces the risk of overfitting compared to individual trees.

• Works well with large datasets and can handle missing data.

Disadvantages:

• Can be slower and more memory-intensive.

• Less interpretable compared to individual decision trees.

PYQ

1. Different Types of Machine Learning Techniques

Machine Learning is classified into three main categories:

1.1 Supervised Learning

• Involves training a model on labeled data, where both input

and output (target) values are known.
• Examples:
o Predicting house prices (Regression)
o Spam detection in emails (Classification)
• Algorithms: Linear Regression, Logistic Regression, Decision
Trees, Naïve Bayes.

1.2 Unsupervised Learning

• The algorithm learns patterns from unlabeled data (no target

variable provided).
• Examples:
o Customer segmentation in marketing (Clustering).
o Market basket analysis to identify product associations
(Association Rules).
• Algorithms: K-Means, Hierarchical Clustering, Apriori
Algorithm.

1.3 Reinforcement Learning (RL)

• In RL, agents interact with an environment and learn through

trial-and-error, receiving rewards or penalties.
• Examples:
o Self-driving cars.
o Game-playing agents (e.g., AlphaGo).
• Algorithms: Q-Learning, Deep Q Networks (DQN).

2. Overfitting vs. Underfitting in Machine Learning

As
pe Overfitting Underfitting
ct
D
efi The model learns the noise and patterns in the The model is too simple and does
nit training data too well, resulting in poor not capture the underlying
io generalization to new data. patterns in the data.
n
C
au Excessive model complexity. Insufficient model complexity.
se
Sy
m
pt Poor accuracy on both training
High training accuracy but low testing accuracy.
o and testing data.
m
s
So
lut Increase model complexity or
Use regularization, cross-validation, or pruning.
io use more features.
n

3. Logistic Regression and its Application

Logistic Regression is a classification algorithm that predicts the

probability of an event belonging to a particular class. It uses the
sigmoid function to map predictions between 0 and 1.

Equation of Logistic Regression:

P(Y=1∣X)=11+e−(b0+b1X)P(Y=1|X) = \frac{1}{1 + e^{-(b_0 +

b_1X)}}P(Y=1∣X)=1+e−(b0 +b1 X)1

Here, P(Y=1∣X)P(Y=1|X)P(Y=1∣X) is the probability of the positive

class.

Applications:

• Spam Detection: Classifying emails as spam or not spam.

• Disease Prediction: Predicting whether a patient has
diabetes.
• Customer Churn Prediction: Predicting whether a customer
will leave a service.

4(a) Two-class Classification Problem

Given: Two-class problem (Man or Woman) with a test dataset of

10 records.
(i) Confusion Matrix Calculation:

Let's assume the following data:

Expected Predicted
Man Man
Woman Woman
Woman Man
Man Woman
Woman Woman
Man Man
Woman Woman
Man Man
Woman Woman
Man Man

From the above data:

• True Positive (TP) = 4 (Correctly predicted as Man)

• True Negative (TN) = 4 (Correctly predicted as Woman)
• False Positive (FP) = 1 (Incorrectly predicted as Man)
• False Negative (FN) = 1 (Incorrectly predicted as Woman)

(ii) Accuracy, Precision, Recall, Sensitivity, and Specificity

1. Accuracy:
Accuracy=TP+TNTP+TN+FP+FN=4+410=0.8 (80%)Accuracy =
\frac{TP + TN}{TP + TN + FP + FN} = \frac{4 + 4}{10} = 0.8 \,
(80\%)Accuracy=TP+TN+FP+FNTP+TN =104+4 =0.8(80%)
2. Precision (for Man):
Precision=TPTP+FP=44+1=0.8 (80%)Precision = \frac{TP}{TP +
FP} = \frac{4}{4 + 1} = 0.8 \, (80\%)Precision=TP+FPTP =4+14
=0.8(80%)
3. Recall (Sensitivity):
Recall=TPTP+FN=44+1=0.8 (80%)Recall = \frac{TP}{TP + FN} =
\frac{4}{4 + 1} = 0.8 \, (80\%)Recall=TP+FNTP =4+14 =0.8(80%)
4. Specificity (for Woman):
Specificity=TNTN+FP=44+1=0.8 (80%)Specificity = \frac{TN}{TN
+ FP} = \frac{4}{4 + 1} = 0.8 \, (80\%)Specificity=TN+FPTN =4+14
=0.8(80%)

4(b) Multiclass vs. Multilabel Classification

Multiclass Classification:

• Each instance belongs to only one of multiple classes.

• Example: Classifying an image as cat, dog, or bird.

Multilabel Classification:

• Each instance can belong to multiple classes

simultaneously.
• Example: A news article tagged as sports and politics.

5(a) When to Use Precision or Recall over Accuracy

1. When Recall is More Important:

a. Example: Detecting cancer. Missing a cancer-positive
patient (false negative) is more harmful than incorrectly
labeling a healthy patient as cancer-positive.
b. Why Recall? We want to identify all positive cases,
even at the cost of false positives.
2. When Precision is More Important:
a. Example: Spam filtering. Misclassifying an important
email as spam (false positive) can be problematic.
b. Why Precision? We want to minimize false positives to
ensure accuracy in prediction.

5(b) Simple Linear Regression and Least Squares Method

Simple Linear Regression:

It models the relationship between a dependent variable YYY and an

independent variable XXX.

Y=b0+b1XY = b_0 + b_1XY=b0 +b1 X

Least Squares Method:

• This method minimizes the sum of squared residuals

(differences between actual and predicted values) to fit the
best regression line.

Coefficient of Determination (R²):

• It measures how well the model explains the variability in the

target variable.
R2=1−SSresSStotR^2 = 1 - \frac{SS_{res}}{SS_{tot}}R2=1−SStot
SSres
6(a) Decision Tree and Attribute Selection using
Information Gain

A Decision Tree splits the dataset based on feature values to create

a tree-like structure for predictions.

Attribute Selection: Information Gain

• Information Gain (IG) measures the reduction in entropy after

a dataset split:
IG=Entropy(parent)−∑i=1knin⋅Entropy(childi)IG =
Entropy(parent) - \sum_{i=1}^{k} \frac{n_i}{n} \cdot
Entropy(child_i)IG=Entropy(parent)−i=1∑k nni ⋅Entropy(childi )

6(b) Naïve Bayes Classifier Example

Dataset:

Weather Play?
Sunny No
Sunny Yes
Overcast Yes
Rainy No

For "Sunny" day:

• Likelihood: Calculate probabilities from frequency counts

(e.g., P(Play | Sunny)).
• Bayes’ theorem: Use these probabilities to predict if the
player can play.
7(a) Ensemble Learning: Bagging and Boosting

1. Bagging:
a. Uses multiple models trained on different subsets of
data.
b. Example: Random Forest.
2. Boosting:
a. Sequentially trains models to correct the errors of
previous ones.
b. Example: AdaBoost.

7(b) Support Vector Machine (SVM)

SVM finds the optimal hyperplane that separates data points of

different classes.

• Hyperplane: A decision boundary.

• Support Vectors: Points closest to the hyperplane.
• Kernel: Transforms data into higher dimensions.
• Hard Margin: Perfectly separates data.
• Soft Margin: Allows some misclassification for better
generalization.

1. Entropy in Decision Tree Learning Algorithm

Entropy is a measure of the impurity or randomness in a dataset. In

decision tree learning, it is used to evaluate how well a feature
separates the data into classes. If all samples belong to the same
class, entropy is 0 (pure). If the data is split equally across classes,
entropy is 1 (maximum impurity).

The formula for entropy (E) for a binary classification is:

E(S)=−p1log⁡2(p1)−p2log⁡2(p2)E(S) = -p_1 \log_2(p_1) - p_2

\log_2(p_2)E(S)=−p1 log2 (p1 )−p2 log2 (p2 )

where p1p_1p1 and p2p_2p2 are the proportions of the two

classes. Decision trees use information gain, the reduction in
entropy, to decide which feature to split on.

2. Classification and its Applications

Classification is a supervised learning technique where the goal is

to assign input data to predefined categories or labels. It involves
training a model using labeled data to predict class labels for new,
unseen data.

Applications of Classification:

• Email filtering: Spam vs. non-spam emails

• Medical diagnosis: Identifying diseases based on symptoms
• Customer segmentation: Classifying customers by
purchasing behavior
• Sentiment analysis: Classifying customer reviews as positive
or negative
• Image recognition: Detecting objects or faces in images
3. Brief on NumPy Package of Python

NumPy (Numerical Python) is a core library for scientific

computing in Python. It provides support for handling large multi-
dimensional arrays and matrices, along with mathematical
functions to operate on them. NumPy is highly optimized for
performance, making it essential for data science and machine
learning tasks.

Key Features:

• Support for n-dimensional arrays (ndarray)

• Broadcasting for operations on arrays with different shapes
• Linear algebra, Fourier transforms, and random number
generation
• Integration with other libraries like pandas and TensorFlow

4. Applications of Naive Bayes Classifier

The Naive Bayes classifier is a probabilistic model that assumes

independence among features. It is efficient and works well even
with small datasets.

Applications:

• Spam filtering: Detecting spam emails using word

frequencies
• Sentiment analysis: Classifying product reviews as positive or
negative
• Document categorization: Grouping news articles by topics
• Medical diagnosis: Predicting diseases based on symptoms
• Real-time predictions: Works well in systems that need fast
classifications, like recommendation engines

5. Brief on Supervised, Unsupervised, and Reinforcement Learning

• Supervised Learning: The model learns from labeled data,

where both input and output (target) are known.
o Examples: Regression, classification models
o Use case: Predicting housing prices based on features
• Unsupervised Learning: The model learns from unlabeled
data, identifying patterns and relationships.
o Examples: Clustering, dimensionality reduction
o Use case: Customer segmentation for targeted marketing
• Reinforcement Learning: The model learns through
interaction with an environment, receiving rewards or
penalties for actions.
o Examples: Q-learning, Deep Q-Networks
o Use case: Training robots to walk or autonomous driving

6. Three Types of Classifiers

1. Linear Classifiers: Use a linear decision boundary to classify

data points.
a. Example: Logistic Regression, Linear Support Vector
Machine (SVM)
b. Use case: Predicting whether a student will pass or fail
based on attendance
2. Tree-Based Classifiers: Use a tree-like structure to classify
instances.
a. Example: Decision Tree, Random Forest
b. Use case: Loan approval prediction based on customer
features
3. Probabilistic Classifiers: Make predictions based on
probabilities.
a. Example: Naive Bayes
b. Use case: Spam detection in emails based on word
frequencies

7. Illustrate Support Vector Machine (SVM) with Neat Labelled Diagram

and Derive the Optimal Hyperplane

Support Vector Machine (SVM) is a supervised learning algorithm

used for classification and regression tasks. Its goal is to find the
optimal hyperplane that separates data points of different classes
with the maximum margin. The hyperplane is a boundary that
divides the data points such that each class lies on either side of it.

Key Components of SVM:

• Support Vectors: Data points that lie closest to the

hyperplane, which influence the boundary.
• Margin: The distance between the hyperplane and the nearest
support vectors from each class. A larger margin reduces the
chance of misclassification.
• Hyperplane: A linear boundary separating classes in the
dataset.

Diagram of SVM:

Imagine two classes (represented by circles and squares) with a

hyperplane that divides them, along with margins (solid lines)
touching the nearest support vectors. The optimal hyperplane
maximizes the distance between the support vectors of both
classes.

Mathematics of the Optimal Hyperplane:

The equation of a hyperplane in an n-dimensional space is:

w⋅x+b=0w \cdot x + b = 0w⋅x+b=0

Where:

• www = weight vector

• xxx = feature vector
• bbb = bias term

The objective is to maximize the margin between the two classes,

which is given by 2/∣∣w∣∣2 / ||w||2/∣∣w∣∣. SVM solves the following
optimization problem to find the optimal hyperplane:
Minimize: 12∣∣w∣∣2\text{Minimize: } \frac{1}{2} ||w||^2 Minimize:
21 ∣∣w∣∣2

Subject to:

yi(w⋅xi+b)≥1for all iy_i (w \cdot x_i + b) \geq 1 \quad \text{for

all } iyi (w⋅xi +b)≥1for all i

where yi∈{−1,+1}y_i \in \{-1, +1\}yi ∈{−1,+1} is the class label for
data point xix_ixi .

SVM also allows for non-linear classification by using kernels (like

polynomial or radial basis function), which project data into higher
dimensions where a linear hyperplane can separate the data.

8. Explain the K-Nearest Neighbor (K-NN) Learning Algorithm

K-Nearest Neighbor (K-NN) is a simple, non-parametric, and lazy

learning algorithm used for classification and regression tasks.
The algorithm assumes that similar data points exist close to each
other in feature space.

How K-NN Works:

1. Store the training data: In K-NN, the model only stores the
training examples and doesn’t explicitly learn a model during
training.
2. Choose the number of neighbors (K): K is a hyperparameter
that determines how many nearest data points will vote for the
label of the test point.
3. Compute distances: For each test point, the distances to all
training points are calculated using a distance metric (e.g.,
Euclidean distance).
4. Identify K-nearest neighbors: The algorithm selects the K
data points from the training set that are closest to the test
point.
5. Assign a label: In classification, the label with the most votes
among the K-neighbors is assigned to the test point. For
regression, the algorithm predicts the average value of the K-
neighbors.

Advantages and Disadvantages of K-NN:

• Advantages:
o Simple and easy to implement.
o Works well for smaller datasets with low dimensionality.
• Disadvantages:
o Requires high computation during prediction as it
computes distances for all training points.
o Performance decreases with high-dimensional data due
to the curse of dimensionality.
9. What is Linear Regression in Machine Learning? What Are Its Types?
Write the Cost Function and Explain the Importance of Gradient Descent

Linear Regression is a supervised learning algorithm used for

predicting continuous outcomes. It models the relationship
between a dependent variable YYY and one or more independent
variables XXX by fitting a straight line to the data points.

Types of Linear Regression:

1. Simple Linear Regression:

a. Involves one independent variable XXX.
b. Equation: Y=β0+β1X+ϵY = \beta_0 + \beta_1 X +
\epsilon Y=β0 +β1 X+ϵ Where:
i. YYY: Dependent variable (predicted output)
ii. XXX: Independent variable (input feature)
iii. β0\beta_0β0 : Intercept
iv. β1\beta_1β1 : Coefficient (slope)
v. ϵ\epsilonϵ: Error term
2. Multiple Linear Regression:
a. Involves multiple independent variables.
b. Equation: Y=β0+β1X1+β2X2+…+βnXn+ϵY = \beta_0 +
\beta_1 X_1 + \beta_2 X_2 + \ldots + \beta_n X_n +
\epsilon Y=β0 +β1 X1 +β2 X2 +…+βn Xn +ϵ
Cost Function in Linear Regression:

The Mean Squared Error (MSE) is the most commonly used cost
function in linear regression. It measures the average squared
difference between the predicted and actual values.

J(β0,β1)=12m∑i=1m(h(Xi)−Yi)2J(\beta_0, \beta_1) =
\frac{1}{2m} \sum_{i=1}^{m} \left( h(X_i) - Y_i
\right)^2J(β0 ,β1 )=2m1 i=1∑m (h(Xi )−Yi )2

Where:

• mmm: Number of training examples

• h(Xi)h(X_i)h(Xi ): Predicted value
• YiY_iYi : Actual value

Importance of Gradient Descent in Regression:

Gradient Descent is an optimization algorithm used to minimize

the cost function by iteratively adjusting the model parameters
(β0\beta_0β0 , β1\beta_1β1 , etc.). The algorithm works by
computing the gradient (slope) of the cost function and updating the
parameters in the opposite direction of the gradient to reduce the
error.

βj=βj−α∂J(β0,β1)∂βj\beta_j = \beta_j - \alpha \frac{\partial

J(\beta_0, \beta_1)}{\partial \beta_j}βj =βj −α∂βj ∂J(β0 ,β1 )

Where:

• α\alphaα = Learning rate (controls the step size)

• ∂J(β0,β1)∂βj\frac{\partial J(\beta_0, \beta_1)}{\partial
\beta_j}∂βj ∂J(β0 ,β1 ) = Partial derivative of the cost function
with respect to parameter βj\beta_jβj

Why Gradient Descent is Important:

• Helps find the optimal parameters to minimize the error.

• Useful when the dataset is too large for closed-form solutions
(like the Normal Equation).
• Ensures the model converges to the global minimum,
especially for linear regression problems.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
SemVII_MachineLearning
No ratings yet
SemVII_MachineLearning
22 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
Unit 3
No ratings yet
Unit 3
97 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
DL DL2 DL3 Merged
No ratings yet
DL DL2 DL3 Merged
11 pages
sdl unit 1
No ratings yet
sdl unit 1
7 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Machine Learning Supervised
No ratings yet
Machine Learning Supervised
42 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
ML
No ratings yet
ML
39 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Machine learning_question bank
No ratings yet
Machine learning_question bank
45 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
ML QB WITH ANSWER
No ratings yet
ML QB WITH ANSWER
20 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
ML (Theory)
No ratings yet
ML (Theory)
11 pages
Machine Learning (1)
No ratings yet
Machine Learning (1)
133 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
UNIT 1
No ratings yet
UNIT 1
4 pages
Unit-5 MECH 3-2
No ratings yet
Unit-5 MECH 3-2
14 pages
What Are The Types of Machine Learning?
100% (1)
What Are The Types of Machine Learning?
24 pages
Act7
No ratings yet
Act7
18 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-10-33
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-10-33
24 pages
Untitled
No ratings yet
Untitled
11 pages
Fam QB Ans
No ratings yet
Fam QB Ans
9 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
ML Report 1
No ratings yet
ML Report 1
23 pages
Report Print
No ratings yet
Report Print
22 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Chapter 01 machine learning
No ratings yet
Chapter 01 machine learning
22 pages
ML Viva Questions
No ratings yet
ML Viva Questions
8 pages
21CSC305P ML_ Unit 1-E.pptx
No ratings yet
21CSC305P ML_ Unit 1-E.pptx
137 pages
ML Notes
No ratings yet
ML Notes
52 pages
Machine Learning GNIT Suggestions
No ratings yet
Machine Learning GNIT Suggestions
7 pages
Unit 1 Machine Learning - PDF Lands
No ratings yet
Unit 1 Machine Learning - PDF Lands
5 pages
Ids Ashber
No ratings yet
Ids Ashber
9 pages
Machine Learning Unit-1
No ratings yet
Machine Learning Unit-1
22 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
1machine Learning
No ratings yet
1machine Learning
26 pages
CE880_lecture5_slides
No ratings yet
CE880_lecture5_slides
32 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
ML Short
No ratings yet
ML Short
11 pages
ML 22-23 Sem, GPT
No ratings yet
ML 22-23 Sem, GPT
14 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
AIML
No ratings yet
AIML
30 pages
presenttion33
No ratings yet
presenttion33
2 pages
Noida Institute of Engineering and Technology
No ratings yet
Noida Institute of Engineering and Technology
24 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
book of 843_AI_Student_HandbookXI-104-127
No ratings yet
book of 843_AI_Student_HandbookXI-104-127
24 pages
ML NOTES
No ratings yet
ML NOTES
13 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
ML notes
No ratings yet
ML notes
10 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Splnproc 1703
No ratings yet
Splnproc 1703
12 pages
Boosting: 1. What Is The Difference Between Adaboost and Gradient Boosting?
No ratings yet
Boosting: 1. What Is The Difference Between Adaboost and Gradient Boosting?
2 pages
Leveraging Machine Learning for Predicting Mental Health Outcomes a Data-Driven Approach
No ratings yet
Leveraging Machine Learning for Predicting Mental Health Outcomes a Data-Driven Approach
9 pages
Algorithms in ML
No ratings yet
Algorithms in ML
15 pages
Top 50 Machine Learning Interview Questions & Answers (2022)
100% (2)
Top 50 Machine Learning Interview Questions & Answers (2022)
15 pages
IJRPR16322
No ratings yet
IJRPR16322
15 pages
The Cartoon Guide To Statistics-3
100% (1)
The Cartoon Guide To Statistics-3
8 pages
Real Time Prediction of Rock Mass Classification 2022 Journal of Rock Mecha
No ratings yet
Real Time Prediction of Rock Mass Classification 2022 Journal of Rock Mecha
21 pages
ai q
No ratings yet
ai q
15 pages
Plagiarism - Report
No ratings yet
Plagiarism - Report
36 pages
Effective Feature Enginerring Technique For Heart Disease Prediction With Machine Learning
No ratings yet
Effective Feature Enginerring Technique For Heart Disease Prediction With Machine Learning
48 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Regression Model To Predict Bike Sharing 12110784
No ratings yet
Regression Model To Predict Bike Sharing 12110784
12 pages
My Resume
No ratings yet
My Resume
1 page
IITG Professional Certificate in Applied AI and ML
No ratings yet
IITG Professional Certificate in Applied AI and ML
25 pages
Data Mining Techniques for Early Detection of Breast Cancer
No ratings yet
Data Mining Techniques for Early Detection of Breast Cancer
8 pages
Analysis of ensemble machine learning classification comparison on the skin cancer MNIST dataset
No ratings yet
Analysis of ensemble machine learning classification comparison on the skin cancer MNIST dataset
8 pages
What Is Ensemble Learning
No ratings yet
What Is Ensemble Learning
4 pages
768247256-Ec3501-Wireless-Communication-836516061-Wc-Notes-PDF
No ratings yet
768247256-Ec3501-Wireless-Communication-836516061-Wc-Notes-PDF
6 pages
Complete Download Outlier Ensembles An Introduction 1st Edition Charu C. Aggarwal PDF All Chapters
100% (3)
Complete Download Outlier Ensembles An Introduction 1st Edition Charu C. Aggarwal PDF All Chapters
65 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Fraud Call Detection
No ratings yet
Fraud Call Detection
9 pages
Stock Price Preduction Report
No ratings yet
Stock Price Preduction Report
4 pages
AI ML Assignment II A Section AY 24 25
No ratings yet
AI ML Assignment II A Section AY 24 25
3 pages
Contemporary Machine Learning Applications in Agriculture
No ratings yet
Contemporary Machine Learning Applications in Agriculture
36 pages
fakejobdett
No ratings yet
fakejobdett
9 pages
Fraud Detection in Banking Data by Machine Learning
No ratings yet
Fraud Detection in Banking Data by Machine Learning
11 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
A Digital Twin Framework For Aircraft Hydraulic Systems Failure Detection Using Machine Learning Techniques
No ratings yet
A Digital Twin Framework For Aircraft Hydraulic Systems Failure Detection Using Machine Learning Techniques
53 pages
Machine Learning Algorithm For Financial Fruad Detection
100% (1)
Machine Learning Algorithm For Financial Fruad Detection
25 pages