0% found this document useful (0 votes)
22 views23 pages

T1 ML QB Soln

The document provides an in-depth explanation of training, testing, and validation datasets in machine learning, highlighting their roles in model training and evaluation. It also covers cross-validation techniques, evaluation parameters like accuracy and precision, and details on Support Vector Machines (SVM) including its types and kernel functions. Additionally, it discusses the Hebbian rule in unsupervised learning, its significance, and limitations.

Uploaded by

stashbox45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views23 pages

T1 ML QB Soln

The document provides an in-depth explanation of training, testing, and validation datasets in machine learning, highlighting their roles in model training and evaluation. It also covers cross-validation techniques, evaluation parameters like accuracy and precision, and details on Support Vector Machines (SVM) including its types and kernel functions. Additionally, it discusses the Hebbian rule in unsupervised learning, its significance, and limitations.

Uploaded by

stashbox45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

T1 ML QB Soln

1.​ Explain training , testing and validation datasets in detail.


Training Dataset
●​ A training dataset teaches a machine learning model to process
information by recognizing patterns, relationships, and structures
within the data.
●​ It consists of input data used to train a model, allowing it to learn and
refine rules for making predictions on unseen data.
●​ The higher the quality of the training data, the better the algorithm will
perform.
●​ Training data is often the largest subset, approximately 60-80%, of the
original dataset.

●​ For example, in training a sentiment analysis model, training data


could include labeled text to help the model understand and predict
sentiments

Testing dataset
●​ A testing dataset is a separate set of data used to evaluate the
performance of a fully trained machine learning model.
●​ It acts as "unseen" data to provide a real-world check, confirming the
effectiveness and accuracy of the algorithm.
●​ The testing data should reflect the actual data the model will encounter
and be large enough to produce meaningful predictions.
●​ Often, a dataset is split into 80% for training and 20% for testing.
●​ For example, after training a model to classify images of cats and dogs,
the testing dataset would consist of new images the model has never
seen.
●​ The model's predictions on these images are then compared to the
actual labels to assess its accuracy.

Validation Dataset
●​ A validation dataset is a set of data used to fine-tune a machine learning
model with the goal of finding and optimizing the best model to solve a
problem.
●​ It provides an unbiased evaluation of a model fit on the training data
while tuning the model's hyperparameters.
●​ This helps to prevent overfitting, which is when a model memorizes
training data patterns but struggles to make accurate predictions on
new, unseen data.
●​ Validation sets are also known as development or dev sets.
●​ For example, in training an artificial neural network, the validation
dataset can be used to optimize the number of hidden units in each
layer.
●​ By evaluating the model's performance on the validation set, machine
learning engineers can adjust hyperparameters to improve the model's
generalization ability.
2.​Explain the concept of cross validation.
What are different types of cross-validation methods?
●​ Cross-validation is a technique used to evaluate machine learning
models on a limited data sample.
●​ It assesses how well a model generalizes unseen data by training and
testing it on different portions of the data.
●​ This helps to prevent overfitting, which occurs when a model performs
well on training data but poorly on new data.
●​ Cross-validation is a powerful tool for selecting the best model for a
specific task.
●​ Cross-validation helps identify if a model is overfitting by recognizing
patterns not tied to specific data partitioning.
●​ It provides a more accurate estimate of a model’s ability to perform on
new data by providing an unbiased estimate of the generalization error.

Here's how cross-validation works:


●​ The dataset is split into multiple subsets, known as folds.
●​ The model is trained on a portion of the data and tested on the
remaining portion.
●​ The process is repeated, rotating the training and test data to ensure
each data point is used for both training and testing at least once.
●​ The results are combined (e.g., averaged) to estimate the model's
predictive performance.

Methods of Cross-Validation:
●​ K-Fold
Cross-Validation:
○​ Divides the input
dataset into K groups (folds)
of equal sizes.
○​ The model is
trained on K-1 folds and
tested on the remaining
fold.
○​ This process is
repeated K times, with a
different fold reserved for
evaluation each time.

●​ Leave-One-Out Cross-Validation:
○​ A single data point is left out of the training data, and the
remaining data is used to train the model.
○​ This process is repeated for each data point.
●​ Leave-P-Out Cross-Validation:
○​ P datasets are left out of the training data.
○​ If there are n data points in the original dataset, then n-p data
points are used as the training dataset and the p data points as the
validation set.
○​ This is repeated for all samples, and the average error is calculated
to know the effectiveness of the model.
●​ Validation Set Approach:
○​ The input dataset is divided into a training set and a validation set.
○​ Both subsets are given 50% of the dataset.
●​ Stratified k-fold cross-validation
○​ Split the dataset into K equal (or nearly equal) folds.
○​ Stratify by class to ensure each fold has the same proportion of
classes as the entire dataset.
○​ Train the model on K-1 folds.
○​ Test the model on the remaining fold.
○​ Repeat steps 3-4 K times, using each fold as the test set once.
○​ Average the performance metrics across all K iterations.
3.​ Explain the following evaluations parameters for ML models
a) Accuracy, b)Recall, c)Precision, d)Specificity, e)F1 Score, f)RMSE ,
g)Confusion Matrix
●​ Accuracy:
○​ Measures the proportion of correct predictions out of the total
predictions made.
○​ It is calculated as:

●​ Where:
●​ TP = True Positives
●​ TN = True Negatives
●​ FP = False Positives
●​ FN = False Negatives

●​ Recall (Sensitivity or True Positive Rate):


○​ Measures the ability of a model to identify all relevant instances12.
It answers the question: "Of all the actual positive instances, how
many did the model correctly predict as positive?"
○​ It is calculated as:
●​ Precision:
○​ Assesses the accuracy of positive predictions.
○​ It answers the question: "Of all the instances the model predicted
as positive, how many were actually positive?"
○​ It is calculated as:

●​ Specificity:
○​ Measures the ability of a model to correctly identify negative
instances.

●​ F1 Score:
○​ The harmonic mean of precision and recall.
○​ It balances the trade-off between precision and recall, especially
useful when class distribution is imbalanced2.
○​ It is calculated as:
●​ RMSE (Root Mean Squared Error):
○​ Measures the error between two datasets2.
○​ In other words, it compares an observed or known value and a
predicted value.

●​ Confusion Matrix:
○​ A table that summarizes the performance of a classification model
by displaying the counts of true positive, true negative, false
positive, and false negative predictions.
4.​Explain SVM in detail with a neat diagram.

●​ Support Vector Machine (SVM) is a supervised learning algorithm that


is effective for classification, regression, and outlier detection.
●​ Primarily, SVMs are used for classification problems.
●​ Core Idea:
○​ SVMs aim to find the optimal hyperplane that separates data into
distinct classes.
○​ The goal is to create the best decision boundary to easily categorize
new data points in the future.
○​ SVMs are particularly effective in binary classification problems.
●​ Hyperplane:
○​ The decision boundary is called a hyperplane.
○​ The dimensions of the hyperplane depend on the number of
features in the dataset.
○​ For example, with two features, the hyperplane is a straight line;
with three features, it's a 2-dimensional plane.
○​ SVM algorithms maximize the margin, which refers to the
maximum distance between the data points.
●​ Support Vectors:
○​ Support vectors are the data points closest to the hyperplane, and
they significantly influence the hyperplane's position.
○​ If these points are removed, it would alter the position of the
dividing hyperplane.
●​ How SVM Works:
○​ The SVM algorithm finds the best line or decision boundary to
separate classes.
○​ For non-linear data, SVM adds more dimensions to separate the
data points.

Types of SVM:
●​ Linear SVM:
○​ Uses a linear kernel to create a straight-line decision boundary.
○​ Effective when data is linearly separable or when a linear
approximation is sufficient.
●​ Nonlinear SVM:
○​ Uses kernel functions (polynomial, Gaussian/RBF, sigmoid) to
map data into a higher-dimensional feature space, where a linear
decision boundary can be found.
●​ Support Vector Regression (SVR):
○​ An extension of SVM designed for regression tasks.
○​ It models the relationship between input features and continuous
output values.
●​ One-Class SVM:
○​ Used for outlier and anomaly detection.
○​ It identifies whether new data points belong to the defined class or
are outliers.
●​ Multiclass SVM:
○​ SVMs are binary classifiers but can be used for multiclass
classification using methods like One-vs-One (OvO) or One-vs-All
(OvA).

Advantages of SVMs Disadvantages of SVMs

Effective in high-dimensional Sensitive to feature scaling


spaces
Effective when the number of Choosing Kernel functions and
dimensions > number of regularization terms is crucial in
samples high-dimensional data

Memory efficient (use of support Computationally expensive (especially


vectors) with large datasets)

Versatile with Kernel functions Difficult to interpret the model in


complex cases

Robust to overfitting with Limited performance with noisy data


appropriate regularization

Effective for both classification Works well only in Binary classification


and regression tasks

●​ What is a Kernel in SVM ?


○​ It is a mathematical function that helps organize data and make it
easier to classify.
○​ The primary goal of an SVM is to find a hyperplane that best
separates different classes of data points.
○​ However, in many real-world scenarios, the data is not linearly
separable in the original feature space.
○​ Kernels help by implicitly mapping the original feature space into a
higher-dimensional space where the data might be more easily
separable.
○​ The original feature space is the space defined by the input
features of the data. Each dimension in this space represents one
feature.
○​ For example, if your data has two features (say, height and
weight), the original feature space will be two-dimensional.
Kernel Description
Type

Linear ●​ The linear kernel is the simplest and most commonly


Kernel used kernel.
●​ It is used when the data is linearly separable, meaning a
straight line (or hyperplane in higher dimensions) can
separate the classes.
●​ It is defined as the dot product of two input vectors.

Polynomia ●​ The polynomial kernel allows the model to fit a


l Kernel non-linear decision boundary.
●​ It computes the dot product of input vectors raised to a
certain power (degree).
●​ It is useful when the decision boundary is not a straight
line but can be represented by a polynomial curve.

Radial ●​ Also known as the Gaussian kernel, the RBF kernel is


Basis commonly used for non-linear classification.
Function
(RBF) ●​ It maps input features into an infinite-dimensional space
Kernel using the exponential function, allowing the algorithm to
handle highly complex decision boundaries.

Sigmoid ●​ The sigmoid kernel is based on the sigmoid function


Kernel (similar to the activation function used in neural
networks).
●​ It can be used to map input vectors into a feature space
where the decision boundary is non-linear.
●​ However, it is less commonly used because it can behave
unpredictably depending on the parameters.

Custom ●​ Users can define custom kernels based on the specific


Kernel problem at hand.
●​ This allows flexibility to create kernel functions that best
represent the relationship between data points.
●​ Custom kernels are often based on domain-specific
knowledge.
5.​Explain the Hebbian rule and its significance in unsupervised learning.
●​ Hebbian Rule is a foundational theory in neuroscience and artificial
intelligence that describes how synaptic connections between neurons
strengthen through repeated activation.
●​ The principle is often summarized as "neurons that fire together, wire
together," indicating that simultaneous activation of neurons leads to
enhanced synaptic efficacy between them.

Key Principles of Hebbian Learning


●​ Synaptic Strengthening: When a presynaptic neuron (cell A) repeatedly
stimulates a postsynaptic neuron (cell B), the synaptic connection
between them becomes stronger. This process is crucial for learning
and memory formation12.

Weight Adjustment Formula: The weight wij of the connection from


neuron j to neuron i can be mathematically expressed as:
Wij = xi * xj
where xi and xj are the activations of neurons i and j, respectively.
This formula indicates that weights increase when both neurons are
activated simultaneously and decrease when they are not.

●​ Learning Dynamics: Hebbian learning operates under the premise that


if two neurons are activated together, their connection strengthens;
conversely, if they are activated at different times, their connection
weakens.
●​ This dynamic allows for the adaptation of neural networks based on
input stimuli without requiring explicit supervision or feedback.

Significance in Unsupervised Learning


●​ Hebbian learning is particularly significant in the context of
unsupervised learning for several reasons:
●​ Statistical Learning: It enables neural networks to learn the statistical
properties of input data without labeled examples. By adjusting weights
based solely on the correlation of inputs, Hebbian learning captures
underlying patterns in the data autonomously.
●​ Principal Component Analysis (PCA): Hebbian rules can be related to
PCA, where the learning process extracts significant features from the
data by identifying directions of maximum variance. This relationship
illustrates how Hebbian learning can be utilized for dimensionality
reduction and feature extraction in unsupervised contexts.
●​ Biological Plausibility: The Hebbian rule reflects biological processes
occurring in the brain, providing insights into how real neural networks
might learn from their environment. This connection enhances its
applicability in developing artificial neural networks that mimic
cognitive functions.
●​ Foundation for Advanced Models: Many modern neural network
architectures, including deep learning models, incorporate principles
derived from Hebbian learning to enhance their performance in tasks
such as clustering, pattern recognition, and feature learning without
requiring labeled datasets
6.​What are the limitations of Hebbian Rule ?
Inability to Account for All Forms of Plasticity:
●​ Hebbian theory does not encompass all types of synaptic long-term
plasticity.
●​ It primarily focuses on excitatory synapses and does not provide
predictions for inhibitory synapses or anti-causal spike sequences,
where the presynaptic neuron fires after the postsynaptic neuron.

Instability of Weight Adjustments:


●​ The basic Hebbian rule can lead to instability in weight adjustments.
●​ In networks where a presynaptic neuron consistently excites a
postsynaptic neuron, the weights can increase or decrease
exponentially over time, causing runaway excitation or inhibition.
●​ This instability necessitates additional mechanisms to limit weight
growth, such as Oja's rule or BCM theory.

Lack of Mechanisms for Homeostatic Plasticity:


●​ Hebbian learning does not incorporate homeostatic plasticity, which is
crucial for maintaining overall neural network stability.
●​ Homeostatic mechanisms help neurons adjust their excitability to
prevent excessive firing rates that could arise from Hebbian learning
alone.

Dependence on Correlated Inputs:


●​ The effectiveness of Hebbian learning is contingent upon the
correlation between input patterns.
●​ It performs well with orthogonal or uncorrelated inputs but struggles
with correlated inputs, which can lead to suboptimal learning
outcomes4.

Neglect of Neighboring Synapses:


●​ Traditional Hebbian models focus on direct connections between pairs
of neurons but do not account for modifications that may occur at
neighboring synapses due to shared activity patterns.
●​ This oversight limits the model's ability to capture more complex
interactions within neural circuits1.

`Volume Learning Exclusion:


●​ The classic Hebbian framework does not include volume learning,
which involves diffuse synaptic modifications influenced by
retrograde signaling (e.g., nitric oxide).
●​ This type of plasticity can affect multiple neurons simultaneously and
is not explained by simple pairwise correlations
7.​ Implement bipolar AND gate function using Hebb’s rule.

Step 1: Define Inputs and Outputs in Bipolar Form

In bipolar representation:

●​ 1 → True
●​ -1 → False

X1 X2 AND output y

-1 -1 -1

-1 1 -1

1 -1 -1

1 1 1

Step 2: Hebbian Learning Rule

Hebbian learning updates weights using:

Δwi=xiy

New weights:

wi=wi+Δwi
Bias update:
Δb=y,b=b+Δb
Step 3: Initialize Weights and Bias
Start with:
w1​=0,w2​=0,b=0
Step 4: Compute Weight and Bias Updates
For each input-output pair:
Step 5: Final Weights and Bias
w1​=2,w2​=2,b=−2

Step 6: Decision Function


Net input=w1​x1​+w2​x2​+b

If Net input > 0, output = 1, else output = -1.


8.​Explain EM algorithm along with its application.

●​ The Expectation-Maximization (EM) algorithm is a recursive method


used to find the maximum likelihood estimators of model parameters
when the model includes unobservable variables (also called latent
variables).
●​ It's a natural generalization of maximum likelihood estimation to the
incomplete data case.
●​ The EM algorithm is used to find local maximum likelihood parameters
of a statistical model in cases where the equations cannot be solved
directly.

●​ How it Works:
●​ The EM algorithm works by iteratively applying two steps:
○​ Expectation Step (E step): Estimate the expected values of the
latent variables, given the observed data and current parameter
estimates.
○​ Maximization Step (M step): Find the parameters that maximize
the expected log-likelihood, using the completed data from the E
step.
●​ These steps are repeated until the parameter updates are smaller than a
pre-specified threshold, indicating convergence.

●​ Applications:
○​ The EM algorithm is used in machine learning to estimate missing
data in latent variables through observed data in datasets.
○​ Some applications include:
■​ Gaussian Mixture Model (GMM) Maximum Likelihood
Estimation.
■​ Mode of the posterior marginal distribution of parameters in
machine learning and data mining applications.

You might also like