MLL
MLL
Supervised learning is a type of machine learning where a model learns from a 9. Monitoring & Updates: Track performance and retrain periodically
labeled dataset — each input is paired with a correct output. The goal is for the with new data.
model to generalize from the training data to make accurate predictions on unseen 8. What are the properties of SVM? Discuss max-margin intuition.
data. Properties of SVM:
Example: Effective in high-dimensional spaces.
Consider predicting house prices. You are given a dataset where each house has
features like square footage, number of rooms, and location, along with the actual Can handle linear and non-linear data using kernel functions.
selling price. The algorithm learns a mapping from features to price and can then Robust to overfitting, especially in high dimensions.
predict prices for new houses. Only relies on support vectors (data points closest to the hyperplane).
2. What is the Bayes Theorem? Write its formula. Max-Margin Intuition:
Bayes Theorem is a mathematical formula used to determine the probability of a SVM aims to find the hyperplane that maximizes the margin, i.e., the distance
hypothesis given some observed evidence. It updates the probability of a hypothesis between itself and the nearest data points from each class. A larger margin implies
as more evidence or information becomes available. better generalization to unseen data.
Formula: 9. Define Reinforcement Learning with an example.
P(A∣B)=P(B∣A)⋅P(A)P(B) Reinforcement Learning (RL) is a learning paradigm where an agent learns to
Where: make decisions by interacting with an environment. The agent receives rewards or
P(A∣B)P(A|B): Posterior probability (probability of A given B) penalties for its actions and learns to maximize cumulative reward.
Example:
P(B∣A)P(B|A): Likelihood (probability of B given A) A robot learning to navigate a maze.
P(A)P(A): Prior probability of A The robot (agent) tries different paths (actions).
P(B)P(B): Probability of B Each move leads to a new position (state).
This theorem is fundamental in probabilistic reasoning, including the Naïve Bayes
algorithm. It gets a reward for reaching the exit and penalties for hitting walls.
3. Differentiate between classification and regression. Over time, it learns the optimal path by trial and error.
Feature Classification Regression 12. Define the three components: Task (T), Performance Measure (P),
Experience (E). Illustrate with an example.
Output Discrete categories (e.g., Continuous values (e.g., In machine learning, the learning problem is described using:
Type spam/not spam) temperature, price)
Task (T): What the system is trying to do.
Goal Predict class labels Predict numeric values
Performance Measure (P): How success is measured.
Accuracy, precision, recall, F1-
Evaluation
score
MSE, RMSE, R² Experience (E): The data the system learns from.
Example:
Decision Trees, SVM, Naïve Linear Regression, SVR, Ridge Building a spam detection model:
Algorithms
Bayes Regression
T: Classify emails into spam and not spam.
Classifying if an email is spam Predicting the price of a car based
Example
or not on features
P: Accuracy, F1-score.
E: A labeled dataset of emails with known classifications.
4. What does the term "hyperplane" mean in SVM?
13. Compare Naïve Bayes and Logistic Regression for binary
In Support Vector Machines (SVMs), a hyperplane is a decision boundary that
separates different classes of data. classification tasks.
In 2D space, it’s a line. Feature Naïve Bayes Logistic Regression
In 3D space, it’s a plane. Features are conditionally Linear relationship between
Assumptions
independent features and log-odds
In higher dimensions, it’s a hyperplane.
SVM selects the hyperplane that best separates the classes by maximizing the Type Probabilistic classifier Discriminative classifier
margin — the distance between the hyperplane and the nearest data points from Slower due to iterative
Training Speed Very fast
each class (support vectors). optimization
5. Compare decision trees and Bayesian networks in terms of Works well with small Better with larger, more complex
interpretability. Performance
datasets and text data datasets
Aspect Decision Trees Bayesian Networks Struggles if features are
Robustness Handles correlated features better
Very high — decisions can Moderate — requires correlated
Interpretability
be traced as rules understanding of probabilities Also interpretable through
Interpretability Easy to interpret probabilities
Easy to visualize as a tree Complex — visualized as a coefficients
Visualization
structure probabilistic graph Summary:
User Friendly Suitable for non-experts
Requires knowledge of Use Naïve Bayes when speed is crucial and feature independence holds.
probability theory Use Logistic Regression when features are correlated or more complex
Example Use Medical diagnosis under relationships need modeling.
Customer churn prediction
Case uncertainty a. Define Entropy in Decision Trees.
Decision trees are more interpretable because you can follow a path from root to Entropy is a metric used in decision trees (especially ID3) to measure the amount
leaf to see how a decision was made. Bayesian networks model conditional of uncertainty or impurity in a dataset. It helps determine the best attribute to split
dependencies, which can be less intuitive. the data at each node.
6. How does logistic regression work for binary classification? If all examples belong to the same class, entropy is 0 (pure).
Logistic Regression models the probability that a data point belongs to a particular If the examples are evenly mixed, entropy is 1 (maximum impurity).
class. It uses the sigmoid (logistic) function to squash the output of a linear Mathematically:
equation between 0 and 1. Entropy(S)=−∑i=1pilog2pi
Formula: Where pip_i is the probability of class ii in dataset SS.
P(y=1∣x)=1/1+e−(wTx+b)
If P(y=1∣x)>0.5P(y=1|x) > 0.5, classify as class 1; otherwise class 0. Entropy is used in conjunction with Information Gain to choose the best attribute
It’s trained by optimizing the log-loss (cross-entropy) function. for splitting the data.
It assumes linearity in the log-odds space. b. What is Perceptron? Explain.
7. Describe the steps to design a learning system for a spam detector. A Perceptron is a type of artificial neuron and a fundamental building block of
1. Problem Definition: Classify emails as spam or not. neural networks. It is used for binary classification tasks and is inspired by the
2. Data Collection: Collect a labeled dataset of emails with "spam" or "not human brain.
spam" labels. Components:
3. Preprocessing: Clean the text (remove HTML, punctuation), tokenize Inputs (x₁, x₂, ..., xₙ)
words, remove stop words. Weights (w₁, w₂, ..., wₙ)
4. Feature Engineering: Extract features using Bag of Words, TF-IDF, or
embeddings. Bias (b)
5. Model Selection: Choose an algorithm like Naïve Bayes or Logistic Activation function (typically step or sign function)
Regression. Function:
6. Training: Fit the model to the training data. y=activation(∑i=1wixi+b)
7. Evaluation: Use metrics like accuracy, precision, recall, and F1-score If the output is greater than 0, it predicts class 1; otherwise class 0.
on test data.
Limitation: It can only solve linearly separable problems. It was later extended to This is the core principle behind machine learning: learning patterns that generalize
Multilayer Perceptrons to handle complex problems. well.
c. Define Markov Decision Process (MDP). j. Explain Structure and Working of Multilayer Perceptron.
An MDP provides a mathematical framework for modeling decision-making where Multilayer Perceptron (MLP) is a type of feedforward neural network that
outcomes are partly random and partly under the control of a decision-maker. consists of:
Components: Input layer – Accepts the features.
1. States (S) – All possible situations the agent can be in.
2. Actions (A) – All actions available to the agent. Hidden layers – One or more layers where learning happens.
3. Transition Model (P) – Probability of moving to a new state s′s' given Output layer – Produces the final prediction.
state ss and action aa: P(s′∣s,a)P(s'|s, a) Each layer uses weights, biases, and activation functions (ReLU, Sigmoid, etc.).
4. Reward Function (R) – Immediate reward received after transitioning. Working:
5. Discount Factor (γ) – Degree of importance of future rewards (0 ≤ γ ≤ 1. Inputs are fed forward layer by layer.
1). 2. The weighted sum and activation functions compute neuron outputs.
MDPs are the basis for Reinforcement Learning. 3. The error is calculated at the output.
d. What is Genetic Mutation in GA? 4. Backpropagation is used to update weights by propagating the error
In Genetic Algorithms (GA), mutation is a genetic operator used to maintain backward.
diversity in the population and prevent premature convergence. MLPs can solve non-linear problems and are the foundation of deep learning.
Mutation: k. Explain Reinforcement Learning with Key Components and
Randomly alters one or more genes (bits) in a chromosome. Example.
Reinforcement Learning (RL) is a feedback-based learning approach where an
Helps explore new solutions that may not be reachable through agent learns to make decisions by interacting with an environment.
crossover alone. Key Components:
Example:
Chromosome: 101101 Agent: Learner and decision-maker.
Mutation (flip bit 3): 100101 Environment: The world the agent interacts with.
Mutation ensures the GA does not get stuck in local optima and promotes States (S): Configurations of the environment.
exploration.
Actions (A): Choices available to the agent.
e. Explain ID3 Algorithm with Example.
ID3 (Iterative Dichotomiser 3) is a decision tree algorithm developed by Ross Rewards (R): Feedback from the environment.
Quinlan that uses Information Gain to split data. Policy (π): Mapping from states to actions.
Steps: Value Function (V): Expected return from a state.
1. Calculate entropy of the dataset. Example:
2. For each attribute, calculate the Information Gain: A robot learning to walk:
IG(S,A)=Entropy(S)−∑v∈Values(A)∣Sv∣∣S∣Entropy(Sv)
Choose the attribute with the highest gain.
State: Its current balance and position.
3. Recursively apply to each subset. Action: Move left, right, or forward.
Example: Reward: +1 for moving correctly, -1 for falling.
Dataset: Decide "Play Tennis" based on Outlook, Temperature, etc. Over time, it learns to walk by maximizing cumulative reward.
Outlook = {Sunny, Overcast, Rain}
l. Explain Deep Learning Architectures.
Entropy is computed, and the attribute with the highest information gain (e.g.,
Deep Learning Architectures are composed of multiple layers that transform data
Outlook) is chosen for the root.
through learnable parameters.
f. What is the Role of Activation Function in CNN? Types:
In a Convolutional Neural Network (CNN), the activation function introduces 1. Feedforward Networks (MLP): Basic architecture, good for structured
non-linearity into the system, enabling it to model complex patterns like edges, data.
textures, and shapes. 2. Convolutional Neural Networks (CNN): For image processing, uses
Common Activation Functions: filters to detect features.
ReLU (Rectified Linear Unit): f(x)=max (0,x)f(x) = \max(0, x) 3. Recurrent Neural Networks (RNN): For sequential data like time
Sigmoid: Used for binary classification series or language.
4. Long Short-Term Memory (LSTM): Variant of RNN for long-term
Tanh: Used for values between -1 and 1 dependencies.
Without activation functions, CNNs would behave like linear regressors and not
5. Autoencoders: For feature learning and dimensionality reduction.
learn complex features.
6. GANs (Generative Adversarial Networks): Generate new data via a
g. Describe Q-Learning with Pseudo Code. generator and discriminator.
Q-Learning is a reinforcement learning algorithm that learns the value (Q-value) 7. Transformers: Used in NLP, based on self-attention mechanisms (e.g.,
of taking a certain action in a given state. BERT, GPT).
Update Rule: Each architecture is suited to different kinds of data and tasks, and many can be
Q(s,a)=Q(s,a)+α[r+γmax a′Q(s′,a′)−Q(s,a)] combined in hybrid systems.
Where:
α\alpha: learning rate
γ\gamma: discount factor
rr: reward
s,as, a: current state and action
s′s': next state
Pseudo Code:
Initialize Q(s, a) arbitrarily
For each episode:
Initialize state s
Repeat:
Choose a from s using ε-greedy
Take action a, observe r and s'
Q(s, a) ← Q(s, a) + α [r + γ * max Q(s', a') - Q(s, a)]
s ← s'
until s is terminal
h. What are the Components of Genetic Algorithms?
1. Population – Set of potential solutions (chromosomes).
2. Selection – Selects the fittest individuals to be parents.
3. Crossover (Recombination) – Combines two parents to produce
offspring.
4. Mutation – Introduces random variation.
5. Fitness Function – Evaluates how good a solution is.
6. Termination Condition – When to stop (e.g., after N generations).
These components mimic natural evolution to find optimal or near-optimal
solutions.
i. What is Inductive Inference in Decision Trees?
Inductive Inference refers to the process of learning general rules from specific
examples. In decision trees, it involves generalizing from a training dataset to
produce a tree that can classify new, unseen data accurately.