0% found this document useful (0 votes)
8 views11 pages

ML Unit3 QB Solutions

The document explains two ensemble methods: Bagging and Boosting. Bagging builds multiple independent models in parallel using random samples and combines their predictions, while Boosting sequentially trains models to correct errors from previous ones. It also covers decision trees, regression models in exploratory data analysis, Gaussian Mixture Models, unsupervised learning, K-Nearest Neighbor algorithm, Weighted K-Means, and various methods to combine classifiers.

Uploaded by

r49793756
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

ML Unit3 QB Solutions

The document explains two ensemble methods: Bagging and Boosting. Bagging builds multiple independent models in parallel using random samples and combines their predictions, while Boosting sequentially trains models to correct errors from previous ones. It also covers decision trees, regression models in exploratory data analysis, Gaussian Mixture Models, unsupervised learning, K-Nearest Neighbor algorithm, Weighted K-Means, and various methods to combine classifiers.

Uploaded by

r49793756
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1.

Explain about Boosting and Bagging

A) Bagging stands for Bootstrap Aggregating. It builds multiple independent models in


parallel using random samples of the data, and combines their predictions (usually via voting
or averaging).

How it Works:

Bootstrap sampling: Create multiple datasets by randomly sampling (with


replacement) from the original dataset.

Train a model (usually a decision tree) on each dataset.

Aggregate predictions:

a. Classification → Majority vote


b. Regression → Average

Advantages:

• Easy to parallelize
• Reduces overfitting
• Handles high variance well

Disadvantages:

• May not improve bias


• Requires more computational resources

Boosting is a sequential ensemble method where each new model tries to correct the errors
made by previous ones.

How it Works:

1. Train the first weak learner on the data.


2. Evaluate performance and assign higher weights to misclassified points.
3. Train the next learner to focus more on the hard examples.
4. Repeat for multiple rounds.
5. Combine predictions by weighted voting or summation.

Advantages:

• High prediction accuracy


• Works well with imbalanced datasets
• Reduces both bias and variance

Disadvantages:

• Slower to train (sequential)


• Prone to overfitting if not regularized
• Harder to parallelize

2. Build the stucture of decision tree with an example?

A) A decision tree is a supervised learning algorithm used for both classification


and regression. It works by splitting the data into subsets based on the feature
that results in the maximum information gain (for classification) or minimum
variance (for regression).

3. Component Description

Root Node The topmost node that represents the feature to split first

Internal Nodes Nodes where data is split further

Leaf Nodes (Terminal Nodes) Nodes that represent the output class or value

Branches Paths from one node to another based on decisions

📘 Example Problem: Weather & Play Tennis

Dataset:

Outlook Temperature Humidity Wind Play Tennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Overcast Cool Normal Strong Yes

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

Sunny Mild Normal Strong Yes

Overcast Mild High Strong Yes


Outlook Temperature Humidity Wind Play Tennis

Overcast Hot Normal Weak Yes

Rain Mild High Strong No

🏗️ Decision Tree Structure (Built using ID3 Algorithm)

Using information gain, we find that Outlook is the best feature to split on first.

yaml
CopyEdit
[Outlook]
/ | \
Sunny Overcast Rain
/ | \
[Humidity] Yes [Wind]
/ \ / \
High Normal Weak Strong
No Yes Yes No

Root Node = Outlook (highest information gain)

• Sunny:
o Best split: Humidity
▪ High → No
▪ Normal → Yes
• Overcast: All samples say Yes
• Rain:
o Best split: Wind
▪ Weak → Yes
▪ Strong → No

3.How does the structure of decision tree help in classifying a data instance?

A) A decision tree classifies a data instance by using a tree-like model of decisions. Each
internal node represents a test on a feature, each branch represents the outcome of the test,
and each leaf node represents a class label or output value.

How It Works: Classification Process

Steps:

1. Start at the root node of the tree.


2. Check the feature specified at that node in the data instance.
3. Follow the branch that corresponds to the value of that feature.
4. Repeat the process at the next node.
5. When a leaf node is reached, assign the class label at that leaf to the instance.
Example:

Given Data Instance:

Python

CopyEdit
{
'Outlook': 'Rain',
'Temperature': 'Cool',
'Humidity': 'Normal',
'Wind': 'Strong'
}
Decision Tree (simplified):

yaml

CopyEdit
[Outlook]
/ | \
Sunny Overcast Rain
/ | \
[Humidity] Yes [Wind]
/ \ / \
High Normal Weak Strong
No Yes Yes No
Classification Path:

1. Outlook = Rain → Go to the right subtree


2. At node Wind: Wind = Strong → Follow the right branch
3. Reach leaf → Class = No

So, the instance is classified as No (will not play tennis).

Why the Tree Structure Helps


Tree Feature How It Helps

Hierarchical structure Makes decisions in steps (like a flowchart), simplifying complex problems.

Feature-based splits Each decision is based on one feature, making classification interpretable.

Path to leaf Represents a logical rule (IF... THEN...), providing transparency.

Leaves hold predictions Final decision is stored in leaves — easy to read and apply.

Advantages of Using a Decision Tree for Classification

• Fast Inference: Once built, trees classify in O(depth) time.


• Human-readable rules: Easy to interpret and explain.
• Handles both categorical and numerical features.
• No need for feature scaling (e.g., normalization not required).

4. What is the role of regression model in exploratory data analysis?


A) Exploratory Data Analysis (EDA) is mostly about understanding the structure,
patterns, and relationships in data, regression models can play a helpful supporting
role by uncovering trends and suggesting relationships between variables.

Why Use Regression During EDA?

Regression is not just for prediction — during EDA, it serves as a quantitative tool
to:

Purpose Description

Understand how one variable changes in response to another (e.g., sales


Quantify relationships
vs. price).

Identify important
Spot which variables significantly affect the target.
predictors

Detect trends or patterns Linear or non-linear trends can emerge from regression lines.

Expose outliers Residuals from regression help highlight abnormal data points.

Check assumptions Variance, linearity, multicollinearity, and normality can be assessed.

5. Explain about Gaussian Mixture Models.

A) Imagine a dataset of flowers. You have a mix of roses, tulips, and daisies. While they're
all flowers, they have distinct characteristics like color, size, and petal shape. A Gaussian
Mixture Model (GMM) is a statistical model that can identify these underlying groups or
clusters within the data.
How does a GMM work?
1. Assume a Mixture of Gaussians: We assume that our data is generated from a mixture
of multiple Gaussian distributions. Each Gaussian distribution represents a cluster or
group within the data.
2. Estimate Parameters: The GMM model estimates the parameters of each
Gaussian component:
- Mean: The centre of the cluster.
- Covariance matrix: The shape and orientation of the cluster.

- Mixing coefficient: The proportion of data points belonging to that cluster.


3. Iterative Process: The model uses an iterative process called Expectation-
Maximization (EM) to find the best-fitting parameters.
- Expectation Step (E-step): Assigns each data point to a Gaussian component based on
the current parameter estimates.
- Maximization Step (M-step): Updates the parameters of each Gaussian component
to better fit the assigned data points.
4. Clustering: Once the model converges, the data points are assigned to the
Gaussian component with the highest probability.
Why Use GMMs?
Soft Clustering: Unlike traditional clustering methods like K-means, GMMs assign
probabilities to each data point belonging to different clusters.
Modelling Complex Data: GMMs can model complex data distributions that are not
well- represented by single Gaussian distributions.
Density Estimation: GMMs can be used to estimate the probability density function
of the data.
Anomaly Detection: Identifying outliers or anomalies in data.

6. Explain the concept of Unsupervised Learning and discuss its significance with examples.

A) A type of machine learning where the algorithm is trained on data that does not
have labelled responses. The goal is to identify hidden patterns or intrinsic
structures in the input data.
Examples of unsupervised learning tasks include clustering, where the goal is to
group similar items together, and association, where the goal is to find rules that
describe large portions of the data.
Significance of Unsupervised Learning

✅ 1. Discover Hidden Patterns

• Reveals natural groupings or structures that are not immediately obvious.


• Helps understand the underlying distribution of the data.

✅ 2. Data Exploration and Preprocessing

• Useful for feature selection, noise reduction, or visualization (e.g., with PCA or t-
SNE).

✅ 3. Customer Segmentation

• In marketing, businesses can segment customers into distinct groups without prior
labels.

✅ 4. Anomaly Detection

• Detect unusual patterns or outliers in financial transactions, cybersecurity, etc.

✅ 5. Recommendation Systems

• Find user/item similarity without labeled preferences (e.g., collaborative filtering).

Types of Unsupervised Learning


Clustering

Dimensionality Reduction

Association Rule Mining

Anomaly Detection

Examples of Unsupervised Learning

Example 1: Clustering (K-Means)

Use Case:

Segment customers based on behavior.


How it works:

• Algorithm groups customers into k clusters based on similarity in features (age,


income, purchase history).
• The company can target different groups with tailored marketing strategies.

Example 2: Dimensionality Reduction (PCA)

Use Case:

Visualize high-dimensional data.

How it works:

• PCA reduces 100 features down to 2 or 3 for plotting and exploration.


• Used in image compression, gene expression analysis, etc.

Example 3: Association Rules

Use Case:

Market basket analysis.

How it works:

• Discover rules like: If a customer buys bread and butter, they are likely to buy jam.
• Helps in cross-selling and product placement.

7. What is K nearest neighbor algorithm in ML


A) K-Nearest Neighbor (KNN) is a simple, non-parametric, and lazy learning algorithm
used for classification and regression. Despite its simplicity, it is very powerful and
widely used in pattern recognition and data mining.
Key Concepts of KNN

Feature Description

Type Supervised learning algorithm

Usage Classification and Regression

Learning style Instance-based (lazy learning)

Assumption Similar data points exist in close proximity (distance-based learning)


How KNN Works

1. Choose the number of neighbors (k).


2. Calculate the distance between the test point and all training data points (commonly
using Euclidean distance).
3. Sort the distances and identify the k nearest neighbors.
4. Make a prediction:
o Classification: Use majority vote of the neighbors' labels.
o Regression: Take the average of the neighbors' values.
5. Return the predicted class or value.

8. Explain weighted K-Means Algorithm with an example.

A) Weighted K-Means is a clustering algorithm similar to standard K-Means, but it considers


weights assigned to data points during clustering. This allows certain points to have more
influence over the final cluster centers.

Why Use Weights?

• Some data points are more reliable or important than others.


• Weights can reflect:
o Sample frequency
o Confidence levels
o Importance in business logic

How Weighted K-Means Works

Let’s break it down step by step.

Inputs:

• A dataset of nnn data points x1,x2,...,xn


• Each point has a weight wi
• Number of clusters k

Steps:

1. Initialize k cluster centroids randomly.


2. Assign each data point to the nearest cluster centroid (like standard K-Means).
3. Update the cluster centroids using weighted means:

μj=∑xi∈Cjwixi/∑xi∈Cjwi

Where:

o Cj is the set of points assigned to cluster j


o Wi is the weight of point xi
4. Repeat steps 2–3 until centroids no longer change or a max number of iterations is
reached.

Simple Example

Dataset:

Data Point Value (x) Weight (w)

A 1 1

B 2 2

C 10 1

D 11 1

Suppose we want to cluster into k = 2 clusters.

Step 1: Initialization

Start with centroids (randomly):

• Cluster 1: A, B
• Cluster 2: C, D

Step 2: Compute weighted centroids

For Cluster 1 (A and B):

μ1=1⋅1+2⋅21+2=53≈1.67

For Cluster 2 (C and D):

μ2=1⋅10+1⋅112=212=10.5

Now reassign points based on the new centroids and repeat if necessary.

9. What are the different ways to combine classifiers?

A) Ensemble learning is like a team of experts working together to solve a


problem. By combining their strengths and minimizing their weaknesses, the
team can achieve better results than any individual expert.
1. Bagging

Bagging involves training multiple instances of the same classifier on different subsets of the
training data. The final prediction is made by combining the predictions of all the classifiers.
2. Boosting

Boosting involves training multiple classifiers sequentially, with each subsequent classifier
focusing on the mistakes made by the previous one. The final prediction is made by combining
the predictions of all the classifiers.

3. Stacking

Stacking involves training multiple classifiers and then using a meta-classifier to make the final
prediction based on the predictions of the individual classifiers.

4. Voting

Voting involves training multiple classifiers and then combining their predictions by taking a
vote. The class with the most votes is selected as the final prediction.

5. Averaging

Averaging involves training multiple classifiers and then combining their predictions by taking
the average of their output probabilities.

You might also like