0% found this document useful (0 votes)

8 views11 pages

ML Unit3 QB Solutions

The document explains two ensemble methods: Bagging and Boosting. Bagging builds multiple independent models in parallel using random samples and combines their predictions, while Boosting sequentially trains models to correct errors from previous ones. It also covers decision trees, regression models in exploratory data analysis, Gaussian Mixture Models, unsupervised learning, K-Nearest Neighbor algorithm, Weighted K-Means, and various methods to combine classifiers.

Uploaded by

r49793756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views11 pages

ML Unit3 QB Solutions

Uploaded by

r49793756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1.

Explain about Boosting and Bagging

A) Bagging stands for Bootstrap Aggregating. It builds multiple independent models in

parallel using random samples of the data, and combines their predictions (usually via voting
or averaging).

How it Works:

Bootstrap sampling: Create multiple datasets by randomly sampling (with

replacement) from the original dataset.

Train a model (usually a decision tree) on each dataset.

Aggregate predictions:

a. Classification → Majority vote

b. Regression → Average

Advantages:

• Easy to parallelize
• Reduces overfitting
• Handles high variance well

Disadvantages:

• May not improve bias

• Requires more computational resources

Boosting is a sequential ensemble method where each new model tries to correct the errors
made by previous ones.

How it Works:

1. Train the first weak learner on the data.

2. Evaluate performance and assign higher weights to misclassified points.
3. Train the next learner to focus more on the hard examples.
4. Repeat for multiple rounds.
5. Combine predictions by weighted voting or summation.

Advantages:

• High prediction accuracy

• Works well with imbalanced datasets
• Reduces both bias and variance

Disadvantages:

• Slower to train (sequential)

• Prone to overfitting if not regularized
• Harder to parallelize

2. Build the stucture of decision tree with an example?

A) A decision tree is a supervised learning algorithm used for both classification

and regression. It works by splitting the data into subsets based on the feature
that results in the maximum information gain (for classification) or minimum
variance (for regression).

3. Component Description

Root Node The topmost node that represents the feature to split first

Internal Nodes Nodes where data is split further

Leaf Nodes (Terminal Nodes) Nodes that represent the output class or value

Branches Paths from one node to another based on decisions

📘 Example Problem: Weather & Play Tennis

Dataset:

Outlook Temperature Humidity Wind Play Tennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Overcast Cool Normal Strong Yes

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

Sunny Mild Normal Strong Yes

Overcast Mild High Strong Yes

Outlook Temperature Humidity Wind Play Tennis

Overcast Hot Normal Weak Yes

Rain Mild High Strong No

🏗️ Decision Tree Structure (Built using ID3 Algorithm)

Using information gain, we find that Outlook is the best feature to split on first.

yaml
CopyEdit
[Outlook]
/ | \
Sunny Overcast Rain
/ | \
[Humidity] Yes [Wind]
/ \ / \
High Normal Weak Strong
No Yes Yes No

Root Node = Outlook (highest information gain)

• Sunny:
o Best split: Humidity
▪ High → No
▪ Normal → Yes
• Overcast: All samples say Yes
• Rain:
o Best split: Wind
▪ Weak → Yes
▪ Strong → No

3.How does the structure of decision tree help in classifying a data instance?

A) A decision tree classifies a data instance by using a tree-like model of decisions. Each
internal node represents a test on a feature, each branch represents the outcome of the test,
and each leaf node represents a class label or output value.

How It Works: Classification Process

Steps:

1. Start at the root node of the tree.

2. Check the feature specified at that node in the data instance.
3. Follow the branch that corresponds to the value of that feature.
4. Repeat the process at the next node.
5. When a leaf node is reached, assign the class label at that leaf to the instance.
Example:

Given Data Instance:

Python

CopyEdit
{
'Outlook': 'Rain',
'Temperature': 'Cool',
'Humidity': 'Normal',
'Wind': 'Strong'
}
Decision Tree (simplified):

yaml

CopyEdit
[Outlook]
/ | \
Sunny Overcast Rain
/ | \
[Humidity] Yes [Wind]
/ \ / \
High Normal Weak Strong
No Yes Yes No
Classification Path:

1. Outlook = Rain → Go to the right subtree

2. At node Wind: Wind = Strong → Follow the right branch
3. Reach leaf → Class = No

So, the instance is classified as No (will not play tennis).

Why the Tree Structure Helps

Tree Feature How It Helps

Hierarchical structure Makes decisions in steps (like a flowchart), simplifying complex problems.

Feature-based splits Each decision is based on one feature, making classification interpretable.

Path to leaf Represents a logical rule (IF... THEN...), providing transparency.

Leaves hold predictions Final decision is stored in leaves — easy to read and apply.

Advantages of Using a Decision Tree for Classification

• Fast Inference: Once built, trees classify in O(depth) time.

• Human-readable rules: Easy to interpret and explain.
• Handles both categorical and numerical features.
• No need for feature scaling (e.g., normalization not required).

4. What is the role of regression model in exploratory data analysis?

A) Exploratory Data Analysis (EDA) is mostly about understanding the structure,
patterns, and relationships in data, regression models can play a helpful supporting
role by uncovering trends and suggesting relationships between variables.

Why Use Regression During EDA?

Regression is not just for prediction — during EDA, it serves as a quantitative tool
to:

Purpose Description

Understand how one variable changes in response to another (e.g., sales

Quantify relationships
vs. price).

Identify important
Spot which variables significantly affect the target.
predictors

Detect trends or patterns Linear or non-linear trends can emerge from regression lines.

Expose outliers Residuals from regression help highlight abnormal data points.

Check assumptions Variance, linearity, multicollinearity, and normality can be assessed.

5. Explain about Gaussian Mixture Models.

A) Imagine a dataset of flowers. You have a mix of roses, tulips, and daisies. While they're
all flowers, they have distinct characteristics like color, size, and petal shape. A Gaussian
Mixture Model (GMM) is a statistical model that can identify these underlying groups or
clusters within the data.
How does a GMM work?
1. Assume a Mixture of Gaussians: We assume that our data is generated from a mixture
of multiple Gaussian distributions. Each Gaussian distribution represents a cluster or
group within the data.
2. Estimate Parameters: The GMM model estimates the parameters of each
Gaussian component:
- Mean: The centre of the cluster.
- Covariance matrix: The shape and orientation of the cluster.

- Mixing coefficient: The proportion of data points belonging to that cluster.

3. Iterative Process: The model uses an iterative process called Expectation-
Maximization (EM) to find the best-fitting parameters.
- Expectation Step (E-step): Assigns each data point to a Gaussian component based on
the current parameter estimates.
- Maximization Step (M-step): Updates the parameters of each Gaussian component
to better fit the assigned data points.
4. Clustering: Once the model converges, the data points are assigned to the
Gaussian component with the highest probability.
Why Use GMMs?
Soft Clustering: Unlike traditional clustering methods like K-means, GMMs assign
probabilities to each data point belonging to different clusters.
Modelling Complex Data: GMMs can model complex data distributions that are not
well- represented by single Gaussian distributions.
Density Estimation: GMMs can be used to estimate the probability density function
of the data.
Anomaly Detection: Identifying outliers or anomalies in data.

6. Explain the concept of Unsupervised Learning and discuss its significance with examples.

A) A type of machine learning where the algorithm is trained on data that does not
have labelled responses. The goal is to identify hidden patterns or intrinsic
structures in the input data.
Examples of unsupervised learning tasks include clustering, where the goal is to
group similar items together, and association, where the goal is to find rules that
describe large portions of the data.
Significance of Unsupervised Learning

✅ 1. Discover Hidden Patterns

• Reveals natural groupings or structures that are not immediately obvious.

• Helps understand the underlying distribution of the data.

✅ 2. Data Exploration and Preprocessing

• Useful for feature selection, noise reduction, or visualization (e.g., with PCA or t-
SNE).

✅ 3. Customer Segmentation

• In marketing, businesses can segment customers into distinct groups without prior
labels.

✅ 4. Anomaly Detection

• Detect unusual patterns or outliers in financial transactions, cybersecurity, etc.

✅ 5. Recommendation Systems

• Find user/item similarity without labeled preferences (e.g., collaborative filtering).

Types of Unsupervised Learning

Clustering

Dimensionality Reduction

Association Rule Mining

Anomaly Detection

Examples of Unsupervised Learning

Example 1: Clustering (K-Means)

Use Case:

Segment customers based on behavior.

How it works:

• Algorithm groups customers into k clusters based on similarity in features (age,

income, purchase history).
• The company can target different groups with tailored marketing strategies.

Example 2: Dimensionality Reduction (PCA)

Use Case:

Visualize high-dimensional data.

How it works:

• PCA reduces 100 features down to 2 or 3 for plotting and exploration.

• Used in image compression, gene expression analysis, etc.

Example 3: Association Rules

Use Case:

Market basket analysis.

How it works:

• Discover rules like: If a customer buys bread and butter, they are likely to buy jam.
• Helps in cross-selling and product placement.

7. What is K nearest neighbor algorithm in ML

A) K-Nearest Neighbor (KNN) is a simple, non-parametric, and lazy learning algorithm
used for classification and regression. Despite its simplicity, it is very powerful and
widely used in pattern recognition and data mining.
Key Concepts of KNN

Feature Description

Type Supervised learning algorithm

Usage Classification and Regression

Learning style Instance-based (lazy learning)

Assumption Similar data points exist in close proximity (distance-based learning)

How KNN Works

1. Choose the number of neighbors (k).

2. Calculate the distance between the test point and all training data points (commonly
using Euclidean distance).
3. Sort the distances and identify the k nearest neighbors.
4. Make a prediction:
o Classification: Use majority vote of the neighbors' labels.
o Regression: Take the average of the neighbors' values.
5. Return the predicted class or value.

8. Explain weighted K-Means Algorithm with an example.

A) Weighted K-Means is a clustering algorithm similar to standard K-Means, but it considers

weights assigned to data points during clustering. This allows certain points to have more
influence over the final cluster centers.

Why Use Weights?

• Some data points are more reliable or important than others.

• Weights can reflect:
o Sample frequency
o Confidence levels
o Importance in business logic

How Weighted K-Means Works

Let’s break it down step by step.

Inputs:

• A dataset of nnn data points x1,x2,...,xn

• Each point has a weight wi
• Number of clusters k

Steps:

1. Initialize k cluster centroids randomly.

2. Assign each data point to the nearest cluster centroid (like standard K-Means).
3. Update the cluster centroids using weighted means:

μj=∑xi∈Cjwixi/∑xi∈Cjwi

Where:

o Cj is the set of points assigned to cluster j

o Wi is the weight of point xi
4. Repeat steps 2–3 until centroids no longer change or a max number of iterations is
reached.

Simple Example

Dataset:

Data Point Value (x) Weight (w)

A 1 1

B 2 2

C 10 1

D 11 1

Suppose we want to cluster into k = 2 clusters.

Step 1: Initialization

Start with centroids (randomly):

• Cluster 1: A, B
• Cluster 2: C, D

Step 2: Compute weighted centroids

For Cluster 1 (A and B):

μ1=1⋅1+2⋅21+2=53≈1.67

For Cluster 2 (C and D):

μ2=1⋅10+1⋅112=212=10.5

Now reassign points based on the new centroids and repeat if necessary.

9. What are the different ways to combine classifiers?

A) Ensemble learning is like a team of experts working together to solve a

problem. By combining their strengths and minimizing their weaknesses, the
team can achieve better results than any individual expert.
1. Bagging

Bagging involves training multiple instances of the same classifier on different subsets of the
training data. The final prediction is made by combining the predictions of all the classifiers.
2. Boosting

Boosting involves training multiple classifiers sequentially, with each subsequent classifier
focusing on the mistakes made by the previous one. The final prediction is made by combining
the predictions of all the classifiers.

3. Stacking

Stacking involves training multiple classifiers and then using a meta-classifier to make the final
prediction based on the predictions of the individual classifiers.

4. Voting

Voting involves training multiple classifiers and then combining their predictions by taking a
vote. The class with the most votes is selected as the final prediction.

5. Averaging

Averaging involves training multiple classifiers and then combining their predictions by taking
the average of their output probabilities.

Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Data Analytics Unit IV
No ratings yet
Data Analytics Unit IV
36 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Unit-IV New
No ratings yet
Unit-IV New
18 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
Unit 5
No ratings yet
Unit 5
25 pages
Lecture 7 Overview of ML Models
No ratings yet
Lecture 7 Overview of ML Models
77 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
DWH Unit 4
No ratings yet
DWH Unit 4
10 pages
AI For Eng Supervised-Learning
No ratings yet
AI For Eng Supervised-Learning
25 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Decision Trees for Data Scientists
0% (1)
Decision Trees for Data Scientists
24 pages
ML Important
No ratings yet
ML Important
11 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Unit3 ML
No ratings yet
Unit3 ML
7 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Decision Tree
100% (1)
Decision Tree
57 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
53 pages
ML Unit 3
No ratings yet
ML Unit 3
13 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
8 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Module 3
No ratings yet
Module 3
33 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
Data Analytics - Unit-IV
No ratings yet
Data Analytics - Unit-IV
21 pages
Da Mid 2
No ratings yet
Da Mid 2
12 pages
Unit Ivnotes
No ratings yet
Unit Ivnotes
19 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Decision Tree
0% (1)
Decision Tree
16 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
76 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
ML Unit1 QB Solutions
No ratings yet
ML Unit1 QB Solutions
14 pages
ML Unit4 QB Solutions
No ratings yet
ML Unit4 QB Solutions
8 pages
23wj5a0529 - DM Assignment 2
No ratings yet
23wj5a0529 - DM Assignment 2
12 pages
MPML13
No ratings yet
MPML13
5 pages
Sdcbdasparkweek1 1
No ratings yet
Sdcbdasparkweek1 1
9 pages
Introduction To Data Mining 2nd Edition by Pang Ning Tan
No ratings yet
Introduction To Data Mining 2nd Edition by Pang Ning Tan
311 pages
CNC Machine Predictive Maintenance Guide
100% (1)
CNC Machine Predictive Maintenance Guide
44 pages
Predicting Energy Consumption in Multiple Buildings Using Machine
No ratings yet
Predicting Energy Consumption in Multiple Buildings Using Machine
15 pages
MGT 247 - Decision Science & Analytics OE
No ratings yet
MGT 247 - Decision Science & Analytics OE
2 pages
EVPI
No ratings yet
EVPI
9 pages
? Decision Trees Exercise
No ratings yet
? Decision Trees Exercise
3 pages
Amity University: Jharkhand
No ratings yet
Amity University: Jharkhand
54 pages
Structuring System Requirements: Logic Modeling: Jeffrey A. Hoffer Joey F. George Joseph S. Valacich
No ratings yet
Structuring System Requirements: Logic Modeling: Jeffrey A. Hoffer Joey F. George Joseph S. Valacich
17 pages
Fake News Detection in Politics
No ratings yet
Fake News Detection in Politics
13 pages
Fda A3 13642032 PDF
No ratings yet
Fda A3 13642032 PDF
19 pages
Decision Trees A Comprehensive Guide
No ratings yet
Decision Trees A Comprehensive Guide
7 pages
Exploring Text-Based Emotions Recognition Machine
No ratings yet
Exploring Text-Based Emotions Recognition Machine
8 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
BUSINESS INTELLIGENCE Sem 6 Question Paper
No ratings yet
BUSINESS INTELLIGENCE Sem 6 Question Paper
6 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Machine Learning Viva Questions With Answers
0% (1)
Machine Learning Viva Questions With Answers
3 pages
AI Exam Paper for Grade IX Students
No ratings yet
AI Exam Paper for Grade IX Students
4 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Malware Detection
No ratings yet
Malware Detection
38 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Kilic - AI ML in CV Healthcare - 2020
No ratings yet
Kilic - AI ML in CV Healthcare - 2020
7 pages
New Ebook Guide To AI & Data Science
No ratings yet
New Ebook Guide To AI & Data Science
175 pages
Unit 3
No ratings yet
Unit 3
86 pages
Network Intrusion Detection Using Supervised Machine Learnin (3) )
No ratings yet
Network Intrusion Detection Using Supervised Machine Learnin (3) )
24 pages
OD11 PL Decision Analysis
No ratings yet
OD11 PL Decision Analysis
4 pages
Water Quality
No ratings yet
Water Quality
18 pages
Laptop Price Prediction Model
No ratings yet
Laptop Price Prediction Model
5 pages
Heart Disease Prediction Using Machine Learning Publication - Ijsart
No ratings yet
Heart Disease Prediction Using Machine Learning Publication - Ijsart
5 pages
Prediction of Regression Rate of HTPB Solid Fuel-A Machine Learning Approach
No ratings yet
Prediction of Regression Rate of HTPB Solid Fuel-A Machine Learning Approach
9 pages