Pre-T1 Assignment 1

The document outlines various concepts in machine learning, including bias-variance trade-off, hypothesis testing, and evaluation measures. It discusses the implications of bias and variance on model performance, resampling techniques for imbalanced datasets, and cross-validation strategies. Additionally, it provides specific examples and scenarios related to model evaluation, performance metrics, and the challenges of overfitting and underfitting.

Uploaded by

nabinkoirala53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views2 pages

Pre-T1 Assignment 1

Uploaded by

nabinkoirala53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Pre-T1 Assignment 1

Topics: Re-sampling methods: Bias–Variance Trade-off. Hypothesis Testing and Variable

Selection, Sub sampling and Upsampling, SMOTE; Cross Validation (validation set, Leave- One-
Cut (LOO), k-fold strategies) and bootstrap; Evaluation measures-Error functions, Confusion
Matrix, Accuracy, Precision and Recall, F1 Score.

1. Briefly differentiate between bias and variance in machine learning. How do they relate to
the model's capacity to generalize?

2. Discuss the impact of bias and variance on the performance of machine learning models,
emphasizing their role in the trade-off for supervised learning. Illustrate your explanation
with a real-world example showcasing scenarios of underfitting and overfitting. (3)

3. Define subsampling, oversampling, and Synthetic Minority Over-sampling Technique

(SMOTE) in the context of addressing imbalanced datasets.
4. What is cross-validation? Briefly explain stratified k-Fold Cross-Validation and Time Series
Cross-Validation.

5. What is bootstrap resampling? Explain its purpose and briefly describe its working
mechanism.

6. Discuss the advantages of using bootstrap resampling for estimating the confidence
intervals of a machine learning model's performance metrics. Why is this technique
particularly useful for small datasets?

7. What do you understand by overfitting and underfitting of a machine learning model? n

in the context of machine learning? Provide concise definitions with examples.
8. Define the terms: True Positive (TP), True Negative (TN), False Positive (FP), and False
Negative (FN).

9. What is a confusion matrix? Why is it used?

10. A supervised learning model gives the following error components for a dataset:
𝐵𝑖𝑎𝑠 = 0.6, 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 0.4, 𝐼𝑟𝑟𝑒𝑑𝑢𝑐𝑖𝑏𝑙𝑒 𝐸𝑟𝑟𝑜𝑟 = 0.2.

What is the total error of the model? Suppose the bias decreases to 0.5 while the variance
increases to 0.6, calculate the new total error. How will these changes affect model
performance? Suggest approaches to reduce the overall error, considering both high
variance and high bias scenarios.

11. A healthcare organization is developing a machine learning model to predict rare diseases
using patient records. The dataset contains 500,000 patient records, but only 2,500
records are labelled as patients diagnosed with the rare disease. Explain why the
imbalance in the dataset poses a significant problem for training a machine learning model
in this rare disease prediction scenario. Describe how SMOTE can be applied to balance
the dataset, providing a step-by-step explanation.

12. A data analyst is building a predictive model for stock price movements using a dataset
with 50,000 records. The goal is to fine-tune the model’s hyper-parameters while
considering the sequential nature of the data to avoid data leakage. Recommend an
appropriate cross-validation strategy and justify your choice. Discuss the pros and cons of
using Time Series Cross-Validation for this scenario, focusing on computational efficiency,
variance of performance estimates, and its suitability for time-dependent data.

13. A machine learning model is built to predict student performance in exams. The model
shows high accuracy on the training dataset but performs poorly on the test dataset.
Identify whether the model suffers from high bias or high variance. What steps would you
take to address the issue and improve test accuracy. Justify your suggestions.
14. A healthcare organization is using a machine learning model to predict whether a patient
has a specific disease (Positive) or not (Negative). The following confusion matrix is
obtained after testing the model on a dataset:
Predicted Positive Predicted Negative
Actual Positive 80 20
Actual Negative 30 170

Based on the given confusion matrix, explain the meaning of each value (80, 20, 30, and
170) in the context of the healthcare diagnosis system. Calculate the model's accuracy,
precision, recall, and F1 Score. If the model's primary goal is to minimize False Negatives
(e.g., to avoid missing disease cases), discuss whether this model performs well. Suggest
ways to improve the model if necessary.

15. A company has developed a spam detection model. On evaluating the model with a test
dataset, the following confusion matrix is obtained:
Predicted Spam Predicted Not Spam
Actual Spam 120 30
Actual Not Spam 50 300

Interpret the meaning of the values 120, 30, 50, and 300 in the context of spam detection.
Compute the accuracy, precision, recall, and F1 Score of the spam detection model. If the
company prioritizes minimizing False Positives (to ensure legitimate emails are not marked
as spam), assess whether the model is suitable. Provide recommendations to adjust the
model if improvements are needed.

MTech (Module 01 QP) - 2
No ratings yet
MTech (Module 01 QP) - 2
2 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
Evaluating Models QA ClassX 25-26
No ratings yet
Evaluating Models QA ClassX 25-26
6 pages
SML Updated UNIT 4
No ratings yet
SML Updated UNIT 4
44 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Machine Learning QUESTION AND ANSWERS
No ratings yet
Machine Learning QUESTION AND ANSWERS
13 pages
IML-IITKGP - Assignment 1 Solution
100% (1)
IML-IITKGP - Assignment 1 Solution
7 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
ML Week7 Soln
No ratings yet
ML Week7 Soln
3 pages
ML Ia! Final PDF
No ratings yet
ML Ia! Final PDF
20 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Ai Chapter 3
No ratings yet
Ai Chapter 3
8 pages
Unit IV
No ratings yet
Unit IV
51 pages
Assignment m2 Machine Learning Final
No ratings yet
Assignment m2 Machine Learning Final
5 pages
ML Final Notes Unit 4,5 Rishi
No ratings yet
ML Final Notes Unit 4,5 Rishi
45 pages
Week 1 2021
No ratings yet
Week 1 2021
2 pages
S&UL Subjective Question Bank
No ratings yet
S&UL Subjective Question Bank
7 pages
ML 4
No ratings yet
ML 4
21 pages
ML QB
No ratings yet
ML QB
6 pages
ML Ese 031223 Openbook
No ratings yet
ML Ese 031223 Openbook
4 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
Aiunit 7 10
No ratings yet
Aiunit 7 10
4 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Sample Questions
No ratings yet
Sample Questions
8 pages
AIML-Question Bank-ESE
No ratings yet
AIML-Question Bank-ESE
11 pages
Machine Learning Midterm 2009
No ratings yet
Machine Learning Midterm 2009
23 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
Data Mining - UOG (HH) - Final - F23-1
No ratings yet
Data Mining - UOG (HH) - Final - F23-1
10 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
2 pages
Questo Es
No ratings yet
Questo Es
8 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
AWS Certified Machine Learning - Specialty - Sample Questions
No ratings yet
AWS Certified Machine Learning - Specialty - Sample Questions
5 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
AssignmentQuestion4Bigdata 2025
No ratings yet
AssignmentQuestion4Bigdata 2025
2 pages
ML Chapter 1 Q& A
No ratings yet
ML Chapter 1 Q& A
4 pages
Machine Learning General: Definiton
No ratings yet
Machine Learning General: Definiton
14 pages
Improving Machine Learning Performance
No ratings yet
Improving Machine Learning Performance
14 pages
18AI61
No ratings yet
18AI61
3 pages
MachineLearning Chatgpt
No ratings yet
MachineLearning Chatgpt
19 pages
18AI61
No ratings yet
18AI61
3 pages
AIML Feb, March Scheme 2023
No ratings yet
AIML Feb, March Scheme 2023
25 pages
Unit 4
No ratings yet
Unit 4
34 pages
X Evaluating Model
No ratings yet
X Evaluating Model
3 pages
ML Module - 1
No ratings yet
ML Module - 1
3 pages
ML Theory
No ratings yet
ML Theory
10 pages
Deep Learning Important Questions For Ia 1
No ratings yet
Deep Learning Important Questions For Ia 1
11 pages
ASSIGNMENT2
No ratings yet
ASSIGNMENT2
6 pages
ML MCQ 1
No ratings yet
ML MCQ 1
5 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Neural Networks & Machine Learning FAQ
No ratings yet
Neural Networks & Machine Learning FAQ
5 pages
DL Unit1
100% (1)
DL Unit1
61 pages
Questions Bank Faml
No ratings yet
Questions Bank Faml
2 pages
MLL Final Exam Prep
No ratings yet
MLL Final Exam Prep
5 pages
CH 6
No ratings yet
CH 6
24 pages
ML Question Bank
No ratings yet
ML Question Bank
4 pages
Introduction To Machine Learning - Ecen 4122 - 2023
No ratings yet
Introduction To Machine Learning - Ecen 4122 - 2023
4 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
Ha01 - PP Test
No ratings yet
Ha01 - PP Test
17 pages
Random Variables & Probability Guide
No ratings yet
Random Variables & Probability Guide
3 pages
Gadget Impact on STEM Procrastination
No ratings yet
Gadget Impact on STEM Procrastination
57 pages
Math Symbols List (+,-,X, - , ,... )
No ratings yet
Math Symbols List (+,-,X, - , ,... )
12 pages
Probability & Statistics Guide
No ratings yet
Probability & Statistics Guide
2 pages
8614 Quiz File
No ratings yet
8614 Quiz File
68 pages
Secd Group6 RC Coleman
No ratings yet
Secd Group6 RC Coleman
17 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Statistics S3 Exam Paper Analysis
No ratings yet
Statistics S3 Exam Paper Analysis
28 pages
ALY6010 - Project 3 Document - Electronic Keno - v1 PDF
No ratings yet
ALY6010 - Project 3 Document - Electronic Keno - v1 PDF
6 pages
Statistics and Probability
No ratings yet
Statistics and Probability
59 pages
6982 A Study On Relu and Softmax In-2
No ratings yet
6982 A Study On Relu and Softmax In-2
11 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
66 pages
ACM-II Merged 26th Jan
No ratings yet
ACM-II Merged 26th Jan
85 pages
EuroElectro International Ltd. DDDB
No ratings yet
EuroElectro International Ltd. DDDB
21 pages
(Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Instant Download
100% (1)
(Ebook PDF) Business Statistics in Practice 3rd Canadian Edition Instant Download
54 pages
Test Bank For Business Statistics Communicating With Numbers 2nd Edition by Jaggia and Kelly ISBN 0078020557 9780078020551
100% (79)
Test Bank For Business Statistics Communicating With Numbers 2nd Edition by Jaggia and Kelly ISBN 0078020557 9780078020551
52 pages
(Ebook PDF) Using and Interpreting Statistics: A Practical Text For The Behavioral, Social, and Health Sciences 3rd Editionpdf Download
100% (4)
(Ebook PDF) Using and Interpreting Statistics: A Practical Text For The Behavioral, Social, and Health Sciences 3rd Editionpdf Download
48 pages
P. Vanicek, Introduction To Adjustment Calculus
No ratings yet
P. Vanicek, Introduction To Adjustment Calculus
250 pages
Level II IFT Study Notes Quant R04 Introduction To Linear Regression
No ratings yet
Level II IFT Study Notes Quant R04 Introduction To Linear Regression
13 pages
Apl Statistics
100% (1)
Apl Statistics
353 pages
Statistics Notes
No ratings yet
Statistics Notes
7 pages
CH 10 Binomial Option Pricing
No ratings yet
CH 10 Binomial Option Pricing
44 pages
Exercises Variables Control Charts
No ratings yet
Exercises Variables Control Charts
49 pages
CS1 Chapter 0 Assumed Knowledge 2024
No ratings yet
CS1 Chapter 0 Assumed Knowledge 2024
37 pages
Detection Estimation and Modulation Theory, Solutions Part I
No ratings yet
Detection Estimation and Modulation Theory, Solutions Part I
80 pages
RR Anova 38
No ratings yet
RR Anova 38
17 pages
Lesson-6 - Data Analysis
No ratings yet
Lesson-6 - Data Analysis
24 pages
Variance of Random Variables Explained
No ratings yet
Variance of Random Variables Explained
2 pages
Huawei 3g Parameters
100% (2)
Huawei 3g Parameters
201 pages

Pre-T1 Assignment 1

Uploaded by

Pre-T1 Assignment 1

Uploaded by

Pre-T1 Assignment 1

Topics: Re-sampling methods: Bias–Variance Trade-off. Hypothesis Testing and Variable

3. Define subsampling, oversampling, and Synthetic Minority Over-sampling Technique

7. What do you understand by overfitting and underfitting of a machine learning model? n

9. What is a confusion matrix? Why is it used?

You might also like