BTECH III YEAR I SEMESTER (AR20)
Name of the Course : Machine Learning
Course Code : 20AI5023
Name of the Course Coordinator : Dr K V Satyanarayana
QUESTION BANK
(PREPARE QUESTION BANK TO COVER ALL THE TOPICS)
S.No QUESTIONS Level Course MARKS
Outcome
UNIT-1
Introduction: Brief Introduction to Machine Learning, Abstraction and Knowledge
Representation, Types of Machine Learning Algorithms. Definition of learning systems, Goals and
applications of machine learning, Aspects of developing a learning system: Data Types, training data,
concept representation, function approximation.
Data Pre-processing: Definition, Steps involved in pre-processing, Techniques
1 Define machine learning, and explain Remembering (L1) CO1 4M
applications of machine learning &What
is the primary goal of machine learning?
2 Explain types of machine learning Remembering (L1) CO1 4M
algorithms. and write goals of machine
learning
3 Define data pre-processing techniques in Remembering (L1) CO1 4M
the context of machine learning.
4 What are the aspects of developing a Remembering (L1) CO1 4M
learning system?
5 Explain the concept of knowledge Understanding (L2) CO1 7M
representation in machine learning.
Provide an example.
6 Describe the role of training data in the Understanding (L2) CO1 7M
machine learning process. Why is it
essential?
7 Differentiate between supervised and Understanding (L2) CO1 7M
unsupervised machine learning
algorithms. Give examples of each.
8 How does data pre-processing contribute Understanding (L2) CO1 7M
to the success of a machine learning
model? Provide specific techniques and
their significance.
9 Compare and contrast supervised and Applying/Analyzing(L3/ CO1 10M
unsupervised machine learning algorithms L4)
in terms of their goals, data requirements,
and typical use cases.
10 Evaluate the impact of the quality of Applying/Analyzing(L3/ CO1 10M
training data on the performance of a L4)
machine learning model.
11 Analyze the trade-offs between various Applying/Analyzing(L3/ CO1 10M
data pre-processing techniques L4)
12 Apply data pre-processing techniques to a Applying/Analyzing(L3/ CO1 10M
given dataset. L4)
UNIT-2
Performance measurement of models: Accuracy, Confusion matrix, TPR, FPR, FNR, TNR,
Precision & recall, F1-score, Receiver Operating Characteristic Curve and (ROC) curve AUC.
Supervised Learning1:Linear Regression, Multiple Variable Linear Regression, Naive
Bayes Classifiers,Gradient Descent, Multicollinearity, Bias-Variance trade-off.
1 Define accuracy as a performance Remembering (L1) CO2 4M
measure for machine learning models.
2 Explain how accuracy is calculated and Remembering (L1) CO2 4M
its limitations.
3 Define a confusion matrix and discuss its Remembering (L1) CO2 4M
components, including true positive rate
(TPR), false positive rate (FPR), false
negative rate (FNR), and true negative
rate (TNR).
4 Define precision and recall as Remembering (L1) CO2 4M
performance measures for classification
models.
5 Explain the concept of the Receiver Understanding (L2) CO2 7M
Operating Characteristic (ROC) curve and
the Area Under the Curve (AUC) metric.
6 Discuss the concept of linear regression Understanding (L2) CO2 7M
and its application in supervised learning.
7 Explain the concept of multiple variable Understanding (L2) CO2 7M
linear regression and its advantages over
simple linear regression.
8 Describe the Naive Bayes classifier and Understanding (L2) CO2 7M
its underlying assumptions.
9 Discuss the challenges of Applying/Analyzing(L3/ CO2 10M
multicollinearity and how it can impact L4)
the regression model.
10 Apply linear regression to a real-world Applying/Analyzing(L3/ CO2 10M
dataset. Describe the dataset, discuss the L4)
independent and dependent variables, and
explain the process of building a linear
regression model.
11 Discuss the model's performance and Applying/Analyzing(L3/ CO2 10M
analyze the impact of multicollinearity on L4)
the model's accuracy.
12 Design a classification model using the Applying/Analyzing(L3/ CO2 10M
Naive Bayes algorithm for a specific L4)
problem.
UNIT-3
Supervised Learning2: Regularization, Logistic Regression, Squashing function, KNN, Support
Vector Machine.
Decision Tree Learning – Decision Tree Learning: Representing concepts as decision trees, entropy
and Recursive induction of decision trees, picking the best splitting attribute: information gain,
searching for simple trees and computational complexity, Occam's razor, over fitting, noisy data, and
pruning. Decision Trees – ID3 – CART – Error bounds.
1 Discuss the concept of regularization in Remembering (L1) CO3 4M
supervised learning and its role in
preventing overfitting.
2 Explain how regularization helps in Remembering (L1) CO3 4M
improving the generalization of a model.
3 Define logistic regression and explain its Remembering (L1) CO3 4M
use in binary classification problems.
4 Explain the K-nearest neighbors (KNN) Remembering (L1) CO3 4M
algorithm and its application in
supervised learning.
5 Define support vector machines (SVM) Understanding (L2) CO3 7M
and discuss their use in supervised
learning.
6 Explain the role of entropy in decision Understanding (L2) CO3 7M
tree learning and how it is used to
determine the best splitting attribute.
7 Discuss the concept of decision tree Understanding (L2) CO3 7M
learning and its representation of concepts
as decision trees.
8 Explain the process of recursive induction Understanding (L2) CO3 7M
of decision trees and discuss the criteria
for picking the best splitting attribute,
including information gain and searching
for simple trees.
9 Discuss the challenges of overfitting and Applying/Analyzing(L3/ CO3 10M
noisy data in decision tree learning. L4)
10 Apply logistic regression to a binary Applying/Analyzing(L3/ CO3 10M
classification problem using a real-world L4)
dataset.
11 Design a decision tree learning algorithm Applying/Analyzing(L3/ CO3 10M
using the ID3 or CART algorithm for a L4)
specific problem.
12 Explain how Occam's razor and pruning Applying/Analyzing(L3/ CO3 10M
techniques can help address these L4)
challenges.
UNIT-4
Unsupervised Learning: K-Means, Customer Segmentation, Hierarchical clustering, DBSCAN,
Anomaly Detection, Local Outlier Factor, Isolation Forest, Dimensionality Reduction, PCA, GMM,
Expectation Maximization.
1 Define unsupervised learning and provide Remembering (L1) CO4 4M
examples of its applications in real-world
scenarios.
2 Explain the K-means algorithm and its Remembering (L1) CO4 4M
use in clustering data.
3 Define hierarchical clustering and discuss Remembering (L1) CO4 4M
its advantages and disadvantages
compared to K-means clustering.
4 Explain the concept of anomaly detection Remembering (L1) CO4 4M
and discuss the Local Outlier Factor
(LOF) and Isolation Forest algorithms
used for anomaly detection.
5 Discuss the DBSCAN algorithm and its Understanding (L2) CO4 7M
use in density-based clustering.
6 Explain the concept of core points, border Understanding (L2) CO4 7M
points, and noise points in DBSCAN.
7 Explain the concept of dimensionality Understanding (L2) CO4 7M
reduction and its importance in
unsupervised learning.
8 Discuss the Principal Component Understanding (L2) CO4 7M
Analysis (PCA) algorithm and its use in
dimensionality reduction.
9 Discuss the Gaussian Mixture Model Applying/Analyzing(L3/ CO4 10M
(GMM) and the Expectation- L4)
Maximization (EM) algorithm used in
unsupervised learning.
10 Apply the K-means algorithm to a real- Applying/Analyzing(L3/ CO4 10M
world dataset for customer segmentation. L4)
11 Design an anomaly detection system Applying/Analyzing(L3/ CO4 10M
using the Local Outlier Factor (LOF) L4)
algorithm for a specific problem.
12 Discuss the steps involved in detecting Applying/Analyzing(L3/ CO4 10M
anomalies, including data preprocessing L4)
and model training.
UNIT 5
Ensemble Models: Ensemble Definition, Bootstrapped Aggregation (Bagging) Intuition, Random
Forest and their construction, Extremely randomized trees, Gradient Boosting, Regularization by
Shrinkage, XGBoost, AdaBoost.
1 Define ensemble learning and explain its Remembering (L1) CO5 4M
significance in machine learning.
2 Explain the concept of bootstrapped Remembering (L1) CO5 4M
aggregation (bagging) and its role in
ensemble models.
3 Discuss the construction of random Remembering (L1) CO5 4M
forests and their advantages over
individual decision trees.
4 Define extremely randomized trees and Remembering (L1) CO5 4M
discuss their characteristics and benefits.
5 Discuss the concept of gradient boosting Understanding (L2) CO5 7M
and its use in ensemble models.
6 Explain the concept of regularization by Understanding (L2) CO5 7M
shrinkage in ensemble models.
7 Discuss the XGBoost algorithm and its Understanding (L2) CO5 7M
significance in ensemble learning.
8 Explain the features and advantages of Understanding (L2) CO5 7M
XGBoost over other gradient boosting
algorithms.
9 Design a random forest model for a Applying/Analyzing(L3/ CO5 10M
specific problem using a real-world L4)
dataset.
10 Analyze the model's performance and Applying/Analyzing(L3/ CO5 10M
discuss the importance of feature L4)
importance measures.
11 Apply the AdaBoost algorithm to a binary Applying/Analyzing(L3/ CO5 10M
classification problem using a dataset. L4)
12 Discuss how regularization helps in Applying/Analyzing(L3/ CO5 10M
reducing overfitting. L4)
L1: Remembering
L2: Understanding
L3: Applying
L4: Analyzing
Course Outcomes:
At the end of the Course, Student will be able to:
CO-1 Understand the basic concepts of machine learning techniques and data
preprocessing techniques.
CO-2 Able to understand performance evaluation metrics and supervised learning
techniques
CO-3 Solve the problemsusing the decision trees and supervised learning methods.
CO-4 Apply the unsupervised learning techniques on data.
CO-5 Apply the ensemble models on data.
Text Books:
1 Machine Learning – Tom M. Mitchell, - MGH
2 EthemAlpaydin,"Introduction to Machine Learning‖, MIT Press, Prentice Hall of
India, Third Edition 2014.
3 The Elements of Stat stical Learning, Trevor Hastie, Robert Tibshirani& Jerome
Friedman, Springer Verlag, 2001.
Reference Books:
1 Machine Learning, SaikatDutt, Subramanian Chandramouli, Amit Kumar Das,
Pearson, 2019.
2 Stephen Marsland, “Machine Learning -An Algorithmic Perspective”, Second
Edition, Chapman and Hall/CRC Machine Learning and Pattern Recognition
Series, 2014.
3 Application of machine learning in industries (IBM ICE Publications).