0% found this document useful (0 votes)
2 views

Mcs1009 Ml Answer Key Part a and b

The document is an answer key for the MCS1009 Machine Learning exam, covering various topics in machine learning including overfitting, supervised vs unsupervised learning, and evaluation metrics like precision and recall. It details concepts such as Bayes' Theorem, the bias-variance trade-off, and the differences between discriminative and generative models. Additionally, it outlines steps for Naïve Bayes classification and methods for handling imbalanced data.

Uploaded by

devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Mcs1009 Ml Answer Key Part a and b

The document is an answer key for the MCS1009 Machine Learning exam, covering various topics in machine learning including overfitting, supervised vs unsupervised learning, and evaluation metrics like precision and recall. It details concepts such as Bayes' Theorem, the bias-variance trade-off, and the differences between discriminative and generative models. Additionally, it outlines steps for Naïve Bayes classification and methods for handling imbalanced data.

Uploaded by

devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Answer Key - MCS1009 Machine Learning

PART A (10 x 2 = 20 Marks)


 QA101 (2 Marks):
Overfitting occurs when a model learns the training data too well, including noise and
outliers, causing poor generalization to new data. It can be reduced using techniques
like cross-validation, regularization, pruning, or simplifying the model.

 QA102 (2 Marks):
Conditional probability is the probability of an event occurring given that another event
has already occurred. Example: P(A|B) = P(A ∩ B) / P(B).

 QA103 (2 Marks):
Supervised learning uses labeled data to predict outcomes, while unsupervised learning
finds hidden patterns in unlabeled data.

 QA104 (2 Marks):
Entropy measures the impurity or randomness in data. In information theory, it's used
to determine information gain.

 QA105 (2 Marks):
Advantages: (1) Automates decision-making, (2) Learns from data. Challenges: (1)
Requires large data, (2) Risk of bias.

 QA106 (2 Marks):
Precision = TP / (TP + FP), Recall = TP / (TP + FN). These are important in evaluating
classification performance, especially for imbalanced datasets.

 QA107 (2 Marks):
Euclidean distance helps in measuring the closeness of data points in clustering and
classification tasks like KNN.

 QA108 (2 Marks):
Z-score standardizes data by subtracting the mean and dividing by the standard
deviation. Helps in outlier detection.

 QA201 (2 Marks):
Discriminative models learn the boundary between classes, e.g., Logistic Regression.
Generative models model data distribution, e.g., Naive Bayes.

 QA202 (2 Marks):
The objective of Linear Regression is to model the relationship between a dependent
variable and one or more independent variables.

PART B (5 x 13 = 65 Marks)
 QB101 (a) (13 Marks):
Bayes’ Theorem:
P(A|B) = [P(B|A) * P(A)] / P(B)
In ML, it's used in probabilistic classifiers like Naïve Bayes.
Example: Email spam filtering based on word probabilities.

 QB102 (b) (13 Marks):


Overfitting: Model fits training data too closely, including noise.
Underfitting: Model too simple to capture underlying patterns.
Bias-Variance Trade-off:
- High bias: underfitting
- High variance: overfitting
Solutions: Cross-validation, pruning, regularization, simpler models.

 QB103 (b) (13 Marks):


Steps in Naïve Bayes Classification:
1. Preprocess dataset (tokenization, cleaning).
2. Estimate probabilities P(feature|class).
3. Apply Bayes’ Theorem: P(class|features).
4. Select class with highest probability.
5. Evaluate model (accuracy, confusion matrix).

 QB104 (a) (13 Marks):


Sensitivity (Recall): TP / (TP + FN)
Specificity: TN / (TN + FP)
Confusion Matrix: Summarizes TP, TN, FP, FN.
Handling Imbalanced Data: Use oversampling, undersampling, F1-score, AUC-ROC,
SMOTE.

 QB201 (a) (13 Marks):


Discriminative models: Learn boundaries (e.g., Logistic Regression, SVM).
Generative models: Learn joint probability (e.g., Naïve Bayes, HMM).
Use Case: Naïve Bayes for text classification; SVM for margin-based classification.

You might also like