Q1.
As a ML engineer you have been assigned to predicting academic performance of a student and their
likelihood of getting placed at company X, answer the following in detail:
• Discuss the three main types of machine learning techniques. Create a dataset with three instances to
predict student placement at company X, using academic scores, extracurricular activities, and
hackathon wins as input data, and placement status as the response. Briefly explain which machine
learning technique would be suitable to design a model to predict whether a student will be placed or
not, providing your reasoning.
• Draw a machine learning model for the above scenario. Outline the typical machine learning process,
from gathered data to model evaluation, specifically in the context of predicting student placement
success. Highlight the importance of model training, and evaluation in ensuring that the model can
accurately predict placement outcomes at company X.
ANSWER:
1. Three types of Machine Learning [explain more if need be]
• Supervised Learning: Learns from labeled data; used for classification (Placed/Not Placed) and
regression.
• Unsupervised Learning: Works with unlabeled data to find patterns like clustering or dimensionality
reduction.
• Reinforcement Learning: Agent interacts with environment, learns from feedback (reward/penalty).
2. Example Dataset
Student Academic_Score Extracurricular_Index Hackathon_Wins Placed
S1 88 6 1 1
S2 72 2 0 0
S3 79 8 2 1
3. Suitable Technique
• The task is predicting Placed (Yes/No) based on features.
• This is a Supervised Classification problem.
• Logistic Regression or Decision Tree would be appropriate.
4. Machine Learning Model (Concept Diagram)
Inputs: [Academic_Score, Extracurricular_Index, Hackathon_Wins]
↓
Data Preprocessing
↓
Classification Model
↓
Prediction: Placed / Not Placed
5. ML Process for Placement Prediction
1. Data Collection – gather scores, activities, hackathon data, placement status.
2. Preprocessing – handle missing values, normalize features, encode labels.
3. Splitting – divide into training and testing sets.
4. Model Training – use supervised classifier (e.g., Logistic Regression).
5. Evaluation – check accuracy, precision, recall, F1-score.
6. Deployment – apply model to new students.
6. Importance of Training & Evaluation
• Training builds the relationship between input features and placement outcome.
• Evaluation ensures the model generalizes and predicts correctly, avoiding overfitting.
Q2. 1. A student has to attend Smart India Hackathon scheduled for fore coming day and following are observed
• The probability that student falls prepare is P(p)=0.6
• The probability that he participates in Smart India Hackathon is P(P)=0.70
• The probability that student do preparation given that he/she participates is P(S∣P)=0.30
Compute the probability numerically that the student will participate in the Smart India Hackathon given that
he/she is prepared.
Using Bayes’ Theorem, find the probability that student will participate in the Smart India Hackathon given that
he/she is prepared. Derive the formula.
ANSWER:
Bayes’ theorem provides a way to find the probability of an event A occurring given that another event B
has already occurred, by relating it to the reverse conditional probability.
4) Interpretation (one line)
Given that a student is prepared, the probability they participate in SIH is 35%.
Q3. Discuss different probability distribution functions (Gaussian, Binomial, Bernoulli, Poisson).
ANSWER:
Q4. Write detailed notes on discrete and continuous random variables with suitable examples.
ANSWER:
[FOR EXTRA MARKS DRAW BAR GRAPH FOR DISCRETE AND SMOOTH CURVE GRAPH FOR
CONTINOUS]
Q5. Identify a suitable use case for supervised or unsupervised learning and Justify your answer.
ANSWER:
Supervised vs Unsupervised Learning
• Supervised Learning: A machine learning technique where the model is trained on labeled data
(input–output pairs). The goal is to learn the mapping from inputs to outputs.
• Unsupervised Learning: A technique where the model is trained on unlabeled data. The goal is to
discover hidden patterns, structures, or groups in the data.
Use Cases
1. Supervised Learning – Predicting Student Placement
o Scenario: We have past student records with academic scores, extracurricular activities, and
hackathon wins along with their placement status (Placed/Not Placed).
o Model: Classification algorithm (e.g., Logistic Regression, Decision Tree).
o Justification: Since the dataset already has labels (placement outcome), supervised learning
is suitable to predict placement for new students.
2. Unsupervised Learning – Grouping Students by Performance
o Scenario: If we only have student features (scores, activities, wins) without placement labels,
clustering can group students into “high performers,” “average,” and “low performers.”
o Model: Clustering algorithm (e.g., K-Means).
o Justification: As there are no predefined labels, unsupervised learning helps reveal natural
groupings in the data.
[draw diagram of supervised and unsupervised learning for better answer structuring]
Un-supervised [change ANN to ML model ]
supervised:
Q6. Explain the key differences between classification and regression with examples.
ANSWER:
Classification vs Regression
1. Classification
• Definition: A supervised learning task where the output variable (target) is categorical (discrete
classes).
• Goal: Assign input data into one of the predefined categories.
• Output: Class label (e.g., Yes/No, Placed/Not Placed).
• Examples:
o Predicting if a student will be Placed (1) or Not Placed (0).
o Email spam detection (Spam / Not Spam).
o Disease diagnosis (Positive / Negative).
2. Regression
• Definition: A supervised learning task where the output variable (target) is continuous (real numbers).
• Goal: Estimate or predict a numeric value based on input features.
• Output: Real-valued number.
• Examples:
o Predicting a student’s exam score out of 100.
o Predicting house prices based on size and location.
o Forecasting stock prices.
Key Differences Table
Aspect Classification Regression
Target Categorical (discrete classes) Continuous (real numbers)
variable
Output Class label (e.g., Yes/No, 0/1) Numeric value
Algorithms Logistic Regression, Decision Trees, SVM, Linear Regression, Ridge/Lasso, SVR,
Random Forest (classification mode) Neural Networks (regression mode)
Evaluation Accuracy, Precision, Recall, F1-score, ROC- Mean Squared Error (MSE), RMSE, MAE,
AUC R2R^2R2 score
Examples Placement prediction, spam detection Exam score prediction, house prices
Q7. List main applications of machine learning in real life and briefly explain any two.
ANSWER:
Main Applications of Machine Learning in Real Life
1. Healthcare – Disease prediction, medical imaging, personalized treatment.
2. Finance – Credit scoring, fraud detection, stock market prediction.
3. Education – Student performance prediction, personalized learning.
4. Retail & E-commerce – Product recommendation, customer segmentation.
5. Transportation – Self-driving cars, traffic prediction.
6. Natural Language Processing (NLP) – Chatbots, speech recognition, translation.
7. Cybersecurity – Intrusion detection, malware detection.
8. Agriculture – Crop yield prediction, pest detection.
Brief Explanation of Two Applications
1. Healthcare
• Machine learning helps in disease diagnosis using patient data and medical images.
• Example: ML models can detect cancer cells in X-rays or MRIs more accurately and faster than
manual methods.
• Predictive models also help in drug discovery and personalized treatment plans.
2. Finance
• ML is widely used in fraud detection, where algorithms analyze transaction patterns and flag
suspicious activities.
• Credit scoring models use supervised learning to decide whether a person is eligible for a loan.
• Stock market prediction models help in forecasting future trends using historical financial data.
Q8. Explain the basic workflow/steps of supervised learning.
ANSWER:
Workflow of Supervised Learning
Supervised learning uses labeled data (input–output pairs) to train a model that can predict outcomes for
unseen data. The key steps are:
1. Problem Definition
• Clearly define the task as classification (categorical output) or regression (continuous output).
• Example: Predicting student placement (Yes/No → classification) or exam score (numerical →
regression).
2. Data Collection
• Gather relevant data containing both features (inputs) and labels (outputs).
• Example: Academic scores, extracurricular activities (features) and placement status (label).
3. Data Preprocessing
• Handle missing values (imputation).
• Remove outliers or noise.
• Encode categorical variables (e.g., Yes/No → 1/0).
• Normalize/standardize features to bring them to the same scale.
4. Splitting the Dataset
• Divide data into:
o Training set – used to fit the model (70–80%).
o Testing/validation set – used to evaluate the model (20–30%).
• Often cross-validation is used for more robust evaluation.
5. Model Selection
• Choose an algorithm depending on the task:
o Classification → Logistic Regression, Decision Trees, Random Forest, SVM.
o Regression → Linear Regression, Ridge/Lasso, SVR.
6. Model Training
• Feed the training data into the chosen algorithm.
• The model learns patterns by minimizing error between predicted and actual labels.
• Example: Linear regression minimizes Mean Squared Error (MSE).
7. Model Evaluation
• Test the trained model on unseen data.
• Use suitable metrics:
o For classification → Accuracy, Precision, Recall, F1-score, ROC-AUC.
o For regression → Mean Squared Error (MSE), Root MSE, R^2 score.
8. Hyperparameter Tuning
• Adjust algorithm parameters (e.g., learning rate, tree depth, regularization strength) to improve
performance.
• Done using techniques like Grid Search or Cross-Validation.
9. Deployment
• The final model is integrated into a real-world system to make predictions on new incoming data.
• Example: A placement prediction model used by a university’s career cell.
10. Monitoring & Updating
• Model performance is continuously monitored.
• Retraining is done when new data or trends appear (to avoid data drift and bias).
Q9. The following dataset shows items produced, defective items found last time, and predicted defective items
for the next batch in a production company:
ID Items Broken Expected
1 10 12 10
2 12 15 14
3 9 8 9
4 15 20 18
5 11 10 11
• Differentiate discrete and continuous random variables with suitable examples.
• Justify how number of broken items can be modeled as a discrete random variable
• Suggest a suitable probability distribution for modeling the count of broken items in each batch.
ANSWER: [THIS A BIT TRICKY SO IF NOT UNDERSTOOD MEMORISE THIS]
Q10.
How does MLE handle different species (A, B, C) in the dataset?
In this dataset, what role do class priors (species proportions) play in MLE classification?
What is the principle behind Maximum Likelihood Estimation (MLE) in machine learning?
ANSWER:
Q11. A real estate company wants to study the relationship between the number of rooms in a house (X) and the
estimate of the house (Y). 8 various houses were observed. The data doesn’t fit on a straight line, company
decides to use least squares to fit a straight line.
The best fit line will help to estimate house price observed number of rooms in a house.
X : 1, 2, 3, 4
Y ($): 10, 15, 18, 20, 25
Find best fit line using OLS.
ANSWER:
Q12.
What is Bayesian Regression in simple words?
How does Bayesian linear regression relate to ridge regression?
How does Bayesian regression quantify uncertainty in predicting house prices for a new sample with RM=7.0,
LSTAT=6.0, TAX=250?
ANSWER:
1. What is Bayesian Regression (in simple words)?
• Bayesian regression is a method where instead of finding just one best line (as in ordinary least
squares), we treat the model parameters (slope, intercept, coefficients) as random variables with
probability distributions.
• It combines:
o Prior belief (what we assume about parameters before seeing data), and
o Likelihood (how well the parameters explain the observed data).
• The result is a posterior distribution of parameters, which tells us not only the “most likely fit” but
also how uncertain we are.
In simple words: Bayesian regression gives a range of possible lines with probabilities instead of only one
line.
Points to remember:
• Bayesian regression treats coefficients as random, giving distributions not single
values.
• With Gaussian priors, it is equivalent to ridge regression but with uncertainty
quantified.
• For new houses, it predicts a distribution of possible prices (mean ± credible
interval) rather than just one number.