MACHINE LEARNING MID-1
QUESTION BANK
One Mark Questions:
1. A computer program is said to learn from experience E with respect to some task T
and some performance measure P if its performance on T, as measured by P,
improves with experience E. Suppose we feed a learning algorithm a lot of historical
weather data, and have it learn to predict weather. In this setting, what is T?
The weather prediction task.
2 Suppose you are working on weather prediction, and use a learning algorithm to
predict tomorrow’s temperature (in degrees Centigrade/Fahrenheit). Would you
treat this as a classification or a regression problem?
Regression
3. Suppose you are working on weather prediction, and your weather station makes
one of three predictions for each day’s weather: Sunny, Cloudy or Rainy. You’d
like to use a learning algorithm to predict tomorrow’s weather. Would you treat
this as a classification or a regression problem? Classification
4. Suppose you are working on stock market prediction, and you would like to predict
the price of a particular stock tomorrow (measured in dollars). You want to use
a learning algorithm for this. Would you treat this as a classification or a regression
problem
Regression
5. Suppose you are working on stock market prediction. You would like to predict
whether or not a certain company will declare bankruptcy within the next 7
days (by training on data of similar companies that had previously been at risk of
bankruptcy). Would you treat this as a classification or a regression problem?
Classification
6. Suppose you are working on stock market prediction, typically tens of millions of
shares of Microsoft stock are traded (i.e., bought/sold) each day. You would like to
predict the number of Microsoft shares that will be traded tomorrow. Would you
treat this as a classification or a regression problem? Regression
7. Which of these is a reasonable definition of machine learning?
a. Machine learning is the science of programming computers.
b. Machine learning learns from labeled data.
c. Machine learning is the field of allowing robots to act intelligently.
d. Machine learning is the field of study that gives computers the ability to
learn without being explicitly programmed.
8. Which of the following are supervised learning problems? (multiple may be
correct)
a. Learning to drive using a reward signal.
b. Predicting disease from blood sample.
c. Grouping students in the same class based on similar features.
d. Face recognition to unlock your phone.
9. Which of the following are classification problems? (multiple may be correct)
a. Predict the runs a cricketer will score in a particular match.
b. Predict which team will win a tournament.
c. Predict whether it will rain today.
d. Predict your mood tomorrow.
10.Which of the following is a regression task? (multiple options may be correct)
a. Predict the price of a house 10 years after it is constructed.
b. Predict if a house will be standing 50 years after it is constructed.
c. Predict the weight of food wasted in a restaurant during next month.
d. Predict the sales of a new Apple product.
11.Which of the following is an unsupervised learning task? (multiple options may be
correct)
a. Group audio files based on language of the speakers.
b. Group applicants to a university based on their nationality.
c. Predict a student’s performance in the final exams.
d. Predict the trajectory of a meteorite.
12.Which of the following is an unsupervised learning task? (multiple options may be
correct)
a. Group audio files based on language of the speakers.
b. Group applicants to a university based on their nationality.
c. Predict a student’s performance in the final exams.
d. Predict the trajectory of a meteorite.
13.Given below is your dataset. You are using KNN regression with K=1. What is the
prediction for a new input value (3, 2)?
14.For regression, write the general predication formula for linear regression. Y =
mX + b,
15.For classification, write the general predication formula for logistic regression p =
exp(y) / [1 + exp(y)]
16.Calculate the entropy for the following data set
Patient Chest Male Smokes Exercises Heart
ID Pain Attack
1 Yes Yes No Yes Yes
2 Yes Yes Yes No Yes
3 No No Yes No Yes
4 No Yes No Yes No
5 Yes No Yes Yes Yes
6 No Yes Yes Yes No
17.Calculate the Information Gain for the attribute Male for the following data set
Patient Chest Male Smokes Exercises Heart
ID Pain Attack
1 Yes Yes No Yes Yes
2 Yes Yes Yes No Yes
3 No No Yes No Yes
4 No Yes No Yes No
5 Yes No Yes Yes Yes
6 No Yes Yes Yes No
18.Calculate the Information Gain for the attribute Smokes for the following data
set
Patient Chest Male Smokes Exercises Heart
ID Pain Attack
1 Yes Yes No Yes Yes
2 Yes Yes Yes No Yes
3 No No Yes No Yes
4 No Yes No Yes No
5 Yes No Yes Yes Yes
6 No Yes Yes Yes No
19.Calculate the Information Gain for the attribute Chest Pain for the following
data set
Patient Chest Male Smokes Exercises Heart
ID Pain Attack
1 Yes Yes No Yes Yes
2 Yes Yes Yes No Yes
3 No No Yes No Yes
4 No Yes No Yes No
5 Yes No Yes Yes Yes
6 No Yes Yes Yes No
20.Calculate the Information Gain for the attribute Exercises for the following data
set
Patient Chest Male Smokes Exercises Heart
ID Pain Attack
1 Yes Yes No Yes Yes
2 Yes Yes Yes No Yes
3 No No Yes No Yes
4 No Yes No Yes No
5 Yes No Yes Yes Yes
6 No Yes Yes Yes No
5 Mark Questions:
1. Write your view on how machine learning programming is different from
classical programming. Write the types of machine learning and how they are
different from each other.
The difference between normal programming and machine learning is that
programming aims to answer a problem using a predefined set of rules or logic. In
contrast, machine learning seeks to construct a model or logic for the problem by
analyzing its input data and answers.
Supervised Learning
• Train the machines using the "labelled" dataset, and based on the training, the
machine predicts the output.
• Categories of Supervised Machine Learning
❖ Classification
❖ Regression
❖ Applications of Supervised Learning
❖ Image Segmentation
❖ Medical Diagnosis
❖ Fraud Detection
❖ Spam detection Speech Recognition
Unsupervised Learning
• The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Categories
of Unsupervised Machine Learning
• Unsupervised Learning can be further classified into two types, which are given
below:
❖ Clustering
❖ Association
• Applications of Unsupervised Learning
❖ Network Analysis
❖ Recommendation Systems
❖ Anomaly Detection
❖ Singular Value Decomposition
Re-Inforcement Learning
• Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance.
• Real-world Use cases of Reinforcement Learning
➢ Video Games
➢ Resource Management
➢ Robotics
➢ Text Mining
2. Explain the packages available for machine learning programming in python and
write the details of each package
Python is a popular programming language for machine learning due to its ease of
use, flexibility, and large ecosystem of libraries and packages.
NumPy: NumPy is one of the fundamental packages for scientific computing in
Python. It contains functionality for multidimensional arrays, high-level
mathematical functions such as linear algebra operations and the Fourier
transform, and pseudorandom number generators.
SciPy: SciPy is a collection of functions for scientific computing in Python. It
provides, among other functionality, advanced linear algebra routines,
mathematical function optimization, signal processing, special mathematical
functions, and statistical distributions.
Matplotlib: Matplotlib is the primary scientific plotting library in Python. It
provides functions for making publication-quality visualizations such as line
charts, histograms, scatter plots, and so on. Visualizing your data and different
aspects of your analysis can give you important insights, and we will be using
matplotlib for all our visualizations.
Pandas: Pandas is a Python library for data wrangling and analysis. It is built
around a data structure called the Data Frame that is modelled after the R Data
Frame. Simply put, a panda Data Frame is a table, similar to an Excel spreadsheet.
pandas provide a great range of methods to modify and operate on this table; in
particular, it allows SQL-like queries and joins of tables.
3. Consider the following Balloons Data Set consists of four attributes (Color, Size,
Act, and Age) and a binary label (Inflated). You will identify root node of a decision
tree for below dataset using Information Gain.
4. Compute the linear regression equation (trendline). The trendline will be y = mx+b
for some values of m (Slope of the trendline) and b (y-intercept).
Prices for round-shaped diamonds
5. Explain KNN algorithm with example and make some reasonable remarks on K
value in KNN algorithm
6. How Logistic regression is different form linear regression and explain logistic
regression with sample example.
10 Mark Questions
1. Consider the following data set about 11 different restaurants and in particular about
the kind of restaurant (fast food, ethnic or casual dining), their prices (low, average or
high), their locations (Oakland, Shadyside or Squirrel Hill), whether they can comply
with dietary restrictions (none, vegetarian or gluten free) and whether you enjoyed
them or not. The data is reported in the following table:
Using this above dataset build a decision tree to decide whether you would enjoy a
particular restaurant or not, showing at each level how you decided which attribute to
expand next.
2. Consider the following data set PlayTennis of 14 samples, four features
outlook, temperature, humidity, and wind.
Given a new instance,𝒙′= (Outlook=Sunny, Temperature=Mild, Humidity=Normal,
Wind=Strong), predict the class (Yes/No) of given instance
𝒙′ using Naïve Bayes Algorithm.
Note:
Map rule
P(Yes|x’)=P(Sunny|Yes)*P(Mild|Yes)*P(Normal|Yes)*P(Strong|Yes)*P(Play=Yes) /P(x’)
P(No|x’)=P(Sunny|No)* P(Mild|No)*P(Normal|No)*P(Strong|No)*P(Play=No) / P(x’)