0% found this document useful (0 votes)
41 views24 pages

ML Question Bank Ans

The document outlines fundamental questions and concepts related to machine learning, including definitions of learning, types of learning (supervised, unsupervised, reinforcement), and various machine learning algorithms like KNN and K-Means. It also discusses statistical concepts such as central tendency, skewness, kurtosis, and distance metrics. Additionally, it highlights the advantages and disadvantages of different clustering methods and evaluation metrics used in classification.

Uploaded by

Mua Deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views24 pages

ML Question Bank Ans

The document outlines fundamental questions and concepts related to machine learning, including definitions of learning, types of learning (supervised, unsupervised, reinforcement), and various machine learning algorithms like KNN and K-Means. It also discusses statistical concepts such as central tendency, skewness, kurtosis, and distance metrics. Additionally, it highlights the advantages and disadvantages of different clustering methods and evaluation metrics used in classification.

Uploaded by

Mua Deb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Some fundamental questions up to ML (CS-3035) mid-semester exam.

(but not limited to):

https://www.javatpoint.com/clustering-in-machine-learning

1. What is learning in computer science/machine learning?


‘Learning’ in machine learning refers to the determination of model parameters using the given
training dataset.

2. Differentiate between weak-AI and Strong-AI.

3. Differentiate between Machine Learning and Deep Learning.


4. Explain different types of learning using suitable real-world examples.

Learning Problems

● 1. Supervised Learning

An example of supervised learning is text classification problems. In this set of problems, the goal is to
predict the class label of a given piece of text. One particularly popular topic in text classification is to
predict the sentiment of a piece of text, like a tweet or a product review.

● 2. Unsupervised Learning

Finding customer segments Clustering​ is an unsupervised technique where the goal is to find natural groups
or clusters in a feature space and interpret the input data. There are many different clustering algorithms.
One common approach is to divide the data points in a way that each data point falls into a group that is
similar to other data points in the same group based on a predefined similarity or distance metric in the
feature space. Clustering is commonly used for determining customer segments in marketing data. Being
able to determine different segments of customers helps marketing teams approach these customer
segments in unique ways. (Think of features like gender, location, age, education, income bracket, and so
on.)

https://www.springboard.com/blog/lp-machine-learning-unsupervised-learning-su
pervised-learning​/

● 3. Reinforcement Learning

The example of reinforcement learning is your cat is an agent that is exposed to the environment. The
biggest characteristic of this method is that there is no supervisor, only a real number or reward signal.
5. Differentiate between supervised learning and unsupervised learning.

6. Explain different data types used in modern machine learning paradigm


with examples.

https://medium.com/swlh/data-types-in-statistics-used-for-machine-learning-5b4c24ae6036

7. What is the true zero point for a numeric data type? Explain with an example.
Absolute​/​true zero​ means that the ​zero point​ represents the absence of the property being
measured

8. Differentiate between Univariate and Multivariate data analysis.


9. What do you mean by central tendency? Explain it with suitable examples.

Central tendency means measuring the center or distribution of location of values of a data set. It gives an
idea of the average value of the data in the data set and also an indication of how widely the values are
spread in the data set.

Measures of Central Tendency

Generally, the central tendency of a dataset can be described using the following measures:

● Mean (Average): ​Represents the sum of all values in a dataset divided by the total number of
the values.
● Median: T​ he middle value in a dataset that is arranged in ascending order (from the smallest
value to the largest value). If a dataset contains an even number of values, the median of the
dataset is the mean of the two middle values.
● Mode: D​ efines the most frequently occurring value in a dataset. In some cases, a dataset
may contain multiple modes while some datasets may not have any mode at all.

Some of the important examples of central tendency include mode, median, arithmetic mean and
geometric mean, etc.
10. What is skewness? Explain at least one remedy for it.

Skewness is a quantifiable measure of how distorted a data sample is from the normal
distribution. In normal distribution, the data is represented graphically in a bell-shaped curve,
where the mean (average) and mode (maximum value in the data set) are equal.
Remedy:Log Transform
Log transformation is most likely the first thing you should do to remove skewness from the
predictor.It can be easily done via ​Numpy,​ just by calling the ​log()​ function on the desired
column.
Remedy 2​:​Square Root Transform:The square root sometimes works great and sometimes isn’t
the best suitable option. In this case, I still expect the transformed distribution to look somewhat
exponential, but just due to taking a square root the range of the variable will be smaller.

11. What is kurtosis? Discuss at least one solution for it

Kurtosis​ is a measure of whether the data ​are​ heavy-tailed or light-tailed relative to a normal
distribution. That is, data sets with high ​kurtosis​ tend to have heavy tails, or outliers. Data sets
with low ​kurtosis​ tend to have light tails, or lack of outliers. A uniform distribution ​would​ be the
extreme case.

12. Explain the similarity and dissimilarity between Normal distribution and
Student's T-test.

Dissimilarities:
1. Standard from the comparison of normal distribution and t-distribution density function. The
density function of the t distribution has a thicker tail than the standard normal distribution.
2. in addition to all t-distribution will give better results than normal distribution whenever we
have less number of data points(<30 in general).
3. The difference is that the t distribution is leptokurtic, and so has higher kurtosis than the
normal distribution. That means that, for a t and a normal with the same mean and variance,
data from the t distribution have a tendency to appear either closer to the mean or farther
from the mean than typical normal data, with a more sudden transition in between. And that
means that the probability of obtaining values very far from the mean is larger than in the
normal distribution.
Similarities​:
1. Like the normal distribution, the t-distribution is symmetric. If you think about folding it in half
at the mean, each side will be the same
2. Like a standard normal distribution (or z-distribution), the t-distribution has a mean of zero.
The normal distribution assumes that the population standard deviation is known.

13. Explain discriminative and generative learning models with suitable Examples.

14. What are the different types of errors used in machine learning?
https://medium.com/towards-artificial-intelligence/12-common-errors-in-machine-learning-729cb
9d0952a

15. Differentiate between Type-I and Type-II errors.

16. Explain different evaluation metrics used in classification.

https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/

17. What are the properties of a distance metric?

Distance metrics play an important role in machine learning. They provide a strong foundation for several
machine learning algorithms like k-nearest neighbors for supervised learning and k-means clustering for
unsupervised learning. Different distance metrics are chosen depending upon the type of the data.
Distance Metrics is used to know the input data pattern in order to make any Data Based decision. A
good distance metric helps in improving the performance of Classification, Clustering and Information
Retrieval process significantly.

18. Discuss different types of distance metrics using suitable expressions.


There are 4 types of Distance Metrics in Machine Learning.
1. Euclidean Distance
2. Manhattan Distance
3. Minkowski Distance
4. Hamming Distance
Euclidean Distance​ -

Euclidean Distance represents the shortest distance between two points

Manhattan Distance -
Manhattan Distance is the sum of absolute differences between points across all the dimensions.

Minkowski Distance
Minkowski Distance is the generalized form of Euclidean and Manhattan Distance.

Hamming Distance
Hamming Distance measures the similarity between two strings of the same length. The Hamming
Distance between two strings of the same length is the number of positions at which the corresponding
characters are different.

19. Explain the Minkowski Distance is a generalization of the Manhattan and


Euclidean Distance metrics.

Minkowski Distance is the generalized form of Euclidean and Manhattan Distance.

Minkowski Distance calculates the distance between two points. It is a generalization of the
Euclidean and Manhattan distance measures and adds a parameter, called the “​order”​ or “​p“​ , that
allows different distance measures to be calculated.

The Minkowski distance measure is calculated as follows:


Where “​p​” is the order parameter.

When p is set to 1, the calculation is the same as the Manhattan distance. When p is set to 2, it is the
same as the Euclidean distance.

● p=1:​ Manhattan distance.


■ p=2:​ Euclidean distance.

Intermediate values provide a controlled balance between the two measures.

It is common to use Minkowski distance when implementing a machine learning algorithm that uses
distance measures as it gives control over the type of distance measure used for real-valued vectors via a
hyperparameter “​p​” that can be tuned.

20. Explain any two bounded distance metrics with examples.

21. Explain the Voronoi diagram used in KNN?


https://www.youtube.com/watch?v=PGy1rATkViA

22. What is KNN classification? Why is it called lazy learning?

● K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
● K-NN algorithm assumes the similarity between the new case/data and available cases and put the new
case into the category that is most similar to the available categories.
● K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.
● K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the
Classification problems.
● K-NN is a ​non-parametric algorithm​, which means it does not make any assumption on underlying data.

K-NN is a lazy learner because it doesn’t learn a discriminative function from the training data but memorizes the
training dataset instead. There is no training time in K-NN. The prediction step in K-NN is expensive. Each time we
want to make a prediction, K-NN is searching for the nearest neighbors in the entire training set. An eager learner has
a model fitting or training step. A lazy learner does not have a training phase.

23. Explain the KNN algorithm with a small hand crafted dataset and
demonstrate its working principle.
1. Load the desired data.
2. Choose the value of k.
3. For getting the class which is to be predicted, repeat starting from 1 to the total number of
training points we have.
4. The next step is to calculate the distance between the data point whose class is to be predicted
and all the training data points. Euclidean distance can be used here.
5. Arrange the distances in non-decreasing order.
6. Assume the positive value of k and filtering k lowest values from the sorted list.
7. We have top k top distances.
8. Let ka represent the points that belong to the ath class among k points.
9. If ka>kb then put x in the class.

https://www.analyticssteps.com/blogs/how-does-k-nearest-neighbor-works-machine-learning-classificati
on-problem

<Demonstration ke liye 50 chaap do>

24. Explain the advantages and disadvantages of KNN algorithm.

Some Advantages of KNN

● Quick calculation time


● Simple algorithm – to interpret
● Versatile – useful for regression and classification
● High accuracy – you do not need to compare with better-supervised learning models
● No assumptions about data – no need to make additional assumptions, tune several
parameters, or build a model. This makes it crucial in nonlinear data case.

Some Disadvantages of KNN

● Accuracy depends on the quality of the data


● With large data, the prediction stage might be slow
● Sensitive to the scale of the data and irrelevant features
● Require high memory – need to store all of the training data
● Given that it stores all of the training, it can be computationally expensive

25. How many types of clustering are available in ML? Explain each type
with examples.

Types of Clustering Methods :

The clustering methods are broadly divided into ​Hard clustering​ (datapoint belongs to only one group) and ​Soft Clustering​ (data
points can belong to another group also). But there are also other various approaches of Clustering exist. Below are the main
clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the centroid-based method. The most
common example of partitioning clustering is the K-Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number of pre-defined groups. The cluster
center is created in such a way that the distance between the data points of one cluster is minimum as compared to another cluster
centroid​.

Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters, and the arbitrarily shaped distributions are
formed as long as the dense region can be connected. This algorithm does it by identifying different clusters in the dataset and
connects the areas of high densities into clusters. The dense areas in data space are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying densities and high dimensions.

Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is divided based on the probability of how a dataset belongs to a
particular distribution. The grouping is done by assuming some distributions commonly Gaussian Distribution.

The example of this type is the Expectation-Maximization Clustering algorithm that uses Gaussian Mixture Models (GMM).
Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no requirement of pre-specifying the
number of clusters to be created. In this technique, the dataset is divided into clusters to create a tree-like structure, which is also
called a ​dendrogram​. The observations or any number of clusters can be selected by cutting the tree at the correct level. The most
common example of this method is the ​Agglomerative Hierarchical algorithm.

Fuzzy Clustering

Fuzzy​ clustering is a type of soft method in which a data object may belong to more than one group or cluster. Each dataset has a
set of membership coefficients, which depend on the degree of membership to be in a cluster. Fuzzy C-means algorithm is the
example of this type of clustering; it is sometimes also known as the Fuzzy k-means algorithm.

26. Explain the KMeans algorithm with a small hand crafted dataset and
demonstrate its working principle.
The k-means clustering algorithm attempts to split a given anonymous data set (a set containing no
information as to class identity) into a fixed number (k) of clusters.
Initially k number of so called centroids are chosen. A centroid is a data point (imaginary or real) at the
center of a cluster. In Praat each centroid is an existing data point in the given input data set, picked at
random, such that all centroids are unique (that is, for all centroids ci and cj, ci ≠ cj). These centroids are
used to train a kNN classifier. The resulting classifier is used to classify (using k = 1) the data and thereby
produce an initial randomized set of clusters. Each centroid is thereafter set to the arithmetic mean of the
cluster it defines. The process of classification and centroid adjustment is repeated until the values of the
centroids stabilize. The final centroids will be used to produce the final classification/clustering of the input
data, effectively turning the set of initially anonymous data points into a set of data points, each with a
class identity.

<Demonstration ke liye 49 chaap do>


27. Explain the advantages and disadvantages of KMeans clustering.
K-Means Advantages :

1) If variables are huge, then K-Means most of the times computationally faster than hierarchical
clustering, if we keep k smalls.

2) K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.

K-Means Disadvantages :

1) Difficult to predict K-Value.


2) With global cluster, it didn't work well.
3) ​Different initial partitions can result in different final clusters.
4) It does not work well with clusters (in the original data) of Different size and Different density

28. Differentiate between classification and regression.

29. Why do we call the Linear Regression a linear model?


Linear regression is called linear because you model your output variable (lets call it f(x)) as a
linear combination of inputs and weights (lets call them x and w respectively).
30. Derive the cost function of Linear Regression using step by step
Explanation.
31. What are the assumptions in Linear Regression? Also mention the
solutions for them.
The Four Assumptions of Linear Regression

● Linear relationship: There exists a linear relationship between the independent variable,
x, and the dependent variable, y.
● Independence: The residuals are independent. ...
● Homoscedasticity: The residuals have constant variance at every level of x.
https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R5_Correlation-Regression/R5_Co
rrelation-Regression4.html
● Normality: The residuals of the model are normally distributed​.

how to solve the assumptions:

https://towardsdatascience.com/assumptions-of-linear-regression-5d87c347140

32. What is Polynomial Regression?

● Polynomial Regression is a regression algorithm that models the relationship between a


dependent(y) and independent variable(x) as nth degree polynomial. The Polynomial
Regression equation is given below:

y= b​0​+b​1​x​1​+ b​2​x​1​2​+ b​2​x​1​3​+...... b​n​x​1​n

● It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.
● It is a linear model with some modification in order to increase the accuracy.
● The dataset used in Polynomial regression for training is of non-linear nature.
● It makes use of a linear regression model to fit the complicated and non-linear functions
and datasets.

33. Differentiate between Linear Regression and Logistic Regression.


34. Justify the name “Logistic Regression”.

Logistic Regression​ is one of the basic and popular algorithm to solve a classification problem.
It is ​named​ as '​Logistic Regression​', because it's underlying technique is quite the same as
Linear ​Regression​. The term “​Logistic​” is taken from the ​Logit​ function that is used in this
method of classification.
35. Why do we use a logistic function in Logistic Regression?
36. Derive the log-loss cost function of Logistic Regression using step by
step explanation.

37. What are the advantages and disadvantages/drawbacks of Linear Regression


algorithm?
38. What are the merits and demerits of Logistic Regression model if any?
39. What is regularization?

Regularisation is a technique used to reduce the errors by fitting the function appropriately on
the given training set and avoid overfitting.

The commonly used regularisation techniques are :

1. L1 regularisation
2. L2 regularisation
3. Dropout regularisation
40. Explain different types of regularization in ML using appropriate
Examples.

L1 Regularization or Lasso Regularization

L1 Regularization or Lasso Regularization adds a penalty to the error function. The penalty is
the sum of the ​absolute​ values of weights.

p is the tuning parameter which decides how much we want to penalize the model.

L2 Regularization or Ridge Regularization

L2 Regularization or Ridge Regularization also adds a penalty to the error function. But the
penalty here is the sum of the ​squared​ values of weights.

Similar to L1, in L2 also, p is the tuning parameter which decides how much we want to penalize
the model.
41. Differentiate between Lasso (L1) and Ridge (L2) regularization.

42. What is Elastic-Net regularization?


In​ ​statistics​ and, in particular, in the fitting of​ ​linear​ or​ ​logistic regression​ models, the elastic net
is a​ ​regularized​ regression method that​ ​linearly combines​ the​ ​L1​​ and​ ​L2​​ penalties of the​ ​lasso
and​ ​ridge​ methods.The elastic net method overcomes the limitations of the LASSO (least
absolute shrinkage and selection operator) method which uses a penalty function based on
‖β‖1=∑j=1p|βj|

43. Explain the penalty term used in the Linear Regression for regularization
of the model.
44. Explain how the penalty term influences the Logistic Regression for
regularization of the model.

45. What is the Stochastic Gradient Descent Algorithm?

In Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data set for
each iteration. In Gradient Descent, there is a term called “batch” which denotes the total number of
samples from a dataset that is used for calculating the gradient for each iteration. In typical Gradient
Descent optimization, like Batch Gradient Descent, the batch is taken to be the whole dataset. Although,
using the whole dataset is really useful for getting to the minima in a less noisy and less random manner,
but the problem arises when our datasets gets big.
SGD ALGORITHM:-

46. What is the Least Square Method?

The ​least squares method​ is a statistical procedure to find the best fit for a set of data points by
minimizing the sum of the offsets or residuals of points from the plotted curve. ​Least squares​ regression
is used to predict the behavior of dependent variables.
47. Explain the OLS and Gauss-Markov Theorem.

Ordinary Least Squares regression (OLS)​ is more commonly named ​linear regression​ (simple or
multiple depending on the number of explanatory variables).

In the case of a model with p explanatory variables, the OLS regression model writes:

Y = β​0​ + Σ​j=1..p​ β​j​X​j​ + ε

where Y is the dependent variable, β​0​, is the intercept of the model, X​ j​ corresponds to the j​th
explanatory variable of the model (j= 1 to p), and e is the random error with expectation 0 and
variance σ².

In the case where there are n observations, the estimation of the predicted value of the
dependent variable Y for the i​th​ observation is given by:

y​i​ = β​0​ + Σ​j=1..p​ β​j​X​ij

The OLS method corresponds to minimizing the sum of square differences between the
observed and predicted values. This minimization leads to the following estimators of the
parameters of the model:

[β = (X’DX)​-1​ X’ Dy σ² = 1/(W –p*) Σ​i=1..n​ w​i​(y​i​ - y​i​)] where β is the vector of the estimators of the
β​i​ parameters, X is the matrix of the explanatory variables preceded by a vector of 1s, y is the
vector of the n observed values of the dependent variable, p* is the number of explanatory
variables to which we add 1 if the intercept is not fixed, w​i​ is the weight of the i​th​ observation,
and W is the sum of the w​i​ weights, and D is a matrix with the w​i​ weights on its diagonal.

Gauss-Markov:
The Gauss Markov theorem tells us that if a ​certain set of assumptions​ are met, the ​ordinary least squares​ estimate
for regression coefficients gives you the ​best linear unbiased estimate (BLUE)​ possible.

Gauss Markov Assumptions

There are five Gauss Markov assumptions (also called ​conditions)​ :

1. Linearity​: the ​parameters ​we are estimating using the OLS method must be themselves linear.
2. Random: our data must have been ​randomly sampled​ from the ​population​.
3. Non-Collinearity​: the regressors being calculated aren’t perfectly correlated with each other.
4. Exogeneity​: the regressors aren’t correlated with the ​error term​.
5. Homoscedasticity​: no matter what the values of our regressors might be, the error of the
variance is constant.
Purpose of the Assumptions

The Gauss Markov assumptions guarantee the ​validity ​of ​ordinary least squares​ for estimating ​regression
coefficients​.

Checking how well our data matches these assumptions is an important part of estimating regression coefficients.
When you know where these conditions are violated, you may be able to plan ways to change your experiment setup
to help your situation fit the ideal Gauss Markov situation more closely.

In practice, the Gauss Markov assumptions are rarely all met perfectly, but they are still useful as a benchmark, and
because they show us what ‘ideal’ conditions would be. They also allow us to pinpoint problem areas that might
cause our estimated regression coefficients to be inaccurate or even unusable.

Difference between OLS and GLS:


https://stats.stackexchange.com/questions/155031/how-to-determine-if-gls-improves-on-
ols

48. Find the equation of linear regression line using following data points:
(0,72),(5,66), (10,70), (15,64), (20,60)
49. Using K-mean Clustering Algorithm, Cluster the following data points:
(5,7), (11,45),(10, 6), (18,29), (10,25), (4,3); where K=2 and Euclidean
Distance.
50. Using KNN algorithm and the given data set, predict the label of the test
data point (3,7), where K=3 and Euclidean distance.
X Y Label
771
741
342
142

You might also like