0% found this document useful (0 votes)
42 views

Important Questions

The EM algorithm is an iterative method to find maximum likelihood estimates of parameters in statistical models where the data is incomplete. It has two steps: (1) the E-step, where it estimates missing data/latent variables based on current parameter estimates, and (2) the M-step, where it computes new parameter estimates based on the current estimates of the missing data. This process is repeated until the parameter estimates converge.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Important Questions

The EM algorithm is an iterative method to find maximum likelihood estimates of parameters in statistical models where the data is incomplete. It has two steps: (1) the E-step, where it estimates missing data/latent variables based on current parameter estimates, and (2) the M-step, where it computes new parameter estimates based on the current estimates of the missing data. This process is repeated until the parameter estimates converge.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

1. Explain EM algorithm with steps.

Answer
The Expectation-Maximization (EM) algorithm is an iterative way to find
maximum-likelihood estimates for model parameters when the data is
incomplete or has missing data points or has some hidden variables.
EM chooses random values for the missing data points and estimates a new
set of data.
These new values are then recursively used to estimate a better first data, by
filling up missing points, until the values get fixed.
These are the two basic steps of the EM algorithm :
a. Estimation Step : i. Initialize k , k and k by random values, or by K
means clustering results or by hierarchical clustering results. ii. Then for
those given parameter values, estimate the value of the latent variables (i.e.,
 k ).
b. Maximization Step : Update the value of the parameters (i.e.,k , k and
k ) calculated using ML method : i. Initialize the mean k , the covariance
matrix k and the mixing coefficients k by random values, (or other
values). ii. Compute the k values for all k. iii. Again estimate all the
parameters using the current k values. iv. Compute log-likelihood function.
v. Put some convergence criterion. vi. If the log-likelihood value converges
to some value (or if all the parameters converge to some values) then stop,
else return to Step 2.
2. Describe the usage, advantages and disadvantages of EM algorithm.
Answer Usage of EM algorithm : 1. It can be used to fill the missing data in a
sample.
2. It can be used as the basis of unsupervised learning of clusters.
3. It can be used for the purpose of estimating the parameters of Hidden Markov
Model (HMM).
4. It can be used for discovering the values of latent variables.
Advantages of EM algorithm are :
1. It is always guaranteed that likelihood will increase with each iteration.
2. The E-step and M-step are often pretty easy for many problems in terms of
implementation.
3. Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm are :
1. It has slow convergence.
2. It makes convergence to the local optima only.
3. It requires both the probabilities, forward and backward (numerical optimization
requires only forward probability).
3. What are the types of support vector machine ?
Answer Following are the types of support vector machine :
1. Linear SVM : Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
2. Non-linear SVM : Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
4. What is polynomial kernel ? Explain polynomial kernel using one dimensional
and two dimensional.
ANSWER: 1. The polynomial kernel is a kernel function used with Support Vector
Machines (SVMs) and other kernelized models, that represents the similarity of
vectors (training samples) in a feature space over polynomials of the original
variables, allowing learning of non-linear models.
2. Polynomial kernel function is given by the equation : (a × b + r) d where, a and b
are two different data points that we need to classify. r determines the coefficients
of the polynomial. d determines the degree of the polynomial.
3. We perform the dot products of the data points, which gives us the high
dimensional coordinates for the data.
4. When d = 1, the polynomial kernel computes the relationship between each pair
of observations in 1-Dimension and these relationships help to find the support
vector classifier.
5. When d = 2, the polynomial kernel computes the 2-Dimensional relationship
between each pair of observations which help to find the support vector classifier.
5. Describe Gaussian Kernel (Radial Basis Function).
ANSWER:

6. Write short note on hyperplane (Decision surface).


ANSWER:1. A hyperplane in an n-dimensional Euclidean space is a flat, n-1
dimensional subset of that space that divides the space into two disconnected parts.
2. For example let’s assume a line to be one dimensional Euclidean space.
3. Now pick a point on the line, this point divides the line into two parts.
4. The line has 1 dimension, while the point has 0 dimensions. So a point is a
hyperplane of the line.
5. For two dimensions we saw that the separating line was the hyperplane.
6. Similarly, for three dimensions a plane with two dimensions divides the 3d
space into two parts and thus act as a hyperplane.
7. Thus for a space of n dimensions we have a hyperplane of n-1 dimensions
separating it into two parts.
7. What are the advantages and disadvantags of SVM ?
ANSWER: Advantages of SVM are : 1. Guaranteed optimality : Owing to the
nature of Convex Optimization, the solution will always be global minimum, not a
local minimum.
2. The abundance of implementations : We can access it conveniently.
3. SVM can be used for linearly separable as well as non-linearly separable data.
Linearly separable data pases hard margin whereas non-linearly separable data
poses a soft margin.
4. SVMs provide compliance to the semi-supervised learning models. It can be
used in areas where the data is labeled as well as unlabeled. It only requires a
condition to the minimization problem which is known as the transductive SVM.
5. Feature Mapping used to be quite a load on the computational complexity of the
overall training performance of the model. However, with the help of Kernel Trick,
SVM can carry out the feature mapping using the simple dot product.
Disadvantages of SVM : 1. SVM does not give the best performance for handling
text structures as compared to other algorithms that are used in handling text data.
This leads to loss of sequential information and thereby, leading to worse
performance.
2. SVM cannot return the probabilistic confidence value that is similar to logistic
regression. This does not provide much explanation as the confidence of prediction
is important in several applications.
3. The choice of the kernel is perhaps the biggest limitation of the support vector
machine. Considering so many kernels present, it becomes difficult to choose the
right one for the data.
8. Explain the properties of SVM.
ANSWER: Following are the properties of SVM :
1. Flexibility in choosing a similarity function : Sparseness of solution when
dealing with large data sets only support vectors are used to specify the separating
hyperplane
2. Ability to handle large feature spaces : complexity does not depend on the
dimensionality of the feature space
3. Overfitting can be controlled by soft margin approach : A simple convex
optimization problem which is guaranteed to converge to a single global solution.
9. What are the parameters used in support vector classifier ?
ANSWER: Parameters used in support vector classifier are :
1. Kernel : a. Kernel, is selected based on the type of data and also the type of
transformation. b. By default, the kernel is Radial Basis Function Kernel (RBF).
2. Gamma : a. This parameter decides how far the influence of a single training
example reaches during transformation, which in turn affects how tightly the
decision boundaries end up surrounding points in the input space. b. If there is a
small value of gamma, points farther apart are considered similar. c. So, more
points are grouped together and have smoother decision boundaries (may be less
accurate). d. Larger values of gamma cause points to be closer together (may cause
overfitting).
3. The 'C' parameter : a. This parameter controls the amount of regularization
applied on the data. b. Large values of C mean low regularization which in turn
causes the training data to fit very well (may cause overfitting). c. Lower values of
C mean higher regularization which causes the model to be more tolerant of errors
(may lead to lower accuracy).
10. Why do we use decision tree?
ANSWER:
1.Decision trees can be visualized, simple to understand and interpret.
2. They require less data preparation whereas other techniques often require data
normalization, the creation of dummy variables and removal of blank values.
3. The cost of using the tree (for predicting data) is logarithmic in the number of
data points used to train the tree.
4. Decision trees can handle both categorical and numerical data whereas other
techniques are specialized for only one type of variable.
5. Decision trees can handle multi-output problems.
6. Decision tree is a white box model i.e., the explanation for the condition can be
explained easily by Boolean logic because there are two outputs. For example yes
or no.
7. Decision trees can be used even if assumptions are violated by the dataset from
which the data is taken.
11. How can we express decision trees ?
ANSWER:
12. Explain various decision tree learning algorithms.
ANSWER: Various decision tree learning algorithms are :
1. ID3 (Iterative Dichotomiser 3) :
i. ID3 is an algorithm used to generate a decision tree from a dataset.
ii. To construct a decision tree, ID3 uses a top-down, greedy search through the
given sets, where each attribute at every tree node is tested to select the attribute
that is best for classification of a given set. iii. Therefore, the attribute with the
highest information gain can be selected as the test attribute of the current node.
iv. In this algorithm, small decision trees are preferred over the larger ones. It is a
heuristic algorithm because it does not construct the smallest tree.
v. For building a decision tree model, ID3 only accepts categorical attributes.
Accurate results are not given by ID3 when there is noise and when it is serially
implemented.
vi. Therefore data is preprocessed before constructing a decision tree.
vii. For constructing a decision tree information gain is calculated for each and
every attribute and attribute with the highest information gain becomes the root
node. The rest possible values are denoted by arcs.
viii. All the outcome instances that are possible are examined whether they belong
to the same class or not. For the instances of the same class, a single name is used
to denote the class otherwise the instances are classified on the basis of splitting
attribute.
CART (Classification And Regression Trees) :
i. CART algorithm builds both classification and regression trees.
ii. The classification tree is constructed by CART through binary splitting of
the attribute.
iii. Gini Index is used for selecting the splitting attribute.
iv. The CART is also used for regression analysis with the help of regression
tree.
v. The regression feature of CART can be used in forecasting a dependent
variable given a set of predictor variable over a given period of time. vi.
CART has an average speed of processing and supports both continuous and
nominal attribute data.
13. Explain attribute selection measure used in decision tree.
ANSWER:

14. Explain inductive bias with inductive system.


ANSWER: Inductive bias :
1. Inductive bias refers to the restrictions that are imposed by the assumptions
made in the learning method.
2. For example, assuming that the solution to the problem of road safety can be
expressed as a conjunction of a set of eight concepts.
3. This does not allow for more complex expressions that cannot be expressed as
a conjunction.
4. This inductive bias means that there are some potential solutions that we cannot
explore, and not contained within the version space we examine.
5. Order to have an unbiased learner, the version space would have to contain
every possible hypothesis that could possibly be expressed.
6. The solution that the learner produced could never be more general than the
complete set of training data.
7. In other words, it would be able to classify data that it had previously seen (as
the rote learner could) but would be unable to generalize in order to classify new,
unseen data.
8. The inductive bias of the candidate elimination algorithm is that it is only able
to classify a new piece of data if all the hypotheses contained within its version
space give data the same classification.
9. Hence, the inductive bias does impose a limitation on the learning method.

15. Which learning algorithms are used in inductive bias ?


Answer :Learning algorithm used in inductive bias are :
1. Rote-learner : a. Learning corresponds to storing each observed training
example in memory. b. Subsequent instances are classified by looking them up in
memory. c. If the instance is found in memory, the stored classification is returned.
d. Otherwise, the system refuses to classify the new instance. e. Inductive bias :
There is no inductive bias.
2. Candidate-elimination : a. New instances are classified only in the case where
all members of the current version space agree on the classification. b. Otherwise,
the system refuses to classify the new, instance. c. Inductive bias : The target
concept can be represented in its hypothesis space.
3. FIND-S : a. This algorithm, finds the most specific hypothesis consistent with
the training examples. b. It then uses this hypothesis to classify all subsequent
instances. c. Inductive bias : The target concept can be represented in its
hypothesis space, and all instances are negative instances unless the opposite is
entailed by its other knowledge.
16. Write short note on instance-based learning.
ANSWER: 1. Instance-Based Learning (IBL) is an extension of nearest neighbour
or K-NN classification algorithms.
2. IBL algorithms do not maintain a set of abstractions of model created from the
instances.
3. The K-NN algorithms have large space requirement.
4. They also extend it with a significance test to work with noisy instances, since a
lot of real-life datasets have training instances and K-NN algorithms do not work
well with noise.
5. Instance-based learning is based on the memorization of the dataset.
6. The number of parameters is unbounded and grows with the size of the data.
7. The classification is obtained through memorized examples.
8. The cost of the learning process is 0, all the cost is in the computation of the
prediction.
9. This kind learning is also known as lazy learning.
17. What are the performance dimensions used for instancebased learning
algorithm ?
ANSWER: Performance dimension used for instance-based learning algorithm are
:
1. Generality : a. This is the class of concepts that describe the representation of an
algorithm. b. IBL algorithms can pac-learn any concept whose boundary is a union
of a finite number of closed hyper-curves of finite size.
2. Accuracy : This concept describes the accuracy of classification.
3. Learning rate : a. This is the speed at which classification accuracy increases
during training. b. It is a more useful indicator of the performance of the learning
algorithm than accuracy for finite-sized training sets.
4. Incorporation costs : a. These are incurred while updating the concept
descriptions with a single training instance.
5. Storage requirement : This is the size of the concept description for IBL
algorithms, which is defined as the number of saved instances used for
classification decisions.
18. What are the functions of instance-based learning ?
ANSWER: Functions of instance-based learning are :
1. Similarity function : a. This computes the similarity between a training instance
i and the instances in the concept description. b. Similarities are numeric-valued.
2. Classification function : a. This receives the similarity function’s results and the
classification performance records of the instances in the concept description. b. It
yields a classification for i.
3. Concept description updater : a. This maintains records on classification
performance and decides which instances to include in the concept description. b.
Inputs include i, the similarity results, the classification results, and a current
concept description. It yields the modified concept description.
19. Describe K-Nearest Neighbour algorithm with steps.
ANSWER: 1.The KNN classification algorithm is used to decide the new instance
should belong to which class.
2. When K = 1, we have the nearest neighbour algorithm.
3. KNN classification is incremental.
4. KNN classification does not have a training phase, all instances are stored.
Training uses indexing to find neighbours quickly.
5. During testing, KNN classification algorithm has to find K-nearest neighbours
of a new instance. This is time consuming if we do exhaustive comparison.
6. K-nearest neighbours use the local neighborhood to obtain a prediction.
Algorithm : Let m be the number of training data samples. Let p be an unknown
point.
1. Store the training samples in an array of data points array. This means each
element of this array represents a tuple (x, y).
2. For i =  to m : Calculate Euclidean distance d(arr[i], p).
3 Make set S of K smallest distances obtained. Each of these distances
corresponds to an already classified data point.
4. Return the majority label among S.
20. Explain locally weighted regression.
ANSWER:1. Model-based methods, such as neural networks and the mixture of
Gaussians, use the data to build a parameterized model.
2. After training, the model is used for predictions and the data are generally
discarded.
3. In contrast, memory-based methods are non-parametric approaches that
explicitly retain the training data, and use it each time a prediction needs to be
made.
4. Locally Weighted Regression (LWR) is a memory-based method that performs
a regression around a point using only training data that are local to that point.
5. LWR was suitable for real-time control by constructing an LWR-based system
that learned a difficult juggling task.
22. Explain Radial Basis Function (RBF).
ANSWER: 1. A Radial Basis Function (RBF) is a function that assigns a real
value to each input from its domain (it is a real-value function), and the value
produced by the RBF is always an absolute value i.e., it is a measure of distance
and cannot be negative.
2.Euclidean distance (the straight-line distance) between two points in Euclidean
space is used.
3. Radial basis functions are used to approximate functions, such as neural
networks acts as function approximators.
4. The following sum represents a radial basis function network :

5. The radial basis functions act as activation functions. 6. The approximant y(x)
is differentiable with respect to the weights which are learned using iterative
update methods common among neural networks.
23. Explain the architecture of a radial basis function network.
ANSWER: 1. Radial Basis Function (RBF) networks have three layers : an input
layer, a hidden layer with a non-linear RBF activation function and a linear output
layer.
2. The input can be modeled as a vector of real numbers x  Rn .
3. The output of the network is then a scalar function of the input vector,
where n is the number of neurons in the hidden layer, ci is the center vector for
neuron i and ai is the weight of neuron i in the linear output neuron.
4. Functions that depend only on the distance from a center vector are radially
symmetric about that vector.
5. In the basic form all inputs are connected to each hidden neuron.
6. The radial basis function is taken to be Gaussian

24. Write short note on case-based learning algorithm.


1. Case-Based Learning (CBL) algorithms contain an input as a sequence of
training cases and an output concept description, which can be used to generate
predictions of goal feature values for subsequently presented cases.
2.The primary component of the concept description is case-base, but almost all
CBL algorithms maintain additional related information for the purpose of
generating accurate predictions (for example, settings for feature weights).
3. Current CBL algorithms assume that cases are described using a featurevalue
representation, where features are either predictor or goal features.
4. CBL algorithms are distinguished by their processing behaviour.
25. What are the functions of case-based learning algorithm
ANSWER: Functions of case-based learning algorithm are :
1. Pre-processor : This prepares the input for processing (for example, normalizing
the range of numeric-valued features to ensure that they are treated with equal
importance by the similarity function, formatting the raw input into a set of cases).
2. Similarity : a. This function assesses the similarities of a given case with the
previously stored cases in the concept description. b. Assessment may involve
explicit encoding and/or dynamic computation. c. CBL similarity functions find a
compromise along the continuum between these extremes.
3. Prediction : This function inputs the similarity assessments and generates a
prediction for the value of the given case’s goal feature (i.e., a classification when
it is symbolic-valued).
4. Memory updating : This updates the stored case-base, such as by modifying or
abstracting previously stored cases, forgetting cases presumed to be noisy, or
updating a feature’s relevance weight setting.
26. What are the benefits of CBL as a lazy problem solving method ?
ANSWER: The benefits of CBL as a lazy Problem solving method are :
1. Ease of knowledge elicitation :
a. Lazy methods can utilise easily available case or problem instances instead of
rules that are difficult to extract.
b. So, classical knowledge engineering is replaced by case acquisition and
structuring.
2. Absence of problem-solving bias :
a. Cases can be used for multiple problem-solving purposes, because they are
stored in a raw form.
b. This in contrast to eager methods, which can be used merely for the purpose for
which the knowledge has already been compiled.
3. Incremental learning :
a. A CBL system can be put into operation with a minimal set solved cases
furnishing the case base.
b. The case base will be filled with new cases increasing the system’s problem-
solving ability.
c. Besides augmentation of the case base, new indexes and clusters categories can
be created and the existing ones can be changed.
d. This in contrast requires a special training period whenever informatics
extraction (knowledge generalisation) is performed.
e. Hence, dynamic on-line adaptation a non-rigid environment is possible.
4. Suitability for complex and not-fully formalised solution spaces :
a. CBL systems can applied to an incomplete model of problem domain,
implementation involves both to identity relevant case features and to furnish,
possibly a partial case base, with proper cases.
b. Lazy approaches are appropriate for complex solution spaces than eager
approaches, which replace the presented data with abstractions obtained by
generalisation.

You might also like