Important Questions
Important Questions
Answer
The Expectation-Maximization (EM) algorithm is an iterative way to find
maximum-likelihood estimates for model parameters when the data is
incomplete or has missing data points or has some hidden variables.
EM chooses random values for the missing data points and estimates a new
set of data.
These new values are then recursively used to estimate a better first data, by
filling up missing points, until the values get fixed.
These are the two basic steps of the EM algorithm :
a. Estimation Step : i. Initialize k , k and k by random values, or by K
means clustering results or by hierarchical clustering results. ii. Then for
those given parameter values, estimate the value of the latent variables (i.e.,
k ).
b. Maximization Step : Update the value of the parameters (i.e.,k , k and
k ) calculated using ML method : i. Initialize the mean k , the covariance
matrix k and the mixing coefficients k by random values, (or other
values). ii. Compute the k values for all k. iii. Again estimate all the
parameters using the current k values. iv. Compute log-likelihood function.
v. Put some convergence criterion. vi. If the log-likelihood value converges
to some value (or if all the parameters converge to some values) then stop,
else return to Step 2.
2. Describe the usage, advantages and disadvantages of EM algorithm.
Answer Usage of EM algorithm : 1. It can be used to fill the missing data in a
sample.
2. It can be used as the basis of unsupervised learning of clusters.
3. It can be used for the purpose of estimating the parameters of Hidden Markov
Model (HMM).
4. It can be used for discovering the values of latent variables.
Advantages of EM algorithm are :
1. It is always guaranteed that likelihood will increase with each iteration.
2. The E-step and M-step are often pretty easy for many problems in terms of
implementation.
3. Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm are :
1. It has slow convergence.
2. It makes convergence to the local optima only.
3. It requires both the probabilities, forward and backward (numerical optimization
requires only forward probability).
3. What are the types of support vector machine ?
Answer Following are the types of support vector machine :
1. Linear SVM : Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
2. Non-linear SVM : Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
4. What is polynomial kernel ? Explain polynomial kernel using one dimensional
and two dimensional.
ANSWER: 1. The polynomial kernel is a kernel function used with Support Vector
Machines (SVMs) and other kernelized models, that represents the similarity of
vectors (training samples) in a feature space over polynomials of the original
variables, allowing learning of non-linear models.
2. Polynomial kernel function is given by the equation : (a × b + r) d where, a and b
are two different data points that we need to classify. r determines the coefficients
of the polynomial. d determines the degree of the polynomial.
3. We perform the dot products of the data points, which gives us the high
dimensional coordinates for the data.
4. When d = 1, the polynomial kernel computes the relationship between each pair
of observations in 1-Dimension and these relationships help to find the support
vector classifier.
5. When d = 2, the polynomial kernel computes the 2-Dimensional relationship
between each pair of observations which help to find the support vector classifier.
5. Describe Gaussian Kernel (Radial Basis Function).
ANSWER:
5. The radial basis functions act as activation functions. 6. The approximant y(x)
is differentiable with respect to the weights which are learned using iterative
update methods common among neural networks.
23. Explain the architecture of a radial basis function network.
ANSWER: 1. Radial Basis Function (RBF) networks have three layers : an input
layer, a hidden layer with a non-linear RBF activation function and a linear output
layer.
2. The input can be modeled as a vector of real numbers x Rn .
3. The output of the network is then a scalar function of the input vector,
where n is the number of neurons in the hidden layer, ci is the center vector for
neuron i and ai is the weight of neuron i in the linear output neuron.
4. Functions that depend only on the distance from a center vector are radially
symmetric about that vector.
5. In the basic form all inputs are connected to each hidden neuron.
6. The radial basis function is taken to be Gaussian