Top 50 Machine Learning Interview Q A
Top 50 Machine Learning Interview Q A
Machine learning is a branch of computer science which deals with system programming
in order to automatically learn and improve with experience. For example: Robots are
programed so that they can perform the task based on data they gather from sensors. It
automatically learns programs from data.
Machine learning relates with the study, design and development of the algorithms that
give computers the capability to learn without being explicitly programmed. While, data
mining can be defined as the process in which the unstructured data tries to extract
knowledge or unknown interesting patterns. During this process machine, learning
algorithms are used.
In machine learning, when a statistical model describes random error or noise instead of
underlying relationship ‘overfitting’ occurs. When a model is excessively complex,
overfitting is normally observed, because of having too many parameters with respect to
the number of training data types. The model exhibits poor performance which has been
overfit.
The possibility of overfitting exists as the criteria used for training the model is not the
same as the criteria used to judge the efficacy of a model.
By using a lot of data overfitting can be avoided, overfitting happens relatively as you
have a small dataset, and you try to learn from it. But if you have a small database and
you are forced to come
with a model based on that. In such situation, you can use a technique known as cross
validation. In this method the dataset splits into two section, testing and training
datasets, the testing dataset will only test the model while, in training dataset, the
datapoints will come up with the model.
In this technique, a model is usually given a dataset of a known data on which training
(training data set) is run and a dataset of unknown data against which the model is tested.
The idea of cross validation is to define a dataset to “test” the model in the training phase.
The inductive machine learning involves the process of learning by examples, where a
system, from a set of observed instances tries to induce a general rule.
7) What are the five popular algorithms of Machine Learning?
a) Decision Trees
c) Probabilistic
networks d) Nearest
Neighbor
b) Unsupervised Learning
c) Semi-supervised
Learning d)
Reinforcement Learning
e) Transduction
f) Learning to Learn
9) What are the three stages to build the hypotheses or model in machine
learning?
a) Model
building b)
Model testing
The standard approach to supervised learning is to split the set of example into the
training set and the test.
In various areas of information science like machine learning, a set of data is used to
discover the potentially predictive relationship known as ‘Training Set’. Training set is an
examples given to the learner, while Test set is used to test the accuracy of the
hypotheses generated by the learner, and it is the set of example held back from the
learner. Training set are distinct from Test set.
Learning
a) Artificial
Intelligence b) Rule
based inference
a)
Classifications
b) Speech
recognition c)
Regression
d) Predict time
series e) Annotate
strings
Designing and developing algorithms according to the behaviours based on empirical data
are known as Machine Learning. While artificial intelligence in addition to machine learning,
it also covers other aspects like knowledge representation, natural language processing,
planning, robotics etc.
In Naïve Bayes classifier will converge quicker than discriminative models like logistic
regression, so you need less training data. The main advantage is that it can’t learn
interactions between features.
b) Speech
Recognition c)
Data Mining
d) Statistics
e) Informal
Retrieval f) Bio-
Informatics
Genetic programming is one of the two techniques used in machine learning. The model is
based on the testing and selecting the best choice among a set of results.
Inductive Logic Programming (ILP) is a subfield of machine learning which uses logical
programming representing background knowledge and examples.
The process of selecting models among different mathematical models, which are used to
describe the same data set is known as Model Selection. Model selection is applied to the
fields of statistics, machine learning and data mining.
24) What are the two methods used for the calibration in Supervised Learning?
b) Isotonic Regression
These methods are designed for binary classification, and it is not trivial.
When there is sufficient data ‘Isotonic Regression’ is used to prevent an overfitting issue.
26) What is the difference between heuristic for rule learning and heuristics for
decision trees?
The difference is that the heuristics for decision trees evaluate the average quality of a
number of disjointed sets while rule learners only evaluate the quality of the set of
instances that is covered with the candidate rule.
Bayesian Network is used to represent the graphical model for probability relationship
among a set of variables .
Instance based learning algorithm is also referred as Lazy learning algorithm as they
delay the induction or generalization process until classification is performed.
31) What are the two classification methods that SVM ( Support Vector
Machine) can handle?
Ensemble learning is used when you build component classifiers that are more
accurate and independent from each other.
methods
Incremental learning method is the ability of an algorithm to learn from new data
that may be available after classifier has already been generated from already
available dataset.
In Machine Learning and statistics, dimension reduction is the process of reducing the
number of random variables under considerations and can be divided into feature
selection and feature extraction
Support vector machines are supervised learning algorithms used for classification and
regression analysis.
c) Cross Validation
e) Scoring Metric
f) Significance Test
43) What are the different methods for Sequential Supervised Learning?
b) Recurrent sliding
windows c) Hidden
Markow models
d) Maximum entropy Markow
fields
f) Graph transformer networks
b) Structured prediction
47) What are the different categories you can categorized the
sequence learning process?
a) Sequence
prediction b)
Sequence
generation c)
Sequence
recognition d)
Sequential decision
Programming
b) Inductive Learning
50) Give a popular application of machine learning that you see on day to
day basis?