0 ratings0% found this document useful (0 votes) 101 views17 pagesAi DS 2 Book-Chpt-5
BE IT MU Sem 7 AIDS Chapter 5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
F Advanced ML
CHAPTER 5 Classification Techniques
University Prescribed Syllabus we Academic Year 2022-2023
Ensemble Classifiers: Introduction to Ensemble Methods, Bagging, Boosting, Random forests, Improving classification
accuracy of Class Imbalanced Data, Metrics for Evaluating Classiier Performance, Holdout Method and Random
‘Subsampling, Cross-Validation, Bootstrap, Model Selection Using Statistical Tests of Significance, Comparing
CClasifers Based on Cost-Benefit and ROC Curves.
‘Selt1eaming Topics: Introduction to ML (Revision), Introduction to Reinforcement Leaming,
5.1 Classification.
52. Model Evaluation and Selection
5.2.1 Metrics for Evaluating Classifier Performance.
52.2 Holdout Method and Random Subsampling....
523 Cross-Validation
524 Bootstrap.
525 Model Selection Using Statistical Tests of Significance...
6.2.6 Comparing Classifiers Based on Cost- Benefit and ROC Curves...
53 Techniques to Improve Classification Accuracy.
53.1 Introducing Ensemble Methods
532 Bagging (Bootstrap Aggregating).
6.3.3 Boosting and AdaBoost
534 Random Forests.
535 Improving Classification Accuracy of Class-Imbalanced Data
54 Reinforeement Learning (FL).
55 Multiple Choice Questions.
+ Chapter Ends...AL& DS-HI(MU-Som 7-11)
5.1 CLASSIFICATION
We have studied classification and different classifiers
catlier. Let's reall some key concepts of classification.
(@) Classification is a form of data
models describing data classes. It is a Supervised
Teaming technique that is used to identify the category
‘of new observations on the basis of training data. A
classifier, or classification model, predicts categorical
labels (classes). Numeric prediction models
ccontinuous-valued functions. Classification and
numeric prediction are the two major types of
prediction problems.
@) Decision tree induction is a top-down recursive tree
induction algorithm, which uses an attribute selection
measure to select the attribute tested for each non-leaf
node in the tee. It a tree-structured classifier, where
fnternal nodes represent the features of a dataset,
branches represent the decision rules and each leaf
node represents the outcome. ID3, C45, and CART
‘are examples of such algorithms using different
attribute selection measures. Tree pruning algorithms
attempt to improve accuracy by removing tree branches
reflecting noise in the data, Early decision tree
algorithms typically assume that the data are memory
resident,
@) Naive Bayesian classification is based on Bayes"
theorem of posterior probability. It assumes class-
conditional independence thatthe effect of an attribute
value on a given class is independent of the values of
the other attributes It isa probabilistic classifier, which
‘means it predicts on the basis of the probability of an
‘object. Some popular examples of Naive Bayes
Algorithm are spam filtration, Sentimental analysis,
and classifying articles.
(4) A rule-based classifier uses a set of IF-THEN rules for
classification. Rules can be extracted from a decision
tree. These rules are easily interpretable and thus these
classifiers are generally used to generate descriptive
‘models. The condition used with “if” is called the
antecedent and the predicted class of each rule is called
the consequent. Rules may also be generated directly
training data using sequential covering
algorithms.
from
(MU-New Syllabus wes academic year 22-23)(M7-62)
m 5.2
____SevecTION ___—___
© Once our el
WHEL
© Before discussing these measures,
ptvancod ML Casitenton Toebigues):Pege ne (6-2)
MODEL EVALUATION AND
Jassitication model is ready, We WOU Tike
creaimate of bow accurately the clasiir can
prediclelassify the output class.
Based on this, we will come to know whether training
done is sufficient or not. We can even think of building
more than one classifier and then compare their
accuracy.
Let's sce now, what is accuracy? How can we estimate
it? Are some measures of a classifier’s accuracy more
appropriate than others? How can we obtain a reliable
accuracy estimate?
Metrics for Evaluating Classifier
Performance
«The following list depicts various metrics/measures of
evaluating how “accurate” your classifier is at
predicting the class label of tuples :
(1) Accuracy Q) Bror rate
® Prec
(8) Specificity
(7) AOC-ROC (8) Log Loss
we need t0
understand with certain terminologies related 10
Confusion matrix.
‘© It is the easiest way to measure the performance of a
classification problem where the output can be of two
(oF more type of classes
* A confusion matrix is nothing but a table with two
zor <~z, then our value of t lies in the rejection
region, within the distribution’ s tails,
‘This means that we can reject the null hypothesis that
the means of My and Mp are the same and conclude
that there is a statistically significant difference
between the two models.
Lerche uitcations.A SACHIN SHAH Ventre‘onto that any difference tetween My
be attniboted to chance.
* The tstatistic for pairwise comparison is computed
eM) - ora
ce Seo
Year) Mp7
ite —
SSH =M) = fem, ed) = [eraty.
Where,
2
ety]
+ two test sets are available instead of a single test set
then a nonpaired version of the test is used, where
the variance between the means of the two models can
be computed as:
Franti) _ varia)
i i
and ky and ky are the number of cross-validation
Samples used for My and Mz, respectively. This is also
‘known as the two-sample t-test. While consulting the
lable of t-istribution, in such case, the number of
degrees of freedom used is taken as the minimum
number of degrees of the two models.
var(My - M3)
5.2.6 Comparing Classifiers Based on Cost
Benefit and ROC Curves
For assessing the costs (risks) and benefits (gains)
associated with a classification model, the tiie
Positives, tre negatives, false positives, and false
negatives are useful
+ The cost associated with a false negative (such as
incomectly predicting that a diabetic patent is not
Giabetc) is far greater than those of a false positive
(incorrectly yet conservatively labeling a nondiabetic
patient as diabetic).
‘* In such cases, we can overshadow one type of error
over another by assigning a different cost to each.
‘+ These costs may consider the danger to the patient,
financial costs of resulting therapies, and other hospital
costs.
© Similarly, the benefits associated with a true posi
decision may be different than those of a true negative.
a
(MU-New Sylabus wes academic year 22-23\M7-62)
to compute UHALY, we 4,
im ned equal costs and essentially AIVidEU the yg
instead computing the average cost (OF benefty pe,
decision,
Other applications involving cost-benefit analy,
include loan application decisions and target marketing
nailouts,
ROC (Receiver operating characteristic) curves are
‘useful visual tool for comparing wo classification
models.
We have studied previously, an ROC curve for a given
model shows the trade-off between the true positive
rate (TPR) and the false positive rate (FPR).
‘The area under the ROC curve is a measure of the
accuracy of the model
‘Any increase in TPR occurs atthe cost ofan increase in
FPR. For a two-class problem, an ROC curve allows us
to visualize the trade-off between the rate at which the
‘model can accurately recognize positive cases versus
the rate at which it mistakenly identifies negative cases
as positive for different portions of the test st
It is immediately apparent that a ROC curve can be
used to select a threshold for a classifier which
‘maximizes the true positives, while minimizing the
false positives.
However, different types of problems have different
optimal classifier thresholds. For a cancer screening
test, for example, we may be prepared to put up with a
relatively high false positive rate in order to get a high
‘true postive, it is most important to identify possible
cancer sufferers.
For a follow-up test after treatment, however, 2
Afferent threshold might be more desirable, since we
ant o minimize false negatives, we don't want tl
4 patient they're clear if this is not actually the case,
‘The AUC ean be used to compare the performance of
{wo or more classifiers,
A single threshold can be selected and ‘the classifiers’
Performance at that point compared, or the ovell
Performance can be compared by considering the AUC.
UBhrcitien rok
tions..A SACHIN SHAH VentureANC i absole
zhacan AUC OF 079, so classifi | i lsior
a clearly beter”
Is, however, Possibe to cleat whet :
in AUC are statistically signifieany
jy 53 TECHNIQUES TO Imp)
” CLASSIFICATION AccuRney
In machine earning, no mate if we are
differences
facing a
Fase
classification ora repression problem, the choice of the [em 2) [ra 0) [Por 31 ee
;
smodel is extremely important to have any chance to
ripley 0} [Dontslay 2] [Donte
obvain good results
Ths choice can depend on many vaisbles of the
problem: quantity of data, dimensionality
y of the
distribution hypothesis, ete om)
+ Inensemble leaming theory, we call weak learners (or
tase models) models that ean be used as building
Mocks for designing more complex. models by
combining several of them
«The idea of ensemble methods isto uy reducing bias
andlor variance of such weak leames by combining
several of them together in order to crete a strong
Jeamer (or ensemble model) that achieves. beter
performances
+ To outline the definition and practicality of Ensemble
Methods, here we have used example of Decision tree
clasifier. However, it is important to note that
Ensemble Methods do not only pertain to Decision
‘Trees. a
+A decision tree determines the predictive value based
on series of questions and conditions. For instance, the
simple Decision Tree shown in Fig. 5.3.1 determines
«on whether an individual should pla outside or not.
4+ The toe takes several weather factors into account, and
given each factor either makes a decision or asks
another question, In this example, every time it is
overcast, we will play outside.
+ However, if itis raining, we must ask if tis windy oF
101? If windy, we will not play.
* But given no wind, tie those shoelaces tight because
‘were going outside to play.
Fig. 5
Fig. 5.31: A decision tree to determine whether to play
‘outside or not
{Good economic
‘conastons (60) $50,000
Apariment Poor ean $90,000
conations (40)
economic
Seren (50 100,00
Poor econamie
conditions (40)
$40,000
Good economic
conditions (60) $30,000
Poor econ $10,000
‘condtions (40)
ecision tree to determine whether or not
to invest in real estate
Decision Trees can also solve quantitative problems as
well with the same format, In the Tree to the left, we
want to know whether or not to invest in a commercial
real estate property Is it an office building?
A Warehouse? An Apartment bui
economic conditions? Poor Economic Conditions?
How much will an investment return? These questions
are answered and solved using this d
‘When making Decision Trees, there are several factors
we must take into consideration: On what features do.
we make our decisions on? What is the threshold for
classifying each question into a yes or no answer? In
the first Decision Tree, what if we wanted to ask
‘ourselves if we had friends to play with or not
(MU-New syllabus wef academic year 22-23)(M-62)
Tal rech-Neo Publcations.A SACHIN SHAH Venture‘Classification Techniques)...Page no,
L805. Qusem 7m) nest
Introducing Ensemble Methods
+ owe have frends, we will pay every time, Irnot, we | 8 5.5.1
ight continue 10 ask ourselves questions aout the is « machine leaming tech
rit oti ak antes ges a4 MS | ge mts 2 Mahe eg hg
er. By adding an atonal question, we hope ries sverl base models in Oreo pede
sreater define the Yes and No classes. cone optimal predictive model which helps to improve
+ This is where Ensemble Methods come into picture! | ee seing results:
Rather than just relying on one Decision Tree and . = duction
hoping we made the ight decom a cach spi, | * TH approach allows, He DOL AT beter
Ensemble Methods allow us to take a sample of | Predictive performance compare ® Single mote
Pears Soon ar ty | Basi idea i to eam a set of casifiers and to alloy
res into account, calculate whi a eee Enseables te o be MOC secur thy
use oF questions to ask at eae h split, and make a final
predictor based on the aggregated results of the
sampled Decision Trees.
their component classifiers.
Create
multiple | Dataset 1
Dataset ] Dataset
dataset
Na
Create
‘multiple
classifiers/_C1
Combine
classifiers
Fig. 5.3.3 : An overview of Ensemble methodsfearning,
‘cqn-1) cn)
© Different types of ensemble classifiers are:
1. Bagging
2. Boosting and AdaBoost
3. Random Forests
%.5.3.2 Bagging (Bootstrap Aggregating)
+ This approsch combines Botsrpping and Agregtono form oe ensemble model, thats why the names Bagel
‘+ Consider yourself as a patent and you would like t havea diagnosis made based
i on the symptoms. Instead of aki
‘one doctor, you may choose to ask several. eee
+ Hf acenain diagnosis occurs more thn any er, you may choos his she Fin or best agnosis
+ Thais the inal agnosis is made based on monty vot, where ath doctor gets an equal ve,
+) Itwe replace each doctor bya classifier, and thats the basic idea behind bagging.
+ Naturally, « majoriy vote made by a large group of doctors may be more reliab
‘more reliable than a majority vote made’ small
se vty vote made by a sal
(14U-New Syllabus we f academic year 22-23)(M7-62) Wbrech- eo Pubkatons JA SACHIN SHAH VentureCombined prediction
Given a sample of data, multi
le
subsamples are pulled. le Poostrapped
A Decision Tree is formed on each
ape of the bootstrapped
Each training set is a bootstrap sample. After each
subsample Decision Tree has been formed, an
algorithm is used to aggregate over the Decision Trees
to form the most efficient predictor.
To classify an unknown tuple, X, each classifier, Mj
returns its class prediction, which counts as one vote,
The bagged classifier, Mx, counts the votes and assigns
the class with the most votes to X.
Bagging often considers homogeneous weak learners,
leams them independently from each other in parallel
and combines them following some kind of
deterministic averaging process Bagging can be applied
to the prediction of continuous values by taking the
average value of each prediction for a given test tuple.
5.3.3 Boosting and AdaBoost
‘Consider the same example that was taken in previous
section, you as a patient, you have certain symptoms.
Now, instead of consulting one doctor, you choose to
consult several.
Suppose you assign weights 0 the value or worth of
each doctor's diagnosis, based on the accuracies of
previous diagnoses they have made.
‘The final diagnosis is then a combination of the
weighted diagnoses. This is the basic idea behind
boosting.
(4U-New Syllabus wee academic year 22-23)(M7-62)
Fig. $3.4 : Bagaing
Boosting often considers homogeneous weak leamers,
leams them sequentially in a very adaptative way (a
base model depends on the previous ones) and
combines them following a deterministic strategy.
In boosting, weights are also assigned to each training,
tuple.
A series of k classifiers is iteratively learned. After a
classifier, Mj, is learned, the weights are updated to
allow the subsequent classifier, Mi + 1, to “pay more
attention” to the taining tuples that were misclassified
byM,.
‘The final boosted classifier, M+, combines the votes of,
cach individual classifier, where the weight of each
classifier’s vote is a function of its accuracy.
In adaptative boosting (often called “adaboost”), we
try to define our ensemble model as a weighted sum of
LL weak leamers. t's a popular boosting algorithm.
‘The basic idea is that when we build a classifier,
‘want it to focus more on the misclassified tuples of the
previous round.
Some classifiers may be better at classifying some
““dfficult” tuples than others. In this way, we build a
series of classifiers that complement each other.
We are given D, a data set of d class-labeled tuples,
(Xi, yi. Xa, Yadenr(Kys Yad where y, is the class label
of tuple
Initially, AdaBoost assigns each training tuple an equal
weight of 1/4. Generating k classifiers for the ensemble
requires k rounds through the rest ofthe algorithm.
Del recicoPucations_A SACHIN SHAH Ventureinsaification Techniques)....PAge no,
A 8 DS-H1(MUSom 7.17) (Advancod MLC
+ Inount i the apes from Dave sampled to frm a | » To pct elas label fora une X: BOOS asians
leaning set, Dy of size Sampling with replacement is | wight to each elasiier’s vote based on how wel he
Used, This indicates the same tiple may be selected | elassifier performed
rove than once, «s The lower a elasifier's error rate, the more accurate i
+ Bach tuple’s chance of being selected is based om its od therefore, the higher is Weight Fr Voting sho
weight. A classifier model, My, is derived from the | be. The weight of the elasifier’s vote is ealeulated ay
training ples of Dy
© ts eror is then
lculated using D, as a test set. The
weights of the training tuples are then adjusted
acconting to how they were classified.
© IE a tuple was
increased. Ifa tuple was correctly classified, its weight
correctly classified, its weight is
is decreased.
+ Avople's weight reftects how difficult ti to classi.
‘The higher the weight, the more often it has been
misclassified,
* These weights will be used to generate the training
samples for the clasifier of the next round. This is
how, a series of classifiers that complement each other
axe built
+ To compute the error rate of model M;, we sum the
weights of each of the ples in D, that M,
misclassified
4
error(M). = wjxcern(X))
i
where exr(X) is the miselassifcaton error of tuple X;
I the tuple was misclassified, then er(X)) is 1;
Mi is
0 poor that its error exceeds 0.5, then we abandon it
Instead, we ty again by generating a new D; taining
‘otherwise it is 0. Ifthe performance of classi
‘set, from which we derive a new M,
©The error rate of M; affects how the weights of the
training tuples are updated.
© Ia tuple in round i was correctly classified, its weight
{is multiplied by error(M,Y(I ~ error(Mj).
‘© Once the weights of all the correctly classified tuples
are updated, the weights for all tuples (including the
misclassified ones) are normalized so that their sum
remains the same as it was before.
‘© Tonormalize a weight, we multiply it by the sum of the
‘old weights, divided by the sum of the new weights.
© As a result, the weights of misclassified wples are
increased and the weights of correctly classified tuples
are decreased.
(MU-New Syllabus wee academic year 22-23)(M7-62)
1 =error(M)
108 ron(M)
For each clas, c, we sum the weights of each classify
that assigned clas ¢ to.
“The class with the highest sum is the “winner” and is
returned as the class prediction for tuple X.
Bagging is less susceptible to model overiting. While
bagging and boosting, both can significantly improve
accuracy in comparison to a single model, boosting
tends to achieve greater accuracy.
5.5.4 Random Forests
Random Forest Models can be thought of as extension,
over bagging, as itis bagging with a slight twist. Each
classifier in the ensemble is a decision tree classifier so
that the collection of classifiers is a“
Classifier is generated using a random selection of
attributes at each node to determine the split. During
classification, each tree votes and the most popular
class is returned,
When deciding where to split and how to make
decisions, bagged Decision Trees have the full dispossl
of features to choose from.
Therefore, although the bootstrapped samples may be
slightly different, the data is largely going to break off|
at the same features throughout each model.
In contrary, Random Forest models decide where ©
split based on a random selection of features.
Rather than splitting at similar features at each node
throughout, Random Forest models implement a level
of differentiation because each tree will split bused of
different features
This level of differentiation provides a great
ensemble to aggregate ove :
t over, producing a more acu
predictor, sed
Tech-Neo Publications..A SACHIN SHAH Ventrejy 05 QuSem 7.1)
ig. 5.3.5 : Random Forest Classifier
+ Steps for implementing Random Forest Classifier
1. Multiple subsets are created from the original data
set, selecting observations with replacement
A subset of features is selected randomly and
whichever feature gives the best spit is used to
split the node iteratively.
‘The tre is grown to the largest,
4. Repeat the above steps and prediction is given
based on the aggregation of predictions from n
number of trees.
5.5.5 Improving Classification Accuracy of
Class-Imbalanced Data
+ When observation in one class is higher than the
observation in other classes then there exists a class
imbalance.
‘© Class Imbalance is a common problem in machine
learning, especially in classification problems.
Imbalance data can hamper our model accuracy big
time. If the data set is imbalanced then in such cases,
you get a pretty high accuracy just by predicting the
majority class, but you fail to capture the minority
class, which is most often the point of creating the
model in the first place.
+ Having an imbalanced dataset (imbalanced target
\atiables) in a problem statement is always frustrating,
and having a perfectly balanced dataset is always a
myth,
* Mostly in Medical Science! Healthcare Machine
Leaming problems, the data set is mostly biased.
* So, predicting outcome in such cases becomes very
difficult as data becomes biased towards one particular
class of outcome,
(MU.New Sylabus wef academic year 22-23)M7-62)
(Advanced ML Cl
ication Techniques).
Class Imbalance appear in many domains such as
(1) Fraud detection
2) Spam fitering
@) Disease screening
(4) SaaS subscription churn
(6) Advertising click-through
If we considera healtheare problem, our main goal is 10
reoe false Positive outcomes, a5 you cannot afford
Jet patients go away with a disease because of a biases
algorithm,
Since there are more ‘Negatives’ in a dataset, the
Machine Leaming Model becomes biased toward
Negative Class.
So, in some cases, it might predict ‘Negative’ for a
“Positive” Class.
There are many ways to reduce the Class
Imbalance/Bias Problem and improving the
classification accuracy of class-imbalanced data:
1, Improve Data Collection and Preprocessing,
‘Techniques: Collect more data and give much
more time to preprocessing by detecting outliers
‘and segment the data according to balanced class.
2. Resampling (Up Sampling and Down
Sampling): A widely adopted technique for
dealing with highly unbalanced datasets is called
resampling. It consists of removing samples from
the majority class (under-sampling) and/or adding
more examples from the minority class (over-
sampling). If you have ess data, then this
technique is quite useful.
‘Up(Over) Sampling is increasing the number of
classes which is less in number by considering the
data points closer to that of the original class. The
simplest implementation of over-sampling is to
duplicate random records from the minority class,
\hich ean cause overfishing,
Down Sampling is the inverse of oversampling,
ie, reducing the number of classes having a
hhigher number of data points. In under-sampling,
the simplest technique involves removing random
records from the majority class, which can cause
Jos of information,
Ll rect-neoPubteatons_A SACHIN SHAH VentureALR DS. (MU-Som 7-17) advance mi Cassticaon Tecigues)...Pa9e n,(5-14
5 ing, the agent
: © In Reinforcement Learning, ams
The threshold-moving = This approach tothe | ¢ Ih MT eedbacks without any lai
lass imbalance problem does not involve any
sampling. It applies to classifiers that
‘input tuple, retum a continuous output value. That
is, for an input tuple, X, such a classifier returns as
‘output a mapping, f (X) —» [0,1]. Rather than
‘manipulating the training tuples, this method
retums a classification decision based on the
‘output values. Inthe simplest approach, tuples for
which £0) > 1, for some threshold, 1, are
considered positive, while all other tuples are
considered negative. Other approaches may
involve manipulating the outputs by weighting.
In general, threshold moving moves the threshold,
1, $0 that the rare class tuples are easier to classify
‘and hence, there is Jess chance of costly false
Degative errors. Examples of such classifiers
include naive Bayesian classifiers and neural
network classifiers like backpropagation,
4. Use Specific Algorithm Properties: This. is
helpful for some Machine Learning Algorithms
where you can give weights to a particular class,
Which may decrease bias. For example, you have
70% A Class and 30% B Class, where you can
give more weight on A class because Algorithm
‘ay tend to be more biased towards Class B.
5. Ensemble methods discussed in section have also
bbeen applied to the class imbalance problem. The
individual classifiers making up the ensemble may
include versions of the approaches described here
such as oversampling and threshold moving.
5.4 REINFORCEMENT LEARNING (RL)
© Reinforcement learning is an area of Machine
Leaming in which an agent leams to behave in an
‘environment by performing the actions and seeing the
results of actions. It is a feedback-based Machine
learning technique. It is about taking suitable action to
‘maximize reward in a particular situation. For each
good action, the agent gets positive feedback, and for
‘each bad action, the agent gets negative feedback or
penalty.
=. I erly af
(MU-New Syllabus we academic year 22-23)(M7-62)
data, unlike supervised Teaming. Since there is py
labeled data, s0 the agent is bound to leam by ig
experience only
se Enironment
Teva
fea
tion
Fig. $4.1 : Reinforcement Learning
© RL solves a specific type of problem where decision
‘making is sequential, and the goal is long-term, such as
‘game-playing, robotics, etc
© The agent interacts with the environment and explores
it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance
by getting the maximum positive rewards,
‘+ The agent learns with the process of hit and tral, and
based on the experience, it learns to perform the task in
better way. Hence, we can say that “Reinforcement
Jearning isa type of machine learning method where an
intelligent agent (computer program) interacts with the
environment and learns to act within that.” How a
Robotic dog Jeams the movement of his arms is an
‘example of Reinforcement learning.
It is a core part of Artificial intelligence, and all Al
‘gent works on the concept of reinforcement learning.
Here we do not need to pre-program the agent,
cams from its own experience without any human
intervention,
* Example : Suppose there is an Al agent present within
& maze environment, and his goal is 0 find the
Glamond. The agent interacts with the environment by
Performing some actions, and based on those actions,
the state ofthe agent gets changed, and it also receives
reward or penalty as feedback.
Tect-Neo Publications..a SACHIN SHAH Ventuté‘AL8.DS«ll (MU-Sem 7.17)
The agent continues doin
action, change sate/remss
feedback), and by doing
explores the environment,
The agent leams that what actions
fedtak ora and wa scons fad ee
feedback penalty. Asa positive reward, the agem gets
Positive point and as. penalty, it gets a negative point
11 is employed by various software and machines to
find the best possible behavior or path it should take in
a specific situation,
Reinforcement
8 these three
if Teaming differs from supervised
Teaming in a way that in supervised teaming the
‘taining data has the answer key with itso the mode! is
trained with the correct answer itself whereas in
reinforcement leaming, there is no answer but the
reinforcement agent decides what to do to perform the
given task. In the absence of a training dataset, itis
bound to lear from its experience.
45.5 MULTIPLE CHOICE QUESTIONS
a RECHOICE QUESTIONS
Q.5.1 Which ofthe following algorithm isnot an example of an
ensemble method?
(@) Extra Tree Regressor
() Random Forest,
(© Gradient Boosting
@ Decision Tree ans. (8)
Q.5.2 Whatis iru about an ensembled clsier?
1. Classifiers that are sur can yote with more
2. Classifiers canbe surer about a pariolr pat ofthe
space
3. Most of the times, it perfoms beter than a single
classifier
(@) Land2@) Lands
(© 2and3 @ Alloftheabove Ans :(@)
Q.53 Which ofthe folowing option i / are comect reguding
benefits of ensemble model?
1. Better performance
2. Generalized models
3. Better intexpretabilty
(@) Land3 (6) 2and3
(©) land? (6) 1,2and3 Vans. (0)
(MU-New Syllabus w.ef academic year 22-23)(M7-62)
Oss
Q.56
Qs7
O58
ass
sed Ml. Classification Toct
Which of the following can be true for selecting base
Jearmers for an ensemble?
1. Different feamers ean come from same algorithm
‘with different hyper parameters
2. Different leamers can come from different
algorithms
3. Different leamers can come from different training
spaces
@ 2
(© Vand3 (@) 1,2and3 Yans.:(@)
‘True or False: Ensemble learning can only be applied to
supervised learning methods.
(@ Tue) False Yans.: (0)
‘Tree or False: Ensembles will yield bad results when
‘there is significant diversity among the models.
‘Note: All individual models have meaningful and good
predictions
(a) True (b) False ~Ans. + (b)
Which ofthe following is / are tre about weak learners
used in ensemble model?
1. They have low variance and they don't usually
overfit,
2, They have high bias, so they can not solve hard
Jeaming problems.
3. They have high variance and they don't usually
overfit,
(@) Land2 (6) Yand3
(©) 2and3 (8) None ofthese Ans. (a)
‘True or Fase: Ensemble of classifiers may or may not be
‘more accurate than any ofits individual model
(@) Te () False Ans. : (a)
If you use an ensemble of different base model, is it
necessary to tune the hyper parameters of all base
‘modes to improve the ensemble performance?
(a) Yes (@) No
(© cantsay
Ans. 2(0)
Q.5.10 Generally, an ensemble method works better, if the
individual base models have 2
‘Note: Suppose each individual base models have
scouracy greater than 50%.
(@) Less comeation among predictions
(©) High cometation among predictions
(©) Correlation does not have any impact on ensemble
output
(©) None ofthe above Ans. : (9)
Tab rechieo putcations.A SACHIN SHAN VentureAL& DS. (MU-Som 7-11)
7 as either of the candidates
See ae ce
‘works similar to above-discussed eecton procedure
int Peroos ae ike Base models of enerle metho
(2) Bagging (6) Boosting
(©) AOEB (@) None ofthese
Ans.
suppose you ae given a" predictions on test lab a”
‘Sao Mg) pec We
ere ftoing mao) can Be sdf combine Be
prisons of thse noe?
Nee: Wear working on arson poem
1. Matin
2 Prot
3 avenge
4. Weighed sm
5. Minimum od Maximum
6. General ean
@13med elas
(© 134and6 ( Allofabove
How can we aig he weight Yo opt of difeet
dla sense?
1. Uses rita etme opi
2. Chote te weigh xing co aliaon
2. Give beh wigs to moe acre models
(tad?) tas
(23 Alefabve
Q.siz
vans. :(@),
YAns.:(@)
Which of the following is trve about averaging
ensemble?
(@) Itean only be used in classification problem
(@) It-can only be used in regression problem
(©) Xt can be used in both classification 2s well as
regression
(2) None ofthese Yas. (6)
Q. 5.15 Suppose there ae 25 base classifies. Each classifier has
‘error rates of © = 0.35. Suppose you are using averaging
4 ensemble technique. What will be the probabilities
that ensemble of above 25 classifiers will make a wrong
prediction?
Note: All classifies are independent ofeach other
(=) 005 () 0.05
om (009 Ans. (0)
(MU-New Syllabus wef academic year 22-23)(M7-62)
Q.517
Qsu8
59
Q.5.20
Qs2
Q.5.22
iadvanced ML
Classification
‘rich of the following POFIMetrs can be tye
Tending good ensemble model in bagging a
algorithms?
1. Max numberof samples
2, Max features
3, Bootstrapping of samples
44, Bootstrapping of Features
(o Lands (&) 2and3
(@ 1ad2 (A) allofabove Yam
Forte below confusion matrix, what is thereat?
Nas [5
wos [53272 | 1307
3 io (| ae
@07 008
(09 995 Yhza.s0
Which among the following evaluation meties woulg
you NOT use to measure the performance of
‘lassification model?
(@) Precision
(©) Mean Squared Error
(6) Recall
(Fl score YAms (6)
Which ofthe following i comet use of eros validation?
(@) Selecting variables to include in a model
(©) Comparing predictors
(©) Selecting parameters in prediction function
(@) Allof the mentioned YAns.:(@)
Point out the wrong combination.
(e) True negative = correctly rejected
(©) False negative = correctly rejected
(©) False positive = conetl identified
(©) Allof the mentioned YAns:(0)
Which ofthe following is a common err measure?
(@) Sensitivity
(©) Median absolte deviation
© Specificity
() Allofthe mentioned Yans.1(8)
‘Which ofthe following cross validation versions may 9°
be suitable for very large datasets with hundreds of
thousind of sample?
(@) kefold cross-validation
() Leave-one-out cross-validation
(©) Holdout method
(@ Alot the above ans.
Tech-Neo Publications.A SACHIN SHAH Ventu®‘AL& DS-II(MU-Sem
Q.525
Q.526
isa disadva
vantage of kfld ers
(a) The variance ofthe result
is increased, NIM estimate is reduced ask
validation method?
(© Reduced bias
(@) The taining algritim hs o
times, Fenun from scratch k
Reinforcement learnin, aes 1a)
(@) Unsupervised laming
(©) Supervised teasing
(©) Award based learning
(None —
‘Which ofthe following is an appicati
wie 18 an application of reinforcement
(@) Topic modeting
(©) Recommendation sytem
(©) Pattem recognition
(@ Image classification YAns.: (6)
Which of the following is true about reinforcement
earning?
(@) The agent gets rewards or penalty according to the
action
(©) It's an online learning
(© The target of an agent is to maximize the rewards
(@) All of the above Ans. (8)
Q.5.27 If TP=9 FP=6 FN=26 TN=T0 then Error rate willbe
(2) 45 percentage
(b) 99 percentage
(©) 28 percentage
(6) 20 percentage YAns.:(0,
(Advancod ML Classification Techni
az
as
as
as
ar
as
as
0.10
an
a1
0.13
Descriptive Questions
What is Rolnforcoment Leaming? Explain the
significance of Award and action in RL.
Which are the various metrics for evaluating the
classifier performance?
What is confusion matrix? What aro the contents of
rd
Compare and contrast different
‘evaluating the classifier performance.
fs based on cost
mots for
How can we compare classi
benefit and ROC curves?
Write a shor note on: Holdout method
How Cross validation method can be used 10
evaluate classifiers?
‘Write a note on: Bootstrap.
‘What is meant by ensemble leaming? What are the
ferent typos of ensemble classifiers?
Explain how bagging method works.
How is boosting method different than bagging?
Write a note on: Adaboost
Explain Random Forest model in detail, Can we
‘consider this as a extension over bagging method?
‘What is class imbalance problem? Explain it with an
example.
How can we improve classification accuracy of
(Class-imbalanced data?
‘Chapter Ends...
ooo