0% found this document useful (0 votes)

107 views

10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University

This midterm exam for Machine Learning covers short questions, Bayes optimal classification, logistic regression, regression, SVM, and boosting. It contains 15 numbered pages and students have 90 minutes to complete it. Students can use class notes, slides, and readings but cannot use other materials or access the internet. Questions range in difficulty and students should allocate time efficiently to answer both easy and difficult questions. The exam is out of a total of 100 points and is divided among the different topics.

Uploaded by

Mahi S

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views

10-701/15-781 Machine Learning - Midterm Exam, Fall 2010: Aarti Singh Carnegie Mellon University

Uploaded by

Mahi S

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

10-701/15-781 Machine Learning - Midterm Exam, Fall 2010

Aarti Singh
Carnegie Mellon University

1. Personal info:

• Name:
• Andrew account:
• E-mail address:

2. There should be 15 numbered pages in this exam (including this cover sheet).

3. You can use any material you brought: any book, class notes, your print outs of
class materials that are on the class website, including annotated slides and relevant
readings, and Andrew Moore’s tutorials. You cannot use materials brought by other
students. Calculators are not necessary. Laptops, PDAs, phones and Internet access
are not allowed.

4. If you need more room to work out your answer to a question, use the back of the page
and clearly mark on the front of the page if we are to look at what’s on the back.

5. Work efficiently. Some questions are easier, some more difficult. Be sure to give yourself
time to answer all of the easy ones, and avoid getting bogged down in the more difficult
ones before you have answered the easier ones.

6. You have 90 minutes.

7. Good luck!

Question Topic Max. score Score

1 Short questions 20
2 Bayes Optimal Classification 15
3 Logistic Regression 18
4 Regression 16
5 SVM 16
6 Boosting 15
Total 100

1
1 Short Questions [20 pts]
Are the following statements True/False? Explain your reasoning in only 1
sentence.

1. Density estimation (using say, the kernel density estimator) can be used to perform
classification.
True: Estimate the joint density P (Y, X), then use it to calculate P (Y |X).

2. The correspondence between logistic regression and Gaussian Naı̈ve Bayes (with iden-
tity class covariances) means that there is a one-to-one correspondence between the
parameters of the two classifiers.
False: Each LR model parameter corresponds to a whole set of possible GNB classifier
parameters, there is no one-to-one correspondence because logistic regression is discrimi-
native and therefore doesn’t model P (X), while GNB does model P (X).

3. The training error of 1-NN classifier is 0.

True: Each point is its own neighbor, so 1-NN classifier achieves perfect classification on
training data.

4. As the number of data points grows to infinity, the MAP estimate approaches the MLE
estimate for all possible priors. In other words, given enough data, the choice of prior
is irrelevant.
False: A simple counterexample is the prior which assigns probability 1 to a single choice
of parameter θ.

5. Cross validation can be used to select the number of iterations in boosting; this pro-
cedure may help reduce overfitting.
True: The number of iterations in boosting controls the complexity of the model, therefore,
a model selection procedure like cross validation can be used to select the appropriate
model complexity and reduce the possibility of overfitting.

6. The kernel density estimator is equivalent to performing kernel regression with the
value Yi = n1 at each point Xi in the original data set.
False: Kernel regression predicts the value of a point as the weighted average of the values
at nearby points, therefore if all of the points have the same value, then kernel regression
will predict a constant (in this case, n1 ) for all values.

7. We learn a classifier f by boosting weak learners h. The functional form of f ’s decision

boundary is the same as h’s, but with different parameters. (e.g., if h was a linear
classifier, then f is also a linear classifier).
False: For example, the functional form of a decision stump is a single axis-aligned split
of the input space, but the functional form of the boosted classifier is linear combinations of
decision stumps which can form a more complex (piecewise linear) decision boundary.

2
8. The depth of a learned decision tree can be larger than the number of training examples
used to create the tree.
False: Each split of the tree must correspond to at least one training example, therefore, if
there are n training examples, a path in the tree can have length at most n.
Note: There is a pathological situation in which the depth of a learned decision tree can be
larger than number of training examples n - if the number of features is larger than n and
there exist training examples which have same feature values but different labels. Points
have been given if you answered true and provided this explanation.

For the following problems, circle the correct answers:

1. Consider the following data set:

Circle all of the classifiers that will achieve zero training error on this data set. (You
may circle more than one.)

(a) Logistic regression

(b) SVM (quadratic kernel)
(c) Depth-2 ID3 decision trees
(d) 3-NN classifier

Solution: SVM (quad kernel) and Depth-2 ID3 decision trees

3
2. For the following dataset, circle the classifier which has larger Leave-One-Out Cross-
validation error.

a) 1-NN
b) 3-NN

Solution: 1-NN since 1-NN CV err: 5/10, 3-NN CV err: 1/10

4
2 Bayes Optimal Classification [15 pts]
In classification, the loss function we usually want to minimize is the 0/1 loss:

`(f (x), y) = 1{f (x) 6= y}

where f (x), y ∈ {0, 1} (i.e., binary classification). In this problem we will consider the effect
of using an asymmetric loss function:

`α,β (f (x), y) = α1{f (x) = 1, y = 0} + β1{f (x) = 0, y = 1}

Under this loss function, the two types of errors receive different weights, determined by
α, β > 0.

1. [4 pts] Determine the Bayes optimal classifier, i.e. the classifier that achieves minimum
risk assuming P (x, y) is known, for the loss `α,β where α, β > 0.
Solution: We can write
arg min E`α,β (f (x), y) = arg min EX,Y [α1{f (X) = 1, Y = 0} + β1{f (X) = 0, Y = 1}]
f f
= arg min EX [EY |X [α1{f (X) = 1, Y = 0} + β1{f (X) = 0, Y = 1}]]
f
Z
= arg min EX [ α1{f (X) = 1, y = 0} + β1{f (X) = 0, y = 1}dP (y|x)]
f y
Z
= arg min [α1{f (x) = 1}P (y = 0|x) + β1{f (x) = 0}P (y = 1|x)]dP (x)
f x

We may minimize the integrand at each x by taking:

(
1 βP (y = 1|x) ≥ αP (y = 0|x)
f (x) =
0 αP (y = 0|x) > βP (y = 1|x).

2. [3 pts] Suppose that the class y = 0 is extremely uncommon (i.e., P (y = 0) is small).

This means that the classifier f (x) = 1 for all x will have good risk. We may try to
put the two classes on even footing by considering the risk:

R = P (f (x) = 1|y = 0) + P (f (x) = 0|y = 1)

Show how this risk is equivalent to choosing a certain α, β and minimizing the risk
where the loss function is `α,β .
Solution: Notice that
E`α,β (f (x), y) = αP (f (x) = 1, y = 0) + βP (f (x) = 0, y = 1)
= αP (f (x) = 1|y = 0)P (y = 0) + βP (f (x) = 0|y = 1)P (y = 1)

1 1
which is same as the minimizer of the given risk R if α = P (y=0) and β = P (y=1) .

5
3. [4 pts] Consider the following classification problem. I first choose the label Y ∼
Bernoulli( 12 ), which is 1 with probability 21 . If Y = 1, then X ∼ Bernoulli(p); otherwise,
X ∼ Bernoulli(q). Assume that p > q. What is the Bayes optimal classifier, and what
is its risk?
Solution: Since label is equally likely to be 1 or 0, to minimize prob of error simply
predict the label for which feature value X is most likely. Since p > q, X = 1 is most
likely for Y = 1 and X = 0 is most likely for Y = 0. Hence f ∗ (X) = X. Baye’s risk
= P (X 6= Y ) = 1/2 · (1 − p) + 1/2 · q.
Formally: Notice that since Y ∼ Bernoulli( 21 ), we have P (Y = 1) = P (Y = 0) = 1/2.

f ∗ (x) = arg max P (Y = y|X = x) = arg max P (X = x|Y = y)P (Y = y)

y y
= arg max P (X = x|Y = y)
y

Therefore, f ∗ (1) = 1 since p = P (X = 1|Y = 1) > P (X = 1|Y = 0) = q, and f ∗ (0) = 0

since 1 − p = P (X = 0|Y = 1) < P (X = 0|Y = 0) = 1 − q. Hence f ∗ (X) = X. The risk is
R∗ = P (f ∗ (X) 6= Y ) = P (X 6= Y ).
1 1
R∗ = P (Y = 1)P (X = 0|Y = 1) + P (Y = 0)P (X = 1|Y = 0) = · (1 − p) + · q.
2 2

4. [4 pts] Now consider the regular 0/1 loss `, and assume that P (y = 0) = P (y = 1) =
1/2. Also, assume that the class-conditional densities are Gaussian with mean µ0 and
co-variance Σ0 under class 0, and mean µ1 and co-variance Σ1 under class 1. Further,
assume that µ0 = µ1 .
For the following case, draw contours of the level sets of the class conditional densities
and label them with p(x|y = 0) and p(x|y = 1). Also, draw the decision boundaries
obtained using the Bayes optimal classifier in each case and indicate the regions where
the classifier will predict class 0 and where it will predict class 1.

10 40
Σ0 = , Σ1 =
04 01

Solution: next page

6
Y= 0

Y= 1 Y= 1

Y= 0

7
3 Logistic Regression [18 pts]
We consider here a discriminative approach for solving the classification problem illustrated
in Figure 1.

Figure 1: The 2-dimensional labeled training set, where ‘+’ corresponds to class y=1 and
‘O’ corresponds to class y = 0.

1. We attempt to solve the binary classification task depicted in Figure 1 with the simple
linear logistic regression model
1
P (y = 1|~x, w)
~ = g(w0 + w1 x1 + w2 x2 ) = .
1 + exp(−w0 − w1 x1 − w2 x2 )

Notice that the training data can be separated with zero training error with a linear
separator.
Consider training regularized linear logistic regression models where we try to maximize
n
X
log (P (yi |xi , w0 , w1 , w2 )) − Cwj2
i=1

for very large C. The regularization penalties used in penalized conditional log-
likelihood estimation are −Cwj2 , where j = {0, 1, 2}. In other words, only one of the
parameters is regularized in each case. Given the training data in Figure 1, how does
the training error change with regularization of each parameter wj ? State whether the
training error increases or stays the same (zero) for each wj for very large C. Provide
a brief justification for each of your answers.

8
(a) By regularizing w2 [2 pts]

SOLUTION: Increases. When we regularize w2 , the resulting boundary can rely

less and less on the value of x2 and therefore becomes more vertical. For very large C, the
training error increases as there is no good linear vertical separator of the training data.

(b) By regularizing w1 [2 pts]

SOLUTION: Remains the same. When we regularize w1 , the resulting boundary

can rely less and less on the value of x1 and therefore becomes more horizontal and the
training data can be separated with zero training error with a horizontal linear separator.

(c) By regularizing w0 [2 pts]

SOLUTION: Increases. When we regularize w0 , then the boundary will eventually

go through the origin (bias term set to zero). Based on the figure, we can not find a linear
boundary through the origin with zero error. The best we can get is one error.

2. If we change the form of regularization to L1-norm (absolute value) and regularize w1

and w2 only (but not w0 ), we get the following penalized log-likelihood
n
X
log P (yi |xi , w0 , w1 , w2 ) − C(|w1 | + |w2 |).
i=1

Consider again the problem in Figure 1 and the same linear logistic regression model
P (y = 1|~x, w)
~ = g(w0 + w1 x1 + w2 x2 ).

(a) [3 pts] As we increase the regularization parameter C which of the following

scenarios do you expect to observe? (Choose only one) Briefly explain your choice:
( ) First w1 will become 0, then w2 .
( ) First w2 will become 0, then w1 .
( ) w1 and w2 will become zero simultaneously.
( ) None of the weights will become exactly zero, only smaller as C increases.
SOLUTION: First w1 will become 0, then w2 .
The data can be classified with zero training error and therefore also with high log-
probability by looking at the value of x2 alone, i.e. making w1 = 0. Initially we might
prefer to have a non-zero value for w1 but it will go to zero rather quickly as we increase
regularization. Note that we pay a regularization penalty for a non-zero value of w1 and
if it does not help classification why would we pay the penalty? Also, the absolute value
regularization ensures that w1 will indeed go to exactly zero. As C increases further,
even w2 will eventually become zero. We pay higher and higher cost for setting w2 to
a non-zero value. Eventually this cost overwhelms the gain from the log-probability of
labels that we can achieve with a non-zero w2 .

9
(b) [3 pts] For very large C, with the same L1-norm regularization for w1 and w2 as
above, which value(s) do you expect w0 to take? Explain briefly. (Note that the
number of points from each class is the same.) (You can give a range of values
for w0 if you deem necessary).
SOLUTION: For very large C, we argued that both w1 and w2 will go to zero. Note
that when w1 = w2 = 0, the log-probability of labels becomes a finite value, which is
equal to n log(0.5), i.e. w0 = 0. In other words, P (y = 1|~x, w)=P
~ (y = 0|~x, w)=0.5.
~ We
expect so because the number of elements in each class is the same and so we would
like to predict each one with the same probability, and w0 =0 makes P (y = 1|~x, w)=0.5.
~
(c) [3 pts] Assume that we obtain more data points from the ‘+’ class that corre-
sponds to y=1 so that the class labels become unbalanced. Again for very large
C, with the same L1-norm regularization for w1 and w2 as above, which value(s)
do you expect w0 to take? Explain briefly. (You can give a range of values for w0
if you deem necessary).
SOLUTION: For very large C, we argued that both w1 and w2 will go to zero. With
unbalanced classes where the number of ‘+’ labels are greater than that of ‘o’ labels,
we want to have P (y = 1|~x, w) ~ For that to happen the value of w0
~ > P (y = 0|~x, w).
should be greater than zero which makes P (y = 1|~x, w)
~ > 0.5.

10
4 Kernel regression [16 pts]
Now lets consider the non-parametric kernel regression setting. In this problem, you will
investigate univariate locally linear regression where the estimator is of the form:

fb(x) = β1 + β2 x

and the solution for parameter vector β = [β1 β2 ] is obtained by minimizing the weighted
least square error:
n Xi −x

X K
J(β1 , β2 ) = Wi (x)(Yi − β1 − β2 Xi )2 where Wi (x) = Pn h
Xi −x
,
i=1 i=1 K h

where K is a kernel with bandwidth h. Observe that the weighted least squares error can
be expressed in matrix form as

J(β1 , β2 ) = (Y − Aβ)T W (Y − Aβ),

where Y is a vector of n labels in the training example, W is a n × n diagonal matrix with

weight of each training example on the diagonal, and
 
1 X1
 1 X2 
A=  ... 


1 Xn

1. [4 pts] Derive an expression in matrix form for the solution vector β̂ that minimizes
the weighted least square.
Solution: Differentiating the objective function wrt β, we have:
∂J(β)
= 2AT W Aβ − 2AT W T Y.
β

Therefore, the solution β̂ satisfies the following normal equations:

AT W Aβ = AT W T Y

And if AT W A is invertible, then the solution is β̂ = (AT W A)−1 AT W T Y . (Note that W = W T ,

so the solution can be written in terms of either).

2. [3 pts] When is the above solution unique?

Solution: When AT W A is invertible. Since W is a diagonal matrix, AT W A = (W 1/2 A)T (W 1/2 A)
and hence rank(AT W A) = min(n, 2) - Refer TK’s recitation notes. Since a matrix is invert-
ible if it is full rank, a unique solution exists if n ≥ 2.

3. [3 pts] If the solution is not unique, one approach is to optimize the objective function
J using gradient descent. Write the update equation for gradient descent in this case.
Note: Your answer must be expressed in terms of the matrices defined above.

11
Solution: Let α > 0 denote the step-size.
α ∂J(β)
β (t+1) = β (t) −
2 ∂β
= β (t) − αAT W (Aβ − Y )

4. [3 pts] Can you identify the signal plus noise model under which maximizing the
likelihood (MLE) corresponds to the weighted least squares formulation mentioned
above?
iid
Solution: Y = β1 + β2 X + , where i ∼ N (0, σi2 ) for i = 1, . . . , n. Here σi2 ∝ 1/Wi (x).

5. [3 pts] Why is the above setting non-parametric? Mention one advantage and one
disadvantage of nonparametric techniques over parametric techniques.
Solution: The above setting is non-parametric since it performs locally linear fits, there-
fore number of parameters scale with data. Notice that Wi (x), and hence the solution β̂,
depends on x. Thus we are fitting the parameters to every point x - therefore total number
of parameters can be larger than n.
Nonparametric techniques do not place very strict assumptions on the form of the underlying
distribution or regression function, but are typically computationally expensive and require
large number of training examples.

12
5 SVM [16 pts]
5.1 L2 SVM
Let {(xi , yi )}li=1 be a set of l training pairs of feature vectors and labels. We consider binary
classification, and assume yi ∈ {−1, +1} ∀i. The following is the primal formulation of L2
SVM, a variant of the standard SVM obtained by squaring the hinge loss:
l
1 > CX 2
min w w+ ξ
w,b,ξ 2 2 i=1 i
s.t. yi (w> xi + b) ≥ 1 − ξi , i ∈ {1, . . . , l},
ξi ≥ 0, i ∈ {1, . . . , l}.

1. [4 pts] Show that removing the last set of constraints {ξi ≥ 0 ∀i} does not change the
optimal solution to the primal problem.
Solution: Let (w∗ , b∗ , ξ ∗ ) be the optimal solution to the problem without the last set of
constraints. It suffices to show that ξi∗ ≥ 0 ∀i. Suppose it is not the case, then there exists
some ξj∗ < 0. Then we have

yj ((w∗ )> xj + b∗ ) ≥ 1 − ξj∗ > 1,

implying that ξj0 = 0 is a feasible solution and yet gives a smaller objective value since
(ξj0 )2 = 0 < (ξj∗ )2 , a contradiction to the assumption that ξj∗ is optimal.

2. [3 pts] After removing the last set of constraints, we get a simpler problem:
l
1 > CX 2
min w w+ ξ
w,b,ξ 2 2 i=1 i (1)
s.t. yi (w> xi + b) ≥ 1 − ξi , i ∈ {1, . . . , l}.

Give the Lagrangian of (1).

Solution: The Lagrangian is
l l
1 > CX 2 X
L(w, b, ξ, α) := w w + ξi − αi (yi (w> xi + b) − 1 + ξi ),
2 2
i=1 i=1

where αi ≥ 0, ∀i are the Lagrange multipliers.

3. [6 pts] Derive the dual of (1). How is it different from the dual of the standard SVM
with the hinge loss?

13
Solution: Taking partial derivatives of the Lagrangian wrt w, b and ξi ,
l
X
∇w L(w, b, ξ, α) = 0 ⇐⇒ w = αi yi xi ,
i=1
l
X
∂b L(w, b, ξ, α) = 0 ⇐⇒ αi yi = 0,
i=1
∂ξi L(w, b, ξ, α) = 0 ⇐⇒ ξi = αi /C .

Plugging these back to the Lagrangian, rearranging terms and keeping constraints on the
Lagrange multipliers we obtain the dual
1
max − α> (Q + I/C)α + 1> α
α 2
>
s.t. y α = 0, αi ≥ 0 ∀i,

where 1 is a vector of ones, I is the identity matrix, y is the vector of labels yi ’s, and Q is
the l-by-l kernel matrix such that Qij = yi yj x>
i xj . Compared with the dual of the standard
SVM, the quadratic term is regularized by an additional positive diagonal matrix, and thus
has stronger convexity leading to faster convergence. The other difference is that the dual
variables here are only bounded from below, but in the standard SVM the dual variables are
bounded both from above (by C) and from below. In fact, for L2 svms the solution does not
depend on the tradeoff parameter C.

5.2 Leave-one-out Error and Support Vectors

[3 pts] Consider the standard two-class SVM with the hinge loss. Argue that under a given
value of C,
#SVs
LOO error ≤ ,
l
where l is the size of the training data and #SVs is the number of support vectors obtained
by training SVM on the entire set of training data.
Solution: Since the decision function only depends on the support vectors, removing a non-
support vector from the training data and then re-training an SVM would lead to the same decision
function. Also, non-support vectors must be classified correctly. As a result, errors found in the
leave-one-out validation must be caused by removing the support vectors, proving the desired
result.

14
6 Boosting [15 pts]
1. Consider training a boosting classifier using decision stumps on the following data set:

(a) [3 pts] Which examples will have their weights increased at the end of the first
iteration? Circle them.
Solution: The negative example since the decision stump with least error in first
iteration is constant over the whole domain. Notice this decision stump only predicts
incorrectly on the negative example, whereas any other decision stump predicts incor-
rectly on at least two training examples.
(b) [3 pts] How many iterations will it take to achieve zero training error? Explain.
Solution: At least three iterations. The first iteration misclassifies the negative ex-
ample, the second iteration misclassifies two of the positive examples as the negative
one has large weight. The third iteration is needed since a weighted sum of the first two
decision stumps can’t yield zero training error, and misclassifies the other two positive
examples. See Figures below.

+ + +

(c) [3 pts] Can you+add one more example to the training set so that+ boosting will
achieve zero training error in two steps? If not, explain why.
Solution: No. Notice that the simplest case is adding one more negative example
in center or one more positive example between any two positive examples, as it still
yields three decision regions with axis-aligned boundaries. If only two steps were
enough, then a linear combination of only two decision stumps sign(α1 h1 (x)+α2 h2 (x))

15
should be able to yield three decision regions. Also notice that at least one of h1 or h2
misclassifies two positive examples. If only h2 misclassifies two positive examples, the
possible decisions are (1) sign(α1 − α2 ) on those two positive examples, (2) sign(α1 +
α2 ) on the remaining positive examples and (3) sign(α1 −α2 ) on the negative examples
- which don’t yield zero training error since signs on (1) and (3) agree. If both h1 and h2
misclassify two positive examples, we have (1) sign(α1 −α2 ) on two positive examples,
(2) sign(−α1 + α2 ) on the remaining positive examples and (3) sign(−α1 − α2 ) on the
negative - which again don’t yield zero training error since signs on (1) and (2) don’t
agree.

2. [2 pts] Why do we want to use “weak” learners when boosting?

Solution: To prevent overfitting, since the complexity of the overall learner increases at
each step. Starting with weak learners implies the final classifier will be less likely to overfit.

3. [4 pts] Suppose AdaBoost is run on m training examples, and suppose on each round
that the weighted training error t of the tth weak hypothesis is at most 1/2 − γ, for
some number γ > 0. After how many iterations, T , will the combined hypothesis H
be consistent with the m training examples, i.e., achieves zero training error? Your
answer should only be expressed in terms of m and γ. (Hint: What is the training
error when 1 example is misclassified?)
Solution: Training error when 1 example is misclassified = 1/m. Therefore, we need to
guarantee that training error is < 1/m. Since t ≤ 1/2 − γ, from class notes we know that

Training err of the combined hypothesis H ≤ exp(−2T γ 2 )

The upper bound is < 1/m if T > ln m/2γ 2 .

Par Inc Golf Case Study For Hypothesis Testing
100% (1)
Par Inc Golf Case Study For Hypothesis Testing
2 pages
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Predictive Modeling PDF
100% (3)
Predictive Modeling PDF
49 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
Final: CS 189 Spring 2013 Introduction To Machine Learning
No ratings yet
Final: CS 189 Spring 2013 Introduction To Machine Learning
9 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
midterm2008f_sol
No ratings yet
midterm2008f_sol
12 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Midterm Sp16 Solutions
100% (1)
Midterm Sp16 Solutions
17 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Exam 21
No ratings yet
Exam 21
17 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
finals19
No ratings yet
finals19
16 pages
Assignment 5 Solution
No ratings yet
Assignment 5 Solution
6 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
13 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
quiz2_B
No ratings yet
quiz2_B
6 pages
MLFA Spring 2024
No ratings yet
MLFA Spring 2024
11 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
No ratings yet
SS ZG568 EC 2R SECOND SEM 2020 2021 Solution 1617000149821
6 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
No ratings yet
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
9 pages
MCQs Dumps 2
No ratings yet
MCQs Dumps 2
15 pages
ass8_solns
No ratings yet
ass8_solns
10 pages
Statistical Methods for ML
No ratings yet
Statistical Methods for ML
24 pages
ML Midterm Question Pool
No ratings yet
ML Midterm Question Pool
7 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
CS725 2020 Midsem
No ratings yet
CS725 2020 Midsem
3 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Midterm Solutions Machine
100% (1)
Midterm Solutions Machine
17 pages
Midterm Solutions PDF
No ratings yet
Midterm Solutions PDF
17 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Tute1 Questions
No ratings yet
Tute1 Questions
4 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Lista Fabio Cozman
No ratings yet
Lista Fabio Cozman
6 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
31 pages
MLvsMAP Merged
No ratings yet
MLvsMAP Merged
208 pages
601 sp09 Midterm Solutions
No ratings yet
601 sp09 Midterm Solutions
14 pages
ML Assignment 3
No ratings yet
ML Assignment 3
5 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Week 5 Prev & Current Assignments
No ratings yet
Week 5 Prev & Current Assignments
23 pages
ML_exam
No ratings yet
ML_exam
11 pages
EE2211_Past_Paper
No ratings yet
EE2211_Past_Paper
14 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Standard Deviation Formulas
No ratings yet
Standard Deviation Formulas
8 pages
Bayes Factor Design Analysis With Informed Priors
No ratings yet
Bayes Factor Design Analysis With Informed Priors
27 pages
Module 7 ANOVAs
No ratings yet
Module 7 ANOVAs
44 pages
The Probability Distribution
No ratings yet
The Probability Distribution
6 pages
Lecture22 Stat104 V3 6up
No ratings yet
Lecture22 Stat104 V3 6up
9 pages
Analyze 4 Hypothesis Roadmap
No ratings yet
Analyze 4 Hypothesis Roadmap
1 page
Quiz Chapter 7 14 Questions Page 1 of 2
No ratings yet
Quiz Chapter 7 14 Questions Page 1 of 2
10 pages
EC424 Syllabus 2015
No ratings yet
EC424 Syllabus 2015
2 pages
Trend PDF
No ratings yet
Trend PDF
23 pages
Au Statistics Hypothesis Testing
No ratings yet
Au Statistics Hypothesis Testing
5 pages
PDF (Ebook) Excel 2019 in Applied Statistics for High School Students: A Guide to Solving Practical Problems by Thomas J. Quirk ISBN 9783030667566, 3030667561 download
100% (5)
PDF (Ebook) Excel 2019 in Applied Statistics for High School Students: A Guide to Solving Practical Problems by Thomas J. Quirk ISBN 9783030667566, 3030667561 download
81 pages
Jackknife: (I) Introduction
No ratings yet
Jackknife: (I) Introduction
11 pages
Stationarity and Unit Root Testing
No ratings yet
Stationarity and Unit Root Testing
21 pages
Robustness Analysis: P. Vanicek E. J. Krakiwsky M. R. Craymer
No ratings yet
Robustness Analysis: P. Vanicek E. J. Krakiwsky M. R. Craymer
37 pages
Exercise Set 1 Solution Key
No ratings yet
Exercise Set 1 Solution Key
9 pages
MULTICOLLINEARITY
No ratings yet
MULTICOLLINEARITY
8 pages
M3.Normal Distribution - Final PDF
No ratings yet
M3.Normal Distribution - Final PDF
23 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Chapter3 Statistics 2021 22
No ratings yet
Chapter3 Statistics 2021 22
35 pages
Digital Assignment - 5: V S Akshit 19bee0435
No ratings yet
Digital Assignment - 5: V S Akshit 19bee0435
5 pages
essential_python
No ratings yet
essential_python
16 pages
DrSoomro - 2588 - 20292 - 1 - Lecture 7 & 8
No ratings yet
DrSoomro - 2588 - 20292 - 1 - Lecture 7 & 8
60 pages
Unit 3
No ratings yet
Unit 3
20 pages
Solution Manual for Modern Business Statistics with Microsoft Excel, 7th Edition, David R. Anderson, Dennis J. Sweeney, Thomas A. Williams, Jeffrey D. Camm, James J. Cochran Michael J. Fry Jeffrey W. Ohlmann download pdf
100% (15)
Solution Manual for Modern Business Statistics with Microsoft Excel, 7th Edition, David R. Anderson, Dennis J. Sweeney, Thomas A. Williams, Jeffrey D. Camm, James J. Cochran Michael J. Fry Jeffrey W. Ohlmann download pdf
29 pages
Business Statistics Project
No ratings yet
Business Statistics Project
5 pages
Slides DS
No ratings yet
Slides DS
334 pages
BS 6
No ratings yet
BS 6
10 pages
Chap1-Overview of Data Science
No ratings yet
Chap1-Overview of Data Science
50 pages