homework4_v1.0
homework4_v1.0
Rules:
1. Homework submission is done via CMU Autolab system. Please package your writeup and code into
a zip or tar file, e.g., let submit.zip contain writeup.pdf and the code. Submit the package to
https://autolab.cs.cmu.edu/courses/10701-f15.
2. Like conference websites, repeated submission is allowed. So please feel free to refine your answers.
We will only grade the latest version.
3. Autolab may allow submission after the deadline, note however it is because of the late day policy.
Please see course website for policy on late submission.
4. We recommend that you typeset your homework using appropriate software such as LATEX. If you are
writing please make sure your homework is cleanly written up and legible. The TAs will not invest
undue effort to decrypt bad handwriting.
5. You are allowed to collaborate on the homework, but you should write up your own solution and code.
Please indicate your collaborators in your submission.
1
1 VC dimension (20 Points) (Xun)
To show a concept class H has VC dimension d, we need to prove both the lower bound VCdim(H) ≥ d and
the upper bound VCdim(H) ≤ d.
You do not need to know anything about convexity beyond this hint to solve this problem.
2. Show that axis-aligned boxes h(x) = 1{ai ≤xi ≤bi ,∀i} in Rn has VC dimension 2n.
1. Let’s first justify the update rule. Imagine there is an adversarial who wants to fool ht in the next
round by adjusting the distribution. More formally, given ht , the adversarial wants to set Dt+1 such
that errDt+1 (ht ) = 12 . Show that the particular choice of αt = 12 log 1−
t achieves this goal.
t
Note: why do we want such an adversarial setting? Because otherwise A might as well return ht or
−ht again in round t + 1 and still be slightly better than random guessing, which means it essentially
learns nothing.
QT −1 PT
2. Show that DT +1 (i) = m · t=1 Zt e−yi f (xi ) , where f (x) = t=1 αt ht (x).
QT
3. Show that errS (H) ≤ t=1 Zt .
QT PT 2
4. Show that t=1 Zt ≤ e−2 t=1 γt .
2
5
4 x x x
3 6 7
3 x
2
2 x1 x4 x8
1 x5 x9
0
0 1 2 3 4 5 6
5. Now let γ = mint γt . From 3 and 4, we know the training error approaches zero at exponential rate
with respect to T . Then how many rounds are needed to achieve a training error ε > 0? Please express
in big-O notation, T = O(·).
6. Consider the data set in Figure 1. Run T = 3 iterations of AdaBoost with decision stumps (axis-aligned
separators) as the base learners. Illustrate the learned weak hypotheses {ht } in Figure 1 and fill in
Table 1. The MATLAB code that generates Figure 1 is available on the course website.
We recommend writing a simple program as it might be tedious to calculate by hand. It will also help
you understand how it works in practice.
t t αt Dt (1) Dt (2) Dt (3) Dt (4) Dt (5) Dt (6) Dt (7) Dt (8) Dt (9) errS (H)
1
2
3
3
which assigns each xi ∈ X to one of the clusters, so as to optimize the following objective,
K nk Xnk
X 1 X
φ= kxki − xkj k2 (4)
nk i=1 j=1
k=1
where xki denotes the ith sample in Ck and nk is the number of data samples in Ck .
4.1 Theory
1. Prove the following Lemma.
Lemma 1. Given a set of points X ⊆ Rd with their center as x̄. For any point s,
X X
kx − sk2 − kx − x̄k2 = |X | · kx̄ − sk2 (5)
x∈X x∈X
2. Use Lemma.1 to prove that minimizing the objective in Eq.4 is equal to minimizing the following
objective:
K X
X n
ω(UK , f ; X ) = 1(f (xi ) = k)kxi − µk k2 (6)
k=1 i=1
3. Algorithm.1 presents how K-means proceeds. Show respectively that both Step 1 and Step 2 will
decrease the objective φ (or ω).
5. In K-means (as in Algorithm.1), we terminate the iterative process when the objective no longer
changes. Prove that K-means terminates in a finite number of iterations.
4.2 Implementation
Now you are ready to implement K-means by yourself. A dataset including 2429 human faces is provded in
the file kmeans data.csv. Each of the 2429 lines in this file corresponds to a 19 × 19 image of a human face.
Every image is represented as a 361-dimensional vector of grayscale values, in column-major format.
1. Implement K-means algorithm, as detailed in Alogorithm.1. Your implementation should initialize
{µk }K
k=1 by uniformly randomly choosing from X . Compute the objective value in Eq.4 of each it-
eration. You K-means algorithm should be terminated when a given number of iterations M are
reached.
4
2. Run your implementation for 15 times, using k = 5, M = 50. Draw the objective v.s. iterations for all
15 runs in one plot. Have they converged? How many iterations does each iteration take to converge?
Choose the run with minimal objective value, compute the mean faces for this run, i.e., the centers of
the clusters. Visualize the mean faces.
3. Usually the clustering results by K-means can be greatly improved by carefully choosing an initialization
strategy. K-means++ is a randomized seeding technique which can improve both the speed and the
accuracy of K-means [1]. Algorithm.2 elaborates how K-means++ initializes the clustering centers
{µk }K
k=1 .
Implement K-means++ based on your K-means implementation. Note that you need to implement a
sampler by yourself which samples from a multinomial distribution. Then, run your implementation
with K-means++ for 15 times, using k = 5, M = 50. Draw the objective v.s. iterations for all 15 runs
in one plot. How many iterations do they take to converge? Compute the mean faces for the run with
the minimal objective. Visualize the mean faces. Compare your curve and mean faces to your previous
ones. Conclude your observation.
Submit both the write-up and your code.
References
[1] D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the
eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial
and Applied Mathematics, 2007. 5