0% found this document useful (0 votes)

4 views

homework4_v1.0

Homework 4 for the course 10-701 Introduction to Machine Learning includes tasks on VC dimension, AdaBoost, Gaussian Mixture Models, and K-means clustering. Students are required to submit their work via the CMU Autolab system, with guidelines on collaboration and submission format. The homework consists of theoretical proofs, algorithm implementations, and practical applications using provided datasets.

Uploaded by

shayanaliyannezhad

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

homework4_v1.0

Uploaded by

shayanaliyannezhad

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

10-701 Introduction to Machine Learning

Homework 4, version 1.0 Due Nov 13, 11:59 am

Rules:

1. Homework submission is done via CMU Autolab system. Please package your writeup and code into
a zip or tar file, e.g., let submit.zip contain writeup.pdf and the code. Submit the package to
https://autolab.cs.cmu.edu/courses/10701-f15.
2. Like conference websites, repeated submission is allowed. So please feel free to refine your answers.
We will only grade the latest version.
3. Autolab may allow submission after the deadline, note however it is because of the late day policy.
Please see course website for policy on late submission.
4. We recommend that you typeset your homework using appropriate software such as LATEX. If you are
writing please make sure your homework is cleanly written up and legible. The TAs will not invest
undue effort to decrypt bad handwriting.
5. You are allowed to collaborate on the homework, but you should write up your own solution and code.
Please indicate your collaborators in your submission.

1
1 VC dimension (20 Points) (Xun)
To show a concept class H has VC dimension d, we need to prove both the lower bound VCdim(H) ≥ d and
the upper bound VCdim(H) ≤ d.

1. Show that linear classifiers h(x) = 1{a> x+b≥0} in Rn has VC dimension n + 1.

Hint: the following theorem might be useful in proving the upper bound. A set of n + 2 points in
Rn can be partitioned into two disjoint subsets S1 and S2 such that their convex hulls intersect. The
convex hull conv(C) of a set C is defined as the set of all convex combinations of points in C:
( k k
)
X X
conv(C) = αi xi : xi ∈ C, αi ≥ 0, αi = 1 . (1)
i=1 i=1

You do not need to know anything about convexity beyond this hint to solve this problem.
2. Show that axis-aligned boxes h(x) = 1{ai ≤xi ≤bi ,∀i} in Rn has VC dimension 2n.

2 AdaBoost (30 Points) (Xun)

Consider m training examples S = {(x1 , y1 ), . . . , (xm , ym )}, where x ∈ X and y ∈ {−1, 1}. Suppose we
have a weak learning algorithm A which produces a hypothesis h : X → {−1, 1} given any distribution D of
examples. AdaBoost works as follows (slightly different from the lecture slides, but they are equivalent):
1
• Begin with a uniform distribution D1 (i) = m, i = 1, . . . , m.
• At each round t = 1, . . . , T ,

– Run A on Dt and get ht .

Dt −αt yi ht (xi )
– Update Dt+1 (i) = Zt e , where Zt is the normalizer and i = 1, . . . , m.
Note that since A is a weak learning algorithm, the produced ht at round t is only slightly better than
random guessing, say, by a margin γt :
1
t = errDt (ht ) = Pr x∼Dt [y 6= ht (x)] = 2 − γt . (2)
P
T
In the end, AdaBoost outputs H = sign t=1 αt ht as the learned hypothesis. We will now prove that
the training error errS (H) of AdaBoost decreases to zero at a very fast rate. In the answer, please state
clearly why the derivation makes sense, for instance “by Cauchy-Schwarz, ...”.

1. Let’s first justify the update rule. Imagine there is an adversarial who wants to fool ht in the next
round by adjusting the distribution. More formally, given ht , the adversarial wants to set Dt+1 such
that errDt+1 (ht ) = 12 . Show that the particular choice of αt = 12 log 1−
t achieves this goal.
t

Note: why do we want such an adversarial setting? Because otherwise A might as well return ht or
−ht again in round t + 1 and still be slightly better than random guessing, which means it essentially
learns nothing.
QT −1 PT
2. Show that DT +1 (i) = m · t=1 Zt e−yi f (xi ) , where f (x) = t=1 αt ht (x).
QT
3. Show that errS (H) ≤ t=1 Zt .
QT PT 2
4. Show that t=1 Zt ≤ e−2 t=1 γt .

2
5

4 x x x
3 6 7

3 x
2

2 x1 x4 x8

1 x5 x9

0
0 1 2 3 4 5 6

Figure 1: Toy data for AdaBoost.

5. Now let γ = mint γt . From 3 and 4, we know the training error approaches zero at exponential rate
with respect to T . Then how many rounds are needed to achieve a training error ε > 0? Please express
in big-O notation, T = O(·).
6. Consider the data set in Figure 1. Run T = 3 iterations of AdaBoost with decision stumps (axis-aligned
separators) as the base learners. Illustrate the learned weak hypotheses {ht } in Figure 1 and fill in
Table 1. The MATLAB code that generates Figure 1 is available on the course website.
We recommend writing a simple program as it might be tedious to calculate by hand. It will also help
you understand how it works in practice.

t t αt Dt (1) Dt (2) Dt (3) Dt (4) Dt (5) Dt (6) Dt (7) Dt (8) Dt (9) errS (H)
1
2
3

Table 1: AdaBoost results

3 Gaussian Mixture Model (10 Points) (Hao)

Consider a multivariate Gaussian Mixture Model with K components:
K
X
p(x) = πk N (x|µk , Σk ) (3)
k=1
P
1. Show that E[x] = k πk µk .

2. Show that Cov[x] = k πk [Σk + µk µ> >

P
k ] − E[x]E[x] .

4 K-means (40 Points) (Hao)

Given n data samples in X ⊆ Rd and an integer K, we showed in class that the K-means algorithm tries to
determine K clusters {Ck }K K d
k=1 with centers UK = {µk }k=1 ⊆ R , and a mapping function f : X → {1, · · · , K}

3
which assigns each xi ∈ X to one of the clusters, so as to optimize the following objective,
K nk Xnk
X 1 X
φ= kxki − xkj k2 (4)
nk i=1 j=1
k=1

where xki denotes the ith sample in Ck and nk is the number of data samples in Ck .

4.1 Theory
1. Prove the following Lemma.
Lemma 1. Given a set of points X ⊆ Rd with their center as x̄. For any point s,
X X
kx − sk2 − kx − x̄k2 = |X | · kx̄ − sk2 (5)
x∈X x∈X

2. Use Lemma.1 to prove that minimizing the objective in Eq.4 is equal to minimizing the following
objective:
K X
X n
ω(UK , f ; X ) = 1(f (xi ) = k)kxi − µk k2 (6)
k=1 i=1

3. Algorithm.1 presents how K-means proceeds. Show respectively that both Step 1 and Step 2 will
decrease the objective φ (or ω).

Algorithm 1: K-means Algorithm

1 Initialize {µk }K
k=1 (randomly, if necessary).
2 repeat
3 Step 1: Decide the class memberships of {xi }ni=1 by assigning each of them to its nearest cluster
center.
4 Step 2: For each k ∈ {1, · · · , K}, set µk to be the center of mass of all points in Ci :
Pnk
µi = n1k i=1 xki
5 until the objective no longer changes;

4. Let Ω(K) = minUK ,f ω(UK , f ; X ). Show that Ω is non-increasing in K.

5. In K-means (as in Algorithm.1), we terminate the iterative process when the objective no longer
changes. Prove that K-means terminates in a finite number of iterations.

4.2 Implementation
Now you are ready to implement K-means by yourself. A dataset including 2429 human faces is provded in
the file kmeans data.csv. Each of the 2429 lines in this file corresponds to a 19 × 19 image of a human face.
Every image is represented as a 361-dimensional vector of grayscale values, in column-major format.
1. Implement K-means algorithm, as detailed in Alogorithm.1. Your implementation should initialize
{µk }K
k=1 by uniformly randomly choosing from X . Compute the objective value in Eq.4 of each it-
eration. You K-means algorithm should be terminated when a given number of iterations M are
reached.

4
2. Run your implementation for 15 times, using k = 5, M = 50. Draw the objective v.s. iterations for all
15 runs in one plot. Have they converged? How many iterations does each iteration take to converge?
Choose the run with minimal objective value, compute the mean faces for this run, i.e., the centers of
the clusters. Visualize the mean faces.
3. Usually the clustering results by K-means can be greatly improved by carefully choosing an initialization
strategy. K-means++ is a randomized seeding technique which can improve both the speed and the
accuracy of K-means [1]. Algorithm.2 elaborates how K-means++ initializes the clustering centers
{µk }K
k=1 .

Algorithm 2: K-means++ Initialization

1 Take one center µ1 , chosen uniformly at random from X .
2
2 Take a new center µk (k > 1) from X , so that P r(µk = xi ) = PnD(xi ) 2 where D(x) is the distance
j=1 D(x j)

from x to its nearest center in {µk }j−1

k=1 .
3 Repeat the above step until all {µk }Kk=1 have been initialized.

Implement K-means++ based on your K-means implementation. Note that you need to implement a
sampler by yourself which samples from a multinomial distribution. Then, run your implementation
with K-means++ for 15 times, using k = 5, M = 50. Draw the objective v.s. iterations for all 15 runs
in one plot. How many iterations do they take to converge? Compute the mean faces for the run with
the minimal objective. Visualize the mean faces. Compare your curve and mean faces to your previous
ones. Conclude your observation.
Submit both the write-up and your code.

References
[1] D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the
eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial
and Applied Mathematics, 2007. 5

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (19)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
(Case Study) Zara Fast Fashion
83% (12)
(Case Study) Zara Fast Fashion
8 pages
Module 5-Part 1
No ratings yet
Module 5-Part 1
30 pages
sol3_2016
No ratings yet
sol3_2016
8 pages
10-601 Machine Learning: Homework 7: Instructions
No ratings yet
10-601 Machine Learning: Homework 7: Instructions
5 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
sol3_2020
No ratings yet
sol3_2020
5 pages
endsem_ML_regular_AK
No ratings yet
endsem_ML_regular_AK
7 pages
SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation
No ratings yet
SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation
11 pages
A Simple Proof of AdaBoost Algorithm
No ratings yet
A Simple Proof of AdaBoost Algorithm
4 pages
Foundations of Machine Learning: Boosting
No ratings yet
Foundations of Machine Learning: Boosting
41 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
ML PYQs
No ratings yet
ML PYQs
32 pages
ml-20240315
No ratings yet
ml-20240315
8 pages
Report 1
No ratings yet
Report 1
3 pages
T R Ik-Cl Ervor Er Kis: (Example)
No ratings yet
T R Ik-Cl Ervor Er Kis: (Example)
122 pages
Machine Learning Solutions
No ratings yet
Machine Learning Solutions
6 pages
ml-20230316-1
No ratings yet
ml-20230316-1
9 pages
Unit 2 - SVM
No ratings yet
Unit 2 - SVM
137 pages
HW 1
No ratings yet
HW 1
4 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Bagging, Boosting
100% (1)
Bagging, Boosting
32 pages
HW02 Sol - KNN DT
No ratings yet
HW02 Sol - KNN DT
8 pages
lapidot2018
No ratings yet
lapidot2018
5 pages
COMP 4211 - Machine Learning
No ratings yet
COMP 4211 - Machine Learning
19 pages
Module3
No ratings yet
Module3
26 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Machine Learning - The Science of Selection under Uncertainty
No ratings yet
Machine Learning - The Science of Selection under Uncertainty
85 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Quiz 1
No ratings yet
Quiz 1
5 pages
ML (1)
No ratings yet
ML (1)
2 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Lecture 3 Annotated
No ratings yet
Lecture 3 Annotated
44 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
sol3_2015
No ratings yet
sol3_2015
8 pages
Homework 0: Mathematical Background For Machine Learning
No ratings yet
Homework 0: Mathematical Background For Machine Learning
11 pages
Foundations of Machine Learning: Courant Institute and Google Research
No ratings yet
Foundations of Machine Learning: Courant Institute and Google Research
42 pages
Final F03soln
No ratings yet
Final F03soln
10 pages
MidA-F21
No ratings yet
MidA-F21
8 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
endsem_ML_makeup_AK-_1_
No ratings yet
endsem_ML_makeup_AK-_1_
7 pages
(Chapman
No ratings yet
(Chapman
69 pages
ML (1)
No ratings yet
ML (1)
6 pages
poly_aml
No ratings yet
poly_aml
76 pages
lecture01
No ratings yet
lecture01
11 pages
K-Means++: The Advantages of Careful Seeding: David Arthur and Sergei Vassilvitskii
No ratings yet
K-Means++: The Advantages of Careful Seeding: David Arthur and Sergei Vassilvitskii
11 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Ds 2
No ratings yet
Ds 2
27 pages
Lec 11
No ratings yet
Lec 11
57 pages
Adaboost Matas
No ratings yet
Adaboost Matas
136 pages
Final Exam Epfl 2020 Machine Leaning
No ratings yet
Final Exam Epfl 2020 Machine Leaning
16 pages
10 EST Solution
No ratings yet
10 EST Solution
16 pages
Hw1a Soln
No ratings yet
Hw1a Soln
5 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
EAD2019 Mdi Gurgon May 2019 Venkatesan
No ratings yet
EAD2019 Mdi Gurgon May 2019 Venkatesan
17 pages
%enterprise Architecture Modeling With SoaML Using BMM and BPMN - %
No ratings yet
%enterprise Architecture Modeling With SoaML Using BMM and BPMN - %
7 pages
A Decision-Making Support System For Enterprise Architecture Modelling
No ratings yet
A Decision-Making Support System For Enterprise Architecture Modelling
18 pages
Agile Enterprise Architecture Modelling
100% (1)
Agile Enterprise Architecture Modelling
11 pages
Comparative Analysis of AHP, FAHP and NeutrosophicAHP Based
No ratings yet
Comparative Analysis of AHP, FAHP and NeutrosophicAHP Based
26 pages
ERP Consultant Selection Problem Using AHP, Fuzzy AHP and ANP - A Case Study in Turkey
No ratings yet
ERP Consultant Selection Problem Using AHP, Fuzzy AHP and ANP - A Case Study in Turkey
12 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
OK - Minutes of The Meeting
100% (1)
OK - Minutes of The Meeting
2 pages
SMART Notebook Arabic - Compiled
100% (2)
SMART Notebook Arabic - Compiled
32 pages
CPCS202 01 Introduction S19
No ratings yet
CPCS202 01 Introduction S19
90 pages
Mid Term Examination Xii Eco 2022-23
No ratings yet
Mid Term Examination Xii Eco 2022-23
4 pages
Leadership Is An Essential Factor Affecting Business Success by
No ratings yet
Leadership Is An Essential Factor Affecting Business Success by
3 pages
02-878 Non Ferrous Pipe & Tube
No ratings yet
02-878 Non Ferrous Pipe & Tube
96 pages
Wage Distortion
No ratings yet
Wage Distortion
1 page
Swanirvar Rural Credit Project
No ratings yet
Swanirvar Rural Credit Project
9 pages
Holy Macro Books M Is for Data Monkey 1615470344 instant download
100% (1)
Holy Macro Books M Is for Data Monkey 1615470344 instant download
33 pages
Ep 27 - Driven - The Magnus Protocol Transcript
No ratings yet
Ep 27 - Driven - The Magnus Protocol Transcript
29 pages
The Compact: Easy Entry in The World of Manual Hardness Testing
No ratings yet
The Compact: Easy Entry in The World of Manual Hardness Testing
3 pages
Religare Securities LTD: A Project Report
No ratings yet
Religare Securities LTD: A Project Report
54 pages
Mole Fraction Volume Fraction
No ratings yet
Mole Fraction Volume Fraction
9 pages
LISA-Brochure 2023 Comp
No ratings yet
LISA-Brochure 2023 Comp
34 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
7 pages
Risk Assessment Template House
No ratings yet
Risk Assessment Template House
3 pages
Kolhapuri Chappal Craft Cluster Study
100% (2)
Kolhapuri Chappal Craft Cluster Study
54 pages
Glass Blocks Brochure
No ratings yet
Glass Blocks Brochure
4 pages
Practical Guide To Thermal Power Station Chemistry
No ratings yet
Practical Guide To Thermal Power Station Chemistry
12 pages
Formation and Functions of Sebi
No ratings yet
Formation and Functions of Sebi
32 pages
Het Model Set II
No ratings yet
Het Model Set II
2 pages
Teacher Practices and High School Chemistry Students Metacogniti
No ratings yet
Teacher Practices and High School Chemistry Students Metacogniti
197 pages
BallParam Batch
No ratings yet
BallParam Batch
43 pages

homework4_v1.0

Uploaded by

homework4_v1.0

Uploaded by

10-701 Introduction to Machine Learning

Homework 4, version 1.0 Due Nov 13, 11:59 am

1. Show that linear classifiers h(x) = 1{a> x+b≥0} in Rn has VC dimension n + 1.

2 AdaBoost (30 Points) (Xun)

– Run A on Dt and get ht .

Figure 1: Toy data for AdaBoost.

Table 1: AdaBoost results

3 Gaussian Mixture Model (10 Points) (Hao)

2. Show that Cov[x] = k πk [Σk + µk µ> >

4 K-means (40 Points) (Hao)

Algorithm 1: K-means Algorithm

4. Let Ω(K) = minUK ,f ω(UK , f ; X ). Show that Ω is non-increasing in K.

Algorithm 2: K-means++ Initialization

from x to its nearest center in {µk }j−1

You might also like