hw2 5
hw2 5
Solution: It is not hard to see that the set of 3 points with coordinates
(1, 0), (0, 1), and (−1, 0) can shattered by axis-aligned squares: e.g.,
to label positively two of these points, use a square defined by the
axes and with those to points as corners. Thus, the VC-dimension is
at least 3. No set of 4 points can be fully shattered. To see this, let
PT be the highest point, PB the lowest, PL the leftmost, and PR the
rightmost, assuming for now that these can be defined in a unique way
(no tie) – the cases where there are ties can be treated in a simpler
fashion. Assume without loss of generality that the difference dBT of
y-coordinates between PT and PB is greater than the difference dLR
of x-coordinates between PL and PR . Then, PT and PB cannot be
labeled positively while PL and PR are labeled negatively. Thus, the
VC-dimension of axis-aligned squares in the plane is 3.
2. Consider right triangles in the plane with the sides adjacent to the
right angle both parallel to the axes and with the right angle in the
lower left corner. What is the VC-dimension of this family?
Solution: It is not hard to see that the set of 3 points with coordinates
(0, 0), (−1, −1), (−2, −2), and (−3, −3) can shattered by such trian-
gles. To see that no five points can be shattered, the same example
or argument as for axis-aligned rectangles can be used: labeling all
points positively except from the one within the interior of the convex
hull is not possible (for the degnerate cases where no point is in the
interior of the convex hull is simpler, this is even easier to see). Thus,
the VC-dimension of this family of triangles is 4.
1
B. Growth function bound
Solution: Following the proof given in class and using Jensen’s in-
equality (at the last step), we can write:
" σ g(z ) #
1 ..1 .. 1
b m (G) = E sup
R . · .
S,σ g∈G m σm g(zm )
"√ p #
m 2 log |{(g(z1 ), . . . , g(zm )) : g ∈ G}|
≤E (Massart’s Lemma)
S m
"√ p #
m 2 log Π(G, S)
=E
S m
√ p r
m 2 log ES [Π(G, S)] 2 log ES [Π(G, S)]
≤ = .
m m
2
The output of the neural network for a given input vector (x1 , . . . , xn )
is obtained as follows. First, each of the n input nodes is labeled with the
corresponding value xi ∈ R. Next, the value at a node u in the higher layer
and labeled with c is obtained by applying c to the values of the input nodes
admitting an edge ending in u. Note that since c takes values in {0, 1}, the
value at u is in {0, 1}. The value at the top or output node is obtained
similarly by applying the corresponding concept to the values of the nodes
admitting an edge to the output node.
1. Let H denote the set of all neural networks defined as above with
k ≥ 2 internal nodes. Show that the growth function ΠH (m) can be
upper bounded in terms of the product of the growth functions of the
hypothesis sets defined at each intermediate layer.
3. Let C be the
P family of concept classes defined by threshold functions
C = {sgn( rj=1 wj xj ) : w ∈ Rr }. Give an upper bound on the VC-
dimension of H in terms of k and r.
3
Figure 1: A neural network with one intermediate layer.