0% found this document useful (0 votes)

117 views6 pages

hw07 Neural Soln PDF

This document contains the solution to homework 7 on neural networks for an introduction to machine learning course taught by Prof. Sundeep Rangan and Yao Wang. It includes: 1. Details on the linear functions and activation functions for the hidden layer of a neural network. 2. Explaining the hidden and output layers of a neural network for regression, including the loss function. 3. Discussing solving the least squares problem to find the weights using linear regression on the hidden unit activations.

Uploaded by

Yasmine A. Sabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views6 pages

hw07 Neural Soln PDF

Uploaded by

Yasmine A. Sabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Introduction to Machine Learning

Homework 7: Neural Networks

Prof. Sundeep Rangan and Yao Wang

Solution

1. (a) The linear functions in the hidden layer are:

     
1 0 1   0 x1 + x3
 0 1 1  x1  0  
  x2 + x3 
H H
z =W x+b = H   x2  +  = 
 1 1 0   −1   x1 + x2 − 1 
x3
1 1 1 1 x1 + x2 + x3 + 1
Hence, the activation functions are
   
gact (x1 + x3 ) 1{x1 +x3 ≥0}
 gact (x2 + x3 )   1{x +x ≥0} 
2 3
uH = gact (zH ) =  =
  1{x +x −1≥1}  .

 gact (x1 + x2 − 1) 1 2
gact (x1 + x2 + x3 + 1) 1{x1 +x2 +x3 +1}

For example, the region where uH1 = 1 is described by x1 + x3 ≥ 0.

(b) The output z O is
 
1{x1 +x3 ≥0}
 1{x2 +x3 ≥0} 
z O = W O uH + bO = [1, 1, −1, −1]   − 1.5
 1{x1 +x2 ≥1} 
1{x1 +x2 +x3 ≥−1}
= 1{x1 +x3 ≥0} + 1{x2 +x3 ≥0} − 1{x1 +x2 ≥1} − 1{x1 +x2 +x3 ≥−1} − 1.5.
ŷ = 1 lie in the region z O ≥ 0. The visualization of this region is not required.
2. (a) Since WH has three rows, the number of hidden units is Nh = 3. The outputs zH in the
hidden layer are,
     
−x −1 −x − 1
zH = WH x + bH =  x  +  1  =  x + 1 .
x −2 x−2
So, the outputs after ReLU activation are,
 H   
u1 max{0, −x − 1}
uH =  uH2  =  max{0, x + 1}  .
uH3 max{0, x − 2}
The functions are plotted in 1.

1
uHj
uH2
3
uH3
2
uH1
1

x
-3 -2 -1 1 2 3 4

Figure 1: Problem 2(a). Hidden layer activations, uHj vs. x for j = 1, 2, 3.

(b) Since the network is for regression, you can take

ŷ = gout (z O ) = z O .

One possible loss function is the squared error,

N
X
L= (ŷi − yi )2 .
i=1

(c) Note that the output layer can be thought of as a linear regressor with input being the
hidden layer activations uH and output being ŷ. Let U be the data matrix where the i-th
row contains [1, (uHi )T ] = [1, uHi,1 , uHi,2 , uHi,3 ], w̃T = [bO , (WO )T ], and y = [y1 , y2 , . . . , yN ]T ,
the problem is to solve the least squares problem

Minimize kU w̃ − yk2

The analytical solution can be expressed as

w̃ = (U T U )−1 U T y

From given xi , we can first determine hidden layer outputs zHi and uHi . We can compute
manually based on the equations given in Part (a). We can also use a Python code to
do so.
Table below lists the corresponding values:
xi -2 -1 0 3 3.5
         
1 0 −1 −4 −4.5
zHi −1 0 1 4  4.5 
−4 −3 −2 1 1.5
         
1 0 0 0 0
uHi 0 0 1 4 4.5
0 0 0 1 1.5
yi 0 0 1 3 3

2
The data matrix is thus  
1 1 0 0
1 0 0 0
 
1
U = 0 1 0 
1 0 4 1
1 0 4.5 1.5
The target vector is  
0
0
 
1
y= 
3
3
The solution is  
0
0
w̃ = 
1


−1
or  
0
bO = 0, WO =  1 
−1
The python code for computing the hidden layer outputs and for determining the least
squares solution is given below.
import numpy as np
Wh = np.array([ −1,1,1])
bh = np.array([−1,1,−2])
x = np.array([ −2, −1,0,3,3.5])
y = np.array ([0 ,0 ,1 ,3 ,3])
zh = x[:, None]∗ Wh[None ,:] + bh[None ,:]
Uh = np. maximum (0,zh)
U = np. hstack ((np.ones ((5 ,1)) , Uh))
w tilde = np. linalg .lstsq(U, y)[0]
bo = w tilde [0]
Wo = w tilde [1:]

(d) We can use the python function predict defined in the next subproblem to compute ŷ
for x in the range of [−3, 4]. The resulting curve is shown below.
x = np. linspace(−3,4)
yhat = predict (x,Wh ,bh ,Wo ,bo)

3
ŷ

x
-3 -2 -1 1 2 3 4

Figure 2: Problem 2(c). Output for the training data.

(e) We represent Wh,Wo,bh as vectors and bo as a scalar. Then, we can write the predict
function as:
def predict (x,Wh ,bh ,Wo ,bo):
zh = x[:, None]∗ Wh[None ,:] + bh[None ,:]
uh = np. maximum (0, zh)
yhat = uh.dot(Wo) + bo
return yhat

Note the use of python broadcasting.

3. (a) We simply add the index i to all the terms:
Ni
X
zij = Wjk xik + bj , uij = 1/(1 + exp(−zij )), j = 1, . . . , M,
k=1
PM (1)
j=1 aj uij
ŷi = PM ,
j=1 uij

(b) The computation graph is shown in Fig. 3.

xi zi ui ŷi L

W, b a

Figure 3: Computation graph for Problem 3 mapping the the training data (xi , yi ) and parameters
to the loss function L. Parameters are shown in light blue and data in light green.

(c) The gradient is as follow

∂L
= −2(yi − ŷi )
∂ ŷi

4
for all i = 1, · · · , N .
(d) We first compute the partial derivative ∂ ŷi /∂uij . We rewrite the equation for ŷi as,
PM
al uil
ŷi = Pl=1
M
. (2)
l=1 uil

Note that before taking the derivative with respect to uij we had to rewrite the sum in
(2) with the index l so that it is not confused with the index j of the variable uij .
∂ ŷi
Now, we use chain rule to find the derivative ∂uij as
P PM P PM
M ∂ l=1 al uil M ∂ uil
∂ ŷi l=1 uil ∂uij − l=1 al uil
l=1
∂uij
= 2
∂uij
P
M
l=1 uil

Note that PM
∂ l=1 al uil
= aj
∂uij
and PM
∂ l=1 uil
=1
∂uij
Therefore, PM
∂ ŷi aj al uil
= PM − Pl=1 2
∂uij l=1 uil
M
u
l=1 il

If we know ∂L/∂ ŷi for all i, then ∂L/∂uij (for all i, j) is computed as
 
PM
∂L ∂L ∂ ŷi ∂L  aj al uil 
= =  PM − Pl=1 2  .
∂uij ∂ ŷi ∂uij ∂ ŷi l=1 uil
M
u
l=1 il

The derivative can be simplified further, but it is not necessary.

(e) Note that
∂uij exp(−zij )
= .
∂zij (1 + exp(−zij ))2
Given that ∂L/∂uij is known, we compute the gradient ∂L/∂zij as

∂L ∂L ∂uij ∂L exp(−zij )
= =
∂zij ∂uij ∂zij ∂uij (1 + exp(−zij ))2

(f) We first rewrite the sum in zij using index `, so that it is not confused with index k.
Ni
X
zij = Wj` xi` + bj
`=1

5
Taking the partial derivatives,
∂zij
= xik
∂Wjk
and
∂zij
= 1.
∂bj
Now, given ∂L/∂zij , we can compute the gradient ∂L/∂Wjk using multivariate chain rule
(Note that L is a multivariate function of N single variable functions z1j , z2j , . . . , zN j )
N
X ∂L ∂zij N
∂L X ∂L
= = xik .
∂Wjk ∂zij ∂Wjk ∂zij
i=1 i=1

Similarly ∂L/∂bj is computed using multivariate chain rule,

N
X ∂L ∂zij N
∂L X ∂L
= = .
∂bj ∂zij ∂bj ∂zij
i=1 i=1

(g) Put all this together, we get

 
N PM
∂L X  aj al uil  exp(−zij )
= −2(yi − ŷi )  PM − Pl=1 2  xjk (3)
∂Wjk
i=1 l=1 uil
M
u (1 + exp(−zij ))2
l=1 il

 
N PM
∂L X  aj al uil  exp(−zij )
= −2(yi − ŷi )  PM − Pl=1 2  (4)
∂bj
i=1 l=1 uil
M
u (1 + exp(−zij ))2
l=1 il

(h) Assume u is a matrix and dloss dyhat is a vector. Then, we can compute the gradients
via python broadcasting:
usum = np.sum(u,axis =1)
uasum = np.sum(u ∗ a[None ,:], axis =1)
dyhat du = a[None ,:]/ usum [:, None] − uasum /( usum ∗∗2)
dloss du = dloss dyhat [:, None] ∗ dyhat du

Cours 2
No ratings yet
Cours 2
25 pages
DL03 Classroom SNN
No ratings yet
DL03 Classroom SNN
41 pages
Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
Neural Network Backpropagation Practice
No ratings yet
Neural Network Backpropagation Practice
9 pages
Day1 06 Simple NN Python
No ratings yet
Day1 06 Simple NN Python
18 pages
NN Theory
No ratings yet
NN Theory
138 pages
AI2025 Lecture08 Recording Slide
No ratings yet
AI2025 Lecture08 Recording Slide
38 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Da 3 Lab DL 21BCE2687
No ratings yet
Da 3 Lab DL 21BCE2687
15 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
Linear Regression & Optimization Techniques
No ratings yet
Linear Regression & Optimization Techniques
42 pages
TUM I2DL Matrix Derivatives
No ratings yet
TUM I2DL Matrix Derivatives
8 pages
Backward Forward Propogation
No ratings yet
Backward Forward Propogation
19 pages
Tutorial On Neural Networks - 18MAR2024
No ratings yet
Tutorial On Neural Networks - 18MAR2024
33 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Python Linear Regression Guide
No ratings yet
Python Linear Regression Guide
23 pages
T243 COE 292 Quiz04 Concept
No ratings yet
T243 COE 292 Quiz04 Concept
7 pages
Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
Physics Informed Neural Networks For Numerical Analysis
No ratings yet
Physics Informed Neural Networks For Numerical Analysis
16 pages
NN Unit 3
No ratings yet
NN Unit 3
68 pages
Non-Linear Models Explained
No ratings yet
Non-Linear Models Explained
8 pages
Neural Network Backpropagation Guide
No ratings yet
Neural Network Backpropagation Guide
9 pages
Soft Computing Test Analysis
No ratings yet
Soft Computing Test Analysis
10 pages
Vectorized Neural Network Gradients
No ratings yet
Vectorized Neural Network Gradients
7 pages
Vectorized Neural Network Gradients
No ratings yet
Vectorized Neural Network Gradients
67 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Polynomial Regression Blogpost
No ratings yet
Polynomial Regression Blogpost
8 pages
CH - En.u4cse19101 Cheduri Linearregression
No ratings yet
CH - En.u4cse19101 Cheduri Linearregression
8 pages
Neural Network XOR Gate Implementation
No ratings yet
Neural Network XOR Gate Implementation
12 pages
DLassignment
No ratings yet
DLassignment
6 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
16 pages
ML TW-PW 02-2
No ratings yet
ML TW-PW 02-2
9 pages
Derivations For Back Propagation of Multilayer Neural Network
No ratings yet
Derivations For Back Propagation of Multilayer Neural Network
14 pages
Machine Learning & Python Techniques
No ratings yet
Machine Learning & Python Techniques
59 pages
L3 Backpropagation
No ratings yet
L3 Backpropagation
61 pages
Happymonk Test Paper For Data Scientist Intern
No ratings yet
Happymonk Test Paper For Data Scientist Intern
2 pages
FAI 4 Mathematical Concepts II
No ratings yet
FAI 4 Mathematical Concepts II
39 pages
Neural Networks Skimmed - Ipynb - Colab
No ratings yet
Neural Networks Skimmed - Ipynb - Colab
8 pages
EECS 16B Homework 14 Solutions
No ratings yet
EECS 16B Homework 14 Solutions
22 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
H2 AndresAlcivar
No ratings yet
H2 AndresAlcivar
12 pages
Week 1
No ratings yet
Week 1
6 pages
Autodiff
No ratings yet
Autodiff
12 pages
AI2025 Lecture10 Recording Slide
No ratings yet
AI2025 Lecture10 Recording Slide
46 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Linear Regression with Gradient Descent
100% (1)
Linear Regression with Gradient Descent
8 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
P 3
No ratings yet
P 3
1 page
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Deep Learning Assignment 2 Solutions
No ratings yet
Deep Learning Assignment 2 Solutions
8 pages
Neural Networks & Deep Learning 2025
No ratings yet
Neural Networks & Deep Learning 2025
73 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Day 1
No ratings yet
Day 1
41 pages
Logistic Regression - Update - 2
No ratings yet
Logistic Regression - Update - 2
60 pages
Machine Learning Solutions Guide
No ratings yet
Machine Learning Solutions Guide
45 pages
Information 13 00330 v2 PDF
No ratings yet
Information 13 00330 v2 PDF
28 pages
Understanding Logistic Regression
No ratings yet
Understanding Logistic Regression
41 pages
Lecture 07 W23
No ratings yet
Lecture 07 W23
27 pages
Steps in Scientific Investigation Guide
No ratings yet
Steps in Scientific Investigation Guide
33 pages
Tutorial
No ratings yet
Tutorial
4 pages
Processing For Improved Spectral Analysis
No ratings yet
Processing For Improved Spectral Analysis
5 pages
Differential Calculus Course Outline
No ratings yet
Differential Calculus Course Outline
4 pages
Low-Fat Ice Cream with Carob Molasses
No ratings yet
Low-Fat Ice Cream with Carob Molasses
5 pages
RPH Sains Tingkatan 4: Minggu 1
No ratings yet
RPH Sains Tingkatan 4: Minggu 1
2 pages
MUSCOVADO Production
No ratings yet
MUSCOVADO Production
2 pages
Clinical Oncology For Students
100% (1)
Clinical Oncology For Students
132 pages
JTY GD A30E Addressable Smoke Detector
No ratings yet
JTY GD A30E Addressable Smoke Detector
5 pages
Periodic Table of Elements Overview
No ratings yet
Periodic Table of Elements Overview
1 page
Ydnekachew Adane
No ratings yet
Ydnekachew Adane
88 pages
Sample Research Proposal
No ratings yet
Sample Research Proposal
9 pages
Thecho
No ratings yet
Thecho
60 pages
Study Plan-Eng - Docx - 20250510 - 154735 - 0000
No ratings yet
Study Plan-Eng - Docx - 20250510 - 154735 - 0000
6 pages
Anti-Skid Epoxy Coating Guide
No ratings yet
Anti-Skid Epoxy Coating Guide
5 pages
SPE-193512-MS Evaluation of A Depleted Oil Reservoir For Gascap Blowdown Using 3D Simulation Models: A Case Study of Chrome Field Development
No ratings yet
SPE-193512-MS Evaluation of A Depleted Oil Reservoir For Gascap Blowdown Using 3D Simulation Models: A Case Study of Chrome Field Development
19 pages
15.modular Electric Vehicle Platforms
No ratings yet
15.modular Electric Vehicle Platforms
13 pages
Year 5 Operations Assessment Sheet
No ratings yet
Year 5 Operations Assessment Sheet
6 pages
Solo Parenting Challenges in Simuay
No ratings yet
Solo Parenting Challenges in Simuay
46 pages
JST Vol. 30 (1) Jan. 2022 (View Full Journal)
No ratings yet
JST Vol. 30 (1) Jan. 2022 (View Full Journal)
904 pages
A Roadside Stand Textual Questions and Answers
No ratings yet
A Roadside Stand Textual Questions and Answers
13 pages
Soalan Assignment Tahun 2022
No ratings yet
Soalan Assignment Tahun 2022
11 pages
Syphilis Rapid Test MSDS Overview
No ratings yet
Syphilis Rapid Test MSDS Overview
3 pages
GMT SE REF 00191 - A - GMT Electronics Standards 1
No ratings yet
GMT SE REF 00191 - A - GMT Electronics Standards 1
33 pages
Spine & Wing
No ratings yet
Spine & Wing
38 pages
Formant Frequencies of 13 Accents of The British Isles Ferragne Pellegrino 2010
No ratings yet
Formant Frequencies of 13 Accents of The British Isles Ferragne Pellegrino 2010
34 pages
Tribute to My Selfless Father
No ratings yet
Tribute to My Selfless Father
2 pages
Concrete Strength ML Presentation
No ratings yet
Concrete Strength ML Presentation
10 pages
Acid-Base Titration Guide
No ratings yet
Acid-Base Titration Guide
32 pages
Independent Examination of The Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) : What Does The WAIS-IV Measure?
No ratings yet
Independent Examination of The Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) : What Does The WAIS-IV Measure?
10 pages

hw07 Neural Soln PDF

Uploaded by

hw07 Neural Soln PDF

Uploaded by

Introduction to Machine Learning

Homework 7: Neural Networks

1. (a) The linear functions in the hidden layer are:

For example, the region where uH1 = 1 is described by x1 + x3 ≥ 0.

Figure 1: Problem 2(a). Hidden layer activations, uHj vs. x for j = 1, 2, 3.

(b) Since the network is for regression, you can take

One possible loss function is the squared error,

The analytical solution can be expressed as

Figure 2: Problem 2(c). Output for the training data.

Note the use of python broadcasting.

(b) The computation graph is shown in Fig. 3.

(c) The gradient is as follow

The derivative can be simplified further, but it is not necessary.

Similarly ∂L/∂bj is computed using multivariate chain rule,

(g) Put all this together, we get

You might also like