Lecture9 Dropout Optimization Cnns
Lecture9 Dropout Optimization Cnns
Announcements:
• HW #3 is due tonight. To submit your Jupyter Notebook, print the notebook to a pdf
with your solutions and plots filled in. You must also submit your .py files as pdfs.
• Midterm exam review session: Thursday, Feb 16, 6-9pm at WG Young CS50.
• All past exams are uploaded to Bruin Learn (under “Modules” —> “past exams”).
This year, we will allow 4 cheat sheets (8.5 x 11” paper) that can be filled front and
back (8 sides total). The exam is otherwise closed book and closed notes.
Feb 15
Wednesday,
Prof J.C. Kao, UCLA ECE
Prof J.C. Kao, UCLA ECE
Dropout
0.5
/
:
100 neurons .
100
Draw
M =
.
BernoulliR.V.'s
w.p. O
Prof J.C. Kao, UCLA ECE
Dropout
13
1R
(i
Mask
-IR13
N units
2N possible configurations.
Prof J.C. Kao, UCLA ECE
Dropout
Dropout in code.
↑
40 p
0.5
=
hy P
ter
2
(w>hc wphp)
187 hour =rel +
m =
XTEST:
hout rel
=
Over many
iterations, the contribution ofwith;
boat was Nichi.
to
p.
Prof J.C. Kao, UCLA ECE
Dropout
We call this approach the weight scaling inference rule. There is not yet any
theoretical argument for the accuracy of this approximate inference rule in
deep nonlinear networks, but empirically it performs very well.
In this class, instead of scaling the weights, we’ll scale the activations.
22
Thus, testing looks the same irrespective of if we use dropout or not. See
code below:
h.
4,24
-
But what about the optimizer, stochastic gradient descent? Can we improve
this for deep learning?
In this lecture, we’ll talk about specific techniques in optimization that aid in
training neural networks.
Reading:
In this lecture, we talk about how to make optimization more efficient and
effective.
- Parameters: ✓
✓ ✓ ✏r✓ J(✓)
Beale's function.
·
9.
-> 93
②
94
v. 0
so
=
v,
=
-
59,
v
= -
=
v (vc
= -
59s =
-
a(ag, agz
+
+
gs)
1(x39, ag2 19s 9+)
+
vy
+
= +
-
#-
I
-- 1
-
classical momentum.
<v
v <
-
1485(0)
Annealing.
(i
a
params
zIRM =
gu
pl 19
- >
W2
↑qwz
-
NI
a, a,
=
g,z
+
Ppc 0
=
90.99.999
=
0.01.g2
+
1st
t- A
510) J10t) (0
=
+
-
0t) DoJ10t)
Pt 0t
29
=
-
+ 1
3(8t 1) 3(0t)
+
=
(87
+
-
5g
-
8z)bt)
=
J(8t) -
3gig ↓
mo
>0
~
~
·
J(O)
-
large step
↑V
&