0% found this document useful (0 votes)
181 views

6.036 Cheat Sheet

The document discusses regression and gradient descent algorithms. It covers topics like training error, test error, step size, iterations, and direct solutions versus using automatic differentiation. It also mentions feature scaling, encoding techniques like one-hot and bag-of-words, and activation functions like sigmoid and tanh that are used in neural networks.

Uploaded by

Haihao Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
181 views

6.036 Cheat Sheet

The document discusses regression and gradient descent algorithms. It covers topics like training error, test error, step size, iterations, and direct solutions versus using automatic differentiation. It also mentions feature scaling, encoding techniques like one-hot and bag-of-words, and activation functions like sigmoid and tanh that are used in neural networks.

Uploaded by

Haihao Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

REGRESSI

654 Go X11611 s O XX ly y
Jridgel6Oo s
t y
É bridge XX'tnXI Xy
Jridge 6,60 x x6 y
Z Egg g
2 6
g
TrainingError En h IE had y
Test Error Ecu
Ej hoco yo

GRADIENT DESCENT

to
to be convex toconverge globalmin
f needs convex localminimum
I if f nonupdate ruleforleastsquares
I
µ Is stepsizeisfun
Point fi is pickedrandomly
saveruntimeandmemory
likelihood Loss
Negativelog

Etangoyay
iterations ToJw II gli gli x xp Tooptimize
stopwhen Iget get e I 6
stopwhen

311
Tooptimize
s lg y o
Direct soli vs AD
Advantages no more matrix inversion seelab
Trae off have to look at u datapoints every step
is low we should stilluse direct sola
maybewhen d
can be linear butinput size can be huge

if res as 1
yet
function
go guys
if oct a o s l
hypothesis is sigmoid

x
T.co a D
tÉk
fEsIYt y
FEATURE.gg zq scale so that s are closetogether

communicates order but not


11 ooo 11 too
Thermometer code
yegg Oo too communicatesdiscretevalues
spacing

l oooo 01000
One notencoding when there's no order

Transforming numericvalues 464 É standardizing


use Bag of words
withtextday
0 I everywhere

1 230
d
2 pm
gg k i o Eso
o o

ft
i z o
z Rew function to 230

4 Sigmoidfunction fee Iet co l oxct o


www.o.ge

5 Hyperbolic tan tanh z


ÉÉ Gl 1
Activation
Typeofproblem Functions Lossfunction
Input x size m xl m d detf.int
Binary softmax
Output A size n x1 vetted
Multiclass softmax NLL The ith label ait f was A t wa
classification All together A
f w betwe
A becomesinput for layer z
a
2
2 s a
zig a

ing
input
final
prediction
zj AM
s activation
forprod Ecgivation
Ictivation

Yj output layer
y

You might also like