💡
lecture 3
important - tanish, on kaggle datasets
https://forums.fast.ai/t/lesson-3-official-topic/96254/9
1.0-Tips
learning tips by tanisq
learning tips by me
asking question about the lecture and what you learned:
how can i use it?
what did i actually learned?
what can i do with that information
will help you on how to think!
2.0-Resources
Important resources
https://forums.fast.ai/t/lesson-3-notebooks/104821
https://forums.fast.ai/t/lesson-3-official-topic/96254
seeting fast ai in mac
https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/4
lecture 3 1
https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/3
math resources
derivitative - really important to understand the concept:
https://www.khanacademy.org/math/differential-calculus/dc-diff-intro
3.0 - Missions
missions list
4.0- Book
chapter 4 - first iteraiton
showing whats inside the path, the actual folder
Everything in a computer is a number
lecture 3 2
so the idea - just lets look at a specific section of the image.
everything is number , so we view the number representation of the
section of the image
so we look at a sction startin 4 pixels from the top and left - so we view
it as a numpy array, which is a numbered representation of the image
lecture 3 3
gradient
pixels
so the image as total of 768 pixels
lecture 3 4
Tensor
lecture 3 5
lecture 3 6
dimensions
a stack - think about a “ “ שטוחה- meaning, each matriexes is “”שטוחה, and
they stack on each other
lecture 3 7
Metric
Over fit
lecture 3 8
lecture 3 9
Stochasit graident descent
testing the effectivnes of any current weight assigment in terms of
preformance, and provide a mechanisem for altering the weight assigment
to improve prefromance - and do that automatic, so a mchine would learn
from it expirence
lecture 3 10
weights and pixels
lecture 3 11
lecture 3 12
lecture 3 13
lecture 3 14
Derivitative
Lets assume i have a loss function ok, which depends upon a paramater
so our loss function is the quadratic function, yes?
than, because its our loss function, wetry some arbitrary input to it, and
see what is the result - what is the loss value
now, we would like to adujst the paramaterby alittle bit, and see what
happens
so its the same as the slope - think of it, like you change it,
lecture 3 15
book - baidc idea
no matter how complex the nerual functions became, the basic idea is still
simpler
This basic idea goes all the way back to Isaac Newton, who pointed out that
we can
optimize arbitrary functions in this way. Regardless of how complicated our
functions
become, this basic approach of gradient descent will not significantly change.
The
only minor changes we will seess later in this book are some handy ways we
can make it
faster, by finding better steps.
Using calculaus
just allow us to calculate quickly how the change in paramaters affect the
loss value
lecture 3 16
This is dervitateive
how much a change itn its paratmer, will change the result of the original
function
dervitative calculate the change, rather than value
For instance, the
derivative of the quadratic function at the value 3 tells us how rapidly the
function
changes at the value 3
You may remember from your high school calculus class that the derivative of
a func‐
tion tells you how much a change in its parameters will change its result.
this is calculuas - if we know how the function will change, the loss
function, we know what will need to do to the paramtaers to make the
loss value smaller
we calculate b ased on every weight
lecture 3 17
so just calculate the derivtative for one wieght (parmataer), and treat all
other paramaters as constant, as not changed
than do the rest for all of it
caclulating gradients
calculating the gradients for the paramaters values at a single point of
time
basiclly just claculating all the current parmaters gradient
Learning rate
we get the gradients - are the slope of the function, if its very large, we
have more adjusment
lecture 3 18
Distingiush
inputs - wht inputs of the function
output - relationships
parmatares - are the parmaters of your model.
they are basiclly
define your function
the paramaetrs are you function signature
the input always changeing
important
Loss function
lecture 3 19
The process
1. start with a function that is your model, with a given praramtaers
2. intialize the paramaters
3. calculate the prediction for your function for your inputs with the current
paramater
a. than, you will have the prediction for your inputs (the valdation set),
with the current paramaters
b. of course, you could see how far off you
4. than, see how far your preditions are from your actuals using the loss
function
a. calculate the loss
5. calculate teh gradients to improve the loss
6. steps the paramaters (change them based on the learning rate)
The main idea
we started with syntetic function, and plotting the data points
this is our actuals, so we have the “oridinal” function, that we use to
create the data points
we input the data points to the function and plot the output
than we create a function - this in real life will be the neural function, but
now, lets assume its a quadratic
than we caclulate the prediction based on the function we create (our
model), and plotting the prediction, and see if it is close to our actuals -
lecture 3 20
if the shape of the prediction close to our actuals look (patterns)
than, we improved the loss
the goal is to find the best possible quadratic
that when we insert the input to it
the output will actually resemble the output of the actuals
so you do that on the validation set so you actually have the
output of the actuals, you can plot it
and then, for the model, you can see each time, you can plot the
prediction, and see it the predictions pattern / shape is simmilar to
the actual shape
and you will improve it each time, so at the end, you will have a
simmilar function shape for the predictions as of the actuals
summary
at first output from our input wont have anything to do with what we want
lecture 3 21
using the label data
the model gives us outputs
we compare them with the targets (we have the labels for those inputs)
we do that using the loss function (mse), this is how we compare,
using the predictions vs actuals
than we calculat the loss value - how wrong our prediction from the
actuals where
then we improve the model by changing weidths
5.0- Lecture
tips how to learn
lecture 3 22
Run notebooks and expirement:
run every code cell from the book - expirementing with different
inputs/outputs
reproduce results - use the clean
what is it for - whats it is going to output, if anything
repeat with different dataset
lecture 3 - first iterations
main concepts
training piece → than gets the model
feed the model inputs and it spits output based on the model
error rate- accuracy
Queastions
lecture 3 23
what does it mean “inference” seconds in training a model
Model.pkl
fit function to data
Derivitative
if you increase the input, the output increase /decrease - basiclly, the slop
of the function
Tensor
works with a list, with anything basiclly.
Optimization
gradienct desecent - caluclate. the gradient(paaramaters), and decrease
the loss
This is deep learning - deep learning is a
metaphor for life → do one iteration, and
improve over time.
just do this
then optimize
lecture 3 24
values adding together
gradient desecent to optimize the paramaters
and samples of inputs and outputs you want
the computer draws the owl
using gradient descenet, to set some parametrs, to make a wiggle
functions, which is addition of vectors?
lecture 3 25
model choosing is the last thing
once you clean , gather the data, and augmented it
you can reason yourself about a model - depends on the task
you need it the most accurate? the fastest? etc
train the model first!
fit function to data
we start with a flexible function (neural network), and we get it to do a
praticualr thing - recognize the pattern in our data
so the idea is to find a function that fit our data? so neural networks is
just a function
loss function - a measure of how good the function
for each paramater - if we move it up, is it makes it better? or
not?
Derivitative - if you increaes the input
אז הנגזרת תיהיה גדולה, בהתחלה השיפוע נניח מאוד גבוה
הנגזרת מתקרבת לאפס,ככל שהשיפוע יקטן
השיפוע גבוה או לא, האם בערך עצמו,אז הנגזרת בעצם תגיד לכל ערך
slope === gradient
how rapidaly the function change at value 3 - כלומר בערך של
הפרמטר הזה
?האם הפונקציה משתנה מהר
Our goal - is to make the loss small, we want to make the loss
smaller
lecture 3 26
if we know how our function will change -
we have a paramater, and the function, that tell us how rapidly the
function change in this paramater
our goal is to make the loss smaller
so, we will change the paramataer a bit
and see if it is make our loss function output a better
lecture 3 27
the magnitue tell us that at this point , the slop is fairly steep, so
the loss change significant when we adjust w - each time we
adjust w, the output change significatly
so lets remove by some value * the slope, and see what happens,
lecture 3 28
why use the slope?
because, the slope will allow us to undersatnd how much of a
step we can take:
in big slope, it means, that it changes very rapidally, so we
might take smaller step
on more general slope, we should take bigger step,
because the change is so small each adjusmet, so we can
take bigger step
lecture 3 - firs iteration
What we did so far
the goal - we just did detector / calssifier
trainning pipece and model.pkl
model → you feed input and spits output
lecture 3 29
Paramaters
where are they come from, and how they helped us
machine learning models → fit functions into data
start with flexible function, and get her to refognize the patterns in the
data we give it
start with a function, yes?
then the goal - is to find the most appropritate function to our data -
so we will test different function, to try the best one
so here - we have the abillity to create any quadratic function, and we will
look for the best one for our data
So the idea:
we try to map the data to a shape of a function - and try to make basiclly
the function describe the data as much as possible, with little noise as
possible
the goal - find the best function that match the data we have!!!!!
the steps
lecture 3 30
1. plot your data
2. plot the starting function
3. try to change the paramaters, and see if the function describe (or fit) your
data better
a. change the paramaters: you can increase or decrease and see what
improving our function
increase the paramater
decrease the paramater
then reitrate, for all the paramaters again, until you find the best
paramater value
The question
if i move the paramater, does the model improve or disimprove?
than we need to have a meausrment of how good is the model - to
know the effect of the parmaters on the model
this is called loss functions - the value that it output, is the loss valuue
- how good the model is.
so the goal - try to find the paramaters that reduce the loss value - that
improve the model
The derivitative:
if i increase the paramater - does the loss get better or worse?
the loss - we want smallest loss as possible.
the derivitative - if you increasae the input, the output increase or
decrease, and by how much (the slop/gradient )
so this is the idea:
we want the smallest lost
we get the paramaters,insert them to the loss function and get the
loss value with this paramaters - the measurment for how good
lecture 3 31
our model FOR THIS PARAMATERS
then we want to adjust the paramaters, so we will check:
if i increase paramater “a” → the loss is a function, so if i
increase paramater “a”, does the loss value improve or not
improving?
we adjust the paramaters, it make the loss value go down or
up?
but the question is, how the paramater, connected to the slop
of the function?
so the biggerst question:
how the paramaters, are working with the loss function?
like, how the paramater, like we have million paramaters
how does the loss function connected to it?
ok i understand if the slope of this single paramater is negative,
than we will need to increase it to go to the minimum point
but if f’(x) is the derivtative of f(x) which is the loss function
we put the paramater in the derivtative of the loss function?
why?
lecture 3 32
loss function - depends upon the paramaters
lecture 3 33
Clarfinyg questions
lecture 3 34
lecture 3 35
lecture 3 36
paramaters vs values
conclusion - most important
lecture 3 37
loss function - measure erros in this outputs!
lecture 3 38
What is the function
what is the function you find paramaters for?
this is the main idea - values getting added together, and
gradient descent optimize the paramaters and sample of inputs
and outputs you want: 49:00
Why small number increase
for each increase in the respecitve paramater by one unit, what
will happen to the loss?
everything is about fit functions to data
lecture 3 39
each time we have a paramater, we put in the function (in our
model)
the paramater is the x value
than, we measure its slope
than we change the x value
so the x value will be left or right from the blue circle
so the paramater - is just the inputs, and we change them,
but the function , is the same, and we all the time measure
the slope of the paramater
Linear algebra
matrix multiplication
Dependeant variable
lecture 3 40
the thing we try to predict - is it a bird? is he survived? etc
gradient vs derivative
each time we increase the paramater
depends on how “ ”תלולis the slope - if the slope is very “”תלול, meaning,
if we change this paramater, the loss value increase/decreaese a lot - we
need to increase the paramater a lot
think about the function f(X)=x^2
when its תלולה, to get to the minimum point, you will need to increase x
a lot
but if you are right next to x=0, where its the minimum value, next to it,
the שיפועis very small, so you need to increase very little if you are
very close to x=0
so in x =-50, you will increase a lot
but for x = 0.7, the slope will be very קטןso you increase little
gradient descenet
calculate the gradient (derivative)
and do a decsent - decrease the loss
lecture 3 41
partial in python
the idea - we take a general function, and create specizliaed versions by
fixing some paramaters
israeli term of gradient descent
lecture 3 42
lecture 3 - second iteration
Matrix multiplicaiton and rectified linear
what is the function that we find paramaters for
relationship between a paramater and wether a pixel is part of a
basssed hum is quadratic is unly hielikley
we try to improve paramaters for a function, right?
so we have a function, we try to improve paramaters for.
the function is neural net
the paramaters are the function
so its unlikely that the paramaters will be resulted in a quadratic
function
lecture 3 43
rectified linear function
infinintly flexible function
the idea - if we added more than one linear function togetehr, we can get
to any function we please.
lecture 3 44
lecture 3 45
adjusting paramaters
what you did when plotting, changing the “m”
is changing the actual parmaters
and that infninelty flexible function can create fucking everything!
rectified linear
we create the flexible function just using addition of recfitifed linear
function
adding - create the bump, downoard and inwar swap
lecture 3 46
then we can add many values we want - you could match any function
and use gradient descent for the paramaters
Slope
will decrease as yo uapproach the minimal value
thats why we need to have the learning rate
Matrix multiplication
why - we need to have to do a lot of mx + b, and add them up
we will do a lot of theme, m are the paramaters right? we might have millon
paramaters
and we will have a lot of variable - and we mulyiply all of the variables, for
example, each pixel of an image, time a cofficeint, and add them togeter
than we do it for each paramater
so the cofficient will be the paramtaer, and then we will have the actual
inputs
lecture 3 47
6.0-Blog
what we did
Part 1
understand how to use different archeticture for your models, to get the
best archeticture
Part 2
Learn about how a neural net actually works
Main ideas
lecture 3 48