0% found this document useful (0 votes)

19 views48 pages

Fast Ai Class Notes

Uploaded by

omerrob13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views48 pages

Fast Ai Class Notes

Uploaded by

omerrob13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

💡

lecture 3

important - tanish, on kaggle datasets

https://forums.fast.ai/t/lesson-3-official-topic/96254/9

1.0-Tips
learning tips by tanisq

learning tips by me

asking question about the lecture and what you learned:

how can i use it?

what did i actually learned?

what can i do with that information

will help you on how to think!

2.0-Resources
Important resources
https://forums.fast.ai/t/lesson-3-notebooks/104821
https://forums.fast.ai/t/lesson-3-official-topic/96254

seeting fast ai in mac

https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/4

lecture 3 1
https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/3

math resources

derivitative - really important to understand the concept:

https://www.khanacademy.org/math/differential-calculus/dc-diff-intro

3.0 - Missions
missions list

4.0- Book
chapter 4 - first iteraiton

showing whats inside the path, the actual folder

Everything in a computer is a number

lecture 3 2
so the idea - just lets look at a specific section of the image.

everything is number , so we view the number representation of the

section of the image

so we look at a sction startin 4 pixels from the top and left - so we view
it as a numpy array, which is a numbered representation of the image

lecture 3 3
gradient

pixels

so the image as total of 768 pixels

lecture 3 4
Tensor

lecture 3 5
lecture 3 6
dimensions

a stack - think about a “‫ “ שטוחה‬- meaning, each matriexes is “‫”שטוחה‬, and

they stack on each other

lecture 3 7
Metric

Over fit

lecture 3 8
lecture 3 9
Stochasit graident descent
testing the effectivnes of any current weight assigment in terms of
preformance, and provide a mechanisem for altering the weight assigment
to improve prefromance - and do that automatic, so a mchine would learn
from it expirence

lecture 3 10
weights and pixels

lecture 3 11
lecture 3 12
lecture 3 13
lecture 3 14
Derivitative
Lets assume i have a loss function ok, which depends upon a paramater

so our loss function is the quadratic function, yes?

than, because its our loss function, wetry some arbitrary input to it, and
see what is the result - what is the loss value

now, we would like to adujst the paramaterby alittle bit, and see what
happens

so its the same as the slope - think of it, like you change it,

lecture 3 15
book - baidc idea

no matter how complex the nerual functions became, the basic idea is still
simpler

This basic idea goes all the way back to Isaac Newton, who pointed out that
we can
optimize arbitrary functions in this way. Regardless of how complicated our
functions
become, this basic approach of gradient descent will not significantly change.
The
only minor changes we will seess later in this book are some handy ways we
can make it
faster, by finding better steps.

Using calculaus
just allow us to calculate quickly how the change in paramaters affect the
loss value

lecture 3 16
This is dervitateive
how much a change itn its paratmer, will change the result of the original
function

dervitative calculate the change, rather than value

For instance, the

derivative of the quadratic function at the value 3 tells us how rapidly the
function
changes at the value 3
You may remember from your high school calculus class that the derivative of
a func‐
tion tells you how much a change in its parameters will change its result.

this is calculuas - if we know how the function will change, the loss
function, we know what will need to do to the paramtaers to make the
loss value smaller

we calculate b ased on every weight

lecture 3 17
so just calculate the derivtative for one wieght (parmataer), and treat all
other paramaters as constant, as not changed

than do the rest for all of it

caclulating gradients
calculating the gradients for the paramaters values at a single point of
time

basiclly just claculating all the current parmaters gradient

Learning rate
we get the gradients - are the slope of the function, if its very large, we
have more adjusment

lecture 3 18
Distingiush
inputs - wht inputs of the function

output - relationships

parmatares - are the parmaters of your model.

they are basiclly
define your function

the paramaetrs are you function signature

the input always changeing

important

Loss function

lecture 3 19
The process
1. start with a function that is your model, with a given praramtaers

2. intialize the paramaters

3. calculate the prediction for your function for your inputs with the current
paramater

a. than, you will have the prediction for your inputs (the valdation set),
with the current paramaters

b. of course, you could see how far off you

4. than, see how far your preditions are from your actuals using the loss
function

a. calculate the loss

5. calculate teh gradients to improve the loss

6. steps the paramaters (change them based on the learning rate)

The main idea

we started with syntetic function, and plotting the data points

this is our actuals, so we have the “oridinal” function, that we use to

create the data points

we input the data points to the function and plot the output

than we create a function - this in real life will be the neural function, but
now, lets assume its a quadratic

than we caclulate the prediction based on the function we create (our

model), and plotting the prediction, and see if it is close to our actuals -

lecture 3 20
if the shape of the prediction close to our actuals look (patterns)

than, we improved the loss

the goal is to find the best possible quadratic

that when we insert the input to it

the output will actually resemble the output of the actuals

so you do that on the validation set so you actually have the

output of the actuals, you can plot it

and then, for the model, you can see each time, you can plot the
prediction, and see it the predictions pattern / shape is simmilar to
the actual shape

and you will improve it each time, so at the end, you will have a
simmilar function shape for the predictions as of the actuals

summary
at first output from our input wont have anything to do with what we want

lecture 3 21
using the label data
the model gives us outputs

we compare them with the targets (we have the labels for those inputs)

we do that using the loss function (mse), this is how we compare,

using the predictions vs actuals

than we calculat the loss value - how wrong our prediction from the
actuals where

then we improve the model by changing weidths

5.0- Lecture

tips how to learn

lecture 3 22
Run notebooks and expirement:
run every code cell from the book - expirementing with different
inputs/outputs

reproduce results - use the clean

what is it for - whats it is going to output, if anything

repeat with different dataset

lecture 3 - first iterations

main concepts
training piece → than gets the model

feed the model inputs and it spits output based on the model

error rate- accuracy

Queastions

lecture 3 23
what does it mean “inference” seconds in training a model

Model.pkl
fit function to data

Derivitative
if you increase the input, the output increase /decrease - basiclly, the slop
of the function

Tensor
works with a list, with anything basiclly.

Optimization
gradienct desecent - caluclate. the gradient(paaramaters), and decrease
the loss

This is deep learning - deep learning is a

metaphor for life → do one iteration, and
improve over time.
just do this

then optimize

lecture 3 24
values adding together

gradient desecent to optimize the paramaters

and samples of inputs and outputs you want

the computer draws the owl

using gradient descenet, to set some parametrs, to make a wiggle

functions, which is addition of vectors?

lecture 3 25
model choosing is the last thing
once you clean , gather the data, and augmented it

you can reason yourself about a model - depends on the task

you need it the most accurate? the fastest? etc

train the model first!

fit function to data

we start with a flexible function (neural network), and we get it to do a
praticualr thing - recognize the pattern in our data

so the idea is to find a function that fit our data? so neural networks is
just a function

loss function - a measure of how good the function

for each paramater - if we move it up, is it makes it better? or

not?

Derivitative - if you increaes the input

‫ אז הנגזרת תיהיה גדולה‬, ‫בהתחלה השיפוע נניח מאוד גבוה‬

‫ הנגזרת מתקרבת לאפס‬,‫ככל שהשיפוע יקטן‬

‫ השיפוע גבוה או לא‬,‫ האם בערך עצמו‬,‫אז הנגזרת בעצם תגיד לכל ערך‬

slope === gradient

how rapidaly the function change at value 3 - ‫כלומר בערך של‬

‫הפרמטר הזה‬
‫?האם הפונקציה משתנה מהר‬

Our goal - is to make the loss small, we want to make the loss
smaller

lecture 3 26
if we know how our function will change -

we have a paramater, and the function, that tell us how rapidly the
function change in this paramater

our goal is to make the loss smaller

so, we will change the paramataer a bit

and see if it is make our loss function output a better

lecture 3 27
the magnitue tell us that at this point , the slop is fairly steep, so
the loss change significant when we adjust w - each time we
adjust w, the output change significatly

so lets remove by some value * the slope, and see what happens,

lecture 3 28
why use the slope?

because, the slope will allow us to undersatnd how much of a

step we can take:

in big slope, it means, that it changes very rapidally, so we

might take smaller step

on more general slope, we should take bigger step,

because the change is so small each adjusmet, so we can
take bigger step

lecture 3 - firs iteration

What we did so far

the goal - we just did detector / calssifier

trainning pipece and model.pkl

model → you feed input and spits output

lecture 3 29
Paramaters
where are they come from, and how they helped us

machine learning models → fit functions into data

start with flexible function, and get her to refognize the patterns in the
data we give it

start with a function, yes?

then the goal - is to find the most appropritate function to our data -
so we will test different function, to try the best one

so here - we have the abillity to create any quadratic function, and we will
look for the best one for our data

So the idea:
we try to map the data to a shape of a function - and try to make basiclly
the function describe the data as much as possible, with little noise as
possible

the goal - find the best function that match the data we have!!!!!

the steps

lecture 3 30
1. plot your data

2. plot the starting function

3. try to change the paramaters, and see if the function describe (or fit) your
data better

a. change the paramaters: you can increase or decrease and see what
improving our function

increase the paramater

decrease the paramater

then reitrate, for all the paramaters again, until you find the best
paramater value

The question
if i move the paramater, does the model improve or disimprove?

than we need to have a meausrment of how good is the model - to

know the effect of the parmaters on the model

this is called loss functions - the value that it output, is the loss valuue
- how good the model is.

so the goal - try to find the paramaters that reduce the loss value - that
improve the model

The derivitative:
if i increase the paramater - does the loss get better or worse?

the loss - we want smallest loss as possible.

the derivitative - if you increasae the input, the output increase or

decrease, and by how much (the slop/gradient )

so this is the idea:

we want the smallest lost

we get the paramaters,insert them to the loss function and get the
loss value with this paramaters - the measurment for how good

lecture 3 31
our model FOR THIS PARAMATERS

then we want to adjust the paramaters, so we will check:

if i increase paramater “a” → the loss is a function, so if i

increase paramater “a”, does the loss value improve or not
improving?

we adjust the paramaters, it make the loss value go down or

up?

but the question is, how the paramater, connected to the slop
of the function?

so the biggerst question:

how the paramaters, are working with the loss function?

like, how the paramater, like we have million paramaters

how does the loss function connected to it?

ok i understand if the slope of this single paramater is negative,

than we will need to increase it to go to the minimum point

but if f’(x) is the derivtative of f(x) which is the loss function

we put the paramater in the derivtative of the loss function?

why?

lecture 3 32
loss function - depends upon the paramaters

lecture 3 33
Clarfinyg questions

lecture 3 34
lecture 3 35
lecture 3 36
paramaters vs values

conclusion - most important

lecture 3 37
loss function - measure erros in this outputs!

lecture 3 38
What is the function
what is the function you find paramaters for?

this is the main idea - values getting added together, and

gradient descent optimize the paramaters and sample of inputs
and outputs you want: 49:00

Why small number increase

for each increase in the respecitve paramater by one unit, what
will happen to the loss?

everything is about fit functions to data

lecture 3 39
each time we have a paramater, we put in the function (in our
model)

the paramater is the x value

than, we measure its slope

than we change the x value

so the x value will be left or right from the blue circle

so the paramater - is just the inputs, and we change them,

but the function , is the same, and we all the time measure
the slope of the paramater

Linear algebra
matrix multiplication

Dependeant variable

lecture 3 40
the thing we try to predict - is it a bird? is he survived? etc

gradient vs derivative

each time we increase the paramater

depends on how “‫ ”תלול‬is the slope - if the slope is very “‫”תלול‬, meaning,
if we change this paramater, the loss value increase/decreaese a lot - we
need to increase the paramater a lot

think about the function f(X)=x^2

when its ‫תלולה‬, to get to the minimum point, you will need to increase x
a lot

but if you are right next to x=0, where its the minimum value, next to it,
the ‫ שיפוע‬is very small, so you need to increase very little if you are
very close to x=0

so in x =-50, you will increase a lot

but for x = 0.7, the slope will be very ‫ קטן‬so you increase little

gradient descenet

calculate the gradient (derivative)

and do a decsent - decrease the loss

lecture 3 41
partial in python

the idea - we take a general function, and create specizliaed versions by

fixing some paramaters

israeli term of gradient descent

lecture 3 42
lecture 3 - second iteration

Matrix multiplicaiton and rectified linear

what is the function that we find paramaters for

relationship between a paramater and wether a pixel is part of a

basssed hum is quadratic is unly hielikley

we try to improve paramaters for a function, right?

so we have a function, we try to improve paramaters for.

the function is neural net

the paramaters are the function

so its unlikely that the paramaters will be resulted in a quadratic

function

lecture 3 43
rectified linear function
infinintly flexible function

the idea - if we added more than one linear function togetehr, we can get
to any function we please.

lecture 3 44
lecture 3 45
adjusting paramaters
what you did when plotting, changing the “m”

is changing the actual parmaters

and that infninelty flexible function can create fucking everything!

rectified linear
we create the flexible function just using addition of recfitifed linear
function

adding - create the bump, downoard and inwar swap

lecture 3 46
then we can add many values we want - you could match any function

and use gradient descent for the paramaters

Slope
will decrease as yo uapproach the minimal value

thats why we need to have the learning rate

Matrix multiplication
why - we need to have to do a lot of mx + b, and add them up

we will do a lot of theme, m are the paramaters right? we might have millon
paramaters

and we will have a lot of variable - and we mulyiply all of the variables, for
example, each pixel of an image, time a cofficeint, and add them togeter

than we do it for each paramater

so the cofficient will be the paramtaer, and then we will have the actual
inputs

lecture 3 47
6.0-Blog
what we did

Part 1
understand how to use different archeticture for your models, to get the
best archeticture

Part 2
Learn about how a neural net actually works

Main ideas

lecture 3 48

FAI 4 Mathematical Concepts II
No ratings yet
FAI 4 Mathematical Concepts II
39 pages
Autodiff
No ratings yet
Autodiff
12 pages
Mit18 S096iap23 Lec06
No ratings yet
Mit18 S096iap23 Lec06
9 pages
FF Calculus 2
No ratings yet
FF Calculus 2
12 pages
Vanishing Gradient Problem
No ratings yet
Vanishing Gradient Problem
3 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
ML Labs
No ratings yet
ML Labs
46 pages
Lecture 4
No ratings yet
Lecture 4
101 pages
Numerical Methods - An Introduction
No ratings yet
Numerical Methods - An Introduction
30 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Deep Learning Numerical Challenges
No ratings yet
Deep Learning Numerical Challenges
46 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Neural Networks Skimmed - Ipynb - Colab
No ratings yet
Neural Networks Skimmed - Ipynb - Colab
8 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
ML Notes
No ratings yet
ML Notes
14 pages
Lec 35
No ratings yet
Lec 35
12 pages
Lecture 20
No ratings yet
Lecture 20
71 pages
Numerical Methods in Engineering Analysis
No ratings yet
Numerical Methods in Engineering Analysis
82 pages
Lecture12 Diff
No ratings yet
Lecture12 Diff
31 pages
NN 2
No ratings yet
NN 2
12 pages
Training Neural Networks: Key Concepts
No ratings yet
Training Neural Networks: Key Concepts
37 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Basics of Numerical Differentiation
No ratings yet
Basics of Numerical Differentiation
24 pages
Polynomial Regression in Python Explained
No ratings yet
Polynomial Regression in Python Explained
1 page
Introduction To Numerical Modeling and Device Modeling Outline
No ratings yet
Introduction To Numerical Modeling and Device Modeling Outline
8 pages
Learning 3
No ratings yet
Learning 3
98 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
Lec06 Derivatives
No ratings yet
Lec06 Derivatives
22 pages
Machine Learning and Pattern Recognition Week 8 - Backprop
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Backprop
8 pages
AI Teacher Training - Machine Learning Curriculum
No ratings yet
AI Teacher Training - Machine Learning Curriculum
34 pages
Demystifying Deep Learning
No ratings yet
Demystifying Deep Learning
68 pages
Machine Learning Unit IV Overview
No ratings yet
Machine Learning Unit IV Overview
23 pages
Neural Networks: Basics and Learning Methods
No ratings yet
Neural Networks: Basics and Learning Methods
28 pages
Federated Learning Lecture Notes
No ratings yet
Federated Learning Lecture Notes
92 pages
Text Book
100% (1)
Text Book
129 pages
Cours 2
No ratings yet
Cours 2
25 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
50 Inference
No ratings yet
50 Inference
31 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Seminar Artificial Neural Network 24 9
No ratings yet
Seminar Artificial Neural Network 24 9
39 pages
Lec 3
No ratings yet
Lec 3
21 pages
Unit 2
No ratings yet
Unit 2
76 pages
Output and Loss Functions in Deep Learning
No ratings yet
Output and Loss Functions in Deep Learning
15 pages
Neural Networks and ART-1 Clustering
No ratings yet
Neural Networks and ART-1 Clustering
11 pages
Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
DL Unit1
100% (1)
DL Unit1
61 pages
Numerical Computation Reference Book Concise V3 Feb 2019
No ratings yet
Numerical Computation Reference Book Concise V3 Feb 2019
78 pages
Calculus Crash Course: Derivatives & Gradients
No ratings yet
Calculus Crash Course: Derivatives & Gradients
24 pages
Neural Networks Cost Function & Backpropagation
No ratings yet
Neural Networks Cost Function & Backpropagation
9 pages
5 Round-Off Errors and Truncation Errors
No ratings yet
5 Round-Off Errors and Truncation Errors
35 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Numerical Methods & Error Estimation
No ratings yet
Numerical Methods & Error Estimation
33 pages
Linearity: Skip To Content
No ratings yet
Linearity: Skip To Content
10 pages
Calculus for Science Students
No ratings yet
Calculus for Science Students
7 pages
1402 Notes
No ratings yet
1402 Notes
38 pages
Basic Calculus
No ratings yet
Basic Calculus
17 pages
Mechanics Lab Manual for Physics 4AL
No ratings yet
Mechanics Lab Manual for Physics 4AL
110 pages
CSE 3rd Semester Syllabus 2019-2023
No ratings yet
CSE 3rd Semester Syllabus 2019-2023
30 pages
Additional Mathematics Syllabus O Level
100% (1)
Additional Mathematics Syllabus O Level
32 pages
Pycse
No ratings yet
Pycse
340 pages
6sem CT3
No ratings yet
6sem CT3
1 page
AP Calc Ab 2025 26 Syllabus
No ratings yet
AP Calc Ab 2025 26 Syllabus
4 pages
Calc 3.1 Chain Rule Packet
No ratings yet
Calc 3.1 Chain Rule Packet
5 pages
Differentiation Under The Integral Sign
No ratings yet
Differentiation Under The Integral Sign
23 pages
ELECTRONICSANDCOMMUNICATIONENGINEERING
No ratings yet
ELECTRONICSANDCOMMUNICATIONENGINEERING
119 pages
Smath Studio Primer
100% (1)
Smath Studio Primer
46 pages
SG FRQUnit1 67fd591971be83.67fd59234d8638.86782678
No ratings yet
SG FRQUnit1 67fd591971be83.67fd59234d8638.86782678
48 pages
Booklet Exercises Students Fall 2020 (1) Bec
No ratings yet
Booklet Exercises Students Fall 2020 (1) Bec
41 pages
Error Propagation in Traverse
No ratings yet
Error Propagation in Traverse
6 pages
Critiquing Calculus Rigor
No ratings yet
Critiquing Calculus Rigor
266 pages
1.practice Sheet Unit 1 Ma101l
No ratings yet
1.practice Sheet Unit 1 Ma101l
3 pages
100 Solved Calculus Problems
No ratings yet
100 Solved Calculus Problems
26 pages
What Is Calculus
No ratings yet
What Is Calculus
4 pages
B.Tech Mechanical Engg Guide
No ratings yet
B.Tech Mechanical Engg Guide
94 pages
(Ebook PDF) Single Variable Calculus: Concepts and Contexts 4th Edition Download PDF
100% (3)
(Ebook PDF) Single Variable Calculus: Concepts and Contexts 4th Edition Download PDF
41 pages
MATHEMATICS
No ratings yet
MATHEMATICS
10 pages
Lagrange's Impact on Science
No ratings yet
Lagrange's Impact on Science
5 pages
Space-Time Continuum Formula Explained
No ratings yet
Space-Time Continuum Formula Explained
2 pages
Internal Verification for BTEC Computing
100% (1)
Internal Verification for BTEC Computing
53 pages
The Effectiveness of LEGO Mindstorms NXT in Following Complicated Path Using Improved Fuzzy-PID Controller
100% (1)
The Effectiveness of LEGO Mindstorms NXT in Following Complicated Path Using Improved Fuzzy-PID Controller
7 pages
Eeg 823
No ratings yet
Eeg 823
71 pages
N4 Mathematics Lecturer Guide
100% (2)
N4 Mathematics Lecturer Guide
306 pages
Tangent Line Slope and Equation Guide
No ratings yet
Tangent Line Slope and Equation Guide
25 pages