MASTER BUILDING
AI
Train Your Own Neural Network
Step-by-Step Programming
Guide
By
Roronoa Hatake
TABLE OF CONTENTS
DEEP CONVOLUTIONAL Q-LEARNING INTUITION
ELIGIBILITY TRACE
INSTALLING OPEN AI GYM WALKTHROUGH (MAC VERSION)
INSTALLING OPEN AI GYM WALKTHROUGH (UBUNTU
VERSION)
DOOM - STEP 1
DOOM - STEP 2
DOOM - STEP 3
DOOM - STEP 4
DOOM - STEP 5
DOOM - STEP 6
DOOM - STEP 7
DOOM - STEP 8
DOOM - STEP 9
DOOM - STEP 10
DOOM - STEP 11
DOOM - STEP 12
DOOM - STEP 13
DOOM - STEP 14
DOOM - STEP 15
DOOM - STEP 16
DOOM - STEP 17
WATCHING OUR AI PLAY DOOM
PLAN OF ATTACK
THE THREE AS IN A3C
ACTOR-CRITIC
ASYNCHRONOUS
ADVANTAGE
LSTM LAYER
BREAKOUT - STEP 1
BREAKOUT - STEP 2
BREAKOUT - STEP 3
BREAKOUT - STEP 4
BREAKOUT - STEP 5
BREAKOUT - STEP 6
BREAKOUT - STEP 7
BREAKOUT - STEP 8
BREAKOUT - STEP 9
BREAKOUT - STEP 10
BREAKOUT - STEP 11
BREAKOUT - STEP 12
BREAKOUT - STEP 13
BREAKOUT - STEP 14
BREAKOUT - STEP 15
WHAT IS DEEP LEARNING
PLAN OF ATTACK
THE NEURON
THE ACTIVATION FUNCTION
HOW DO NEURAL NETWORKS WORK
HOW DO NEURAL NETWORKS LEARN
GRADIENT DESCENT
STOCHASTIC GRADIENT DESCENT
BACKPROPAGATION
PLAN OF ATTACK
WHAT ARE CONVOLUTIONAL NEURAL NETWORKS
STEP 1 - CONVOLUTION OPERATION
STEP 1(B) - RELU LAYER
STEP 2 – POOLING
STEP 3 – FLATTENING
STEP 4 - FULL CONNECTION
SUMMARY
SOFTMAX CROSS-ENTROPY
DEEP CONVOLUTIONAL
Q-LEARNING INTUITION
We're starting off the section on deep convolutional cool
learning. So let's have a look at what it's all about. So we
had an environment where an agent and we had a vector
describing that environment which was fed into a neural
network and at the end we got the q values as our outputs.
And then of course we found out how to restrain the
learning part. We found out how actions are decided based
on those values. That's an action part. And we talked about
action selection policies and different things about how
deep learning works. But here the key concept for all of this
is how do we get from this from the actual environment and
the states to the neural network. Well the transition is over
here. The input vectors are the input Lehre of our neural
network and it is a vector. So what we're looking at is OK so
we're actually edging on the curve. That's not the correct
term, we're not looking at anything. The agent basically has
this information. So the environment is parsing it this
information saying OK you the agent you're currently in this
your state is described by the sector in this simplified
example it's described by this vector X-1 of 1 x 2 of 2 so
your coordinates are 1 2 and that is your whole state in a in
a more complex environment. The statement and all other
things that the agent can be observing but the point here is
that it is posed as a vector. And the thing is that that doesn't
happen in real life in real life except for GPS systems and
other things like that. But in real life what do we use most of
the time we use our senses we use our eyes even in GPS.
It's not built into our brain. It's not telling us the coordinates
through our brain. So we're still using our eyes to look at the
GPS and understand what's going on there. And so this is
kind of cheating for AI to be able to get information about
the environment as a vector. It's too simple, it's not how it
works in real life. That's not how we as humans operate and
ultimately we want to create artificial intelligence which can
operate in a similar fashion to humans which is as it can
take on the same challenges as humans. And so in the
human world we don't have that we don't have that we
don't have these coordinates or other types of vectors that
are passed to us that explain the state we're in in that
environment. So we're going to have to remove that to
make it more realistic. And then what can we replace it with
what do we see or what do we do as a human to get
information. Well most of the time we see of course all our
senses but most of the information that we're getting about
the world around us comes through our sight. And that is
why we're going to change that little arrow which we had
into a whole convolutional neural network. So this is from
our Onix. Number two we've got the convulsion Larry and
that's why it's important to be quite comfortable with
evolution conditional neural networks and how that works if
you've done ODP intercourse and you should be comfortable
with that. Or you can just have a look at the next number
two. We've got some very good intuition projects there. So
here we've got the convolutional operation which happens
so we're actually going to be looking at this as an image. So
this is an image of the Net environment. And so the agent is
actually looking at the environment. So in this case not that
he's like looking from within there he's like looking like that.
Let's say he's playing this on a computer and he can see
this environment and therefore he can see where this figure
representing the agent actually is. You can see his hole in
viral or whatever a human would see if it's an actual maze
and the human would see the maze from inside. And so the
agent should be able to do exactly the same thing.
So what he says is done goes through a commotion lair. You
sort of fool pulling a leg goes there flattening again. You can
find out more about these different parts of convolutional
neural networks in the annex and then officers flattened.
Then we have inputs which go into the neural network and
this is a way more realistic because the agent has to use
their sites and or has to process images which the
environment is supplying to the agent just as a human
would be processing images. And the beauty of this is not
just that it's more realistic and it's kind of like more as a
hue. The age is actually the same as a human would be but
it allows us to process much more complex environments.
For instance this is how we can play Doom or other games
like that because instead of just getting a vector of
information which somebody would have created for us in
this environment we can just hook up artificial intelligence
to any environment which as humans have a vision of is
and. As a human when you're playing this game you can see
exactly this picture and that's exactly what the artificial
neural network or the agent would see now. So in this part
of the course when you're going to probing the practical
Tournelles the agent will actually see this exact picture it'll
see the pixels it will get this exact picture full of the pixels of
this person with this we've gone with this face with this
percentage with everything exactly what we see here that's
exactly what the agent will see. Then it will have to dissect
that through pulling liff fattening and then it'll go into a
neural net. And needless to say that the neural networks
actually are much more complex than that so let's replace it
with something like this. This is not much more complex.
This looks a little bit more complex but in reality the neural
networks are going to be working with and creating if you
are going to be quite interesting and going to be much more
complex than this. But as you can see already here even if
you just have five inputs and sort of two things become
much more complex and here you can see we have many
more actions that the agent can take. So in the game of
doom turn left and right look down look up Ron's shoot
reload.
Or you know all those different actions that are possible in
the first person. And moreover it doesn't have to be that
you can. You can touch this agent to another type of game.
That's the beauty of it that it then realizes that it can now
operate any kind of environment that you attach it to
because as long as there's a visual representation of that
environment it's already got the whole infrastructure the
whole structure is ready to process that. So that's what
deep convolutional CULE learning is all about. So we're
taking it even to the next step we're adding convolutions
into our convolutional Lares into our agents' brains now and
we're making it even more complex and therefore we're
bored with being able to solve even more complex
challenges. So I hope you're very excited because this is
going to be in an epic section and we're going to create
some amazing things and I can't wait to see you on that
next project.
ELIGIBILITY TRACE
We're going to cover up quite a complex project called
eligibility trace or and step. Q learning and this is something
I am going to implement in the practical side of things so
that's why we need to come out of and at the center it is
quite a complex topic so we've got a very interesting
approach to getting us up to speed with the intuition behind.
So I have a different approach in mind than we're used to
solace the simple look at that and see how that goes. So I
want to give you an example to start off with. I'm going to
give you an example in this project and that will
demonstrate the power of eligibility. And give us the
intuition behind things. And then if you like to delve further
into eligibility traits I'll give you the best place where you
can read about it. I'll give you a reference to a book but
otherwise. So while this is going to be different because
we're into it first rather than delving into the intuition we're
going to look at an example and the intuition becomes
obvious after we talk about it. And that's my hope. So let's
have a look. Let's see, let's see if we can do this. So here
we've got two agents and they're navigating the same
environment and we're going to see how these two agents
work. First one is going to work with our eligibility trace.
Second one is going to work with illegible traces and
hopefully we'll see why the second one is going to be so
much more powerful than the first one. So let's have a look.
We're going to look at this agent first. And the way he
operates is the exact way that we've discussed deep
circular things so far. So the agent is going to take a step or
is going to take action to move into a new state. Good to get
a certain reward is going to put that through or through its
algorithm update the neural network that's running this
agent or that's running in the mind of this agent. So that's
basically how learning from that moment is going to take a
new step. So from this new state it is going to take a new
action based on what its neural network is telling it to do, is
going to get rewards, going up, updating and so on and is
going to keep doing that. So obviously this is going to do
quite a good job and as we've seen previously from the
previous practical Squire to DROs we're going to get some
quite good results here but now we're going to add a new
feature. Now this agent number two this guy over here he's
going to navigate the same environment. He's going to use
the legibility of trees. And this is what it means. What he's
going to do is he's going to take any steps he's going to take
in this case five forceps is going to take four steps and then
only after taking these steps will he calculate the total
reward that he got from those steps and he will put it
through his network. He will put it through his neural
network that's governing the decision making process and
then the neural network will learn from that. So which one.
Right away like Which one do you think is more powerful.
The guy that is just taking it one step at a time and kind of
like poking in the blind or in the dark and he's like OK so I'm
going to take a step to see what happens. Take a step to see
what happens. Take steps. What happens? The guy at the
top or the guy that takes just very courageously Marsha's
through four steps in a row and then he decides whether
those were good steps or not altogether and why you can
see here or why you're probably getting a sense for why the
second guy is better or is more powerful is because the
second guy actually knows what's at the end. The first guy
when he's assessing whether this step is good or not he's
only looking at the reward that he's getting. And so he's
only guided by the reward the environment is giving him.
Same thing here he's only guided by the reward that this
environment is giving him here. So every time that's his
only kind of compass that he has the reward the reward the
reward. Whereas here he actually can assess after taking
the steps he can assess. OK so I did get to the finish line. So
this combination of steps was good. All of them were good.
Or Oh no I ended up in the firepit or Ohno I. I did and when
my car didn't get to the finish line or I crossed the sand wall
or I lost the game of doom or something. And then he
decides that this whole combination of steps is bad. And
therefore for these steps that are earlier on he has more
information. He has more insights like in a very intuitive
approach. Again this is a much more complex topic than
we're portraying here. But in an intuitive way for example if
you take this step this step only has information to you to
obtain it you only have information coming back from this
reward here. And for this step in this case the same exact
step. It has more information coming all the way from. OK so
what was the outcome after four steps or five steps or
whatever. Yes , that is how it works and why it's called
eligibility trace is because during this process not only does
he look at the computer reward of what's going on and then
the cumulative loss and then all that is appropriate. But
actually there is a trace of eligibility as what is called the
disability trust. There is a trace that is kept in an algorithm
which says OK so if we do get a let's say we get a
punishment we get a negative reward then which of these
steps is most likely to be eligible for that punishment. So not
only do we know overall this whole pattern or the school
combination of steps but we also keep a trace of eligibility
which steps we are going to update if we get everyone. So
for instance if as a negative reward we might have an
eligibility trace that indicates to us that this is a step that is
most responsible for what we got in the end or if it's a
positive reward again we might know the algorithm helps us
keep track this eligibility algorithm also helps us keep track
of what's what step or what action needs to be is eligible
eligible to be updated based on that reward that we get.
And that's why it's called eligibility trace. And so that's the
basic intuition behind eligibility and hopefully these two
examples of these agents make it quite obvious or are quite
intuitive in while these abilities can be so powerful. And if as
promised if you'd like to delve further into topical eligibility
traces or step learning then a wonderful amazing book
which you can find is called reinforcement learning. An
introduction is by Richard Sutton Andrew Barto 1998. I think
they're in the process of creating a second edition or a very
critical issue. But this is the most common or the most
popular or the most referenced book on reinforcement
learning. It's got a ridiculous number of citations.
I think like tens of thousands if I'm not mistaken. And also
the chapter you need for this is Chapter 7. So in order to
look at eligibility choices there's a whole chapter about
Chapter 7. You can read about it and it goes into lots of
detail. Forward Backward eligibility traces and also how
integral temporal difference on one hand and the other end
of the spectrum you have Monte-Carlo methods in between
you have eligibility traces allegedly traces or you link to go
from temporal differences to Monte-Carlo methods. Very
interesting to read lots of pictures which I really really
appreciated very intuitive explanations. So there's lots of
things that you can learn from this book about artificial
intelligence and reinforcement learning but specifically
eligibility traces are like a very good place to go to is this
book for eligibility traces. And the second reference for
today is something that is going to show you in the practical
trials the deep learning or the Google deep mind research
paper on asynchronous methods for a deeper reinforcement
learning. Yes, that's the paper that's the one paper that the
A-3 sees paper that we're going to be discussing further
down in the score. We're getting closer and closer to it. And
as you can tell we're pretty excited about this so this is
going to be looking a little bit about how they implemented
eligibility traces in this paper so we're going to be using this
more for the practical side of things. So hopefully you
enjoyed today's project and know you're a bit more
comfortable with eligibility traces and I can't wait to see you
next time. Until then enjoy it.
INSTALLING OPEN AI
GYM WALKTHROUGH
(MAC VERSION)
If you're on Mac you can see this project or Also take a look
at the other project which will focus on using Linux in a. So
to get started we want to install an app. There are some
dependencies and if you're using a Mac you can use
homebrew to install the required environments if you're
looking to install the full set of environments. Good to have
it, good to try. Just to get them installed. OK. I'm just using a
test environment. What I want to do is use homebrew. I
have copied the brute install commands. I just want to run
them when I run them through. Just install these
dependencies and we're going to move on to the next step
which will be cloning into a gym and installing everything.
We're going to then move on to Israel and Dubai and
wisdom as well with that packet. Jim, do we have those? All
right. So what we want to do is to clone into open and jump
as you see at a quick spelling or slash open a slash. Jim flew
into it. I should be pretty quick. We also want to see the end.
Jim and Jim we want to use it to install the don't forget the
following which will be Cartesian dot brackets. All close
brackets. And quotation on the following: it should be
installing everything in an open engine. Then let that run
and we'll be moving into the next step. All right. So we have
the set of the insulation with Pip over there. Jim, we want to
do it again. You see and we're going to clone it to Dubai. So
similar operations get up on slash and open a slash doom.
But again it glowed reasonably quickly. Where is that?
We want to see into Dubai. Once we're in Dubai. We want to
use the installation steps. Install everything. We're almost
there. Once that's installed we're going to see it again. And
we're just going to do all of these new packages and do a
few more steps. I'm sorry. Make sure when you run this you
don't forget the period at the end of it. Do you see that Pip
does a pretty good job of letting you know where the errors
are coming from. As to his usage, I'm going to pause as well
as his dolls. Actually, no. Take that back. Here's the dolls
rather quickly. We want to see each other again. We're going
to install this too. So if it's this, do it then install it. So let's
move on. We have wisdom and solve successfully and we
need you. Last quick step is to install a packet. So we're
going to tip again domain package manager and we're
going to do the following command. Jim, do you want to let
that run. So it's successfully installed as well and we should
have Jim set up. Now I do recommend using 2.7 that's what
I'm using right now for this demonstration. Do you
recommend using 2.7 for a Jim because it's going to help
run it smoother and you might not run into any errors using
2.7 compared to 3.5 or 3.6. If you do run into any errors
installing this and running opening. Jim please see the
attached files then you can find common debugging
solutions. You can see the steps, presumably you would
want to run Windows. I'd like to thank all the students that
have contributed to the discussions on debugging tips and
solutions. So again if you run into anything you could try to
either common errors would be to upgrade Pipp just to
resolve the environment in a kind of rerun the commands
and also just to check here with the common debugging
solutions where you'll find additional information for
resolving any errors that may pop up. Again aim to use 2.7
for this. But if you're using 3.5 3.6 or one of the versions of
Python 3 you can take a look here and see if any of these
help as well.
INSTALLING OPEN AI
GYM WALKTHROUGH
(UBUNTU VERSION)
We want to go to a terminal and we need to do all the
dependencies for opening a gym which you'll find on the
main page. You know if you have mortised all that's fine but
you can copy them from the good repository. For one too
that you'll see here. You want to just Cabanis grab it and
turn a patient into a terminal. This is basically setting up the
dependencies is similar to the Mac OP Max setup where
innovation is in the terminal and run it and let it install.
Mainly this process if you've seen the Mac project or done it
it's again it's very similar just different dependencies
running. I want to. Now we want to up, we want to run sudo
if you have any permission , so want to have the Missioner
and run sudo let that run and install everything. Once we
get that we're going to end up cloning Jim we're going to
see the end to Jim and we're going to do the same for Jim
doom doom and for. So just follow along with the commands
and we will get it set up. But now we want to go into clone
and get a clone in the form of address to clone into the open
AGM repo so we can download the required information and
files we're going to clone into it.
It should be relatively quick once we're in we're going to see
it. Jim and now we're going to run a pipe installation with
the following command. Don't forget the parentheses, the
period and the open brackets with all the way to run it. That
may take a moment. So then pause the project when it
comes back and work on the next step. Once it is set up so
we have that set up now we're going to clone into the
following Git clone HTTP as defined address it how come
slash open a I slash doom dash by. All right. Set that up
quickly. So we're going to CD into it. CD doomed as pie
where to run another Pip install and once we have this we're
going to let it run and install don't forget the period on the
pip install command. Again it's a pretty quick setup for this
one as well. Now when you DCD into it actually we're in the
CD now we want to install these do. And I'm going to come
back to this once it's installed. As you can see I did run into
an error so if you run into this air we're going to need to
install CMYK. I thought I already had it on my system but in
this environment I don't. So there's no worry. It's a C make
error so we should be able to fix it with the Pipp install. Let it
run and we're going to then rerun the Pipp install this.
Command and it should install it successfully. Again Nepos
project. And jump back when it is installed. Right. As you'll
see after we added see make it has successfully installed
going around run the final Pipp installation command. All
right. So we can resolve this with our C make. And if you
take a look at this and some other errors that may pop up it
could just be some pre-built dependencies that are required.
We saw this with CMYK. So you just take a look if it has a
similar or just try and install it see me but we have one
more insulation to go through which we're going to take a
look at. Now. So the final step was to use Pip to install this
package with Bakhit dash. Jim dash does and this insulation
should be again relatively quick once it's installed.
We are set up. You see that we have an open gym set up.
Now I just want to mention if you do run into any errors
please see the debug logs and they come and debug
solutions that will be attached to this lecture. I want to take
a moment to thank all the students that have contributed to
these solutions and debugging information. So thank you
Jay, one who has contributed as you run into anything.
Please comment or share the discussions and we will be
there to help you as soon as possible. But again I highly
recommend using Python 2.7. If you do use Python 3 3.5 3.6
I have seen quite a few students using them and
successfully using them with the packages. But you may run
into a couple of necessary changes. It's no problem. Again
just check the debugging solutions and let us know if you
have any questions. All right. That will wrap it up for this
project.
DOOM - STEP 1
We're going to play Doom with an artificial intelligence
which will try to kill some monsters to reach some goal in a
3D environment. You know 3-D maps exactly like when
you're playing project games on encounter differences. You
are not going to play this game the way you're going to
build and code, we'll be playing it for you. So there we go
we have another very exciting challenge ahead of us and
besides we're going to take it to the next level now because
we're going to make a more complex model which is the
deep convolutional Cuellar. So I hope after these intuition
lectures you're now very excited to implement it. I am
definitely very excited to implement it. So let's start right
now. And as you can see I'm starting with Google because I
want to show you the opening eigene environment and walk
you through the game we're going to play. All right so
starting with Google we're going to type here open a gym.
So there we go, opening a gym. Then you click on the link
and you will find a page on that page and on the upper left
corner of this page you're going to click on environments
and that is where you have all the environments which are
actually environments of games. And for each of these
games you can implement an AI to be the game.
And each time it is specified which goal you have to
accomplish, what are the actions and what are the rewards
and know that for most of the games the input states are
actually the input images. That is it's not like before with the
self-driving car where we had you know a vector and coding
One state of the environment for most of the games here
the input state is instead the image. So it's exactly like our
future. I will have eyes like humans do. In fact this AI we're
about to implement is to be the closest to a human brain
because it is going to observe a series of images then it is
going to detect where it has to do in the images and then
using a classic neural network. Well we'll know which action
it should play. And so now let's move on to the game. We
want to beat the last games on this list. So let's click on it
and there we go. Welcome to the doom and violence. So as
you can notice we have several versions of doom. And these
are just different maps with different levels. So for example
let's have a look at the first one to basic zero which is this
game that you see here. And as you can see for this game
and for any of the games it is specified to go so for this
game the goal is to kill this monster here in three seconds
with one shot. Then it is specified the rewards. So you get
101 points for killing this monster here. Then you get a bad
word with minus five points for missing a shot. So actually
right now it is getting some that want because it's missing a
lot of shots and you get minus one every point oh 20
seconds 20 seconds that's all the rewards. And then you
have how the game ends. So it ends when the monster is
dead. When the player is dead or when there is time out.
And you also have the actions so you have three actions for
this one attack. So when you shoot, move right and move
left.
OK but that's not the game we're going to play. Actually this
one is too simple. That's my favorite. I prefer the one where
we can move forward, move back, turn left, turn right and
shoot many monsters. This one is much more exciting than
actually this one corner. Let's click on it. That's the one. So
as you can see now we can actually move forward. We can
move left, we can move right, turn left, turn right and we
can shoot again. And we have many more monsters like
these guys here. And our goal of course is to not only to
shoot them but that's because the main goal is to reach this
tip of the map here which they call the vest. The main goal
and that gets us 1000 points is to reach the vest over there
and to do this well we'll have to kill the monsters because if
we don't kill the monsters we will get killed and we will not
be able to reach the vest. All right. And so that's our main
goal . Let's see what our rewards are. We get the first word
when we are getting closer to the vest. So that plus distance
the words is exactly the distance we complete while getting
closer to the vest than mine is distance reading further from
the vest and minus 100 points for getting killed. So here the
reward is just measured with the distance. When you get
closer to the vest you get a reward that is equal to the
number of meters you've got closer to the vest. So that's a
continuous positive reward. And when you get further from
the vest you get a reward that is equal to the number of
meters completed by getting further from this list. And then
if you get cancer you get a bad word of minus 100 points
then the game can end when the play touches the rest.
That's when we win. When the player is dead, that's when
we lose. And when there is time that game is over and here
are our allowed actions, shoot right and left forward, turn
right and turn left. We have more actions than before. We
now have six sections. That's OK. This will not increase the
complexity so badly. All right so I hope you're excited with
this game. I actually think this is the most exciting game.
DOOM - STEP 2
We will now get us ready to start the implementation of our
AI. And as usual the first thing that we need to do is set the
right father as working directory. So let's do this now so that
we can move on to what's more interesting. So as usual I
start on my desktop then I go to my artificial intelligence
folder then module to now do. And there we go to the that
we have to set as a working directory. So let's do this now
we click on these two but here then restart the kernel and
then yes. And here we go. We now have the right folder as a
working directory. So now you can see we have four foules,
well actually three thousand one folder in this working
directory folder. So let's start with the first one. The first one
is either way. That's of course the file that will contain our
artificial intelligence and that's nothing else than this file
here. That is the AIW I fell into which will implement
everything that is related to building an AI and especially
building an AI with the deep convolutional. Q learning more.
So basically that's where we'll have the big adventure. Then
we have some other Feltz we have on the second floor
which is an experience where we play that way. And so this
time I put experience to play separately just because we
already implemented it and now we want to focus on what's
new. And trust me we have a lot of new things to do with
this new artificial intelligence because not only we want to
build an AI AI but we want to build any AI to beat doom. So
you can imagine that this will require some quite advanced
code so no worries. We have a big code waiting for us and
you will learn a lot of new tricks. That's why this experience
replay trick that you already know and that I remember
improves a lot the training. Well let's put it separately in this
experience we played that I found so that we can now focus
on all the new concepts, techniques and tricks that are
waiting for us. All right and then we have the image pre-
processing that I found. So that's another python file which
will take care of pre-processing or images because you know
this time or I will have eyes and that's because the input
states are no longer encoded by a vector. But this time the
input states are the images. So the first layer of the big
neural network that will make will be the eyes and that will
be the convolutional layers of the convolutional new
network to make sure these images can be accepted as
inputs of the convolutional neural network. Well we need to
preprocess them and so this fout will take care of pre-
processing these images so that they can go into the neural
network. And so I separated this valve because this is not
directly related to AI. And again we want to keep the
maximum of our brain in our memory and our focus on
everything that is related to AI. So we are putting this
separately so that we can preprocess even just in a
flashlight and save some energy for the rest. You can have a
look at it if you want and you also have the deep cores. Well
you can have a look at the practical projects. We talk about
image processing. Well here again we really want to focus
on the eye. Trust me we have a lot to do and eventually the
last folder. Well that's the projects folder. So right now this
folder is empty as you can see. But when we execute the
code some projects of the apelin doom will be added to this
folder. So that will be very exciting because we will see on
some projects how well the AI is doing. So we will literally
see the AI killing the monsters and trying to run towards the
goal. You're going to see this will be pretty exciting. So of
course the first projects will be very bad because the eye
will not be trained much yet and so it will get killed very fast
but then you will see that as the training is progressing well
the eye will get better and better and eventually it will
manage to kill some monsters not getting killed. And
hopefully we will be able to make it to go. All right so let's
go back to our AFL which is just one. And as you can see I
already took care of importing all the central libraries and
packages that we need to play them. So let's quickly have a
look at them one by one we have of course none by
because we'll be working with arrays. That's inevitable.
Then we talked of course because we're implementing the
AI with a torch. Then we have the torch and a module which
contains all the tools to implement a neural network. So for
example the module will contain the convolutional layers
that will be part of our future neural network. Then we have
that and then that functional package which has a shortcut
and that contains all the functions that are used in a neural
network. So typically the activation functions will be using
some rectifying activation functions but also some pooling
function for the company neural networks and all these
functions are contained in functional. Then we have up to
him which is of course for optimizer I think we'll be using an
addon optimizer and this optimizer is contained in up to him
and then the best of the best by torche the variable class
from the autograph module and that's all the power of it all
because that's what contains the dynamic graphs allowing
to perform very fast computations of the gradients even the
gradient of composition functions. So we will definitely be
using it. As for the soldier in the car. But trust me for Dume
we will be needing it very badly. I guess that's all for the
essential libraries then we need to import some packages
related to open agent and doom. So of course we import Jim
then we import some wrappers module of the gene library
and one of these wrappers is the Kyp wrapper. So that's
basically to import all the tools and environments of Jim.
And finally we have this package that we need to import
and which is directly related to do. And that's the actual
space and two discrete of the doom wrapper that basically
contains the environment of doom and more specifically the
actions that can be played. The number of actions for this
specific game we're going to play. And I remember that
there are six sections: move left, move right, turn left, turn
right, move forward and shoot. All right so that's basically
what you need to import for doom. And then finally we of
course need to import our two internal files. We played a
pure Whyfor experience. We play an image pre-processing
to preprocess the images that are nothing else than the
images of the screen. When playing the game these images
will be pre-processed, converted into non-payers, reshaped
to a certain format and then they will go to the new one that
was the convolutional one that was all right.
So I guess now that we were ready to start the big
implementation of this AI and now it's very important for me
to tell you that that leads me to the very important point of
this Munjal is that since you know I told you we have a big
implementation waiting for us. Well in order to not get lost
in all this we need a good structure. And so I already
highlighted the structure. We will be implementing this in
two parts. The first part will be about building AI. So that's
where we will make the brain of the ai ai in the brain. As you
understood it's nothing else than the neural network. You
know this big CNN composed of some convolutional layers
and then some fully connected layers to predict the outputs.
There are still the key values and then we'll make the body
of the AI. And that's the new representation I'm bringing to
you. And that's again not to get lost. You will see that the
further we progress with the code the more we will see the
structure and everything will make sense in the end. And to
make sure that makes sense we need a representation of
the AI. And basically this first part of building the AI will be
composed of three sections. First section will be about
making the brain that is the neural network. The second
section will be about making the body. And I'm calling it the
body because this is the part that will tell the AI how to play
the action. So you know first you have the brain that detects
the images and predicts the q values. But then you need to
specify how the AI should play the action and that it does it
with its value like a human body would do. So the body will
be the part where we will specify the method of playing the
action. So for example with our self-driving car the brain
was a neural network we made and the body was how the
action was played. That is with the Saft next method. And
here that's the same we're going to make a brain and we're
going to make a body which will play the action. I will let
you find out. But the key point in all this is that we will have
a very structured code so that not only you can take a step
back and really understand what's going on but also you will
be able to use it as a framework whenever you want to build
the AI for other purposes. All right. And after building the. In
part one we will move on to part 2 which will be about
implementing the convolutional here and there again will
have different sections. And one of them will be of course to
train the AI. So I can't wait to dive into it now.
DOOM - STEP 3
And as you can see I already added dimension in the
structure of this indentation with these three sections
composing Part 1 which clearly show how we are going to
build this AI. First we're going to make the brain. There is
nothing else than the neural network then we're going to
make the body which will define how the actions are going
to be played. And then once we have the brain and the body
we will assemble them to make the AI which will be in the
last section of this first part. So now you can already have a
good vision of the structure of this implementation. First we
make the brain then we make the body and then we
assemble the two to make the AI. And then this is all we're
going to start with the first section that is about making the
brain. And this is going to take us for projects. You can
imagine that making a brain is not like making a cake. So
this will require more than one project. And of course as
usual we are going to represent that brain with a class
because we will need several functions and in order to have
a structure of several functions that will be organized in
some kind of instructions. Well of course we need a class
and that's Berkeley because once we made that class. Well
we will be able to create as many brains as we want by just
creating some objects of this class. So again classes in
Python and in general object oriented programming
languages are very practical because you make a model of
something you want to build and then you're able to create
as many objects as you want and they will have all the
features that you define in the class. And for our brain the
features will be of course. Well first of all the architecture of
the neural network which I remember will be CNN and of
course two different functions like for example passing on
the signals from the input neurons to the output neurons
that will be of course the Ford function that will make. So
let's do this, let's start making the brain. This is going to be
pretty exciting. It is one of my favorite parts. And therefore
let's get straight into it. So we're going to start by
introducing the class of course and we're going to call this
class. Well I hesitated to call it brain but let's be more direct
and let's call it CNN because actually the brain is a CNN
network. Convolutional neural network. So as you watch you
can call the brain if you want but at least now we know what
we're building. And CNN As for the network, the self-driving
car is going to inherit from the end of Mudgal. So remember
the end of module is what we put it here and we want to be
able to use all the tools of this and a module and therefore
we want to use this technique in object oriented
programming which is inheritance and which allows us to
you know well use all the tools from a parent class and this
very class is going to be on that module. There we go. And
now we can use all the tools and objects at the end of that
module. All right so now that we have our inheritance we
can go into the class to make our first function. And as you
probably guess the first function is the end function that will
define all the variables of the future brain objects. You know
the future and objects that will be created. All right, so let's
do this. Def then two underscores in it to the scores again.
And now we need to add some variables. So first of all I was
going to be self that's of course to refer to the object. Now I
guess you're pretty comfortable with this then we're going
to add another variable which will be the number of actions
in the Dumah environment. So we're going to call that
number action. Number of actions. And actually this variable
is not compulsory for the function. It's just that if you want
to test the idea we're going to build it in other
environments. Well this will be very practical because we
will import this number of horrible actions from the doom
and Varman wrappers with two discrete and when doing
that we will know the input of the name of the environment.
And so if you want to experiment with this on other
environments and play other games well you won't have
anything to do because this number of actions will directly
get the number of actions in the Dumor environment. You'll
be playing with it. OK so that's it for the two arguments of
this function. So now we can go inside and now remember
what we have to do. The first thing we have to do is activate
the inheritance with the super function. So that's exactly
like for the self-driving car. We take the super function then
inside we start by inputting the class that will define the
neural network and that is CNN. Then we have to input self
to refer to the object. But then remember that's not all we
need to add here at DOT and then the init function with
some parenthesis. And by doing that we activate the
inheritance. And now we can use all the tools from the end
and the module. All right so now I think it's time to build the
architecture of the neural network. And so as you remember
we are going to build a CNN convolutional neural network
simply because this time the AI will have eyes and the eyes
of the eye will be the convolutional layers of this
convolutional neural network. And then after the ai
visualizes the images with the convolutional layers it will
pass on the signals into a classic visual neural network. So
that is the one that we had before with fully connected
layers. And that's where it will try to predict the cube values
for each possible action that we can play. So you have the
architecture in mind first we're going to have some
convolutional layers and then some fully connected layers
and this will be the brain of our AI. So what we're going to
do to be able to have a step back at what we're making.
Well let's just make this architecture with the variables we
want to create. So actually speaking of this architecture we
are going to build a CNN with three convolutional layers.
And then after that one hidden layer that means that we will
need three convolutional connections and two full
connections. And speaking of connections, that's exactly
what we're about to define. That will be the variables for
CNN class and therefore Right now I'm going to define five
variables. Three for the convolutional connections and two
political connections. So let's do this. We're going to start
with the convolution connections. So I'm going to call them
self-taught convolution. One is going to copy that and based
on the law. And then we go self-conviction to and self-
conviction three that are convolution connections to this
first conclusion one here we'll apply some convolution to the
input images to get a first convolutional layer then the
second contribution to here will take the first convolutional
layer as input and by applying again some convolution it will
create a second convolutional there and then this
convolutional there will get some new images each of them
detecting one specific feature.
So we'll get these new images in a convolutional layer then
we will apply this convolution to here to connect these new
images from this first convolutional layer to some new
images of a second convolution layer. And these new
images again will detect some features in the images of the
first convolutional there. So it's just to reinforce the future
detection and then to the emergence of the second
convolutional there. We applied the third convolutional here
to get it for each of them. Some more images that detect
even more features inside the input images. And so the
more we do this the more we apply some convolutions to
the different layers of images. Well the more we are able to
detect some features and that's how by detecting features
the eye will understand where the monsters are, where it
has to shoot to kill them and where it should go to. It will
also detect the walls and the obstacles. Well literally where
it has to go and that is thanks to what all these
convolutional layers detect in the original images. All right
so that's for the convolution part of the CNN. But then
remember after the convolutional layers we have to flatten
all the pixels obtained by the different series of convolutions
that were applied and by flattening all the arrays of pixels
we get this huge vector that will become the input of a
classic artificial neural network. And that's where we get our
fully connected letters and therefore our connections. And
so now what we have to do is create two new variables
because we're going to have one here and there in this
classic artificial neural network that comes next and
therefore we need one full connection from this huge Flatten
vector to this one here and there and a second full
connection between this one here and there and the output
layer composed of the output neurons that are the key
values. So let's make these two first connections and then
we will define all these connections. So ask for the
soundtrack in code we're going to call them self that one
and then self taught C too. All right so now we have all our
valuables and to know what we have to do is of course
defined then with the classes of the engine module.
DOOM - STEP 4
So let's start with the first one. Convolutional one applies
convolution to the input images so that's the original images
and now you're going to see how everything will become so
simple to create this convolution. Well what we have to do is
actually create a subject of some specific class and this
class is taken from and then and then the classes come to
the top because we're working with 2D images and now as
you can see we need to put several arguments. First one is
in channels. Let's put it in channels. The second one is our
channel. The third one is Kerno size and the rest of them are
stride padding the dilation groups and bias. And we have
different values for all these ones. So we're not going to buy
them, we're going to keep the default values. But what's
important is these three arguments in channels and
channels and kernel size and so do I guess what they
correspond to. Well very simply in general correspond to the
input of the convolution and all channels correspond to the
output of the conclusion. So what is it going to be? Well very
simply that's going to be the number of channels in our
images. And actually we are going to work with black and
white images because basically we don't have colors to
recognize the monsters. The AI is totally capable of
recognizing the monsters in black and white. So we don't
see the colors at all, just recognize them by their shape.
Therefore we're going to use one channel so one channel is
when you have black and white images and three channels
is when you have called images. And therefore since we're
working with black and white images in channels it's going
to be equal to one then our channels so our channels it's
going to be equal to the images you want to have in the
convolutional there which is the output of this constitution
one. And so basically this is equal to the number of features
you want to the text in your original images because who
will create one image per feature we want to detect because
basically you know how it works. We applied one feature
detector to the input image to detect a specific feature in
the input image and therefore the number of output images
here is the number of features we want to detect. So now
the question is how many features do we want to detect.
Well a common practice is to start with 32 feature detectors
and so that will lead us to 32 percent images in this first
convolutional layer so the input is one black and white
image a real image and the output in the first convolutional
there is 32 processed images and by processed I mean of
course that the conclusion was applied to the input image to
get 32 new images with detected features. And then we
need to specify a kernel size which is nothing else than the
dimensions of the square that will go through the original
image. And in common practice we use either to buy two or
three wide three or five by five. And for the first one we're
going to use a five by five feature detector that is a feature
detector that will have five by five 10 engines. And then we
will reduce the size of this kernel for the next convolutional
layers. And speaking of it this is exactly what we're going to
do now. We are going to copy this to define the second
convolution and therefore I'm basing that here and now it's
very funny and very easy. It's like a domino. The input
channel of the second convolutional layer is the output
channel of the first convolutional layer there. So this
number of output 32 Here is the same number of inputs 32
here. And that's because we have 32 images in the input
convolutional layer of the second convolution. And so the
second convolution is applied to this second convolutional
layer to return a third convolutional layer. And so now the
question is how many new images do we want. Well same
as creating 32 new images 32 is actually a very common
number in convolutional neural networks if you look at the
architectures you will find 32 in many of them. And then for
the kernel size Well we need to reduce the kernel size that is
the dimensions of our feature detector. And so now we're
going to go from five to four or even three and then we'll go
even smaller. All right so our second convolution is ready. It
takes as inputs 32 process images. Each one is a first
feature of the original input image and it creates 32 new
images. Thanks to this reduced dimensions of the feature
detector. And so now let's push this even more. So I'm
copying this and pasting that here to create a third
convolution to detect some features. And so now that's the
same as the input channels. Here is the number of input
images at the left of the deconvolution connection and that
is the number of process images that went to the right of
the previous convolution connections. That's 32. Therefore
we keep sorry to hear. That's perfect. And now the question
is again how many new images do we want to detect. We
are going to take 64 and therefore 64 outputs process
images. And of course now we take a smaller kernel size
and we're going to take two. And so that's a very classic
convolutional architecture there and it's very efficient to
have a high level of feature detection inside images. All
right and so now that we have our three convolutional
there's are to our three convolution connections here. Well
now it's time to get our toothful connections that I remind
we'll take this huge vector that we obtain after flattening all
the 64 times 32 times 32 again images that we got from all
these convolutions so we flattened all the pixels of these
images and we can one huge vector that will become the
input of a new fully connected neural network. And so that's
when we have to make these connections between first this
huge vector and a hidden layer and then a second full
connection between the hidden layer and the output they're
composed of the output neurons. Each one corresponds to a
cube value of the possible actions. So let's make these two
connections. You know how to do that. That's exactly what
we did for the self-driving car. So let's do that again. Well
first we take our Maggio then we take Lynnie our class
because again the connection we create is an object of the
ruling class. And then in parenthesis. Well that's the same
for putting the input features that is the number of them
then the output features. And so the input features for the
first full connection are what it is going to be. Well that's
going to be equal to the number of pixels there are in this
huge vector change after flattening all the process images
after the three convolutions. And so what is this number?
Well actually there is a trick here. This number is actually
hard to get. We actually need to make a function to
compute that number. We don't have a variable that will get
us this number. We have to compute it and therefore what
we're going to do now. And now it's very important to
understand the mindset of programming that we must have
and try to bring to you the mindset that is what you must be
thinking right now to overcome this obstacle because the
first time you might say hey I don't have this number of
neurons in the Flaten vector. What should I do? I'm stuck
here.
Well no actually because what you can do now is simply
input any name here that will represent the number of
neurons so uncommon that number neurons number of
neurons and then we will simply make a function that will
return and this number of neurons variable. This number of
pixels we're looking for. So we can totally do that. We can
totally put this very vocal. Of course we will get a warning
because it doesn't exist yet but we will create it afterwards
with a function. And we are totally allowed to do that even if
the function comes afterwards. That's a typical
programming thinking you must have when you get that
kind of obstacle. Well you can make a function to get what
you're missing. All right and then our features and our
future. That's the number of neurons in a hidden layer and
that this time is up to you. That depends on the architecture
of the new network you want to create. And so a good
number would not be a small number. So for example 40
neurons might be fine. We can try to increase that. If the
training is not too slow you can try to increase that. Maybe
that will improve the predictions but let's start with 40.
Maybe we'll increase that afterwards. All right that's it for
the first full connection then we'll copy this paste here for
the second full connection that is the connection between
the hidden layer and the output layer. And so the features
here become the out features of the previous layer and that
is 40. So here we can put 40. That's of course the number of
neurons in a layer. And our future is going to be equal to the
number of output neurons there should be no neural
network. And since each output neuron corresponds to one
new value and one Cuvee and response to one action while
the number of output neurons here is of course the number
of actions and we have one variable for this which is
number actions and therefore here input number actions
and there we go congratulations. We did find the
architecture of our neural network. Our neural network is
composed of three convolutional layers and one hidden
layer. All this in one big CNN and this CNN will detect
features in the game so that the AI will know where it has to
do where it has to go and where it needs to shoot. So then
we go for this step. That's the first very important step
done. Now we're going to move onto the next step which is
of course to get this number of neurons that is still missing.
That's actually why we have the warning here and the
phone number neurons but no worries. Now we will make a
function that will return the number of neurons in this huge
vector and we will put that number in a variable that will be
called a number of neurons.
DOOM - STEP 5
So now the next step is to make that count neurons function
which will give us what we want. That is this number of
neurons in this huge vector after the convolutions are
applied. That's the only missing information we need right
now and we are going to get it with the function. So let's
make this function we are going to call it counts. And those
core neurons very simply. And what is this count neurons
function going to take as arguments. Well it is going to take
the object self but then it's going to take something else
because this number of output neurons in the flattening
layer actually only depends on one thing. It depends on the
denominations of the original input image, the one that
goes at the very beginning of the neural network. And so
the only argument we need right now is actually the time
mentions the timing of the input images. Therefore let's give
a name to this argument representing the diamond and of
the input image. And we're going to call it image. All right.
And I can tell you right now that the actual dimensions of
the input images coming from Dume are going to be 80 by
80. We're going to reduce the size of the original images to
80 by 80 and that's going to be the format of the images
going into the neural network. So Image them is actually
going to be one 80 80 and the one corresponds to the fact
that we're working with black and white images. That is with
only one channel so image them is going to be correlated to
the total one eighty and 80. All right. So that's the only
argument we need. And now let's scan the neurons. So how
are you going to do that? Well first of all we actually don't
have any input image right now. We don't have any doom
image that we can import. We're going to do that later. So
the first thing we have to do is create a fake image but that
has done introns 80 over 80. We're going to create that fake
image with fake pixels and that will still give us eventually
the number that we want because that number only
depends on the dimensions and not on the pixels that are
inside the images. So this just creates a fake image to start.
And then we will compute the number of neurons that we
want. So the trick to create a fake image is well we are
going to call it x. First of all and then we were going to use
the torch that is Rand because you know we're going to put
some random pixels in these images who we're using these
random functions from torche which is the rand function.
Then inside we're going to input as you can see the
damnations of the images. That is one 80 80. But since
we're going to put this image into the neural network and as
you remember the neural network can only accept batches
of input states , here are batches of input images. We are
going to create fake diamonds in which we can directly do
this run function. We actually just start with the one that will
correspond to the batch and then we can just put all 188
corresponding to the denominations of the input image. And
as you understood these nominations are contained in this
image the argument which represents that table 1 80 80. So
we just need to add images to them. But in order to pass
the elements of the table because you know right now
Image them is a double as a list of arguments of a function.
We need to add them here before we imagine them. There
is before that Apple store the store will allow to pass the
elements of the image to Apple as a list of arguments for
function. And as you can see that's exactly what is specified
here with the store and the diamond. All right so that will
create an image of fake pixels that will have nothing to do
with the images. But again we'll still be able to get the final
number of neurons and now the last thing that we need to
do is to convert this input vector into a torch variable
because this is going to go into the neural network. All right.
So this now represents an input image of random pixels that
was just converted into a viable image and that will now go
into the neural network. And more specifically the
convolutional layers of the neural network because since we
only need the number of neurons after the convolutions are
applied we will just go up to the convolutions 3. So right up
to the third convolutional layer. And we will not go into the
two full connections here. And that's because the number of
neurons that we want is between convolution 3 and f.c 1. All
right so now that we have one input image with the right
mentions Well it's time to propagate this image into the
neural network to reach the flattening layer then we're
going to get the neurons in the flattened layer and we'll just
get the information that we want that is the number of
neurons in this flattening layer. So now we have to do
exactly what we do in a forward function. We need to
propagate the signals into the neural network but only in
the convolutional layers until we reach the flooding layer. So
let's do this. We're going to update x now x is the input
image and with the second X here X will become well the
first convolutional layer. And now what we have to do is the
three steps process. The first step is to apply deconvolution
to the input images. Then the second step we apply mix
pruning to the convoluted images and inserts that we
activate the neurons in these pooled convoluted images and
the x will become this first convolutional they're composed
of all these pools convoluted Energis. So let's do this first
step and apply the first convolution convolution one to the
input images. So what we do is take our convolution oneself
that convo Lucian one that we apply to our input images
which so far are represented by x. So that's the first thing
that is done now. Second step we are going to apply
spooling to our convoluted images returned by convolution
when X and two climax pulling Well we're going to take a
function from the functional module. So we take the
shortcut and then we're going to use the function Max pool
to D. That's the one we put self convolution one X in the
parenthesis of the max pull to the because we play the next
pull into the convoluted images. But this next function takes
additional arguments which are first the kernel size.
So again that's the size of the window sliding through your
images and that will take the maximum of the pixels in each
slide. So that will still detect the features because the
features are associated with a high value of the pixel in the
arrays. As you say, intuition lectures. So this first
documented human need to input is this kernel size. And
we're going to take three. That's a common choice for the
kernel size. And then we need to put the strides you know
by how many pixels it's going to slide in the images. And we
are going to take a stride of two. Again that's a common
choice. So there we go. Now the second step is done. And
now let's move on to the third step which is to activate all
the neurons in this pool and convoluted images. And this
first convolution layer and to do this again we are going to
apply a function to all this. And so here and taking again
because we're going to take another function which as you
might have guessed is going to be an activation function but
which one. As usual it's going to be a rectified activation
function and maybe you remember the name for that really.
There we go. That's the one. And so we apply really to our
pooled convoluted images that is all this. All right. And
that's it. Three steps done. That was very quick. So
remember the way we have to look at this is first we apply
the convolution to our input images then we apply Max
pulling to our convoluted images obtained with a
convolution. And then we activate the neurons in all this
pool convolutional layer with the rectifier activation function
so perfect we get our first convolutional layer on which was
a climax pulling and in which the neurons are now activated.
And so basically what it does is that it propagates the
signals from the first convolutional layer to the next one.
And speaking of the next one that's exactly what we're
going to take care of right now. We're going to do the same
thing as we just did on the first convolutional there to the
second convolutional layer to again propagate the signals
further into the neural network by activating the neurons of
the second convolutional layer. But before doing this we
need to get this convolutional layer. And so we are going to
apply convolution to X that is now the first convolutional
layer. Well we are going to apply convolution to 2 x to obtain
the second convolutional layer after which will be Max
pulling it. And then finally activating Sirat. So let's do this.
It's actually very easy to just copy that and paste that
below. Now of course we need to replace convolution one by
convolution too. And there we go. That's actually ready to
see very easily. And now with this line we propagate the
signals from the second convolutional there to the next one
which is going to be the third convolutional there. And to get
this third convolutional will will need to apply that again. So
I'm copying this pasting below and replacing convolution too
by convolution 3 and that's done, isn't it so practical. We
propagate the signals in the three convolutional letters in a
flashlight. Thanks to this awesome structure. All right, so
perfect. Now we have our signals propagated up to the third
convolutional layer and after. And speaking of after that
leads us to what we're looking for. What we're interested in
is the flattening there. All right so now that we have our
third convolutional there that's the last X here. It's time to
get our flattening out. And so that's exactly what we're
going to do. Now we're going to flatten all the pixels of this
third convolutional layer, that is, we're going to take all the
pixels of all the channels of the third convolutional layer.
We're going to put them one after the other in a huge
vector. And of course this huge vector is going to be nothing
else than the flattening layer and at the same time we will
use a trick to get the number of neurons in this flattening
there. That is what we're looking for. That's the number of
neurons we're missing and therefore let's directly return
what we want and in this return we're going to flatten the
third convolutional layer and get at the same and the
number of neurons in this flattening layer. So we're going to
take X. which is our third convolutional trip there. We're
going to take all the channels of the third convolutional
there and we're going to use a function which is the size
function to flatten all the pixels of all these channels in one
same huge vector. And so the trick is you can find it in the
pite watch the toil. Well first we take the data of X because
X is a special structure you know it's a torch Voivode So he
has a pretty complex structure. But first we need to access
it with data here then we need to view what's inside of it. So
we use this view function and now we need to access what
we're looking for and that is even with the arguments 1 and
minus 1. You have to understand what's inside the structure.
But you can just understand that this is how we're going to
get this number of neurons and then to finish we need to
add size in parentheses and inside inputs 1. So basically
what we do here is we take all the pixels of all the channels
and we put them one after the other in this huge vector
which will be the input of the fully connected network.
That's basically what the size does. And with this we can get
this number of new ones that we're looking for. All right so
now we get what we want. And so finally we can replace
number neurons here by what is returned by this function
when it is applied to the format of the images. That is one
by 80 by 80. So what we have to do now is replace number
neurons by taking the count neurons function which we
apply to the format of the images which will be the total one
80 and 80. And there we go. And of course we don't forget
self because Count neuron is actually a method of the CNN
test. So we need to add the. And now the warning should
disappear. And there we go. Now everything is good. We get
the architecture of the neural network with nothing missing
and we have this countersuits function in case you know
you want to try some other architectures and you don't
want to count the number of neurons manually. You just use
this function to play it to the format of your images. And this
will get you directly what you want. That is the number of
neurons in the flooding layer without having to do anything.
And wherever the architecture is that's critical. And now
we're done with the first big important step of this brain that
we're making and we have one last step. That is one last
function to make which is going to be the main forward
function. So we are going to propagate the signals from the
beginning of the brain that is from the eyes of the eye to the
output layer that is after the second connection.
DOOM - STEP 6
This is the forward function that will propagate the signals in
all the layers of the neural network including the three
convolutional layers and the fully connected layer. And so
this function is the forward function exactly like for the self-
driving car except this time we have to propagate the
signals in the convolutional layers before the fully
connected layer. And the good news is we already did that
in the previous step with the count Newnes function. So we
already have the code to propagate the signal in the
country. And so this will be very quick. We'll just combine
what we did here and what we did for the submarine car
and we'll get our forward function for our brain. So let's do
this. We introduce a new function here, the last one for the
brain and this function is the forward function which takes
an argument. Well exactly like before self to refer to the
objects and X which will be first the input images and then
you know X will be updated as the signal is propagated into
the new network. All right. So Cullen and then let's go inside
the function. So as I just said we already made the code to
propagate the signals in the three convolutional layers.
That's exactly these three lines of code. So I'm copying
them and pasting them here and there we go. We already
have our propagation of the signal in the three convolutional
layers. And so now we just need to propagate the signal
from the convolutional layers to the hidden layer and then
eventually to the output layer that is at the very end of the
neural network and to do this we first need to flatten the
third convolutional layer that we obtained here remember X
at first is the input image. Then here X becomes the first
convolutional layer then X becomes the second
convolutional layer there. And then here X becomes the
third convolutional there. So right now at this stage X is the
third convolutional layer. And now to take the flattening
there we need to flatten this third convolutional layer X and
to do this we are going to do something quite similar as we
did here only this time we don't need the number of neurons
we simply need to flatten the channels in the third
convolutional there. So this will be quite more simple but
very similar. And to do this well we're going to take X again
because X is going to become the flattening layer. We're just
updating X so X is equal then we take X again but this X is
the old X that is the third convolutional layer. So we take the
third convolutional there then but then we take the view
function to which we apply to arguments. The first one is X
that size zero. So again we take the size function to take all
the pixels of all the channels and the third convolutional
layer and we put them one after the other in this huge
vector that is going to become this X here and this X then
will become the input of the fully connected network. But
that's not all we need to do. And here comes up. And minus
one. So that trick you can find in the pocket watch projects.
That's how you can flatten a convolutional group composed
of several channels by using the size function. And of course
if you want more details on how this works you can go to
the pocket watch projects. I will provide the link. So now
that we got our flattening there, well you know this
flattening is going to become the input of a classic fully
connected network with a simple linear transmission of the
signal. And so now we're not going to use a convolution
function to pass on the signal we're going to use a linear
transmission with a linear class and then to break the
linearity because when we're working with images and
images have non-linear relationships.
Well we're
going to use a rectifiable function to be able to learn these
non-linear relationships. So let's do this. This is actually the
next step. And so now that's exactly like what we did for the
self-driving car. We take X because we want to update it
again. We want to get the hidden layer now. And so first
what we do is we take our full connection SE1 because the
full connection. FC one is the one that connects the
flattening layer to the hidden there and therefore we need
to take SE1 and apply it to the X that we have right now
which is flooding there. And we don't forget the self of
course because everyone is a variable of our init function.
So self-taught SE1 X and so that passes on linearly the
signal from the flatteringly to the hidden there. But now we
need to activate these neurons while at the same time
breaking the law of errancy. And that's exactly what we do
with the rectifier activation function. So now what we have
to do is take our functional module and from this functional
module we take of course or rectifiable function that is the
real you. And we put self-taught SE1 inside the parenthesis.
All right, so what happens in this line of code is that first we
propagate the signals from the flattening layer to the hidden
layer of the fully connected network. And then we activate
the neurons of this hidden layer by breaking in the air with
this rectifier activation function and we get our head in
there that is perfect here. And now we have one last step to
do it is of course to propagate the signal from the hidden
layer to the output layer with the final output neurons and
to do this. Well that's very simple. That's exactly like what
we did with the car. We take our second full collection of C2
and we apply it to of course the neurons of the hidden layer
that is right now x. So here are the neurons here and there.
This all x and x here becomes of course the output neurons
of the output layer containing the cube values. And finally
we simply return the output neurons that are x with the
cube values. So perfect. Congratulations we just made a
brain we just made the brain of our AI AI with the eyes and
the rest of the cells. So congratulations. Now it's time to
make the body that is defining how we're going to play the
action. After all the signals are processed in the brain. So
that's our second big step.
DOOM - STEP 7
All right so we just made the brain. And now let's make the
body. So as you understood the body is the part where we
define how the actions are going to be played like what
happens for the real human body. You know you have the
brain that sends signals to the body and then your body
plays the action. Well that's the same we have our signals
coming from the brain. We get the output signal with the
Ford function here. You know first what happens is that we
get the images. The images go into the eyes of the neural
network composed of the three convolutional layers and
then with the fully connected layers we get the output
signal from the brain which contains the values. But then
this output signal should be forwarded to the body and the
body will play the action. And so that's exactly the part that
we're going to take right now. We're going to implement the
way the body will play the action and the way it will do it is
with a certain X method exactly like for the Southron car. I
insist that the max method is highly recommended for play
in action with the body of the AI and therefore that's the one
we're going to go for. But as opposed to the sort of income
we're going to make in class and this class will of course
correspond to the body of the eye and therefore let's start
by introducing a class here that we're going to call soft next
but like this I don't want to call it soft Max only because
sighed Max is a class of Pi torch from the end module. So it's
dangerous to call it this way. Therefore I'm calling it sucked
Maxwellian. Now it's very clear that our CNN convolutional
neural network is the brain and Sephton activity is the body
of the ear. So Soft next body and let's inherit from the in
that module. I don't think we're going to use it. But anyway
we can still inherit from it. You know in case you want to
improve this stuff maximally class and want to use some
tools from the module Well you will be able to do it with the
engine module. But at this point I don't think we will be
using any of them in a module. So then in and let's go inside
the body.
All right so first as usual we are going to start with our
innate function to define the variables and the future. But.
That is the bodies of the AI AI and actually ask for a human
body a parameter that can define it is the temperature. And
actually that's going to be the only temperature. So it's a
simple body but still using this temperature parameter will
do a lot for us. OK but before the temperature let's not
forget the self but the objects for the bodies and now we
can input the temperature see which is the same parameter
as the one we use for this car. OK. And then Cullen and let's
define our variables. So since we inherit from DNA not
Mudgal we're going to use the superfunction again. And so
let's be efficient, let's keep Yes and let's face it, that's right
here. And of course let's not forget to replace CNN hereby
signing the next buddy. There you go. Now I suppose that
might become a reflex for you to use the superfunction at
this stage. And then what we have to do is of course set a
temperature variable with self that T and that will be equal
to the argument that will be input when creating an object
at the next class. I remind you that whenever you create an
object of the next class you have to put the arguments that
are in the init function and therefore there is t and then the
variable of your object attached to objects self the team will
be called to this T which is the argument that you will input.
All right. And now that said for the function that's actually
all we need. So I guess we're ready to move onto the next
function of the sought maximal class. And this is going to be
the last function; there are only two functions in any
function. And the next one that will be implemented in the
next toilet which will be the forward function and Whiteford.
That's because right now we have to forward the output
signal from the brain. That is you know the Cuno uses
content in the output neurons of the output layer to the
body which will play the action. So we are forwarding the
output signal from the brain to the body that will play the
action move forward go left go right turn left turn right or
shoot.
DOOM - STEP 8
We're going to make the forward function which will
propagate the output signals of our brain to the body of the
AI so that it will play the right action to reach the vest. But
there is no reaction yet because there is no training that we
have not trained the AI yet but this is exactly what we will
do in part to implement deep convolutional curin which by
the way I will rename training the AI with deep convolutional
Killary. But right now we need to forward the signal from the
output layer of the brain to the body. And that's exactly
what we're going to do with this forward function which is
the last function of our body. So let's do this. We start with
Teff forward and according to you what arguments it is going
to take. Well it's going to take off first and then there is
another one. Well yes there is. And what is it going to be?
Well very naturally we want to forward the output signal of
the brain to the body and therefore the input will be the
output signal of the brain. And so now we need to give a
name to these outward signals. And so I'm going to add
here the argument output. All right so that corresponds to
the output signals of the brain after the input images are
propagated through all the brain to reach the output later
which is X here returns by the forward function of the brain.
And now this output signal of the brain will be forwarded to
the body with this new forward function that we will make in
the next class. So let's do this, let's add some color in here
and now as you understand it we're going to use the next
method to play the action. That means that the body of our
AI after receiving the output signals of the brain will play the
actions with the next technique. So basically now what we
have to do is exactly the same as what we did for the car.
We're going to get our distribution of probabilities. That's
the first step and then we're going to sample an action
according to this distribution of probabilities. So basically
what we could do now is get our self-driving car file and
copy paste what we implemented for the select actually
function in the self-driving car. But let's do it again. It will be
good practice and actually you can try to type it before me.
OK so first what we're going to do is get our probabilities. So
I remember this is a distribution of probabilities for each of
the q values which depend on the input image and each
action. So we have one key value for each of the six or
seven possible actions and therefore we get a distribution of
seven probabilities and then 7 because I think there are
seven actions instead of six. Because besides moving for
left right or shooting we can also run that makes seven
possible actions and therefore we get a distribution of seven
probabilities one for each q value associated to each action.
So Propst equals. And now remember what we had to do.
Well basically we have to use the sought max function from
the functional module. That's very simple: we take our
functional module first then do that and then we take our
sighed next function. Here it is. We press enter and now we
put the arguments of the next function which I remind are
the elements for which you want to create a distribution of
probabilities. And so that's of course the q values that are
the outputs of the neural network. That's the output of the
neural network for which you want to create a distribution of
probabilities. Now remind us we want to create this
distribution of probabilities to be able to explore the
different actions instead of directly picking the one that has
the maximum Q value. If we directly pick the one that has
the maximum Q value where we don't explore much the
other actions and we might miss something. But with this
next method we can do some more exploration and
therefore maybe find some hidden solutions in the patterns
that might be much better. So again I highly recommend
stuff Nax and then from now what we have to do is input the
values that is our output here. The outputs of our brain so
outputs that we go. But then we have this temperature
parameter that we can use that we can configure to
customize the exploration. Remember that the higher we
set the temperature the less exploration of the other actions
will do because the best action will be selected with a higher
probability as opposed to the other actions which will be
selected with lower probabilities. That's exactly like with a
car and therefore we have to multiply the output here by
our temperature parameter so that we get perfect. Now we
get a little warning because we haven't used preps yet but
we are about to use it now. And so that brings us to the next
thing we have to do.
How are we going to use these probabilities? Well we're
going to sample the final action to play from this distribution
of probabilities and therefore what we have to do now is use
the multinomial function to sample the action according to
this distribution of probabilities. So now we're ready to get
our actions. So I'm creating a new Voivode here because
that will become actions that will be played by the body of
our AI. And so now we take our distribution of probabilities
to which we add dots and then the multi normal method. All
right and now we get our final actions to play there
assembled from our props distribution. Okay, perfect. So
now we were ready to return what we wanted. That is the
action to play. And these are of course actions and now the
warning should disappear. We use everything we want.
There we go. Perfect. So now the forward function is ready.
And congratulations. The body is also ready. So now we
have our brain. We have our body and therefore we're ready
to assemble them to make the future AI our future AI I will
be composed of nothing else than a brain and a body. And
so it's what has intelligence and a body to play the actions
which will be the right actions to play thanks to its
intelligence. But remember before we have to train its
intelligence and that's what we'll do in part to train the eye
with convolutional cool learning. All right. So let's make the
AI in the next toils. It's again going to be a class of two
functions I think. And so this one requires two or three
projects.
DOOM - STEP 9
So now that we have a brain and a body we can make the AI
of course we're going to assemble them and you're going to
see how now things are all going to make sense. You're
going to understand why we had to make some classes or
more specifically the purpose of making these classes. And
mostly you're going to get into it a sense of how you can
play with objects to create new bigger objects and as you
understood we already created two classes: the brain and
the body. So then by creating some objects of these two
classes Well we can make a bigger object which will be the
AI. And that's exactly what we're going to do. Now we are
going to make AI by assembling the two classes that we
created : the brain and the body. So let's do this. It's going
to become super intuitive. Now I'm going to introduce a new
class which of course I'm going to call it because now we're
ready to make the final ai ai. And right here I'm going to
start with the IT function and what is this function going to
take it is right now that things will make a lot of sense. Well
first we get the inevitable self with the object. But then the
two arguments are going to be nothing else than a brain
and the body. That's the two elements that we need to build
an AI. We need a brain. The neural network and a body
which will play the action with some snacks. And so now
very simply what we have to do is defining the variables of
our object. Because right now the brain and body are just
arguments so let's do this. We have to make two variables,
obviously the first one is self brain which will be called to
the brain argument but which later will be a brain object
created from the in class and then the second variable itself
that body which will be called to the body argument right
here. But which in the future will be of course an object of
this next class. And so now it is as simple as that. Thanks to
this pretty awesome structure that we made here we build
an AI that trains yet not intelligent yet but still we have a
brain and a body.
So that's perfect. And now the only thing that we have to do
left is make a big forward function that will do the whole
propagation No since now our AI is assembled Well we're
not going to use the two forward functions separately. You
know the one of the brain and the one of the body we are
going to make a big forward function which will be our next
function which will take the images as input then we'll
probably get the signals in the brain. And for this we will of
course use the first forward function and then once we get
the output signals of the brain we will forward these output
signals into the body with the forward function here that will
use a certain next technique. And then eventually we will
return the actions to play. And there is just now that we can
make this big food function because we have assembled the
brain in the body. And so we're not going to call this next
function for it. We're going to use the call function which is
kind of like the init function with that we'll call the two
forward functions here from the brain in the body to
propagate the signal from the very beginning with the input
images to the very end with the actions of play. And that's
why the next function will be this call function that will
combine the two for functions of the brain and the body.
That's the only thing we have to do and then we will have
our AI ready.
DOOM - STEP 10
Now the only thing that we have to do left is to make this
the Ford function that will propagate the signal from the
very beginning when the brain is getting the image to the
very end when I place the action so we're going to make this
whole function and that's going to be our last step before
we move on to part to training our AI with deep
convolutional kulang. So let's do this. We are going to take
the function call which actually is similar to the init function,
that is it's an existing function but this time we use it to call
some other functions. The ones that we made before
because you know we're going to use the forward function
from the brain and the port function from the body. And so
we're using this function now to basically call these
functions. So the call is going to take two arguments. The
first one is self. Of course the object and the second
argument which according to you what is it going to be. Well
we are doing the whole propagation this time. So what we
want to take as input is of course the input images because
of course that's the starting point when the AI is playing the
game. It first visualizes the images of the game then
propagates the signals in the brain and then plays the
action. Therefore the second argument is going to be inputs.
And now we are ready to make this whole propagation. So
let's do this again. So the first step is to receive the input
images from the game. And since these images are going to
enter the neural network Well you can imagine that we have
to format them in a special structure and the structure is of
course a torch structure. So the first thing that will happen
is that we will convert these images into an umpire array
then we will convert it into a torch tensor and then finally
we will put the torch tensor inside a torch variable that will
contain both the tensor and a gradient. That's for our
dynamic graphs to compute very efficiently the gradients
later faster Kattie great in the sense. So that's our first step.
And then once we get the right format of our images well
they will be able to enter the neural network and then that's
where we'll do the whole propagation of the signals. So let's
do this first by converting the image into the right format.
So our images are for inputs. Now we're going to create a
new variable which I'm calling input. So that's the real input
of the neural network and this input. Where is it going to
be? Well first we need to take our inputs. That is our original
image. Then as we said we want to convert these images
into Nampa arrays. So to do this we can simply take none by
which has a shortcut and then the function array. So we put
it in the parentheses of the function array. There we go. Now
it is converted into something and by arrays. But then since
the cells of the numpad arrays will contain the pixels it is
actually safer to specify the type float. It's better to make
sure we have some floats right now to make sure that we
can use that float. Sorry to hear. All right so now we still
have an umpire. But with the tablet. All right and that's also
another reason it's that tensors are by definition arrays of a
single type. And so we choose the single type to be a float
float 32. All right. Now that we have our non-bio raise the
next step is to convert that into a tortoise sensor and to do
this we can use for example a torch. And then from
underscore non-pilot function that will convert that into a
torch sensor. There we go. And now the last step is to put
these torch sensors into a torch variable containing both the
tensor and the agreement. And you know how to do it of
course we take our variable class because actually
everything that is inside this variable is actually the input of
the variable class. But I wanted to show that to you this way
because you know we start with our input images then we
convert them into numbered arrays then to torch tensors
and then tomorrow. And now we're good.
They are allowed to enter the neural network that is first the
eyes of the eye and then the fully connected layers to lead
to the predictions. So speaking of the eyes , that's exactly
what we're going to do now. We're going to propagate these
allowed images now into the eyes of the eyes that are
through the three convolutional layers. And to do this you're
going to see now how it's so simple. That's because we
already have our brain in our body from the init function. We
simply need to take our brain self that brain and apply this
brain to the input images and that will propagate things to
the food function here from the brain. That will propagate
the signals inside the brain and since the forward function of
the brain returns the output signals that the neurons of the
output layer containing Q values Well this self the brain
input here will return this output signal and therefore we're
going to put here whether it be turned into a variable and
we're going to call it very simply outwits this output is the
output signal of the brain. And now that we have the output
signal of the brain Well we have to propagate this output
signal to the body and to do this we're going to use the
second forward function from the body and to do this. We
simply need to take our body and apply it to of course the
output because the Ford function of the body takes as input
the output signals of the brain. So that's exactly what output
is right now and returns the actions. And therefore since it
returns the actions. Well here we are going to add actions to
cause that very output. All right so now you can see that
very simply we propagated the signals inside the brain and
then from the brain to the very first by using the form
function from the brain which takes us and puts the input
images and then propagates them into the brain to retain
the key values. And then we propagate this output signal
into the body where the forward function of our body to get
the action to play. And so now the only remaining thing that
we have to do and that's the very last line of code of this
part one building the. We have to return the action to play
and that is actions. However right now the actions have to
be in torch format and we need to convert them back into
them by right and to do this we're going to take the data
structure of these actions and then add here the non-pilot
function and then we go. Now we have the actions we
turned in the right format. So congratulations. We are now
done with this first part 1. We build the AI in three steps.
First we made the brain, second we made the body. And
third we assemble the brain in the body and we propagate
the whole signal from the eyes to the moment we play the
action. So that's the first step. That was a huge step but
now as you understood we built an AI AI but it is still stupid.
We need to train it to be intelligent. So we need to train it to
do what we want to do and to do this we're going to use the
word have to do environments you know because it's
learning from the world by being reinforced when it gets a
good reward. And by being punished or weakened when it's
getting a bad word that's where the cue learning will come
into play.
DOOM - STEP 11
Now that we built AI with the architecture of the neural
network, the body, the way the actions are played and
everything. It's time to train, say I, with deep convolutional
Cuellar. So that's from now that we will implement you know
experience replay. Working with values , working with
words. And there's even going to be a bonus which will
improve a lot. The training process and that is called
eligibility trace eligibility trace is a powerful technique which
consists of accumulating the word over several steps and
the cube values are learned on this accumulation of words
as opposed to before where the cube values were learned
after each transition. Therefore after getting each word this
time we will be learning the values after getting several
rewards instead of just one word. So instead of having one
transition after the other and you know updating the Q
value each time. Well the q values are going to be updated
every step because eligibility traces radicals and steps
eligibility trace and end is this number after which the q
values are going to be updated in our model. Here we are
going to have any calls then. So that means that there will
be a 10 steps eligibility trace and therefore we will update
and learn the q values every 10 steps.
After accumulating the words on these 10 steps, that's a
bonus. That will make our model even more powerful and
you will see that in the end we will get outstanding results. I
was really amazed when I saw the final result. I used to work
on Merle's that took a lot of time to execute. You know the
AI took a lot of time to train but you will see that with this
one plus the neural network that we made. That is our brain
and our body here with stuff. Next we will get a very
powerful mob and therefore very powerful AI because you
will see that it will really cure lies. Do you understand what
I'm talking about? So as you can see in this part 2 we are
starting by getting the Dume environment and actually
preparing the lines of code for you. We are just using the
image pre-processing external file from a working directory
folder. So basically the order is rather to first take this line of
code doesn't that make it in court. 0 0 is the name of the
environment of the game we're playing. So first we import
the environment with this gem that makes That's what you
can find on the open page and projects. But then we use
this preprocess image class which is a class from Image pre-
processing to pre-processed the images that will come into
the new one that work and we preprocess them so that they
have a square format with the dominations 80 by 80 and
that remember is because in our new one that work well we
set our input images to have the time engines one by 80 by
80. Remember one is the number of channels and so one
means that we're working with black and white images. So
that's the greyscale here. And 80 by 80 means that the
dominance of our input images will be 80 by 80 and that is
what we set in the neural network but of course then we
need to specify this when putting the images which is
exactly what we do here with this preprocess image class.
And then after we import the environment with the right
format of the input images while we import the whole game
with the projects with this line of code and remember the
cool thing about this is that in the end we'll see the projects
are playing Doom So we will see how it will kill the monsters
try to reach the best and everything will be super exciting.
And remember that these projects will go into these projects
folders. All right. And last line here but I want to show it to
you because that's important that's now more related to the
AI that we're building. Well remember that our neural
network takes as input number actions. That's because you
know we want to make an AI that we can test easily on
several environments and several Dume environments. And
since the different environments have different number of
actions Well we specified this number actions variable as
the input of the CNN the brain. And therefore now what
we're going to do is get this number actions variable using
the Dumah environment that we just imported and created
into this variable. And later this number actually is very well
that we're about to create will be the input of the brain. So
let's do this. I'm introducing this real now variable number
actions number actions equals. Now we're going to take our
doom environment that is the variable that we created. So
do I. Then we add that and then well here we go we take the
first here action space that's the set of your actions. I
encourage you to have a look at the opening at Horrible's to
see how it works. You know how to open the gym
environments. But basically this is the set of actions. And
from this set of actions we can access the number of actions
in the environment and to do this data here and then.
And that's the number of actions and therefore doom and
that action space will return 7 6:48 because there are seven
actions. I know that we can see six actions in the two
environments on the opening page but I think we can also
run. And so you know we can move forward, move left,
move right, turn left, turn right and shoot. And besides we
can run that makes seven actions alright and that's it for
getting to do our environment we have to do our
environment. We have a number of actions so we have so
far everything that we need for our brain. We will then just
create an object or brain object which we'll call CNN minimal
letters. And since any function takes a number of actions as
arguments, we will put the number of actions in the scene
as an object that we will create and then of course we'll
create the body and eventually the AI. And that's why the
next section I'm going to call it building an AI AI because
now we can build as many eyes as we want. That's the
awesome thing about object oriented programming. We can
build any AI we want. And so we're going to build our AI that
has a sophisticated brain and that's exactly what we'll do in
the next project.
DOOM - STEP 12
Now it's time to build our very first AI because right now
we've only made the manual of instructions with the class
but we haven't created any object yet. And so we don't have
a real AI yet but we are about to get it right now because we
were able to create one object of this class and this object
will be nothing else than a name which will have a brain.
Anybody. All right, so let's do this. It's actually very simple to
do it now that we have defined everything with two classes.
So basically what we need to do is first create a brain
because as you can see when we create any AI we need to
input a brain but we also input our bodies we need to create
a body as well. And then once we created a brain object in a
body object Well we will be able to create the AI. But no
worries we will build the brain and the body and the
flashlight and actually just do it right now. Let's start with
the brain. We're going to call the brain CNN because the
brain is a convolutional neural network and it will be an
object of the same class. So it makes sense to call CNN.
CNN equals and then we take our CNN class Stein and we
put in parenthesis according to you. Well at this point right
now when creating an object of a class. What we have to
put is very simply the argument of the function. And that's
number actions and things to what we had previously when
getting an environment where we already have this number
actions and therefore we simply need to input number
actions here into the CNN class. Perfect. So now we have
the brain. Now let's make the body we're going to create an
object at the next point class and we're going to call this
object sighed next version that will be the value of our AI.
And this object is an object of the sighed next buddy class
to which we have two inputs.
The only argument of the innate function of the soft next
class which is the temperature T and therefore input t that
we have to specify value because so far t is just an
argument to T equals. And we're going to start with one
that's a small temperature but this might work very well and
actually I already know this will work very well so that you
can try with other temperatures you know how it works now.
Your actions will be more sure of themselves. There's action
with the high school that you will have a higher probability
to be selected as opposed to the other actions which will
have lower probabilities to be selected and therefore there
will be less explored. But anyway we can start with one. This
will get us good. All right so now we have a brain. We have a
body. So I guess it's time to make the final eventually. So
now you're going to see how things are going to become so
simple. It's when the intuition reaches its speech to make an
AI. We simply need to create a new object that we call of
course AI from our AI class. And since the eye is composed
of a brain and a body we input the brain which is our
convolutional neural network. The object and a body which
is nothing else than the soft Mak's body object from the
south very fast and see we build an AI in a flashlight by just
putting a brain in the body. And now we are ready to be
trained. So now it's time to launch the whole deep
convolutional learning process with experience. We play
that bonus eligibility trace on 10 steps and eventually once
we have all this we will train the AI to make it smart. So I
can try to do this. The next section is going to be about
setting up experience replay. So we're not going to
implement it all over again like for the self-driving car
because the good news is that we already have it in print.
So that will be fast. We'll just create an object of the replay
memory class that is in this experience replay file. So that
will help us a lot and therefore we will move on quickly to
the new and most important thing that is the train.
DOOM - STEP 13
All right now we have our AI ready to be trained. And the
first step of the training is to set our experience replay. So
we're slowly getting there. The training and the good news
is that we have an implemented version of experience
replay. Besides that is adapted to eligibility trace which I
remember is a technique that instead of learning the q
values every transition learns it every 10 transitions. So
basically that's exactly the same as before. But instead of
having a single target and a single word for each step we're
going to have a cumulative target of ten steps and a
cumulative reward of 10 steps and we will learn on the 10
steps each time. So we are learning 10 steps instead of one
like before. And with this I will work wonders and that will
make some wonders for the training process. You know the
training will take much less time. Thanks to this technique,
we have to specify inexperience replay that we are learning
every 10 steps. So that's why this experience replay is not a
classic implementation of experience replay. Like do one for
the self-driving car. It is an experience replay
implementation taking into account this 10 steps learning
and therefore you will find in this experience replay file two
classes one class that makes your AI progress doing ten
steps so that it can sum the rewards observed on these 10
steps. That's the first class and we need this class because
we need to include these 10 steps in the replay memory
class which is the classroom implement for the experience
we play and that's how we make sure that the memory also
takes into account the fact that we're learning in 10 steps.
So that's why you will find two classes in this
implementation of experience replay but that's only to take
into account that we're learning in 10 steps and that must
be taken into account also into memory. So speaking of our
memory let's create it. We're going to call our memory
memory and memory is going to be an object of the replay
memory cast and the replay memory class is a class of this
experience replay right now. And so I'm taking first this felt
experience replay conduct and that's where I take the replay
memory class. Perfect. And now you can see we have to put
two arguments: the first argument is and steps which
correspond exactly to the number of steps on which we're
going to learn the key values. So you know the number of
steps on which we accumulate the target and what we want.
We're going to have a cumulative target and the cumulative
reward and then the second argument is the capacity that is
the size of the memory. So for example here we can see ten
thousand. So if the capacity is equal to 10000 that means
that then we will have a size of 10000 and therefore that
means that we will get a memory of the 10000 steps
performed by the eye. But again we're not going to learn
every transition. We're going to learn every ten steps along
these last 10000 steps of the memory and that's exactly
this new feature that we introduce here compared to before.
Before we only had this replay memory trick and here we
have this replay memory trick plus this trick I've learned
every ten steps and we're going to learn every ten steps
and we're going to do it in the memory composed of the last
10000 steps. And this is experienced replay combined to
ineligibility traits with 10 steps will considerably improve the
training performance. So let's end with these two
arguments. The first one is and steps and that will be equal
to. But for now let's say and steps will specify what step is
right after that it will actually be an object of the other class
of this experience replay file which is the end step progress
class and that allows you to make progress during ten steps.
And remember during the 10 steps we will sound the words
on the ten steps to get the cumulative rewards over 10
steps. And that is exactly the eligibility test.
So now what we have to do is create these steps here and
we create it with the second class that we have in this
experience replay file which is step progress. So now we're
going to create steps and this will be an object of the step
progress class that we take again from our experience.
There we go. So that's the anti-progress class and now we
have to put three arguments. As you can see we have to put
the environment here that we imported. Then the second
argument is our AI and this will be of course the AI that we
built right here in the U.S. and the last argument is step in
this that's where we will specify that we want 10 steps you
know to learn every 10 steps that is every 10 transitions. So
let's help with these arguments. The first one is the
environment and that's doom and all right. Then the second
one is our AI AI and that we counted ai ai. That's the one
here. So this is just the name of the argument of the step
progress class and this ai ai. Here is our ai ai. The one that
we built and then the last argument is a stack and that is
equal to 10. All right. So right now we are just taking into
account in the memory that there is a learning on 10 steps
and this learning on 10 steps is called eligibility trace. So
we're really working on the advanced stuff here. But
remember that's because we're trying to be Dumb that's
nothing like making a piece of cake. So we need these
advanced techniques to make it work. So now we're almost
ready. Before moving on to the next step which will be
actually about implementing LGBT trays the only thing that
we have to include is the capacity of course and that is let's
say 10000 men we will have a size of 10000 meaning that
the memory will contain the last 10000 steps performed by
the AI and that will allow us to generate some many. As I
remember, it was a simple function. You know the memory
contains 10000 transitions but to train the eye we're going
to sample so many batches of ten transitions not one
compared to before 10 transitions this time and we will
sample these mini batches of 10 transitions into memory
composed of the 10000 steps. All right so now I guess we're
ready to move on to the next step which is about
implementing eligibility traces. So we're going to have some
adventure here. This will not be a simple implementation.
DOOM - STEP 14
This special toy is going to be super exciting because we are
getting closer to the A.S.C. algorithm. You're going to see
that what we're about to implement and that is called
eligibility trace or Sarsour is actually an algorithm of the
synchronous active critics agents algorithms that we cannot
consider. And we see because we're still going to have one
agent but still you're going to see that what we're about to
implement is actually taken from the following paper which
is this paper as Synchronoss method for deep reinforcement
learning and it is in this paper that you will find A-3
algorithms that we will implement as the final bonus of this
course. But as I said we are getting closer to it because the
model that will implement right now is actually this one the
synchronous and secularized that's the one that's almost
the A3 C which is doing after that but with one agent and
the powerful thing about this is this and step Cunanan we're
going to learn to accumulate rewards and learn the
cumulative target on end steps instead of one step like
Priestley. And that's what will make the training much more
performance and therefore are much more powerful. So we
actually have the pseudo code for this algorithm. It's this
algorithm right here. So let's click on it and there we go.
That's the algorithm we are about to implement. But
remember with only one agent the difference is that here
they take an action. 80 according to the president Greely
policy based on the q values for the current state and the
action played. But in our case we didn't implement an
excellent green policy. We implemented a soft Max. But the
rest is the same as you can see. We are going to compute
the cumulative we worked on and actually 10 steps.
Remember the steps are equal to 10. And so we will
implement this line of code in our algorithm that we're
about to implement right now. We're going to get this. And
mostly we are going to implement this as well. You'll see
that we will get the maximum of the q values for the current
state and the current action in this theta. Here is just a
target parameter. So let's do this, let's attack this algorithm.
This one is called synchronous and stick learning. But we
don't have the right to say synchronous as far as we're
concerned because we only have one engine. But therefore
we can call it and learn eligibility traces or even Sarsour. All
right, so let's do this. It's going to be pretty fun. We can
basically follow the code here and that's what we're going to
do. And so as you can see a parameter that we'll need is
again up the parameter that is the decay parameter and
therefore we will start by introducing a variable for this
Gahanna parameter and choosing them. So let's do this. We
actually don't need a classroom tremendous We can simply
implement this with a function because you know we don't
really need to create objects for this to be to trace model a
function will be enough because basically what we want to
do is to return the inputs and the target so that later when
training the AI we are ready to minimize the distance
between the predictions and the target and to get the
predictions we need the inputs because we are going to
apply our brain on the input to get the output signals that
will be our predictions. And then once we have our
predictions and our targets we will be ready to train the AI
by trying to minimize this great distance between the
predictions and the toilets. So that's the whole point of
doing this right now. We're implementing this function to be
able to return these inputs to the Soviets so that we can be
ready for the training to minimize the squared distance
predictions and manage toilets. All right so let's do this as
we said we want to implement functions. We start with this.
This we're going to call it eligibility underscored trace. You
can also call it Sarsour. You can also call it step to ring
whatever you want but let's call it eligibility trace and this
function is going to take one argument which is going to be
a batch and why that it's because we're going to get some
inputs and some targets because we're going to train the AI
on batches. And so the inputs and the targets will go inside
some batches and therefore the input argument here is this
batch that will contain several inputs and then several
targets that will compute. So there we go. That's the only
argument we need. Now let's go inside a function and let's
define what we need to do. So as we saw in the Basilica of
the paper we need again the parameters so as we said we
start by introducing this gamma parameter semi-close and
we can already decipher the value. And we're going to
choose four point ninety nine that's a classic good value for
Ghana and Norreys. I checked that this is a good value for
our AI. All right then next step is to prepare our input and
our targets because that's exactly what we want to return.
We want to return the inputs into targets to prepare the
training. And so we can already initialize them with an
empty list because of course these inputs Inside the best
we're going to have several inputs all into a list and that's
when initializing the inputs as a list as well as the targets
that we go. So we initialized our and put in our targets and
in the end this eligibility trace function will return exactly
these inputs. And these yes were of course Filton. We have
several inputs and the associated several targets in what
will be returned by the function. All right, the next step is to
start a loop and that's exactly because we're following the
slow code of the paper. This sort of code. And as you can
see there is this repeat code section and repeated exactly a
full loop in the code. We are going to compute the
cumulative reward right here accumulated over the 10
steps. And how is it computed? Well in each step that is not
the last step we're going to get the maximum of the core
values of the currency that we're in during this and steps
run. And if we reach the last State of the 10 steps well this
will be equal to zero. That is, we don't want to do it
anymore. And then we have this for loop which is going to
be another up. They don't say repeat here but that's the
same it's going to be the second full loop in our algorithm.
Well we will have that we were this way by multiplying it by
the decay parameter gamma and adding the word. So let's
do this, let's go back to Python and let's start our full so.
And what is going to be the iterative variable. Well that's
going to be our 10 step series. You know our series of 10
transitions so we're going to call this variable series that
represents a series of 10 transitions like a sequence of 10
transitions. So for series in. And then what do you think?
Well our series will be down to our batch. There are the
batches on which will train the AI and so forth in a batch
that is for all the series of 10 transitions in our input batch.
Well, where are we going to do well to get a cumulative
reward? You will see in the silica that we need the State of
the first transition of the series and also the state of the last
transition of the series. So what we have to do right now is
get these input states and so we are going to put these two
states into a viable that we are going to call input and we
will get these two input states. The first one of the series
and the last one that we're going to put into a non-pilot
array but no worries will not stay with this and Ampyra will
of course convert that into a horrible but the first step is to
put these two states, the first one in the last one, into an
empire. And so right here in this array we add the first input
which is the input stage of the first transition of the series
and that is Series and then to take it for a transition we take
the index zero of the series that's the first transition and
then we can access it by taking its attributes which is state
and that's because in our experience replay file we did find
a special structure for each of the transition and you know
the structure. Each transition is composed of a state and an
action word. But then the last element which is done with
this special structure that we are allowed to use right now
comes from the way we defined the transition and
experience replay. All right so with this we get the input
state of the first transition and now let's also get the input
stage of the last transition of the series. And to do this that's
to say we can just copy this and paste it and replace as you
are here by the last index of the series which we can access
with this trick minus one series minus one that will get the
input state of the last transition of the series. All right then
we need to put these two elements inside some square
brackets because that's what is expected by the umpire a
function and then an important thing to do since we are
going to convert that into a torch answer in a torch variable.
Well remember a torch tensor is by definition a special array
containing one single type. And so we need to force having
one single type. And as usual we're going to choose the
float type and so on adding this parameter here D type
equals and P that float. So that you can take this one and
now we can convert that into a torch tensor in a torch
voivode. So let's do this. Well first let's convert that into a
torch sensor. And remember we can use torche that from
non-prime that we go and we put all the array of the two
input states inside this torch dancer with the torch from
them by function perfect that will convert these arrays of
the two input state into a torch sensor. And now we put this
torch the answer into a torch very well using the variable
class so input will be an object of the valuable class. And in
fact as you understood this variable class takes all this as an
argument and that creates the object. All right so now we
should be good. We have our two inputs that we need. That
is the input state of the first transition and then input say
that the last transition. And now now that we have the
inputs Well what can we get we can get the output signal of
the brain of the AI. That is the prediction that we're going to
call it outputs. That's the output signal. And to get the
outputs. Well now that's very easy because we already have
a brain created which is our convolutional neural network.
And so we can simply take our brain CNN applied to the
inputs which will return the prediction that is the output as
simple as that. And now we are already ready to move on to
the next step. And the next step is to start to compute this
community if you want. So now we're going to do exactly
the same as our as to algorithm the Sarsour or should we
call it and steps to learning. We're going to introduce the
cumulative reward variable which will be the cumulative
reward let's go back to the paper as you can see right now
what we have to do to get this community reward which is
our here. Well and each step of the 10 steps run we need to
update it by adding a zero to this community if we were if
we reached the last stage of the series or the maximum of
the core values if we haven't reached the last stage of the
series that is for all the statics that lasted. So that's simply
an bonanzas. Let's go back to Piscean. So this community
reward as we just saw is going to be equal to zero point zero
if we reached the last state and we can write this condition
this way if the series index minus 1. That is the last
transition of the series. Then we add that done because
done actually is an attribute of you know this transitional
structure that we defined in experience we play our
experience replay foul and this done comes from actually
the opening structures because if we go to the open
Allergan website which is actually right here I prepared it.
That's the do good or vizir. And if we go to documentation
and then if we. That's the project I really encourage you to
have a look at. You can run an environment that Knowsley
you can see that our observations that these are transitions
are defined by an observation. We want this done here and
this means exactly that a transition or a step is over. And so
we're going to use this done here for our IF condition.
Therefore iSeries madness when that is done means if the
last transition of the series is over is completed. And so this
cumulative reward is going to be equal to zero if the last
transition of the series is done. Else if we haven't reached
the last transition well cumulative reward is going to be
updated with as we said the maximum of the key values.
And since this output here is the output of the brain that is
the predictions of the neural network. And as you know the
predictions of the neural network are the predicted values
Well this output contains the values. And since we need to
take the max of the q values well we need to add first this
index because this structure contains two key values and
the next one.
And then we need to add data to access the data. This
output structure you know has the special structure of a
torch voivode. So with this we get our core values and then
we want to take the maximum of our cue values and so we
add that Max. And now we get exactly what we want as in
the paper this maximum of the cube values for the non-
terminal States is perfect. And so now what we're going to
do is make the second fold. That is for the 10 steps of the
series we are going to update the cumulative went this way
by multiplying first by Gamma the decay parameter which
we already have and then add the B word. So let's do this.
We are actually going to do exactly the same as in the
pseudocode as you can notice they start from the right. So
they're not starting with the first step and going into the last
test they start with the last step. T-minus 1 to the first step
to start. That's exactly what we're going to do and that's
because we want to get in the end the cumulative reward
equal to our equals or zero plus gamma or 1 plus gamma
squared or two plus that added plus gamma at the power of
10 or 10 where are 1 or 2 that are 10 are the word obtained
in each of the steps of the series.
DOOM - STEP 15
So let's do this, let's make this for loop starting from the
right and going to the left and to do this we're going to add
four. So this is an iterative VAR that was going to be our step
because we're going to go from the last step to the first step
of a series of transitions and so forth. And then the trick to
go from the right to the left is to use the step in reverse.
And now we just need to input a sequence and this
sequence is going to be of course our series. So we can put
our series but as you can see in the paper we go from T
minus 1 to start. So we don't go from the last step that is
the terminal stage but the step before that there is to minus
one but to start that is the first step. And so here to go from
not to let's take that step before we need to add in brackets
column minus one. I'm sure that for those of you who
followed machine learning in a deeper course you know this
trick Kollin minus one means that you're going up to the
element before the last element but not up to the last
element and therefore we get the sequence we want. That's
why we're going to go from the element before the last
element to the first element and that we do things in
reverse to go from the right to the left. All right so we are
ready to enter the for loop. And so inside this for loop what
are we going to do. Where are we going to do exactly as in
the paper? We're going to update the cumulative reward by
multiplying it by Ghana and adding the word attained in the
current step that is in the step of the follow up. All right so
let's do this going back to Python. And so we want to update
our accumulated work the following way by first multiplying
it by gamma. There we go. Here we multiply it by gamma
and then we want to add the reward of that which we can
access this way with the special structure. Remember that
word is an attribute of the object. And so here of course we
add a plus. All right it's a cumulative reward. Equals do we
want this step. We are in right now the loop plus Gahanna
times the previous cumulative reward before it is a. Perfect.
So now I think we're good. We're following the algorithm
thoroughly. And now time for the next steps. Well now it's
going to become pretty easy. We go back to the first follow
up because this for loop is just to compute the cumulative
reward not going from the right to the left by updating this
way. Following the algorithm and now as you remember the
goal of doing this is to get our inputs ready and our targets
ready so that we can minimize the squared difference
between the two for the training. And so right now the only
thing that we have to do is get these inputs and toilet's
ready. So let's do this first. What we need to do is add the
first date of the series in our parts list. So far this state is
part of our goal. That was just to compute the output. So
we're going to get this input state of the first step
separately because that's exactly what we need to happen
in our list. So let's get this separately. Therefore we're going
to call it state. And so exactly the same as here we can get
it this way by taking the first index of the series which
contains the first transition and then adding that state to
get the state of this first transition. So that's the site we
need then saying we're going to get separately the targets
associated to this input stage of the transition. And so an
interesting new variable here target which will be equal to
the value of the first step. And since the Q value is returned
by the neural network and its content and output and since
output is the output associated to this input which contains
the first of the transition well we can get this q value of the
first date by just taking output here and taking the index
zero. And then we add that data that will simply get us the
Q value of the input state of the first transition and that is
exactly the time. Q So that way we take it then we are going
to update this target variable but only for the action that
was selected in the first step of the series and to access this
first step of the series. Well we need to take first series 0
because this is exactly the first step at a series 3 0. And to
access the action corresponding to this first step of the
series well we need to add here that action again that is this
attribute structure that we're using. You know action is an
attribute of the first step of the series. That is the first
transition of the series because each transition of the series
has the following structure state action word and done so
action here this attribute action here means that we're
simply getting the action of this first date. And so the target
for that specific action of the first step is exactly what needs
to be updated by the community of the world. So basically
here we're just going to write that target associated with the
action that was played. The first step of the series is this
cumulative reward that we just computed. All right and now
we're finally ready to update our input by appending this
first stay here and this first are here for the first hour. We
only need to update the first step of the series because you
know we train the AI on 10 steps and therefore the input is
the first step of the ten steps. And also we get the target in
this first step but then we don't get any inputs or any toilets
in the following 10 steps because basically the learning
happens 10 steps afterwards. That's why right now we only
get the state and the target of the first step of the series. So
it's important to understand that and therefore if we
understand that now we understand that we have to input
them in our list of inputs and our list of targets. So let's do
this. First let's append the states to our inputs. So we take
our inputs list and we use the append function to add the
state which remembrancer the input state of the first step of
the series and then we are going to append the target at the
first step to our list of targets and to do this we take our list
of targets and say we use the append function to append
this first target. There we go.
Almost done and now we need to return the last things
which are of course what we needed to as we said at the
beginning of this project the inputs and the targets that are
now updated. So we are going to add here every turn and
we're going to get our inputs first but then that's the thing
we need to convert them into a number array first then do a
type conversion to make sure we have a single type with
the type you equals and that floats 32 the same. And then
we convert this into a torch tensor because of course we're
working with a torch that's totally compulsory. And so I'm
using the torch from a non-Thai function again. And that
gives us our inputs. Perfect. And now let's do the same for
the targets. Now we can use this trick which is quicker.
We're going to stack the targets together and to do this we
need to take first our torch library because we are going to
use the stack function by torch to stack the targets. All
right. And so this line of code basically returns the inputs
and the toilets that were just updated through this eligibility
trace Sarsour algorithm. Or we can call it and step in and
send our congratulations. We were ready to do the final
training because basically the training consists of
minimizing the square differences between the predictions
of our inputs and the toilets.
DOOM - STEP 16
So now that we are ready to train the network to minimize
the squared distance between the outputs and the target
thanks to what we did with eligibility trace in the previous
section. Well basically we were ready to start the whole
training by, you know, getting our input, our target, our
predictions, then computing the last error between the
predictions and the target and then doing the backward
propagation to get a grid in the center of the data weights.
So we were ready to do all this but since we want to
compute the moving average and 100 steps you know to
keep track of the average during the training. Well just
before we do this whole training we are going to make a
class right now that we'll get this moving average of 100
steps. So no worries we will do it quickly. We'll make it a
class with three functions. We will do all this and the single
project so we'll do it quickly. We already did it. And besides
we want to focus on the training right now because that's
the most important. So let's make this class right now in this
single project. All right so we are going to introduce a new
class which we're going to call and a for moving average
and then here we go with our first function. So that of
course the innate function that never changes in it and this
and that function is going to take two arguments. The first
one is self for the moving average future object and size
which will correspond to the size of the list of the words of
which we're going to compute the average. So this is going
to be 100. All right so we have arguments for the function.
Now let's go inside the function. Now you know what to do:
we have to initialize the variables specific to the object. And
these are. Well first the first one is going to be a list of
words that's going to be the list containing the 100 words of
which we're going to compute the average. So here right
now we're just simply initializing this list with this empty list
here. So list every word and then the second variable of our
future object is going to be of course the size and the size is
going to be equal to the arguments who will input when
creating the future moving average objects. So sightseers.
And already we are ready to move on to the next function
which is going to be the add function and that will add the
cumulative rewards. Be careful it's not the simple reward it's
the cumulative reward. And that's because you know we are
doing eligibility traces and therefore learn every 10 steps.
And therefore learning with cumulative reward and not a
simple reward. So this adds function that we're about to
make will add the cumulative reward to that list of rewards.
So Jeff we're going to call it ad of course and this function is
going to take two arguments. The first one is self because
we're going to use this list of words here because simply we
are going to append the cumulative reward to this list of
words. So we need the self to be able to get this to self. And
the second one is going to be the rewards which will
represent the cumulative reward. All right so there are two
arguments of the function. Now let's go inside the function
and let's define what it has to do. Ok so very simply the first
thing we have to do is whenever we get accumulated we
want a new one. You know when we progress on tenue steps
Well what we have to do is add these cumulatively words to
the list. And that's exactly what we're going to do we're
going to write a line of code that will add this new
cumulative reward that we're getting after progressing on
ten steps to this list of words here. And to do this we have to
separate two conditions because since he will be working
with batches Well we want to be in some lists but in some
other cases do we words can also be as a single element
and the syntax to add an element to a list which is the list of
words here is not the same whether you're adding a list or a
single element. So we just have to make this a condition
that will separate these two cases. And let's start with the
first case which is the case when what we're adding to this
list of words is a list and to do this we're going to add an
instance in parenthesis we put two arguments the first one
is are we words that we're adding. So we weren't. And the
second one is LIST. And so if an instance word list means if
the rewards are into a list. And so if the rewards are into a
list what we do is very simply self that we take our list of
rewards and we are going to add this list because since this
is a list what we can do is use a simple addition operation
because we can use some tulis to get the rewards. Here is a
list because this will be called to true meaning in this case.
And so we can simply sum this list to our list of words and
therefore we can simply have your list of rewards plus equal
rewards. And by doing this we're just extending the list by
some in these two lists together. All right. And then the
second condition. So we can simply add else so that if the
reward is not a list and therefore if it's a single element and
so else what happens in that case. Well that's the same, we
want to add two words to our list of words. But we cannot
use the syntax because we words will no longer be a list. It
will be a single element. And so what we need to use is
another syntax which is the append function. When you
want to add a single element to a list you can add
something to. You have to use the append function. And so
this is exactly what we're going to do now. We are going to
take our list of words of the object and paste that here and
then add dot and then we go first one. And of course in
parenthesis we put the elements we want to append. And
this is of course b word but words in that case will not be a
list. It will be a single element like a single cumulative word
not into a list. All right and then we want to do this but now
we have to add something more. It's what happens when
this list of words gets more than 100 elements. Well in that
case what we have to do is delete the first element of this
list every word to make sure that this list of words always
contains no more than 100 elements. So exactly like what
we did for the south driving car when making this go
window and so to make sure of this we we're going to add a
while condition specifying that whenever the length of our
list of words that is number of elements in our list of words
whenever this number is larger than self that size that is the
size that we said here and which later will be equal to 100.
When we create the object. Well as soon as the number of
elements of this list of words is larger than 100. Well what
we want to do is delete the first elements of our list of words
which we can get by taking the index 0 that is the first index
of our list of. This is the first element of our list of words and
we want to delete it whenever our list of words contains
more than 100 elements. So with this conditional here we
make sure that our list of words never contains more than
100 elements and therefore Now what we can do is make a
new function to compute the average of our list of words
which will contain on the run one hundred elements. And
therefore we will compute the moving average of 100 steps
each time. So let's make this the function. It's going to be
very easy because there is the main function in Python
which is a function from non-pilot to compute the average of
a list. And so let's introduce our last function here which
we're going to call average and this function is going to take
one argument which is going to be self because we're going
to use of course still our list of words which is a variable of
our object to self and coloring. And now let's compare the
average and so directly we will return the average because
we can get it with the mean function to which of course
we're playing. Well what we want to compute the meaning
of is our list of what I think I still copy. Yes there we go. So
we simply return the mean of our list of words and the
mean. As I said, it is a function by non-Thai people. So here
and English look at that mean self list of words. And then we
go we have our average on 100 steps perfect. So we made
that class very efficiently. Now we get the instructions on
how to obtain a moving average of 100 steps. And since
we're going to use one moving average object when doing
the training well let's already create this moving average
object. And so we're going to call it an A and simply and is
going to be an object and a class. And as we said we want
the size to be 100 because we want to compute the moving
average on one 100 steps. So perfect. There we go. We are
now ready to train our AI to finally be intelligent. It's about
time it is from this point that our age will become smart. So I
can't wait to train it. It's going to be quite easy because this
is something we are dead. But this is going to be fun. And
besides after that it will be time to have even more fun
because basically at our age I will be fully ready and also
intelligent and therefore we will execute the code. And then
I will play Doom and eventually we'll watch the projects of
our AI AI playing Doom and we will see if it manages to
reach the vest.
DOOM - STEP 17
So that's exactly what happens when training the day I will
train it's intelligence to reach the goal we wanted to
accomplish that is reaching the vest without getting killed.
And to do this we're going to basically train the neural
network to output the right predictions and then everything
is already ready because these output signals from the brain
already have the right transmission to the body to play the
final actions. So basically now what we're about to do is
something we already did before. We're just going to take
some random batches from the memory get our inputs from
the samples get the outputs get the target get the
predictions compute the last area between the predictions
and the target and then perform backward propagation to
categorize in the sense that the weights according to how
much they contributed to this last error. So let's do all this.
You're going to see how it's going to be so easy because we
already have all the tools to implement this not only we
have the torch tools like the optimizer and the last functions
but also we have all the classes that we made before like
our brain of course which we're going to use to get the
predictions. Then our experience replay implementation
eligibility trace and all these tools combined to the parts
which will make the training super performance and
therefore eventually we will get a super powerful AI. So let's
make this happen. Let's be smart and the first thing we're
going to do now is get the last function that we're used to in
the training when computing the error and an optimizer.
That's the first thing we'll do. So let's create a variable for
the last function. We're going to call it loss and this will be
equal to the MSEE loss function from the end module. And
that MSEE us that's the last function we'll use because
basically our predictions are key values you know we're
predicting the core values of the different actions. And
therefore since these are real numbers well we're kind of
doing some neural network for regression and therefore the
last function is the means CT error. That's the last function
we use. In general for regression. All right so now that we
have our last function Let's get our optimizer. So optimized
here that's the Voivode we create Freda's optimizer. And
we're going to take it as usual. As for the self-driving car,
the atom atomizer. That's a very powerful optimizer that will
work wonders for the training. So let's get this one up in
that Adam. And remember that's exactly what the self-
driving car has to input two essential arguments. The first
one is the one that will make the connection between the
optimizer and the parameters of our neural network. That is
the weights of the neurons of our brain. And to do this we
take our brain which we call CNN. That's the object we
created for our brain. And so CNN that remembers
parameters that we go and some parenthesis so that makes
the connection between the optimizer and the weights of
the neurons in the brain of our AI. And then the second
argument is a learning rate and that's given by our. And so
here we have to take a smaller raise because we don't want
to converge too fast and we want to have some acceleration
and therefore good learning ways that we can take. Here is
a small one that is on point or one that is open 1 percent. I
think that's the same we use for self-driving cars. All right so
now we have a last function atomizer so we are almost
ready to start the fall. Well actually we will start to do it right
now but just before we do it we're going to decide the size
of the number of epochs we will be training. And therefore
in creating a new variable here which will respond to this
number of a box and let's say it equal to 100. That will be
way enough to train the AI. And even that's the way I will
manage to reach the best way before 100. Like 20 or 30.
Let's see. But for now let's take 100. And if we need it we
will increase it. But I don't think that will be necessary. OK so
now that we have our number of airbags we can start to
make the follow up. You know this main loop of the training
when we train over the Xbox. So for then it's really viable to
be a book. That's what we choose for it back in. Now of
course we're going to use the range function to say that we
want to go from the first back one to the number of books
plus one because remember the upper bound of a range is
not to. Therefore we want to go up to 100. Well we have to
specify and be the plus one to go up to 100. All right so
CULLEN And now let's get into the loop. All right so the first
thing we're going to do is to do 200 runs of 10 steps. So
each epoch will be 200 runs one after the other of 10 steps
and to do this we have this one step function from our
experience replay class and therefore to use this function
which is actually a method because we will get it from our
memory object which is an object from the replay memory
class to generate these two hundred runs of 10 steps. Well
we have to say our memory object that I remember we
created right here memory is an object to replay memory
class with 10 steps and a capacity of 10000. We created this
object and from this object we take. Well this run step
function runs steps and we specify two hundred successive
runs of 10 steps. So that will just add each epoch. Basically
one hundred steps. Now we have these 200 steps running at
each epoch. Well it's time to sample some batches from
these runs and to sample these batches. We have another
function from our memory which is a sample batch and that
will exactly generate some batches from these two hundred
runs. But remember these batches are this time batches of
series of transitions. There is a series of 10 steps as
opposed to before where the batches were just some
batches of single transitions here this time. There are going
to be batches of ten steps 10 transitions and therefore Now
it's time to get from our memory. These random batches.
And to get them we used the simple batch function to which
we have to apply the batch size for the batch size where we
can take 32 or even 64 or even 128. Remember that it's a
common practice to use 32. That's what you will see in
general in the neural networks architectures when doing
some Bachche learning. But this time it's quite different. We
were just sending some batches of ten steps. So it's better
to take benches with larger sizes. So that's why we can take
64 or 128. So we're going to take 128 and actually this is
going to be inside a Fulop because we want to take several
batches and we're taking them in what is returned by the
sample batch function. So this Fulop for that in memory
sample batch 128 means that every 128 steps Well our
memory will give us a batch size 128 which will contain
actually the last 128 steps that were just run. We're just
getting some batches of size 128 and the learning is going
to happen on these batches. And besides inside these
batches we will have eligibility trays running you know to
learn every ten steps. All right so now inside this loop which
is still happening in one epoch. But now this time we are in
a specific batch and so now the first thing we're going to do
is we're going to get our inputs and our targets separately.
And that as I told you is you can do it with one of the tools
we implemented which is eligibility tricks as you can see
here. This illegibly trace function takes a batch as input.
Now we have to batch and return as output the inputs and
the targets. So right now what we can simply do is create
two new variables which are going to be the inputs and the
targets and do these inputs cannot tell us equals exactly
what returns this LGBTI trace function applied to the batch
to apply this function to the batch of our loop. And so what
we'll do is just eligibility trace applied to the batch of our up.
All right. So that gets us the input and the toilets but in PI
torch there is always something more we have to do. And of
course this is to convert the inputs of the neural network
and also the target into some variables. But there is nothing
new. We know how to do it. We can do it this way: we take
our inputs then our targets. And while they will be equal to
variable inputs that's for the inputs and variable targets and
that's where the targets. All right so the inputs of the brain
are converted into some torche variables and the targets
also are converted into some variables. So now we can get
the inputs into the neural network. And why do we need to
do this? That's because the next step is to get the
predictions we have in which we have to target. Now of
course we need our predictions because then what happens
is that we will compute the loss between the predictions and
the targets. Let's get these predictions. Well again this is so
simple now we just need to take our brain which is CNN our
constitutional news network and apply it to our inputs.
There we go. The inputs go into the neural network and the
neural network will output the predictions. Perfect. So now
we have the protections we have to target so we can get
the laws and that's the next step. We're going to introduce a
new variable because right now we're going to get the last
error which is different from the last function because we
use the laws function to get the last error. So last error here
and that we will get it with the last function applied to our
predictions and the targets that we go to see how
everything is smooth. Now everything is logical. We get
input first the targets then thanks to the inputs we get the
predictions and then thanks to the predictions and the
toilets we get the last error. So very logical and smooth. And
now what is the next step? Well, same logical path now that
we have the last we can back propagate this last error back
into the neural network to do weights and we do that with
the gas grill in the center and to perform to get degrees in
the sense we need our optimizer. But we already got it here.
Our Adam atomizer. But now at this point remember what
we had to do. We had to initialize it and to initialize it.
Remember we take our optimizer object. And then we apply
the zero grette method. There we go we don't forget the
parenthesis that initialized it. And now the next step is to
propagate the last error back into the new network and to
do this well we take our last error and we apply the
backward method on it. So that's exactly to play backward
propagation. And then finally now that the last error is back
propagated into the new one that works well we can update
the weights. We're still getting great in the sense and to do
this remember we take our optimizer and then we apply the
step method. There we go. The weights are now dated. As I
told you not only we already did it but now it seems so
simple and so natural. So now we're going to do something
fun. We are going to print the average we work every epoch.
So you know we can keep track of how the AI is going and
how the training is going. We want to see the average
reward increasing over the steps of the epochs and at first
of course there is this exploitation phase so the average
would not increase at the beginning but then once it
reaches the exploitation phase then we'll see the average
reward definitely increase and it will increase up to a certain
level which is when it reaches the vest as fast as possible.
So let's start with the premise. You know we are doing this in
one way but we have to go back to the loop here print and
then we're going to print. Well first but then percent s
because we're going to convert everything to a string that's
better. And then we're going to add the average we were
and then we add percent S as well then we're going to close
the quote and then we add a percent. And on the other side
you know we can put the variables that are going to be this
first person as that is the epoch here. And the second
variable corresponds to the average word which will be
computed right now. So the average word viable doesn't
exist yet. We're going to create it right now. So we are going
to use Esti our epoch even if the number to convert that into
his train gets better and we are going to add s t r that's
going to be the average reward. And so we're going to
create a variable that we're going to call a Viji reward. And
now we're going to create this variable and compute it. OK
so let's do this. That's the only thing we had to do left so
EPOC we already have. Now let's compute the average word
and we need to compute that right here. Still in the loop but
our bachelor because now we have our patch samples and
we have training happening in the batch. But now that for
propagation Plus the backward propagation is done in the
batch. So we are getting back into the epoche loop and we
can now compute the cumulative words which we can do
with our steps object because our insteps object contains
this function which allows us to get the cumulative rewards
happening in the steps you know and steps run. So we are
going to use it right now to update the new rewards of the
steps and then we will update the moving average object by
adding the cumulative words to the moving object. And then
we compute the average and that's how we're going to get
the average of what we think is the first thing we need is
the word Stribild. So let's call them the word steps and then
as we said we take our end step object which I remember
was created here and object of the step progress class from
our experience replay for instance objects. Then we add
word steps and then some parenthesis. All right. That will
get us the new cumulative rewards of the steps. All right but
then we need to add these new cumulative rewards in our
moving average object and to do this we have a method
this time in the moving average class which is this method
that's very simple we take our moving average object which
we created here with 100 steps then we're going to use our
add method. And then in the Admetus we put our words
steps and this will add the rewards of the steps into the
moving average. All right and finally we can compute the
average word and that is well you know that's the variable
here. So that's what is going to be equal to the average
word and to get it we just need to use the average method
this time from our moving average object and that is we do
and say that average to say that because our moving
average was already updated with the new word steps that
we just added things to the Add method great. Now we have
our average word so that will populate here. And this is
going to be printed every hour. So this is going to be very
exciting because each episode we're going to keep track of
how the air is going. You know we're going to see if the
average word is increasing.
And speaking of something exciting. What's very exciting is
that I know that it will reach the vest for sure when it
reaches 1500. So what we're going to do now is that if the
average reward reaches 1500. Now if it's getting higher
than 1500 Well we're going to print something new and
that's going to be the final print. The very successful print
we're going to print that we can say. Congratulations to your
AI because your AI wins. So that would be awesome if that
happens if we reached a non-average word. Larger than
1500. We can be 100 percent sure that AI is the best. Maybe
it's a little lower than that but I experimented with 1000 and
sometimes didn't reach advanced but with 1500 if we get
there well the AI wins for sure. So we're going to watch that
afterwards. All right so that would be very exciting. And then
that's actually only the end of the curve. We need to specify
that we want to start the training. If that happens you know
if it reaches the vast wealth it no longer needs to train. So
we're going to end all the process with a break if we reach
an average world of 1500. We can finish the whole process
and to do this we can use this break here. All right. And then
one last thing to do when the game finishes is when the AI
wins. Well we can close the environment and so on by
adding just one last action hero which I'm going to call
closing the doom environment. So to do this we take our
Dume environment which we called Doom and remember.
And then we can just add that close and that's all that we'll
just close the environment which is better to do once you've
finished the game. All right so we're done. So I'm so excited
to see the results and actually I'm going to have a surprise
for you in the next project while watching the results. It's
going to be pretty exciting. And so now I guess it's time to
play with it and have fun. So let's hope we can reach this
vest as fast as possible. That's the goal of this new
environment. All right, so let's have fun. Prepare yourself a
good coffee or a good tea. Now stand to sit comfortably in
our chair and watch some very cool projects of our planes.
WATCHING OUR AI PLAY
DOOM
And just so you know the students have been really
struggling with me to write these lines of code. We have 156
lines of girls now and we're so relieved to finally watch the
results. Great job. OK. So to get the results very simply
we're going to select the whole code just before we execute.
And as a reminder you have the projects folder and do much
too. And right now this for a reason it's going to be
populated as soon as we see us with some projects so it's
happening in real time yes real time. OK. Good stuff. OK. All
right. Are we ready? Yes, let's do it. Execute. And there we
go starting with the first project. So first we are recording
your first attempt at playing the game. And there we go.
The first average we were negative. That's pretty bad but
that's totally normal. Right now the eye is exploring train
line 41. Yeah and remember we have to watch the 1500 too.
I think we were lucky on this one. This is the first time I see
this you know reaching such we want to as you see just do it
just like he was just like you. OK. Interesting. Yeah. So what
does it mean if it reaches 1500 Oh. So if it reaches 1500
means that we know 100 percent that the eye will have
reached the best. This is the one way it's got to get to that
best as soon as possible. Yes. Yes. Actually we. We have
time to explore opening night. Yeah. You know we're playing
this. Do we ever do as you can see this mission to run as
fast as possible to grab a target so you know it's like we
were super positive when it reaches you that's pretty fast.
Yeah. So it really tried to reach us. It isn't like this is a
special one because it doesn't really have to even kill the
monsters, it has to figure out that it needs to run. It has to
figure out the best strategy to reach as soon as possible.
Maybe it's best just to kill the monsters. But I think so yeah
maybe it's to avoid the monster maybe it's to run very fast
labeling the monster I don't know. Yeah but yes that's
definitely a goal to reach the best as fast. OK, that's
interesting. It seems to be going down. Is it going down? No,
it's normal. I think we're just at the beginning. And it's you
know you know there is the training and it's doing some
stuff and I think you know it it went far too far but it got
killed.
And so it was not a good strategy because it took some
directions you know towards the monsters that got killed
but still it got form because it reached more than 1000. But
we're going to see that project. Actually you know the
project or recording so we're going to see that to watch
each one of them. OK. So unless they get points for stuff.
But the biggest Poinsett Yes the most prestigious for the
best of us. Well it's a continuous war. You know when it's
approaching the west it's getting a continuous OK. If it's
getting further away from the base it's getting negative
rewards. For example at one Mannis 41 it got further away
from the vest or it got killed. Yes, getting killed also gets you
back to work. Gotcha. Gotcha. OK so walk me through the
main section of the code. OK. So at first we made the brain
the brain yeah. Yes the convolutional areas which are the
eyes of the eye and the forward function that propagates
the signal inside the brain. OK. OK then we have to make
the body and that is the part of the eye that plays the
action. OK. So we see that we used the soft Max way of
getting the airplane reaction off. What's the best action you
know? Yes. And so the cool thing about this next method is
that definitely the best action is playing. But the other
actions are explored. Yeah. Because it works on a
probabilistic approach. You know the other actions have low
probability to get played. But still it's probably a yes or no
no. So sometimes since then we know we take a random
drop from a distribution of probabilities of the actions. Well
sometimes these other actions are played. And so that's a
way of exploring that. You know that's something we talked
about on the Internet or else the fact that you got to
combine exploitation and exploration. Right. Exactly.
Otherwise you might get stuck and walk on that black
smoke and you'll never find the other Maximals. Exactly
right. And by the way guys as you can see I think that's the
expression face being over because right now the average
word is definitely increasing. Look you know we went up to
697 But then we went down then went down again. But now
we reach 525 then eight hundred and eighteen. I think the
ice is starting to get smarter and I think the average word
might be increasing. That's great too. Take my soul. Me too
me too. I was very surprised at the beginning because
actually we oh I see. Next, ever increasing. So I think now
the exploration phase is over exploring and I think it might
be reaching very soon when the exploration phase is over. Is
it fully over or is it so randomly sometimes not it's not fully
over but is less likely to explore. Yes, that's likely to be
explored. That's right. Now I bet that the average work is
going to keep increasing. That's a reach you know the last
checkpoint which is this 1500 rewards buys me off. It's like
when you come to your country you go to because they
have new groceries, different groceries which you said
you've got from your grocery shopping experiment,
something you don't like in your life. And then sometimes
figure out oh this is what I like to have for lunch and then if
you're just buying power. But occasionally you'll go out and
buy something new to see how you like it. Yes I bet you do
that often. Exactly. And then once in a while you'll find all
are so cool and you'll change your pattern or have a little bit
of yes. I think that's kind of like something that's a good
analogy. Yes. All right so we might reach the final average
soon but let's continue on the cold. Yes. That's a great way
to summarize it. So as we made the brain in the body we
assembled the brain in the body to make the whole day. And
that's the section Yeah. And we have this whole fourth
function that propagates the whole thing from you know the
original input image then through the brain and then
through the body and then the actual space. So it was
actually an analogy you know to make your brain and
thinking is actually a body. Right now you would never see
the eye with a brain in the body of any of the resorts. It's
just the way we're seeing it. So the brain makes decisions.
The body then acts on decisions. Yeah exactly. That was just
a way to play with the objects and object oriented
programming to make it into his head. OK. OK. So that's part
one making the hand with the brain and the body. And then
part two is of course the training of the. So in the training of
course the employment eligibility trace which is an instant
cure and then the deep convolutional Kiran is that instead of
learning every step you're getting. We want every step and
getting to every step we compute accumulate if we win and
cumulative targets and steps. And we did it in 10 steps and
that improved the training a lot. That's pretty good. Yeah.
And then we make a movie average and that's. It's like a
school function. And finally of course we train the eyes.
That's the cutting thing. I know we have our inputs. They go
into the new one at work and then we get in there because
we also have a target. We can get a prediction to the target
that generates a loss then the last areas back propagate
into the new network. And then we place this into data
weights to try to minimize the distance. Well actually the
square distance between the predictions and the time it's
yes. So that's exactly what we do here. Plus again we're
getting deeper in working with our networks basically. And
then finally we print this out to our inbox at each epoch so
that we can see and keep track of the ice cream. And finally
we print if we reach 1500 We print that's something we still
need to get to yes. OK. So how's it going right now? Okay.
So you were right. It is still exploring a little bit because it's
definitely reaching an average of the largest one thousand
years. But still you know it's still exploring because we don't
want to get to 1500 yet and do it soon. OK to start looking
at the first project. Okay. And that's a good idea. So there's
project so we all have four projects when you have so many
more because it's not recording that you frequent. So it's
got to zero. But the first time you see a new one is just
randomly. Yes. OK. So here's the first project. No, I don't
know what it's doing right. OK. Then secondly it might not
be much better. Oh yes. Maybe because it's nine seconds
now. So old Kilborne. Yes. Good. Two to three.
All you got to the last. I get to do that again Larry. I know
what it is. This is a lucky one. The Lucky One played again.
OK. So go book it and say that they're not ready for war.
That's the perfect catch. But as we said, the best strategy to
reach this is the nicest possible. Probably not. So let's see
how the carrot is going. OK. Still so sick. All right, let's do
the next one. OK. Next one or two seconds. He stopped
shooting. You decided to explore the pacifist version. I get to
make friends. I was going to befriend them all. So the next
one. Once again we all got there. OK. You got it. Yeah but
what if you got the rest pretty much right away. Yes. Then
how much like how would you get more points because you
know the faster you get to divest the more positive is your
reward. Can you go even faster than that? Yes I think so let's
watch the next one. Say he understood it, he's not OK. He
was running towards the best as you can see we see some
of the directions left and right. I think the best course is
when we just try the right strips. OK. OK, interesting. But
you can see that he's getting to play 27 here. Let's see. You
know this one's on right now. OK so they're pretty simple. So
guys I think you know 1500 was too much. Yeah I think you
know it might be a good average two to research. Once you
choose this 1500 by 1000 then maybe you'll get to project
that with your winning. Yeah I think you've got to
experiment that as well. So it looks like it's converging.
Right. They go down 300-1300. Yeah. Looks like he's figuring
it out. But we can see that he's figured out that he's on to
the play. Let's wait until I turn 25. Maybe he will reach 1500.
But anyway you're right that was really quick. That's it. The
training didn't take much time. That's also why you think
that is because I think it's because of the LGBT to trace their
back into the method. OK. There are actually other bodies
that can play the action. There is also as we saw in the
paper which is this paper you implement this algorithm. I
think it is and. That's actually we see paper from deep mine
that's going to be critical. But as you can see in this
algorithm they take action according to Neptune reports. So
that's another that's another body that's another way of
playing the action. So that's also good. But I wanted to end
too many soft necks because I thought the results would be
better. But guys if you want to practice try implementing
Ipsden grip. OK. So how are you going 23 so far. So the 100
is going to be another project. Twenty five. Yes. OK.
Interesting. Well it's doing you know what I read recently.
No. Why? Why? Computer games don't use artificial
intelligence like we created. You know like for example those
monsters that are killing him. Why? Well that's an old game
but even in New Games they don't use these algorithms full
of monsters. You know what. No. Why why why. Well I think I
need some more powerful algorithms in that because he's
talking about the project when it's killing over and over
again. Which one? You know the project of the airplane. Yes.
No no no no no. I mean like if when you as a human you
play a game. Yes. And against you there's AI AI that is
against you in a computer. OK. Why don't they use these
algorithms? Oh we can truly use these algorithms. For
example I almost implemented an algorithm to play against
me. I had a pong game, yeah. So we can. But the thing is
they didn't on purpose that it was impossible to beat them. I
think they do. There is a soccer player. Think about it as a
good game if they use something like this. It just gets so
good that you will never win. Yeah. Yeah that's why they
have to really dumb it down. Yeah. So any artificial
televisions playing even in the most basic game like chess
and whatever. But like especially in the more advanced
himself going to be like this album because it's just got to
be impossible to. This is going to kill you right. Exactly.
Because I also suspect online gaming gets so much access
to so many human players. So you can train all different
human play behaviors and you can understand humans
judiciously actually instantly. The guys in the project games
are nothing like this. Yeah. Because as you said we would
have no chance. It's funny, it's like something somewhat
post-box. OK. So how are we doing? Well definitely
increasing strength. So he found a way right. Look at that
like between 13 and 24 7 airports you're stuck at a
thousand. Yes the threat of thousand one hundred and now
he finally found a break. Maybe he's like Yavanna the way
John Foster is. I think I just can't imagine how he can even
improve even further. Yes. OK. I remember he reached 1900
1900 when I went out there. I was definitely running fast but
then I said you know I don't think there's much difference.
And there we go. Finally he recorded this project as well. Yes
it did actually add records to the air every five six months.
So wait, see. Just like that where we got this project. Yes we
are. But anyway we already met this guy. So I thought yes I
was going to ask you like what if he. But we don't recall the
winning. Well we would like to but I guess I would have won
it. Let's watch the last one.
OK. So congratulations guys. Cissoko Fiat 1500 I can't really
call. That's the one right. Yes. And you know I don't think
that's too busy 236 and right now it is 2:43. Where's the
project? Good question. The project. It should be very soon.
You can. Or maybe dinner with you. No it didn't actually I
didn't finish writing results. You know like I can die. Well
that's it, it's all good. Yeah this one is good enough. That
would be really very fast. My friend that was on the peaceful
path in trying to make friends, this one I think is shallow , is
actually not a good idea. The next time that's next and to
really understand the difference. Yeah. OK. OK. That's pretty
cool. Yeah. That's so exciting. Hope Hope you guys got the
same results. And I think you guys might get the results
before the epic's 26 because I actually tried it several times.
And sometimes you can get them at number 10 or 15 but
26 is the first. OK. OK. So that's the end of this module to
know what is going on to the next module about celebrating
your breakout. Yeah. You guys excited for a while is break
out the last one because it's the most difficult one is the
most challenging one and the simple reason for that's the
main reason for that is that the little ball that deep
convolutional neural network has to detect is much smaller
than the big monsters do much more much easier to detect.
PLAN OF ATTACK
We're going to discuss the plan of attack for this amazing
section on the most powerful Ogger algorithm in artificial
intelligence which is A-3 see at the moment. So let's have a
look. Today we'll talk about the following things. We'll start
with the three A's of A3 C. So we'll talk about what the
abbreviation stands for and how are we going to deal with
the three A's and also we'll look at the actual article or the
actual research paper that gave birth to A-3 see the deep
Google deep mind research paper and we'll have a quick
look at it so that you are more comfortable navigating in
your free time. Plus we will see exactly why the aid through
sea is the cutting edge most advanced algorithm on the
planet for individual intelligence. At the moment and why
it's so important to have this in your arsenal. Next we're
going to talk about the first day of A-3 which is actually
critical. Well it actually stands for a synchronous advantage
active critic algorithm and so we're going through the A's
not in order but we're going through them in an order that
makes sense for us that will help us in our intuitive
understanding. Then we'll talk about the synchronous
element in the A-380 algorithm and then we'll talk about
that Wantage and that's the project where we'll put
everything together and all start making sense.
So a quick word of caution that as you're going through
these you might notice that after the active critic and even
after the synchronous or Tauriel you will still not exactly
have the full picture of full understanding of how this all ties
in together. And that's totally OK because all these three
elements they work in, they indeed work together and they
require each other to function. So just power through it and
get to the third project. And in the Third World the
advantage is when it will all come together. And if at that
point you still don't quite understand something then you
can just revise that specific project with this critical
synchronous or that vonder project to put all together. But
basically just don't get thrown off if after one of those
projects you still feel that you need someone's information .
That's because they all work together to come to one
function, a 3D hologram that's an advantage Tauriel that all
ties in together very nicely. And then finally we'll talk about
some additions that you can add or some additional
modules and elements that you can add today through the
algorithm. The thing is that there's different versions of their
algorithms with different modifications and already through
the first four literals in the section Mool we will see some
invocations and at the end we will add in an extra one the
long short term memory. SDM. Now here we won't go into a
lot of detail. Ellis Tam's that's a topic for another course but
nevertheless you will get an overview of what LACMA is and
how those structures are or how those layers in your own
networks work so that you have a better understanding of
integration of the practical projects but very important also.
This will showcase how the A-380 algorithm can be modified
to suit certain purposes. So there we go that's the plan for
this section. Definitely it's going to be an exciting algorithm
to look at because it's the top algorithm at the moment in
the world and we're going to be developing an intuitive
understanding and then together with Adlon after this you're
going to be coding it which is very very exciting and I can't
wait to see on the first project.
THE THREE AS IN A3C
We're taking our first step into the world of A-3 see and as a
first step we're going to find out what disapprobation stands
for. So A-3 C stands for a synchronous advantage. Actor
critic algorithm. This is an algorithm which was developed at
Google a deep mind in 2016 by a group of researchers and
it's the cutting edge algorithm for artificial intelligence. To
date now it has multiple notifications and we'll discuss that
more in the course especially in the practical projects. But
nevertheless this algorithm blows everything else including
deep convolutional Q learning networks out of the water
completely out of the water and it is faster. It takes less time
for training and gets better results. So throughout this part
of the course we'll be referencing and we have already
referenced by we'll be referencing even more paper on the
paper that was published that first introduced A-3 sea is
called the synchronous methods of deeper enforcement
learning why Vladimir minae and others from Google deep
mine which show you this paper now so that you have a
introduction to it so. So here is this paper I wanted to show
you so that you can get a feel for it and get ready to get
introduced to it a little bit. And of course it is highly
recommended to read through the paper and understand
what's what exactly they're talking about and you'll see that
throughout the practical projects Adlon will be taking you
through certain parts of the paper through certain
paragraphs or sections which will be relevant to what will be
programming at that point in time. And what I wanted to
point out here is like as you can see a lot of research went
into this but there's a lot of references as well but like part
of the area like is that at the end they compare the different
algorithms compare the results and this is what I wanted to
point out here. So let's zoom in a little bit.
So here as you can see the even in Google deep mine their
training or they're evaluating their algorithms on games just
as we're doing in the scores. So exactly the same principle
because games are a simulated environment or a small
environment, a confined environment with certain rules and
they want to understand how well this artificial intelligence
is doing in those games. And here we go. Exactly. All those
games which you can find a lot of them you can find on
open a gym and the games that we've been working with.
So for instance in this section we're working with breakouts.
It's also here as you can see for Breakout. They've got in
bold they've got the best algorithm highlighted so Dick un
that's the algorithm we've been working with and then some
other algorithms and then you've got A-3 see through see if
LSD long or short term memory. So that's the one we'll be
implementing in this part of the course will all have A-3 see
with the nelisse team which makes it even stronger. As you
can see breakout is the best result is achieved by threes you
fellows. So that's the score. Six six point eight compared to
the others. And also you can see that for most of them so if
you now take a bigger picture view you can see that most of
the bold ones are actually in this last column. So yes indeed
there are some games where other algorithms are
performing better but as you can see deep Dijk un is
actually not performing better in any of the games but you
can see that there are other algorithms other algorithms
perform better sometimes but the A-3 cell system performs
the best. In most cases you can see this ball this ball this
one these wands this one and so on. So you can see that
the atresia system is a really powerful algorithm. It is indeed
at the forefront of artificial intelligence and that's exactly
what will be implemented. So a very exciting section ahead
highly encourages you to go through this paper and really
get a feel for what we're going to be talking about. And then
throughout this section and throughout the especially the
practical side of things, Stroh's we're going to be going
through this in detail. We're actually going to be working
with their pseudo code here which is available and we're
going to be. So I will show you how to implement that and
how we're going to be working with it. And on that note I
hope you're going to enjoy this paper and I look forward to
seeing you next time.
ACTOR-CRITIC
Today we're talking about the first part of A-3. See the actor
critic part. So here we got a synchronous advantage. Actor
critic algorithm. And we're going to be talking about that
underlined Akrotiri. That's where we're going to start. You
could technically start anywhere but it just makes a lot more
sense to start from a critic because that way we'll have a
very consecutive explanation of intuitive understanding of
what's going on is going to facilitate us. But if we start
surprising yet at the end of this aberration. All right so far in
this course we've come up with a deep conditional Kule
learning which is illustrated here so we've got the computer
seeing the pixels so the actual image and pixels are not just
a vector. So it's not cheating, it's actually seeing exactly
what a human sees. It sees the monsters, it sees the health,
it sees the parameters at the bottom, it sees the card or it
sees the gun. It is exactly the same thing as a human would
see when playing this game. Then that image is passed
through a convolutional lair and then it's passed through a
pulling lariat flattens and goes into a neural network and
then at the output we've got actions as remember we've got
those cube values then we apply an action selection policy
to them. So for instance we apply a soft Max and we find out
which action we want to take. And so there is some
exploration plus exploitation going on. There is a
combination of the two. So that is how deep convolutional
cool learning works. But now let's see what we're going to
do with it. So for simplicity's sake just so that it's easier for
us to operate with is because we're going to adjust this
image and move it around. We're going to replace circles
with squares with these or these rectangular boxes.
And we also are going to get rid of those lines and just
change them to arrows so this doesn't change the essence.
This is just the representation on this chart. Even though
this representation is still deep convolutional Kule learning is
just going to be easier for us to modify and show exactly
what it is. So that's just how we are going to represent
things from here. And what does this specific part do?
Remember we're starting like step by step we're starting
with the active critic part. So we're going to see how we go
from deep crucial Kule learning to A-3 see step by step and
first step we're going to introduce this actor critic PARTOVI
here so we're going to talk about that. So the first thing that
happens is this last bit. The output is actually we're just
going to redraw it like this so it's exactly the same output,
exactly the same q values are exactly the same action. So if
you had eight possible actions you still have eight possible
actions which is going to put them at the top so they take
up less space so nothing so far nothing has changed so far.
This and this are exactly the same. But now this is where
the active critic part comes in. We're going to have a second
output. We're going to have the first one as a set of outputs
and here we're going to have a separate individual output
so technically we are going to be using our neural network.
So once an hour. Or the image and everything like the
values go through the network from left to right over here.
They don't just spit out one set of values they spit up
actually two sets. And so the top said we really know what it
is , it's possible actions but here we're actually going to
have another extra value so let's have a look at that. What
is that value? So here we go, that's the top. So we just kind
of like to reduce the size of this illustration. The top output
is the cube values as we discussed previously for the
actions. So they're the same thing. Everything is the same.
But then now this bottom part and the top part is actually
called the X or we're going to give it a name that's the actor
because that's the part where the agent chooses what it
wants to do so that it's like it's acting it's as if it's performing
on stage and it'll make more sense once we have the
second name up on the screen as well. And then the second
output is just like one value and that is V of S so that is the
value of the state. So if q of S is the Q of A is the q value of
a certain action and as you can see that's why there's action
one action two action three are up to action six or higher
meaning actions there possibly are in that state. So in a
given state s What is the q value of taking action a action to
action one action to and so on. Then here we're also
predicting we are also using a neural network to predict
what is the value of the stage we're actually in and this part
is called the critic. And so that's the intuitive for the kind of
not even full intuitive that's just like the start of the intuition
behind the actor predicting that there's two outputs now
from the neural network not just one. P before we just had
that one at outbred which we now call the action. But now
we have two outputs: Akshara and critic. And there's going
to be a dynamic between them which we'll explore further.
But for now it's important to understand that we're
predicting not just the values of the actions that the agent
can take from the current state but are also predicting the
value of being in this kind of state using that same year old
network. So that's the core of the first step into being an
active critic.
And now we're going to need to talk about a synchronous
which we'll do in the next project in order to understand
exactly what's going on between duty and the final thing for
today is that all of these key values as we know that's also
called Pulse. So in some literature in some blogs and some
discussions you might find in the active critic you might find
the author talking about Cue values on the side of the actor
in some in other literature and blog posts and discussions
you'll find the agrah the author talking about the policy so
and usually is use the user like a Greek letter P for
representing the policy or just say policy of state. So
altogether these are the policy of state because as we
remember the policy is that if you put all the actions
together the possible actions and then it's deciding which
action to take. So these are going to be like the probabilities
of taking each action so that's the policy. So don't be thrown
off if you see one or the other. They basically mean the
same thing. So on one hand here you've got the policy or
the q values on the other hand you've got the actual value
of the state and they're being predicted from that year on.
So that's the start of the active critique.
ASYNCHRONOUS
We continue our journey into the world every three days and
we're talking about the synchronous side of aither C so
there we have our abbreviation of synchronous advantage.
Active critic. And today we're going to find out what a
synonym here stands for. And let's go back a step. Let's look
at how we started this whole course for enforcement,
learning what it's all about, that the Asian is in a certain
state. They observe the state. They make certain decisions,
they take actions in that state and then the state is changed
so they get into a new state plus they get rewarded. So the
reward for taking that action or some sort of reward which
could be a penalty as well and they end up in a new state.
And based on that now they take another action again. They
get a reward and end up in a new state and they take
another action and so on and so that is the basis behind all
of reinforcement learning. And that's what we've been using
in deep learning and deep convolutional keep learning and
that has allowed our agents to gradually beat more complex
and more complex environments. But now we're going to
introduce an even better concept and even take this even
further. What A-3 sees through this and synchronous
element is instead of having one agent attack the
environment. They have three agents or whatever number
of agents or several agents attacking the same
environment. And the key here is that's why it's called the
synchronizes because they're initialized differently so their
star inputs are different. So for instance as you'll see from
practical sources you set a random seed and you set it
differently for each of the agents. And that way because
their starting points are different. They're going to first go
through environments in different ways and then they're
going to explore in different ways and then in the next
iterations are also going to explore in different ways. And so
for instance we have three agents. You're all of a sudden
getting triple the amount of experience instead of just one
age and going through and exploring the environment and
trying to understand how to operate it in that environment.
You now have three or however many of them going through
that and getting this experience and so there so that each
one of them is learning for this bigger experience and apart
from just giving a broader range of experience it also
reduces the chances of one agent getting stuck in a local
maximum. So for instance if one agent finds a way to beat
the environment which is not the most optimal because if it
deviates a left to the right from that solution that it found it
always gets more penalized it might get stuck in a local
maximum. It might just keep doing that thinking that that's
the optimal solution where it's actually not. Well the
likelihood of several agents getting stuck in that same local
maximum decreases over decreases with the number of
agents so the probability of one agent getting stuck in a
certain local maximum might be high but Or might might be
a certain value. But the probability when you have three of
them of all three of them getting stuck in that local
maximum is much lower. And as long as they share
experience between each other they can help each other
out so if one of them gets stuck for instance it's in a local
maximum and just simply think that that's the best and
that's the best that's the best solution all the time and
keeps doing that. Well as long as it interacts with the other
agents So let's say this guy gets stuck in a calm action as
long as interacts with other agents through the way we
build our whole algorithm through cellular and they will help
him out. They will give him knowledge that actually you
know hey you should explore this or he will be more likely to
get out of that. And also overall the environment will know
that hey even though this a great maximum these other
ages have seen better options and we should keep exploring
because it looks like there are better options.
So in a very short kind of rough intuitive understanding
that's some of the advantages of having these are
synchronous agents for so you have more experience to
choose from and to learn from. You could get to the solution
faster and generally speaking if there's a lesser chance of
getting stuck in a CRN local maximum. So let's see how this
all plays out. In this model that we've built so far so
remember this is what we've gotten so far through the
actual critic and this is like we're all teasing this is so far as
you remember from the first to turtle we did introduce this
you know we had this already even in deep ocean. So we
just named the X now but now we've introduced a critic but
so far doesn't really make sense. What's the point of having
this criterion and measuring the value of the state or
predicting the value of a stage using the same neural
networks or the same approach? But now this partner's
going to start making more sense. What we're going to do is
we're going to replicate this because now we have multiple
agents So if multiple agents this is what it looks like. So the
first way of imagining it is now we have these three days
well remembering what we said about them sharing their
experience between each other. So this is actually like right
now they're all independent. You have one playing the game
and other than playing the game another play in the game.
It's like launching your agent on three different computers.
You put three different computers next to each other and
you launch them and you know that's great. Like indeed you
like you'll get you'll get more experience you'll get more
variety especially if they're initialized. So we can assume
from here that they're ill-initiated before even though we
have the same picture here. Are we going to know that
they're actually initialized differently so it's not going to be
like identical training identical learning from this game. And
so even if you like you put three computers side by side and
you launch them, yes you're going to have more experience
because you're going to have three agents playing and also
you're going to have a bigger variety of possible solutions.
So that's true.
But the problem is that they're not sharing our experience
among each other or not learning from each other. So they
don't have that synergy. They don't have the advantage or
the extra power that they would get if they were competing,
you know, like how if you have a team of people they work
better together than each one of them separately. So like in
a team here you got one plus one plus one. It's three but in
a team one plus one to spawn and not three's like the three
because they leverage each other's strengths and mitigate
each other's weaknesses and same thing here. So if you put
these two computers side by side, yes you'll have more
memory and possibly someone will get a better solution.
Another one that's great but it'll be even better if they start
sharing that experience. And how do they do that? Well it's
through this Wii that we calculate it so this Wii value that's
the output of our network is actually like that. So they have
this same every so often. All these agents they're
contributing to the same criticism. They don't have separate
critics, they have a common critic and that's the key of how
the actor critic ties in with their synchronization. So there's
one critic that's watching us as they get experience. So how
do we calculate the Wii. We've got to get the Wii through. As
you remember we can get TV through the values that we
get so the rewards that we get through the environment.
And so as the agents explore their environment they are
calculating they're predicting the Wii. Plus they have the Wii
that they can calculate. This all ties back into what we've
already discussed in the previous sections of the scores. So
they already have a Wii that they can predict like expect
through the rewards that they know that exist in this maze
and that they've already explored and as they explore them
of course that value can change. But also they have the Wii
that this is the output of the neural network so as they're
going through this they're going to be adjusting their neural
networks in order to better match that expected. So
basically this is shared. The critical part is shared between
the agents and that is how they share the information
between each other. That's how they are able to kind of see
what is going on in the environment shared with each other
and then use that as we'll see further in the next part. So
use that in order to optimize how they're behaving in the
environment. And the other thing to note here is. So this
was through C. This is like the core of A-3 see up to here.
This is a type of version of 08:30 But there's an actually
even better implementation of this. A through C which you'll
actually hear I'd love to talk about in the one of the first
projects and the practical side of things and what he'll be
talking about is how the creator of Pi torche actually made
an adjustment to one of the codes that was shared and get
hub where he took all of these as you can see right now
they have separate neural networks and they showed the
Wii that adjustment that was made was actually to take all
of these neural networks and put them into one take them
and put them together. So ultimately there's only one neural
network here shared among the agents. So before they had
each one of them had one neural network which were
shared for the actor and for the critic one neural network
Shelfer actually for the critic one neural network share for
accuracy. Now they all have one neural network which is
shared for the actor or critic actual critic x or critic. And then
the critic is here in common. So let's see. Let's move these
pictures to the left here so make some space. And this is
basically the architecture or the structure that we're going
to be using in the practical projects. I know that this may
sound a bit overwhelming at this stage but we've got one
more to talk about which is an advantage and there we'll
see it better in action. How does it go so we'll talk about
intuition in action? But generally speaking this is what it is.
This is there's one network which each of the agents use or
they share. Basically what that means is that they share the
weights the weights of the network are shared between
ages and when they update it they update the whole
network not just their own network. And then they have
outputs like these actions for each agent and then they
have the critic that is shared which is going to be monitored.
So I know all of this is kind of like there's a lot of stuff right
now but hopefully it's slowly coming together at least. The
main takeaway from here is that the critic because it's
shared. That's how the agents are able to make sure that
they are cooperating together in order to get the result
much faster. And then in the next project we'll see even
further how all of this adds up. All of this comes together.
And for now there's like I would like to recommend or we
would like to recommend you an additional reading. So this
is a blog by Jaromir Jansch. It's called Let's make an A3 see
implantations is actually two parts implementation and
theory. There's the link and it's very similar to what Adlon
will be implementing in the practical side of the project so
it's not specifically for this project just not just for Sutro but
it's for this whole section. Encouragement there is some
additional information and some additional insights there.
And so that's why we're bringing it up here. But
nevertheless in the next project we're going to start pulling
all of this together.
ADVANTAGE
We're talking about the final a in a 3C. We're talking about
advantages. So there it is. We've already spoken about actor
criticism and synchronization previously. And so he built a
way to what we're going to be looking at today and with
advantage we're going to put everything together. So this is
what we have so far: we've got a neural network which is
shared between the agents and the asynchronous agents
and then we've got the critic which is also shared between
age and so. How does this all play out and why is this critic
shared between the agents. Let's have a look at that. We
understand better we're going to look at an example we're
going to look at this agent for instance and see what
happens when he's in a certain state and he needs to make
a decision about what action to play. So this agent is in a
state he sees this image and then what happens is this
information goes into the neural network it goes to the
convolutional lair then goes into the pooling lair then it goes
into the flattening lair and then from there it goes into the
hidden layers of the neural network and then as an output
he gets all of these policy values that you values are the
policy. And also he gets the value of the critic. And so as we
know neural networks in order to operate they need to
propagate certain errors or losses back through the network.
So this way in order to update the weights so what waits or
so what losses are we going to be dealing with here. Well
we're two losses. We've got the value loss and the postals.
So the value loss is linked to the value partial loss is linked
to pools and so valuable. We've already dealt with it before.
We know that we have rewards and we know that we have a
discount factor so basically this is very similar to what we
were talking about in the conversion in deep learning
projects. Basically the network predicts a certain value V
and at the same time we can estimate what should be
based on what we know about the environment so far we
can estimate what the value should be in the state and by
comparing the two we can calculate the value loss and then
the back propagator network updates the weights. So that's
bracing for the new thing here is the policy loss. And so
what is this policy loss and how does it work? Well this is the
part where this whole situation where the critic is shared
between the actors or between the agents is going to finally
emerge. So to understand palsu loss we need to introduce a
value called Advantage hence the name of this part of the
story on this whole part of the Salyut the advantage and the
advantage is calculated as Q of as an A minus V of s. So
basically the Q value or that you chose to play all of the
action that you chose to play in the state that you are in
state S minus the value of that state. So this is the
difference between the two and that is called that one. And
the advantage is used in the calculation of the pulseless.
Now we won't go into the formula of the pulseless
calculation because it's quite complex. It uses entropy or
you can use entropy without having to. We're not going to
dissect that formula but we're going to understand this on
an intuitive level. Why are we doing this? Why are we
calculating this advantage and how is it going to help us?
Well let's look at this premise for a second. The Q value here
comes from what the neural network predicted for this
agent and so predictive of in this specific action in this
specific state for the action that it can play so it's got these
actions and it can slide one of them and it can play it well
whereas the Wii value is the value that is dictated by the
critic. It is the value that we have here in this shared part
and that's the key here that this part is shared so the critic
breaks because this is how the credit comes into play.
Because we've got a value that we choose or the action that
we choose to play for this agent in that state. But then the
critic can tell us what is the known value of that state what
is overall the known value for this whole group of of agents
that are performing together because their sharing doesn't
answer because the initial B because they're sharing the
critic they're all contributing to this to these v values that
are being calculated for a different set so the whole a.z
algorithm says OK. So the critic knows how much better is
your q value that you're selecting compared to the known v
value. That's what it's saying. So that's basically it. So I'm
going to select a value here based on my policy based on
whether we use something like a soft max function or on or
an Epsilon Greedy policy or something like that. And of
course we'll be out exploring plus exploitation combined in
there but we selected Q value and now the question is what
is the extra. What does that scold ones what is the
advantage that your selected action brings compared to the
known value of that state and that is the essence of the
advantage and basically then that is used to calculate the
policy loss and then the policy loss is then back propagated
through back to the network. So they're both back
propagated through a network and the weights are adjusted
in order for the network to better represent the value of the
critic and also so that's the top part. But then this also part
of the key here is that the value of the weights are bakra.
When is this post-offices back forget that rates are adjusted
in such a way so that this advantage is maximized so like
that's the intuitive side of the intuitive understanding of it
that we are back again this policy lasts through the network
in order to help maximize this advantage. And what that
means is basically that when an agent comes across bad
actions like actions where the q values less than the known
value for the state. So basically the whole ATC algorithm
knows that the value for the state is something X and then
all of a sudden you came across a very bad action and you
did it you chose about action. And what that means for the
enthusiasm is that well why would we do something like
that when it's worse than what we already know about this
whole environment and what could have been done. So we
shouldn't do more of that. And therefore the weights are just
in a way so that happens is rarer. So that happens less
rarely. So that's a less frequent occurrence that we choose
that bad action. On the other hand if you choose a very
good action where q value is greater than V or much greater
then where during this backwardation of the Polish loss
through the network the weights are going to be update and
in such a way to really reinforce that to encourage reassure
that to happen again so that the weights will be adjusted in
such a way that so the atresia algorithm will think oh well
that does really cool that Wantage was very high there. I
should do more of that and therefore you will update the
weights in such a way that will be more likely to occur in the
future. So and therefore that is you know that's how the
network is slowly slowly going to adapt and slowly going to
construct itself into something that on one hand calculates
the value correctly and then on the other hand or as correct
as possible.
And on the other hand it encourages or it has actions which
have a high advantage. So there we go, that's that's this
part. And now let's have a look at another one just to
reinforce what we just discussed at the top 1. So same thing
here. The top agencies a situation a state is in a state and
then needs to decide what to do so since this information to
the networks of this image use internet regrows to
convolutional they're pulling their flattening Lehre goes into
the hidden layers and then from here we get an output we
get the acute values of the policy we get the V values again
the same thing we've got two losses. We've got the value
loss which is here Polish loss which is here value loss. We
already know how it is calculated. When we discussed this in
deep Q learning and just discussing it just now as well. So
that's how the value was calculated and then the policy loss
again in order to calculate that which we're not going to go
into for him. But on an intuitive level we're calculating that
advantage which is okay so we took a certain action based
on our selection policy whether it's soft Max or upselling
greedy or whatever other social policy that we're using. And
then what's the action we took Now let's compare it to the
known and value of the state which comes from the shared
critics so this critic is kind of like if you think about it's kind
of observing all of these agents at the same time is looking
at this one look at this one this one they're all contributing
towards a critic to get the critic more up to speed with the
environment to make sure that the critic is representative of
what's going on in the actual environment so that the
weights. This is where the value loss comes in so that the
weights of the actual neural network reflect very well the
actual situation of things in the environment so that they
can then rely on this value and then use it here. And so
basically. So all of these agents are contributing to this
criticism. But then at the same time through this valueless
but at same time the critic is observing the decisions or the
policies of these agencies. It's like it's like going looking
back at the like I'm trying to draw like an arrow to the poles
an arrow an arrow. So looking back at them at the decision
that they're making is criticizing these decisions through
that vantage and saying OK you made a decision you chose
this you chose this action. That's great. Now let's calculate
the advantage or disadvantage of a ranch's equals you
know the Q value of might have made the decision I made
or the choice I made to the I made chose to take minus the
known value to the critic. Not about the critic. So Kalika the
difference if it's a low difference you're Polish then when you
polish your losses back propagated through the network the
way it's going to be adjusted is going to encourage the
weights to be adjusted in such a way that that doesn't
happen again that that Q value or that Q value is going to
be lower so that because our policy selects the actions
based on the q values the higher the Q value the more likely
it's going to be selected. So if we were using an arc Max
policy then we just always select the one of the highest as
you remember we discussed this then we'd always select
the one with the highest value. But we actually were using a
probabilistic approach where I was using soft Max or
upselling greedy policy. And then we were basically
selecting where we can select any one of them but the
higher the cube the better. So if we selected something and
then the advantage was very low then bomb the network is
going to be added in such a way that next time the value of
that certain action is going to be less and maybe something
else will be more. So that's how that is split out and on the
other hand if we select something where that advantage is
going to be high then this is going to go into the policy laws
and then the networks and we update it so that is a more
commonly observed event like scenario. And so basically
this whole Polish loss helps the network adaptor morph in
such a way that we do. Moral of the good stuff is good
actions and good things and do less of the bad things. And
that's how these two losses come into play and that's how
they're back appropriately. So hopefully that clears up in a
very intuitive way of course we didn't go into the formulas
into the complex mathematics behind all of this and like into
the very intricate details. But at the same time hopefully in
an intuitive way. All of this clears up why we have the actor
and the critic and how they interact together that you know
you have these agents asynchronous or synchronous side of
things. Then this is your actor and critic and this is the
advantage and how that all comes into play. So these are
synchronous agents. They're going. They're playing this or
exploring the environment and working through the
environment and they're all altogether contributing to a
critic which is then observing their policies observing the
actors which is what this is called. And through that vantage
and therefore coming up this poses a loss and then policy
and value loss or back propagate to just the network in
order to. On one hand represents the true way of things in
the environment. Another way to improve the actors'
performances. So there we go. That's a quick recap of the
intuition we discussed. Once again hopefully this is all
coming together on an intuitive level and of course in the
practical projects We'll talk more about how all of this works
in Atlanta we'll walk you through this process of building
owners. But having this image in your mind and this kind of
like a roadmap of everything how it comes together will be
well should be I hope will be very helpful for you to better
navigate the practical side of things. And in terms of
additional reading for today we've got two elements so the
first one is on the advantage.
So here we've got high dimensional continuous controlling
using generalized advantage estimation by John Shulman
and this is an image of a stick figure getting up like standing
up. And here you can find even more about advantage and
advantage and you'll find all the different types of
advantages. You've got the general advantage in estimation
and you've got advantages that you use actually in the in
the forms in the calculations so if you want to find out more
about advantage and exactly how it works the formulas
behind it and some of the the top top elements or formulas
and no holes in the space of this advantage. We discussed
them. This is the article to go to. And one more one other
element that or piece of work that we wanted to remind you
about is the blog, a series of blog posts by Arthur Giuliani
which we've mentioned a couple times already. This is part
eight which is specifically about A-3 sea. So here you can
get another explanation.
So with a bit more mathematics about what is going on and
you maybe you can pick up some additional things from
here. Just two things to keep in mind first of all as always
this blog is intense followers we're using pi torch. So keep
that in mind. And the second thing is that the way we
structured our approach is we talked about active criticism
first then we talked about synchronous. And then we talked
about advantages whereas in log Arthur's first talks about a
Synchronoss an actor criticizes an advantage so keep that in
mind so hopefully that doesn't throw you off. But other than
that of course is a great piece of content. And we highly
recommend checking it out for some additional information.
So there we go, hopefully enjoying today's project.
LSTM LAYER
We're going to talk about an add on that we're going to be
implementing for our 8:3 algorithm. It's called The Long
short term memory or the LSD for short. So let's have a look
at what we have so far and then we'll discuss why we need
the LSD and what else to him is so far we've discussed
through shall we. And we've talked about all three letter A's
in the 3C and of course we've seen that it's actually a bit
more complete or much more complex than what we have
on this image. We actually have three or more multiple
agents going through the environment and they're
communicating between each other and so on. But for
simplicity's sake for today's story we're going to just
illustrate everything with this one agent. At the end we have
this report, the critical part. So basically once we have a
state let's state to this image as it goes through
convolutional there then goes through a pulling Lego
shuffling of flattening a layer. And at this point we have
values or numbers which are then propagated through the
network and they go into the hidden layers and then as the
output we get the policy or the actor parts and they get the
value of the state or we get the critic part and what's we're
going to do today is we're going to talk about this hidden
part in the hidden layers we can actually take it to the next
level and we can add a notification. And we've already seen
that multiple notifications exist for they shall move. Seeing
one of them we've seen that in some cases you can have
this main part of the network which is individual to every
agent. Or you can have this main part of the network which
is shared and that's what we saw that in our previous
intuition trials we had a shared part of the network. This
network was shared between the agents and as Adlon will
tell you more in the practical projects that really helps with
the challenge of breakup. And there are lots of other ways
you can modify the algorithm. Lots of other additions that
can be implemented. And one of them we're going to
discuss because we're actually going to have in the practical
side of the trolls here before you hit and Lares which you
can add is an elist Himmler a neural network Klare which
allows your valuation algorithm to have memory which
allows the algorithm to remember what happened before
and we'll talk about LACMA just now in more detail. But
basically you can add an extra layer here which elist Lehre
and enhance your algorithm with some additional Nimer of
another feature and which you will actually see in the
practical Sorrells is that we didn't even need any hidden
layers after they are all similar So you'll see that in the
Atlanta implementation. He's got the flattening there right
away. After that he's got the stemware So basically this box
represents the elist gambler. And then after that right away
you've got the output so you don't even need any other
hidden Lares after that simply because that's how much
power the LSM lair adds to the algorithm. And again the
algorithm or the architecture of your own network it's a very
visual thing, it's a personal preference. It's a very creative
thing so you might want to have to elist gamblers you might
have well-learned analyst Jim Lehrer. That is several like five
hidden layers after they list them. That's totally up to you
and for you to experiment and explore. But this is what we
came up with in the practical projects so you'll see that we
have a flattening flat layer and after that we've got an ls
dam lair and then the output. So now we've talked about
the other similar so much what is this a handler. Well the
LACMA lair adds memory and gives a feature that allows the
neural network to have memory about what happened in
the previous iterations and it is often symbolized or shown
with a symbol which looks like this. This is just getting
started into it and I'm just putting it here. I know it looks
very crooked but I'm putting it here so you can see when we
further discuss this image. You can see what's going on so
the output of this letter goes in here and that is our. So this
is a whole layer going in here so it's a vector of values x is a
vector avows goes into the stem which we'll just discard the
cell. And then as an output you get another vector which is
you know the concatenation of these stores or some
Somehow it ties in our case as an output you get this and
you get this. So let's have a look at this in more detail. So
it's going to focus on this part. In fact we get into as you
probably noticed by the letters being on the side we're
going to turn it to the side. So like that and the whole this
whole like jumble around was just to reiterate the fact that
even though it looks like this what is actually happening is a
layer of values a whole vector of values is going in here.
Something is happening which will just cause just now and
then a whole vector valise is going on. So this is the layer,
it's not just one element of this is the layer itself.
So let's go back again just to reiterate that Lehre goes into
this where something happens. So that's that the LACMA is
just on its side. So it's just easier to do it this way and that's
a common representation. Right now we agree why this
image was on its side and how are we going to proceed with
this. Let's start digging into this LACMA situation a bit more.
So what happens inside Ellis Jim Lehrer. So this is what it
looks like. And of course this looks very complex and we're
definitely not going to go through all of this right now simply
because there's quite a lot to discuss. The point was
Operation Xolair wise operations and just there's just a lot
going on or a lot of intricate details which we're not going to
go into because otherwise would blow out this of course and
this is not the purpose is not to talk about Else dams here
which are going to utilize the LACMA. And if you'd like to
learn more about Ellis systems you can either go to or we're
here. Christopher Ola's blog. He's got a good description of
his stems or we also talk about LACMA in our deep learning
age. Of course you can check it out. We've got a whole
section on recurrent neural networks and systems as well.
So basically this is the internal part of the system. And what
happens is like the leg goes in so we're going to talk about
this on an intuitive level on a very basic level just just what
was going to be enough for us to understand what happens
or why there's memory. And so that you can also better
understand what Atlanta is talking about when he's
implementing this. So Largo's in all of this is something
something basically goes on here Larry goes up. What we
need to actually see is that out of these parts there's
actually additional inputs into this Lehre. So remember this
usually you have an input from a previous Lehre then this
letter and then you have an output if you think of that
image we had previously the normal network which is not
not on its side which is from left to right from top to bottom
to top. But unless you actually have more inputs. So I know
it's getting even more complex but at least we can
understand them. So this is your memory cell. This is the
key and this is what you'll hear Heidel Atlanta talk about. So
the memory cell is something that is saved in the elist him
so these inputs and outputs are actually here what you're
looking at is the time axis. So this is unraveled in time so in
one specific iteration this happens but then this value is
taken from the possed and this values passed into these
values these values are taken from the past and these
values are passed onto the future and how they pass wealth
through the way the else teamwork so we're followed
worrying about too much what's going on here. All we need
to understand is that when the letter goes in and here we
really have a value which came from the past which is
stored inside the LSD inside the long short term memory.
We've got this memory cell and whatever value was here
before it is just it just stays here as you can see it just goes
through it flows through freely except for these pointwise
operations where it can be either closed off or something
can be added to it. But regardless of that it's just some
value that flows through freely so it's basically passed onto
the next point in time. So you could just think of it as some
memory that is like a flash drive or something like that that
this cell has and so just remembers the previous value that
was in here and then it can use that to either add to it or
read from that value. And this value is the hidden state. So
hence the H and the hidden state is basically. And now the
value that comes from the past and then is used inside the
system. And as you can see at the end after all this happens
what you get is you get a letter that comes out and it is so
that you get this value that comes out and it is the same
value that's passed forward. So basically the Ellis team
remembers two things: there's a constant value that is just
like it stays in the list and that can be changed like this is a
flash drive for like a constant value. So the memory cell.
And so you can you can you have the luxury of storing
something in that space and that memory and it will be
passed onto the future so whenever in the next iteration. So
like the algorithm was in an environment it saw something,
did something and so on. And then in the LACMA you can
store a certain value then it will remember this value even
when it's in the next state. And also the other value they'll
remember i'll remember its previous output. It automatically
will remember its previous output so the output goes here
and goes here. So that's basically the very very very high
level of what happens in an LSM. Once again if you'd like
more details as there are lots of resources where you can
find them and at this stage we just don't need to go into
that much detail on all of these things. We just need to
understand you know what a memory cell is, what
adherence to the memory cell is, what a head of state is and
how it facilitates memory for him.
And the question is so now that we kind of have a general
overview of all of this to reinforce or to solidify this
knowledge you're kind of like giving a reason for this
knowledge. Let's ask the question: why do we need
memory? Why do we need memory in our A-3 or other
algorithms? Well let's look at our example of the challenge
that we're taking on in this section. So the challenge is
breakout and what happens and break will and break out
you've got this environment these little blocks that you
need to destroy with this little ball and you need to make
sure that this is your kind of racket or platform that's
moving around. And wherever the ball is flying it must catch
the ball and bounce off the platform and go back and hit
balls off the walls as well. Go back a block and come back.
And so that's the essence of what you need to accomplish.
But now let's look at this ball like imagine you're an 83 C
algorithm dure or an agent inside one of those agents inside
08:30. You see this picture: what do you extract from here?
What would your action be here for you? So you can see the
balls flying right. So well it is flying right. So it's going
somewhere and maybe it's flying towards the right you can.
Could you make this conclusion and would you anticipate
that it's coming towards you?
You probably could and maybe you're in the right spot to
catch the ball. But what if the ball is actually not flying that
way but is flying that what if it's flying that way. The thing is
you cannot tell from this one image which way it's flying
because you don't know where it was in the previous
moment of time. So if it was here then it's flying this way. So
if you knew the previous moment, if you knew that it was
here you'd be there now you know here is as a human you
just draw a line for these two and you will say going this
way. But if you knew it here you draw lines just as going this
way. Moreover, look at this. It could have actually been
somewhere like over here. Maybe it's going up, maybe it's
actually going that way so maybe it was here and I was
going up with it, so just from that one image is very hard
and is actually impossible. It's just like geometrically
impossible to tell which way the ball is flying. And so that's
why the LSD the memory actually really really helps our
Mfat the memory it can still do a good job but could
probably guess or you know find other ways of
understanding where to go. But with the stem move just
even that one memory so if we go back even with that one
value that was kind of like the output of the previous value
or maybe you know maybe you can store it here or based
on this value or based on the information it gets from the
previous point in time. So let's say from what happened
here. So that's where your ball was before so you can pass
on information about the environment from the previous
point in time through here. Then now you have it now you
know not only have your information from the image. If we
go back even further you'll remember that information from
the image. Well this is doomed but we're actually working to
break out information. The image came here and turned into
these flattened values. And so that information for them.
Image coming into the whole system. And now all of a
sudden as you remember coming from not from somewhere
but from the previous point in time. So that's where you
actually demonstrate coming from top or from the ball from
left right. Actually it just stays in the elist him lair. You have
that information just through the architecture. They'll say
you have information about what happened previously. And
so we go back and that information here helps you now
make a decision on what to do. Helps the algorithm make a
decision. And now all of a sudden it knows that. OK. So the
ball is actually lying in either. Let's say it's flying in this
direction or in this direction so I'm in the right place I should
stick around here the ball is coming in my direction or if it or
if it realizes the ball is lying there it should start moving to
the left because if it waits a bit longer it'll be too late. And
they will miss the ball. So basically that's how the elist them
where really helps in the algorithm and that's exactly what
we will see when you're doing the practical projects of
Atlanta. So there you go that's how these teams work. And
just one additional note as we mentioned at the start Ellis
teams are not 100 percent necessary. They're not complete
. They're not completely attached to their algorithm. You
might want to have them in a through C algorithm. You
might not want to have them depending on the situation to
bring in the Arctic's you choose. There are lots of additions
and we've already discussed the addition or the
modification where the neural network is shared between
actors and is not shared between agents or not. Now
nonetheless elist Jim there's another one that you will see in
the practical projects where we add entropy which is
calculated through a policy lsen Adlon will walk you through
that. So basically there's lots of different modifications that
can happen in an A-380 algorithm. Just remember that it
depends on what you want to achieve and it's also
something that would encourage you to explore if you're
going to be implementing lots of these and trying different
algorithms. We've already discussed a couple and maybe
you can find some additional modifications that might be of
interest to you or maybe when you're watching these
projects maybe buy them more modifications have come
out which are very interesting. So definitely that's
something that you could look into and that could further
enhance your knowledge of artificial intelligence and how to
create these algorithms.
BREAKOUT - STEP 1
The third module of discourse the H-3 sea as synchronous
active critic agents. And so now I can really say welcome to
the State of the art machine learning. Well at the time I'm
saying this because maybe some of you will take the course
in one to two years but at the time I'm saying this in 2017.
Well you are about to work on one of the most powerful
models in artificial intelligence but there is more that is not
the only special thing about this module. Not only we're
about to work with the most powerful model but we are
going to implement the most powerful version of this
algorithm that is the most optimized version of the A-380
model because you can imagine that this is the heart of the
algorithm. But then there are a lot of tools that we can use
to optimize the whole model. And so not only are you going
to have the heart of the way through the algorithm but are
also going to implement all these tools around it to make
the model super powerful. And why did I want to do that?
Well that's for two reasons. The first reason is that we are
approaching the end of discourse. This course is the highest
level course between the three courses and they'll deal in
AI. It is that now I think you're ready to take it to the next
level. And the second reason is that solving a breakout is
actually super challenging. Remember in the promo project
we wanted to put break out as the first module because we
thought it would be the easiest challenge but not at all it
was actually the most difficult challenge and easier way of
explaining this is that well in doom the monsters are big and
therefore easier to detect and therefore easier to kill or
avoid. But in Breakout we have this tiny ball that the eye
has to detect as well because the eye will still have eyes.
You 're still going to do deep reinforcement learning. So it's
actually super challenging and that's why we don't really
have a choice to implement. The most powerful version of
the A-3 see now why do I say this is the most powerful
version. That's for a particular reason. It's not like I'm saying
that I'm going to implement the most powerful version of
the A-380. No it's not this. The reason I'm saying this is that
the version we were about to implement and this is
something very special we're going to do is actually a
version of the 3 C that was implemented by somebody but
corrected by one of the most influential people in machine
learning today who happens to be the creator of Pi torch.
His name is Dan Pashka. Now what we're going to do. We're
going to go on and get on the main page and if you scroll
down to the end down to the bottom you will see the team
fight arch creators and contributors. And you can see here
that my torch is currently maintained by Adam Pascal.
That's the person who we should really be grateful for
because there are very few versions of the three things that
work well for breakout and he corrected one of the codes for
the A380 to make breakout work perfectly well. So Pascal is
not only maintaining By towards But also he is one of the
creative torch and as I said today he's in the top 10 most
influential people in machine learning. So we can feel
confident that the version we are about to implement is
probably the most powerful version of the A-3 seen today.
And so what is this implementation? Well originally it comes
from a developer called iliac history of. And so as you can
see he did a buy towards implementation of the A3 C which
originally didn't work well for breakout but then somebody
made a pull request. If we go to the pool request here we
can see in the close one that we go we have this one a
cleaner solution to Gretchen problem. And guess who this
pull request was made from. It was made from Adam Pashka
created by torch and that solved the problem that makes
the A380 work very well on break out without waiting for
days and days. And therefore if we go back to this
implementation we can see the four contributors of this
most powerful implementation. And here are the
contributors. So thank you very much to all of them and we
can say a huge and special thank you to Adam's passion for
fixing the thread sharing problem. He started by doing a
fork which is a sub branch of the code. And then he made a
pull request to the developer to fix this problem.
There wasn't a code which is a great and sharing problem
and that's how he became a major contributor of this
implementation making the whole thing work perfectly well
and trust me I did a lot of experimentation on the AC DC
model and actually implemented five models. I was even
desperate that it didn't work well so I made my own
breakout and cavity to have a bigger ball and therefore an
easier pre-processing of the images. Then I went back to the
eye and made my own implementation of the 3C. But it took
ages to run and train on a pretty powerful computer. So I
wanted to find a better way. And that's the way it is. Very
powerful implementation of the Silmaril of which one of the
major contributors is the Creator by torch. So what we're
going to do in this Mudgal I think you're ready for is
implement this highest level code for the implementation of
the A3. So we're basically going to reimplement all these
files and we will mostly insist on the files that are directly
related to the three C all the parts that are directly related
to the we see. We will implement the code line by line for
the others. I will just expand the code so we should be able
to tackle this without finding it too overwhelming so that we
get quite a special module. Not only do we work in a state of
the art model of AI but also at the time I'm speaking and
highly confident we are implementing the most powerful
version of the A-380. So let's do it. Let's go back to Python
and let's start all this before we start. We're going to do the
most simple thing we're going to do in this module. Setting
the working directory folder. Let's go to our AI. The template
folder modules we break out are the most challenging ones.
And there we go. That's all our files. So let's see which ones
are directly related to A-3 C. And so let's see which ones
we're going to implement. Line by line and focus our energy
on.
So there are actually two files. The first one is not the same
as this one so we will re-implemented line by line because
that's the most important. That's where we make the A-3
see brains. And the most important thing to understand
here is that we will have a shared model which will have the
same data of the weights for the actor and the critic. That's
a part of this special version of the A-3, see the shared
model with the shared data, the weights and then the other
most important founder to implement. Line by line is the
trained up wildfowl of course right after we made the brains
of the sea. Well we have to train them and we train them in
this train with my wife. So this is quite a long code but this
is what contains the heart of the A3 C model which will have
to Lass's to reduce the value in the US which is the last
related to the predictions of the critic and the policy last
which is the last related to the predictions of the atom. So
this is quite new but you know that's because the A-380
were basically working with several agents each one having
their own copy of the environment. We also have this fully
connected layer that outputs a value of the function and
that basically is a common vision of what's happening in the
game. So this will be quite challenging. So make sure to be
in good shape and for the rest of the fellows. Well I will just
explain them in detail but not by spending too much time on
them. Believe me you want to keep your energy for this.
This will be already a lot. So these fellows are ants ducky Y
which is an improvement of the gym environment thanks to
the universe. So basically that just improves the gym
environment with the universe and that allows us to have an
optimal pre-processing of the images and also to normalize
all the values of the environment like the colors intensities
or the rewards intensities. Well all the values of the
environment this file normalizes all these values and also
make sure we have enough to not keep processing of the
images.
And as you can see this is taken from this opening I get her
page where the universe story agent. So we will not spend
too much time on this. We will actually stop here. You just
need to understand that we improved the gym environment
with the universe to get enough to not be processing the
images. The rest is not that important, especially for the PC.
Then we have main P Y which is the code that will execute
the whole thing. So you know the code that will run the
whole thing, create the brain, train the brain and output the
project. And that's because it will run all these codes here.
So more we saw where it was then may have turned up.
Why is the special optimizer? That's basically the atom
optimizer but adapted to this shared model that we're
implementing. So we will explain all this code in one project.
Then we have a test W-why. It's actually the last one to test.
Q Why is basically the file that will implement a test agent.
So there is an agent that will break out without updating the
model. So that's totally independent from the training. And
we will also expand this code in detail. Besides the good
news is that you will have two codes, one code which will be
the code we implement in the projects but without any
comment. And one of the codes that is one of the code
folders with all the codes commands it. So with all these six
fouls all well connected. So that if you miss something and
project Well you will be able to look at the content of the
code to understand what's going on. So there we go. I hope
you're excited to implement this. You're really at the top of
the mountain now or just below the top because you need
to understand this first but you're getting there. So take a
good breath of oxygen.
BREAKOUT - STEP 2
All right so we're going to start with the first fellow of our
moral and the most important fowl that's muddled up. Why?
And this is in this file that we will implement the brain of the
homo you know the brain at the heart of the 8:3 Simmo
that's in this file that will make the neural network which will
of course contain some convolutional will know when that
works because of course we're still doing some deep
reinforcement learning. So are able to have eyes and inside
this neural network will integrate everything that is related
to the active Craigmile and there is a bonus. As I told you,
we are implementing one of the most powerful A-3 morals
and what makes it that powerful is that it will contain a
record of a new network and more precisely in LCN long
term memory so that we can learn the temporal properties
of what's going on. Again that is actually the temporal
properties of the input so that the predictions can be even
better. So there we go. We are implementing a very
powerful model that combines basically all the neural
networks that we saw in the deepening course. That is an
artificial neural network, a convolutional neural network and
a record new network. And at the heart of all these networks
there is of course the A-380 model that will make the AI AI
very powerful. So let's do this, let's attack this model and
implement it. So we're going to start by making two
functions. There are just some functions that will take care
of how we can initialize the weights because you know we're
going to have some new networks and therefore we're going
to have weights and we just want to make these two
functions for us so that we already have a tool to integrate
very easily inside the home so the neural network. So these
two functions are going to be normalized columns. Initialiser
is basically a function that can not only initialize some
weights but sets a specific variance of tensor weights. So
that's exactly what we're about to implement right now. And
then we will implement the second function which will be
the weights in that function and that will basically initialize
the weights in enough time for the learning. All right. And
then once we're done with these two functions we will start
implementing the neural network. So let's do it. Let's quickly
make these two functions so I'm starting with a def here
then I'm going to give the name of this function which is
normalized and these core columns underscore initialiser.
There we go. And this function is going to take just two
inputs. First it's going to be the weights we want to initialize
and the standard deviation because as I just said we want to
set a specific variance for our tensor of weights. And if you
want to understand why we have to do this it's because you
know when we make the neural network that will be the
actor and the critic according to the through tomorrow and
we'll make two separate fully connected layers one for the
actor and one for the critic. And these two fully connected
layers will have weights and we will set a standard deviation
for each of these two groups of weights. And so what we'll
do is we will set a small standard deviation for the actor. It
will be around 0.01 and a big standard deviation for the
critic which will be around 1 I think. So that's why we're
making this function so that we can very easily set the
standard deviation for the weights we will initialize later for
the actor and the craic. That's why we're doing this. So now
we're just going to set a default value but this will change
afterwards when we initialize the weights. So let's use so for
1.0. All right. And now we're ready to define what's inside
this function. So what will first prepare is the output that
we're going to call out. So this out for all is what will be
returned by this function. And so at first what we're going to
do is it. So as you understood this output will be a tensor of
weights that will have a specific standard deviation. But
before we take care of setting the standard deviation we
just want to initialize it and then we'll set the standard
deviation here which is the argument which is the input of
this function. So to initialize tensor weights you might know
how to do it. We already did it. We're going to use our torche
library and from this torche library we will take the round
and function which will initialize the torched answer with
random weights that follow a normal distribution. So that's
why it is called rant. And as for normal. And so now what we
simply need to input is the number of elements that
distention will contain. And this number of elements is of
course the number of weights because we were actually
initializing a tensor for these weights here. And so to get
this number of elements we can simply take our weights
and add dots to get size with parenthesis.
And this will give the number of elements and weights so
that it will create the torch tensor of the same number of
elements of our weights and it will be initialized with
random weights following a normal distribution. All right.
And now it is time to set the standard deviation we want to
have that is the standard deviation here. So we're going to
do now is a simple normalization. We have a tortured sense
of weights and now we want to normalize it. And so to
normalize it will simply write the explicit computation. And
so what we simply need to do here is take our output then
update it by multiplying it by the standard deviation we
want to have divided by the sum I have just mentioned. And
so to get the sum we're going to use the square root
function by torch. And so that's when taken here torche that
s q r t. That's the square root function. And inside we are
going to input the square some of the weights of a vector.
And so we take our outputs. Then we use the power function
to which we input to because we want to take the square of
the sum and then we take therefore the sun. And inside we
are going to specify the index of the cone that contains the
weight. Want to some and then to get these weights
separately because you want to sum them well we usually
expand on the school as a function of our output out. All
right. So this will get the weights out which so far was
initialized as a Torchin sort of weights that gets all these
weights. We take the sum of square and then we take the
square root to apply the normalization. And the fact that we
have this standard deviation at the numerator we'll make
sure that we can write it here. Variants out will be equal to
the square of the standard deviation. This formula here will
make sure that this tensor of weights that we initialized will
have a variance that will be equal to the square of the
standard deviation that we put as arguments. And that is
how we can set a specific standard deviation for the future
actually and a critic that will make it soon and we will
choose a small standard deviation for the actor and a larger
one for the critic and we will do this very easily since this
function. All right. And so now we have only one thing left to
do. It's of course to return the output that is now normalized
with this specific standard deviation. All right, so perfect.
That's the first function we had to make. That's the first tool
we will be very happy to use tonight. H-3 sea brain. We have
one more function to make now.
BREAKOUT - STEP 3
Actually these weights. So the second function we're going
to call it weights underscore in it and it will take as
argument the object which will represent the neural
network. So that's all. And then colon and now let's get
inside the function to define what we wanted to do. So
basically what we wanted to do is initialize the weights of
the new network in such a way that we get an optimal
learning. So this will not seem particularly intuitive. This is
based on research papers and experiments. We are going to
initialize the weights in a specific way that we haven't seen
before but believe me that will optimize the learning
process. So we just implemented without getting into the
details of why we initialize the weights this way. And so
we're going to start by using a trick which will be used later
to make the distinction between the convolution and the full
connection because you know remember our AI will have
eyes and therefore it will have some convolutional layers.
And of course it will also have some fully connected layers
and we will have a different initialization of the weights for
these two types of connections. So we're going to use this
trick to separate these two kinds of connections and then
we'll use some conditions to get the different initialization
for each of these connections. So this trick is to create a
new variable that we're going to call last name and that will
be called to an object and represents the neural network.
But it's an object we'll see later. And we're going to get the
special attribute from this object which will be Kolesnik with
double underscore first class double underscore dot double
underscore again name and almost there. Another double
underscore. So that's a pretty ugly trick to look for the type
of connection of our new network object but that will give us
exactly what you want. You're going to see it's going to
make sense when we start with conditions. And by the way,
speaking of conditions, we can stop them right now. And so
what we're going to do now is start the first condition which
will get us the first case. That is if the connection is a
convolution and so to write this condition right if class name
that's fine. Here we use a method the find method find and
inside when put in quotes for convolution. And so if class
name does find can we're going to do something different
than minus one that is actually if we have a convolution
because Manasquan means no. Well in that case we will do
a special initialization of the weights. So this condition here
means if we have a convolution connection. So in that case
what we do is run this specific initialization of the way we
want to do. And so that's where all the not intuitive things
will come. We will start by creating a variable that we're
going to call it. And this court shape so weight underscores
the shape will be a list that will basically contain the shape
of the weights in our new network. And so we use the list
function to create a list. And inside we're going to put in the
neural network that weight which will be the weight of the
neural network. But in the convolution connection and to
get the shape of these weights we use another attribute
which is that data and then size size will get us the shape of
these weights in the convolution connection. So now the
weight shape contains in a list the shape of the weights and
the convolution connections of our network and. All right so
we have weight shape then to initialize the weights of this
convolution connection. We're going to need two values.
First is the product of the first dimension by the second
nomination by this third dimension. So that's what we're
going to get right now and then we'll also need to get the
zeros time ancient times the second time and sometimes
the third dimension and then we'll use these two values in
the competition of how we initialize the weights. So let's get
this through this first product. We call it fun and that will be
equal to the product and we are going to use the prod
function which is a function by non-Thai which has a
shortcut and P So MP that PRUD and inside prod we input
what we want to make the product. And so as we said that
is the diamond and one, two and three of our weight. And so
to get this we can take a wave shape and get the indexes of
these three line engines. And so we set Simonton one up to
10 inches and three included. So a dungeon is excluded.
And that's how we can get it for the upper bound here is not
included. So that is what we want the Same for fun out as
we said Fan out it's going to be the product of the damage
in zero time dimension two times that I mentioned three.
And so here we can get indexed from two included to four
excluded. So that will be the product of time and two and
three and then we can multiply it by design and zero which
we can access with whaleship zero or index zero. So to sum
up this is the one time two times in three and just below we
have zero times two times in three of our weight check list
of weights. All right so now we're going to use these two
values fan in and fan out to proceed to the initialization
because we're going to compute a new value as we're going
to call W bound and that's equal to the square root which
we can get with a function and P from and by that as q r t.
Second like before. So the square root of 6 is divided by
fanning out. So fan in let's fan out that we got this W down
here represents in some way the size of the tens of weights.
And why did we get this? It's because then what we were
just about to do now is we want to generate some random
weights that are inversely proportional to the size of the
tensor of weights. Because indeed what we're about to do
now is take our new network and then get its weight. So by
still taking the attribute weight then access its data that is
the tensor itself. And then from this tensor of weights we're
going to generate some random weights that are inversely
proportional to the size of the tensor weights. And so in this
uniform function now we have to put a lower bound which
will be minus W bound and the upper bound which will be
plus W back. OK so that's for the weights. And now we need
to initialize the bias and good news for the bias. It's going to
be much more simple. We are going to initialize them all
with zeros so to get these buys we take them from our
model of course that is our new network. And then the
attribute for the bias is bias then same with access to the
data. And then we're going to use a method which is the Phil
underscore method which as you might have guessed is
used to fill the tensor of bias with zeros well with the rules
we have to specify that we want to fill it with zeros here.
And so that's why I'm putting here Zero. All right so to
summarize we generate some random weights inversely
proportional to the size of the tensor weights and we
initialize the device with zeros. All right so that was for the
Initialize action at the convolution connections. Now we
need to do the same for the full connection. And so we're
going to add new conditions and if we take this trick we use
a first class name that is this variable that contains the
different names of the connections. So if the class name is
the same we use the find method to which when put in
quotes this time a full connection that is a classic linear
connection in a classic artificial neural network. And so the
name for that is linear and saying we're going to use this
trick to say that we want it to be different than minus one so
that's this end of class thing. Fine linen is different from
minus one means if the connection is in there that is if we
have a classical connection. So in that case how are we
going to initialize the weights? Well it's going to be quite the
same. We're going to introduce whaleship Voivode which will
not erase the first one because we will either in this case or
that case so it will not be the same so we can totally reverse
that then same we're going to introduce a fan in variable
which this time will not be equal to the product of these
three dimensions. But actually this time it will be equal to
simply design mention one. And that's because for the full
connection there are less connections than in a convolution
connection. Now is this an intuition lecture in the end and in
the CNN section there is less mention for a full connection
than for a convolution.
So basically we just take this time in one year then say
we're going to have a fan out variable which will then be
used to compute W. bounds and this fan out domination is
going to be the weighted shape of index 0. That is the
diamond zero. All right then to GWB and it's going to be the
same. It's going to be the square root of 6 divided by the
sum offending and found out. So there we go. And then the
good news is that it's exactly the same as previously we use
the uniform function for the weights and the fill function for
the bias to get the same kind of initialization this time with a
different fan in and fan out and therefore different w that. So
that's the same principle, that's the same idea. The only
thing that changes here is that we have less dominations for
the full connection and therefore more simple consideration
of this bound of the weights here to generate these random
weights. But the good news is that now it's really not only
these weights and it functions very well. But now we have
two tools. And so we are ready to start building the brain. So
I can't wait. This will be of course the most exciting part.
This was just to warm up and get us ready for the big thing.
So we'll take care of that in the next project. Well actually
it's going to take several projects of course. We'll start by
making the eyes. And then remember we will add an
illustration to learn the temporal properties of the input and
then we'll take care of the actor and the critic. And that's
where we'll use this to function: normalized comes initialiser
and weights in it.
BREAKOUT - STEP 4
That is the brain of our AI. And speaking about brains I
would like to highlight something. Remember in the first
module we made a simple brain with only fully connected
layers. Then in the second module for Dhoom we made a
brain that not only had fully connected layers but also eyes
because we added the convolutional layers which gave eyes
to the eye because it could observe the images and
understand what's going on inside. And now we're going to
say that even at a high level because we're going to make a
brain that not only will have eyes and fully connected layers
but also memory. Because as I said in the previous project
we're going to add a record of that work inside this big brain
and that will give a long memory to our brain so that it can
understand the temporal relationships and the temporal
properties of the input images. So there we go and an even
more powerful brain. I can tell you that the Dymo we're
about Intiman right now is really really powerful and we can
see how building AI's and doing deeper and deeper and
Forsman learning is all about getting closer and closer to
how the human brain works. You know we started with the
basic relationships of the brain with the linear connections.
Then we added eyes and we added the memory. Who knows
what we're going to add in the future models. You know in
2018 maybe they will add something that will make the
brain look even more like a human brain. But already with
fully connected lasers eyes and memory we have already a
really good and functional brain. So let's do it. Let's make
this brain. So as usual we're going to make a class for that
because it's going to have a lot of properties with the
convolutions and science. So we're going to make it function
to initialize all this and create all these connections. And
then of course we'll have the Ford function that will of
course propagate the signal inside the brain so that we can
eventually get the output. All right. Are you ready? Let's do
this. So class we introduce in class which we call actor credit
because of course I'm talking about brains here. But let's
not forget that we're making with the model which is based
on the active critic principle with separately the actor and
the critic so will make actually one of the near-full
connections for the actor and one meaningful connection for
the critic. You'll see how well we'll do that will be actually
quite simple. So active critics and this actor class is going to
inherit from the end and Maggio so that we can use all the
Pitre which tools. So let's do this to inherit from the end in
that module. Well we need to take first the torch library then
that then and then and that and Mudgal. All right. That way
we inherit from it. All right. So there we go with our first
function which will be of course in its function. So we start
with in its double underscore then this function is going to
take as argument self with the object then the input shape
that is the denominations of our input images and we call it
non inputs and the actions phase which is basically the
space that contains all the actions. We also know from this
action space we can get a number of actions that is a
number of possible actions which will actually get very soon.
So that's why we also needed that for the arguments. That's
all we need. And then let's go inside the function and let's
create all the variables proper to our brain. But before we do
that remember what we have to do to activate in some way
the inheritance that we can use all the tools from the end of
Maggio we have to use the superfunction this way inside of
which we input. Actor critic that is our class and then come
up with the object All right then and there we go again with
the function. There we go. That gives us all the tools that we
all need from torch to build our brain. All right then. Well it's
time to make the eyes of the eye. That is the convolutions.
So we're going to do it very quickly because we already
explained this in detail for doom. Because I remember the
day I had ice. So it's exactly the same we're going to make
some conclusions and we will use a very simple architecture
with 32 feature detectors of size three by three stride of two
and a padding of one. So that's a pretty classic architecture
but that will actually be enough to make sure that I
understand what's going on in the break. OK. All right, so
let's make those convolutions. So we start with self because
the convolutions will be variables of the objects so self that
can we can call it come and there's going to be four
convolutions so and can this one can one. And there we go.
We take the N and Maggio dot and then we take the cone to
the class because actually the covenant will be an object of
disgust. And then inside first we put the input shape of the
images. So that is exactly what we have here so we can
copy this and enter it as the first input. Then the second
argument is that the number of feature detectors is also the
number of kernels. So we're going to take 32 as we just said
classic choice then we need to choose the size of the kernel
that is the number of cells that will slide over the input
image. And so remember we can either take three, four or
five choices and we're going to choose three and then we're
going to choose. Tried to. And a padding of one that we use
for the first convolution that goes from the input image to
the first convolutional layer is composed of 32 convoluted
images. So now we are ready to make the second
convolution. So it's actually going to be almost the same so
I'm copying this line and basing that below pasted below
again and pasting it one last time because we're going to
have four convolutions with almost nothing to change so we
can already replace Here comes one by comes to one by
three and comes one by come four. That will be our four
convolutions. And now of course we need to change some
things here but not much because we're going to keep track
of two for each and a pattern of one. They will all have 32
feature detectors that have 32 outputs convoluted images.
But then remember this corresponds to the left part of the
convolution. So actually that corresponds to what was at the
right part of the previous conclusion you know it's like a
dominoes. It's really easy. And therefore here we have to put
32 and here's where we're going to see very easily 32 and
32. All right so to sum up we start with our input images
that have none. Simon Jones was the first convolution where
we get 32 convoluted images each one detecting a specific
feature then from these 32 convoluted images we apply the
second convolution to get 32 new convoluted images then
the same from these 32 new convoluted images. We apply
the third convolution to get 32 new convoluted images
again and then eventually from the three to convoluted
images we apply the fourth convolution to get features. All
right. And this will be enough with this or I will have
supervision. It will take the ball very well. All right so that's
it for the convolution So that's it for the eyes. And now let's
take care of the memory. This new feature of this brain
we're implementing as opposed before with to not only it
will have a supervision but also it will have a super memory
long memory because we going to see Long short term
memory which is this kind of record or a neural network that
gives to your model some kind of a long memory so that it
can learn some long temporal relationships from the past so
saying we're going to create new variables from time with
self and this new variable we're going to call it simply
because this will correspond to the LACMA network inside
the brain. So SVM and out before we write the code for the
LCN let's make sure we understand what this LACMA part of
the brain will do. So as we understood this LCN is used to
learn the temporal properties of the input images. So for
example if the ball hits the LACMA will encode the balance.
So that's the first thing to understand. It will kind of encode
what's happening in the game. Then the next important
thing to understand when we implemented that ISTM is that
we need to choose in-order of the temporal dependencies
and here since we're going to feed our neural network with
a sequence of four images then that means that we can
already learn some temporal dependencies of order for that
is some temporal dependencies where what happens at
1:20 depends on what happens at time T. T minus 1 T minus
2 and T minus 3 so that we can definitely do that. But the
good news is that we're going to use analysis YEM and
therefore we will be able to learn some even more complex
temporal relationships. For example we can learn some
simple properties where what happens at 1:20 will depend
on what happens at time T. T minus 20 minutes to do minus
three down to T minus. And that's the long part in the long
short term memory with this assay and we can learn some
very complex temporal relationships. All right, so let's add
our LCN.
So to do this we're going to use the N in module and then
we're going to add the class as T.N. cell which will create
this as an object which will represent DST and part of the
new network because right now what's also important to
understand is that we're making a C R and then you know
convolutional record a new network and the arland part
comes after the CNN part and therefore right now what we
need to input in this LACMA cell is first the size of the output
after the convolution. So that is 32 times three times three.
So this 32 times three times three is actually the output
after the four convolutions here. But that becomes the input
of the RNA to the LSD and that works. And now why is the
output of the four convolutions half the size three two times
three times three. Well don't worry it's not that direct. It's
actually not a simple formula but there is a formula to
compute this number of output neurons. After flattening the
pooled and convoluted images of deconvolution. But if we
gathered the terms of this big formula Well we get 32 times
three times three. I didn't want to spend too much time on
this because we have a lot to do. And besides, we already
made a function to compute this number. Remember it was
for doom when we made this count neurons function. So you
can reuse it if you want. You're not convinced but that is just
what we get after gathering the terms of this big formula
like computing the number of outputs. So that's for the first
argument and then the second argument is going to be the
number of output neurons of the. And we're going to go for
256 OK. And so what does that mean now that means that
now we have a vector that encodes each event of the game
or in other words we have an encoded state and so that is
now that we can make the separation between the actor
and the critic because you know we're going to make
actually two separate new networks one for the actor and
one for the critic but they will be the same encoding of the
images and the temporal relationships for these two neural
networks. So this is the common part that we do for these
two new networks. This will be the same beginning for the
two new networks but now things are going to change for
the actor and the creek because we're going to make one in
the full connection for the actor and one different than near-
full connection for the craic.
BREAKOUT - STEP 5
All right so after we made these four convolutions and the
LCN we now have an encoded state that is going to be the
input of these two neural networks that we're going to make
for the actor and the critic. And speaking of them the only
thing that we have to do now is just create tools in our full
connections. One for the actor and one for the critic. But
before we do that we need to get the number of possible
actions. And so I'm going to call you a variable here that is
not going to be a variable of the object. So I'm not going to
use self here but I'm going to create viable non outputs
which will represent a number of possible actions and to get
it. Well we can get it from the action space. So we take our
action space which will be the input of the function when we
create the object. And then we add that and to get this
number of possible actions. And now the actor and the critic
will take separately the same input that is the output of this
whole process here with the convolutions and GLSEN. So it
will take the same input which is an encoded state but then
they will have two different linear connections so that we
get eventually actually two neural networks one for the two
and one for a critic. So let's make these two separate neural
networks. But since we already did the big job with the
encoding here, what we simply need to do is just create two
objects one in the full connection for the actor and one
other linear connection for the critic. And so that's exactly
what I'm going to do. I'm going to create two objects now,
the first object for the linear connection of the critic which
I'm going to call critic and the score Linnie are and to create
this linear connection. You know how to do it, we simply
need to take the engine module and then the linear class to
which we have two inputs. Well the input neurons which are
the outputs of all this including here with the convolutions
and the GM that is 256 neurons. So when I put two hundred
and fifty six here and then we're going to have one output
because remember the output of the neural network for the
critic is the value of the function applied to the input state
to the input encoded states that we made here. So if we call
the input state s that is the output of all this well the output
of the neural network of the critic will be VS and therefore it
has one dimension. It's just a value. And so here we put one.
And remember this is what is shared among the actors so
that they can get some common information that they can
use to play their action in a more relevant way. OK so that's
for the neural network of the critic. And now let's make the
new network of the actor and therefore I'm in here self
taught actor linear and same we already have the inputs
encoded States. So now we simply need to add a linear
connection and therefore saying we take the in a module
then the linear class and now saying this new network of the
actor will take the encoded state that has the size of 256 so
256 here. But then the output is going to be different
because of course you know that the output of the neural
network for the actor other key involves the key values of
input states. The one that we could hear and the action
plate. So again if we call this encoded state that we mean
here as an action played a the output of this neural network
actually there will be q as a. And since you know we have
one huge value for each action then therefore we have no
outputs. Q values and therefore the output here is going to
be non output because no output is actually the number of
humans. Okay, perfect. So if you want I can write for you
output here. The critic is as well as is the encoded state and
for the actor the output is cute as all right. So that's very
important to understand this distinction here and to
understand that we therefore have two separate news
networks. One for the critic and one for the actor Okay
perfect. So we are almost done with this function. Now the
most important thing is done. The only remaining thing that
we have to do is to initialize all the weights of those two
neural networks and all the bias. And of course to do that
we're going to use the two functions that we made earlier
that is normalized columns initialiser and the weights in it.
So let's do that quickly. It's going to be pretty easy and
pretty fast. So first we're going to initialize some random
weights and to do this we're going to apply the weights in
its function to our object. So here we have to take self to get
our object and to object. We apply the weight in its function.
Therefore inside we just need to put the weights in that
function and then we get that will apply this function to our
object and by doing this we are just initializing some
random weights to get a future optimal learning of these
weights. And now what we have to do is make a special
normalization for the actor and the critic. But remember I
think I told this in the previous projects we're not going to
set a same variance for the X in acrylic yakked you will get a
small standard deviation small variance. And the critics will
get a big one. And why do we do this? What's the purpose of
giving a small standard deviation of the weights for the
actor and the large standard deviation of the way for the
critic. Well that allows us to manage exploration vs
exploitation. That's exactly the purpose of doing this by
giving a small variance to the actor in a larger audience to
critique. We will have a good management of exploration vs
exploitation. So let's do this. Let's first take care of the actor.
So we take a self or object then we're going to take the
neural network of our actor Linnea, then we are going to
access the weights of this new network of the actor and
remember to access the data of the weights we need that
data. All right. So with this we get the weights and now
we're going to use our function normalized comb's initialiser.
So I'm copying this pasting here and we're going to enter an
argument. The standard deviation we want these weights to
have. But first remember this function takes two arguments.
First the way we want to initialize. So we simply need to
take that again and base that here. And the second
argument is the standard deviation. We want these weights
to have. So as we said we want a small standard deviation
to the actor in a small one it's going to be 0.01 perfect. So
that's where the weights of the neural network of the actor.
Now let's take care of the bias of the actor's new work. And
therefore here we're going to do almost the same thing
we're going to copy this paste below. Replace weight by
buyers to access the buyers and after data we're simply
going to add fill and remember inside when put zero
because we want all devices to be initialized with zero. So
actually I don't think this line is necessary because as you
remember the buyers are already initialized to zero with this
fill function in the wait function. So you know we're doing
this just to make sure that the buyers are actually initialized
to zero. But I think it's already done here. But anyway now
we are 100 percent sure. All right now we're going to do the
same for the critic. So let's be efficient and let's cover these
two lines. Let's face them here and here we're just going to
replace actors with critics. Same here. And now the only
thing that we have to change is just the standard deviation
we want the weights of the neural network for the critic to
have. And as you remember once this time a large standard
deviation instead of open or one we will input one so there
we go we have a small standard deviation for the weights of
the new work of the actor and a large standard deviation for
the weights. And then when we get at the critic. And of
course that's something we get to replace active here by
credit.
All right now we get cool. So now we have two remaining
things to do first is to also initialize the bias of the team and
to do this we take our object self because the LACMA
belongs to our object and we'd say or as T.N. then that and
then we're going to get the two types of buyers that are in
the last. That's bias and the score by age and the other one
is based on the score age. That's the two types of bias in the
CME and they are going to be initialized to zero. So first we
access the data and then we use the fill underscore function
to fill all these buyers with zeroes and initialize them with
yours. Right. And now for the second group of buyers we are
here the same get replaced by age by age. All right. That
initializes the bias of the zeros and now the last thing we
need to do is to use a method that is inherited from the end
and that module that is the train method. And basically
there is just a method that puts the module in treatment. So
what's the use of it? Well the use is that it allows it to
activate if there is any drop out in the bathroom ligations.
And so to use it we just add a self taught train and that puts
the module in trammeled perfect. So we're done with the
init function. We have our convolutions, we have the CME,
we have our two separate neural networks for the critic and
the actor and all the weights and bias are well initialized. So
that's all good. We are ready to move on to the next step
which is to make the forward function that will of course
forward propagate the signal from the very beginning with
the original input images throughout the brain until we get
the output.
BREAKOUT - STEP 6
We're going to make the forward function that will forward
propagate the signal through all the brain from the very
beginning with the input images up to the outputs which will
contain the key values for the actor and the de-value that is
value taken by the function for the critic. So it's going to be
quite similar to what we did for Dumb but this time
something is going to change. Amending is going to change
that. Now we have an illustration in the brain. So we all
have to do something more to propagate the signal and be
careful with that. And the other thing less important but still
that changes compared to before is that we're not going to
use a real activation function as you know the nonlinear
activation function but we're going to use the loo which is
kind of a more sophisticated real function. You will see that
in the past we're talking. So let's do this, let's make this
function start with a death. It's actually the last function of
this executed class. So we're going to call it forward like
Einstein and this fourth function is going to take the object
self because we're going to use the objects and the inputs.
So important to understand what these inputs will be. This
will not only be the input images, these inputs will also
contain the hidden nodes and the cell nodes of the same. So
that's why I wanted to highlight that some things are going
to change now. Basically we're considering the forward
function of the Internet and cell nodes of the list. And
speaking of them now what we're going to do is separate
these two inputs of this argument in putting the forward
function and how can we separate them. Well we can recall
a new Voivode which will be the input images. So that's
designed input images and we separate them with the
topple H x and C X which is a temple of the hidden states
and the cell States at the Coliseum. So H x other states and
C X are the South states. All right and that will be equal to
the input that is this argument here. So now we made that
separation and therefore we can start to propagate the
signal throughout the brain and to do that we are going to
get successively different layers from the first one to the
last one by using our connections that is the convolutions
the LACMA connection and the linear connection here the
full connections. So let's do this. Now it's going to be the
same as before. We're going to get our first layer that we're
going to call X and get this first layer we need to propagate
the signal from the inputs to this first layer and therefore we
need to use the first convolution because it is the first
convolution that propagates the signal from the input
images to the first layer. So what we're going to do now is
copy this because this is the first convolution we had here
and we apply this first convolution to our input images
which are now the right inputs and that we get that
propagates the signal from the input images to the first
layer. But now remember we have to use a nonlinear
activation function to break the linearity in order to be able
to learn the non-linear relationships inside images and to do
this we are going to use as we said the activation function
that we're about to see right now and that by touch is stuck.
But before that let's get it. So to get it it's like really we take
the functional module which has a shortcut and then that
and then a loop and then we put all this in parentheses
because we want to nonlinearly activate the neurons. This
first layer here that we obtained by applying the first
convolution on the inputs. So now let's go to the PI torch
Doug to understand what it is. Here it is. So you can access
it by torch stuck or slash slash. And then that H G and L and
then you have to find non-linear activations and then the
nonlinear activation functions you will find. Well really that's
the classic when we know it's just a max of zero and X you
have the graph in mind. Then you have six which is this one.
So a little more sophisticated. And then we go and have a
look. And as you can see blue is a redo plus an additional
element. So it's really sophisticated. And so that's the one
we use to nonlinearly activate the neurons and the different
layers. And by the way this new activation function is called
the exponential in their unit. So there we go. We apply the
elu on the first convolutional layer. And now things are going
to be easy. We're going to proceed to the next for
propagation of the signal which is from the first
convolutional layer to the second convolutional layer which
we are going to call X because basically we're just adding X
now x is the first convolutional there. And by propagating
the signal from the first convolutional layer to the next one
X will become the next convolutional layer. And so to
propagate the signal from the first convolutional layer to the
second one we can't simply copy this and paste it here and
replace it one by two. And now of course the second
convolution is not applied to the input images but to X. That
is the first convolutional layer which is right here. All right,
perfect. Now we get our second convolutional there. And
now let's propagate the signal again from the second
convolutional there to the third one. And therefore we can
directly copy this and paste it here and replace two by
three. There we go. And last one. Now to propagate the
signal from the third convolutional there to the fourth one
and last one we can just copy this again here and replace
three become four. There we go. So let's recap. We start
with our inputs. We apply the first convolution to get the
first convolutional there. Then we apply the second
convolution to the first convolution layer to obtain the
second convolutional layer. Then we apply this third
convolution to the second convolutional there to obtain the
third convolutional layer. And finally we apply the fourth
convolution to the third convolutional there to obtain the
fourth convolutional there. And that's how the signal is
propagated throughout the eyes of the eye. So there we go.
We now have the output signal after the four convolutions
and now we know what to do. We need to expand this whole
output signal in one dimensional vector. That's the
flattening step. So we're going to update x again x. Now will
become this flattened one dimensional vector. And to do
this that's the same we need to take X which is so far the
fourth convolutional layer X but then we use a view function
and we first put minus one to say that we want one
dimensional vector and then as a second argument we need
to put the number of elements in the vector and that is
remember three two times three times three. And therefore
we can put here 32 times three times three. There we go.
Now we have our flattened vector and the flattening step is
done. Perfect. Now let's take care of the LCN part. So as you
understood the LSD takes as input the flattened vector. This
one dimensional vector of three two times three times three
elements. So it's already ready and well-prepared for the
icn. The team is now ready to take this Flatten vector as
inputs and therefore we can take our LACMA and inputs as
arguments first X are ready. FLATOW And vector that is this
x-ray here that we just expanded. But also and that's where
the trouble comes into play. We need to put H x and z x. And
we can use ha-Satan see X here because we made that
separations from the original input arguments of the forward
function. So LACMA X the platens output vector after the
four convolutions and all of the hidden in the cell note. So
there we go. Then we must not forget the self because LSD
is a variable or function so a variable attached to the object
itself and a CM and this will actually return to outputs a total
of two outputs which will be the output nodes and the
output sounds. So it's actually topical and therefore we can
update H. X y and z x the cell nodes because that's exactly
the output of this NCM here. Great. So we're almost done
now. Now that we have the outputs of the illustration we
need to get the useful output because actually only the
hidden notes are useful and therefore we are going to get it
by then X again and X will now be equal to atax the first
element of the output sample of the US an X equals. And
we're almost done. Remember that we have two brains, one
brain for the actor in one room for the critic. And therefore
we have to output signals to return the output signal of the
actor and the output signal of the critic. And therefore now
what we're going to do is return these two output signals
and how can we do that. Well that's very easy. We simply
need to take our linear connections but separately that is a
linear connection of the critic and that the full connection of
the actor. And we applied each of these connections to the
output X that is a useful output of the LACMA and that will
be all that will be the output signal. So there we go. Let's do
it.
We first take self or object then we get the linear connection
of the critic which is critic and this colinear which we apply
to X the output signal the CM and then same we take self
again then that and then we take the linear connection of
the actor which is actor and it's collinear are which same we
apply to x. There we go so that's the main thing we need.
But then we're also going to return the top of Ajax to a node
and see X to sell nodes because we will be using them later
and the retro look of the LCN. All right, perfect. So now
we're done with the brain. Or should I say the brains
because we actually made two brains one for the actor and
one critic. So congratulations for making the eighth with the
brains. I hope that wasn't too overwhelming to combine at
CNN and in Anniston but at least the good news is that
we're really working with the best and most powerful model.
So there we go. We're actually done with this first family
model. Y. And so in the next two toilets we'll take care of the
optimizer because we're going to make a separate
optimizer. We're not going to cut each line of code because
a lot of it comes from the research papers and this is
actually pretty specific. And if we go into the deep details of
what's going on with this optimizer this might be a little too
overwhelming for what's going to happen next because we
still have the train function to make which will be a huge
function and that contains the algorithm that we see. So
trust me you want to keep some energy for that. And
therefore we will not spend too much time on this. But still I
will expand the code and you will understand the whole idea
behind this optimization.
BREAKOUT - STEP 7
We made the brain or if you want the brains for the A-3 see
now we need to train this brain. But in order to train his
brains we need an optimizer that's this tool and sticks a
cigarette in the center to take the weight according to how
much they contribute to the error between the predictions
and the targets. And what we did up until now in the first
and second module we used the atom optimizer by torche in
the training. But as I told you we are dealing with a very
challenging problem that is breaking out and the A-380
algorithm by itself is not enough to solve this problem. We
need some customized optimizers and a lot of different
tricks to solve this problem without waiting for ages. So
that's the purpose of doing this and that is why we have a
separate custom optimizer based on the atom optimizer.
And that is contained in this shared enemy class and why
shared atoms. It's because it is actually the atom optimizer
but that will work on shared States. So we're going to
explain how it works and it's to toile So we're going to go
through the different functions here without killing them
because you know we want to keep some energy for the
next and in addition that is the train that I fell which will
take more than one hundred lines of code. So be ready for
that. And therefore we will try to explain what's going on
here in one Statoil Statoil. And let's start right now. All right
so first we introduce this class sharing atom that will contain
three functions: the init function, the shared memory
function and the step function. So what we do first is that
we inherit from that atom which is of course the atom
optimizer and that we get from the Upton module from the
torch library. So here we are plain heritance to get the tools
all related to the atom optimizer and then we start with the
init function. So what happens here. First we use the
superfunction to inherit from all the tools and all the basic
parameters from the Atom class and these basic parameters
are here. Harams learning race baiters Epsilon and weight
decay. And then we start a follow up. This is the first Foluke
for the Paramo group. So for us what is parum groups set up
around groups contains all the attributes of the optimizer.
And among these attributes we have the parameters that
we have to optimize these parameters that we want to
atomize other ways of the network that are contained in self
parum groups perhaps. So there we go, a group that
belongs to self-help groups. And here we have the second
Faltu which will get these parameters that we want to
optimize and that are exactly contained in self-doubt parum
groups perhaps. So basically we go through self-help groups
that contain all the parameters and for each group of
parameters and self-talk from groups we are going to go
through the parameters that we want to optimize. Therefore
for the group Paramo here means for each Tancer of weights
that we want to optimize. So for each sensor weights that
we want to optimize and then what happens inside this
group with these four lines of code. Basically what happens
is that the update made by the optimizer is based on an
exponential moving average of the gradient. That's this line
of code here. That's the exponential moving average of the
gradient of moment one that is of order one. But the objects
made by atoms are not only based on that, it is also based
on an exponential moving average of the square of the
gradient. That is an exponential moving average of the
gradient of momentum to or two.
So here is the exponential moving average of all of one. And
here is the exponential moving average of two for each of
them that EMJ has degraded. So that's what happens here.
And now if you want to get more in depth of how the
exponential moving average works well I highly encourage
you to have a look at this research paper Adam method for
stochastic optimization because basically the atom
optimizer that we're implementing right now is based on the
algorithm here. So if you want to have more details on how
the algorithm works well this paper will be definitely helpful.
And then you have some further explanations on the
algorithm with the atoms and the rules. And so you know
that's only if you want to attack this before attacking the big
train function that will be made afterwards. OK so let's go
back to bison. And now let's move on to the second
function, shared memory. So now I'm just going to say a few
words. The idea of this shared memory function is kind of
like the tensor that kuda you know is an accelerator based
on a view. And so basically what happens here is that we
have these tensors of the states that share memory here
and here that behave a little bit like tenso that could
accelerate computation. But the difference is that here the
sensors that share memory send the computation to a part
of the GP you or you that is accessible to all the paralyzed
threat. So that's basically what is done here. That's a little
bit like 10 to that kuda but it's only sent to a part of the GP
to be accessible to the parallelized threats. All right. And
then we have the last function step. So you know this
function it's like the step method of the atom optimizer that
we are using in this course. And so again this is based on
the algorithm of one of the same papers that we saw before.
So this algorithm. So again you want to understand in detail
the following lines of code. Well again I Ingrid's you to have
a look at this algorithm one by this paper. And besides what
is done here is not totally compulsory because this is
actually a copy paste of the step method acting that atom
class. So basically what is done here we could have done it
by using our inheritance because here we inherit from Acton
that Adam and so to use our inheritance Well what we can
do instead of doing all this is just going to write here
comment is just use the superfunction that we apply to our
shared Adam class then our object self and here we just add
step with parenthesis step is the method of the act in that
class and that's exactly the same. That's why I was just
saying that here is just a copy paste of the step method of
the act in the Atom class. So I think that if you replace all
this by this super function applied to share Adam and the
step method well we might get exactly the same thing. All
right, so that was interesting to have a quick look at it.
Basically you can see this as the Adam optimizer. It's like we
had a deeper look at it. But again if you want to go into
more details of all this and if you want to understand what
happens behind the scene Well I encourage you to have a
look at this research paper. I'll put the link in the comments
here. You know remember you will have all the code
connected in great detail. So it's really good if you can have
a look at it. And now I hope you have some great energy
because we are going to move on to the train file which will
contain this huge train function and that will basically train
our brains which now we can do because we have our
optimizer. So have a good break now.
BREAKOUT - STEP 8
We also have the optimizer. So basically we are ready to
train our different agents. That is our different brains. So
that's from now that will make this big train function which
will contain all the A3 algorithm and therefore what we're
about to implement in this train that I found is just this huge
train function that will just be this big train function nothing
else no class but who will use this train function. And the
last step of this module with the main code. So there we go.
But before we start you can notice that. Well first we import
some libraries so that's the classic libraries with the torch
module. I mean your torch library then ends the library to
create the environment which will break out. Then we will of
course import the class from our model. File this one. And
finally we will use a variable from TORCIDA. I regret running
highly performing competitions at the gradient. Thanks to
the dynamic graphs. And then we have this secure shared
grad's function which I didn't want to spend too much time
on because well first this is just a function that will make
sure everything works correctly. If the model used by the
agent doesn't have any shared gradient. That's why it's
called short shared grad's and the other reason is that I
don't think this function is necessary. But we never know
and at least with this will be 100 percent sure that the code
will execute properly but that's not really important. What
we must focus on is this trend function that we all start
making right now. So here we go. Def and train will soon
become a train and this transformation will take several
arguments. The first one is rank. I'm going to explain what it
is and the second one is harams so that all the parameters
are the environment. Then the third parameter is going to
be shared moral. So you know the shared model is what the
agent will get to run its little exploration on a certain
number of steps and then finally the last argument is going
to be the optimizer that is the one we made earlier. So
perfect that's up for arguments. And now we are ready to
start implementing the same function. So the first thing
we'll do is you know you remember what A-380 stands for it
stands for synchronous active Crilley agents.
So at 8:3 there is synchronization. So as you understood we
have to disenfranchise each training agent and to
diseconomies then we're going to use the rank to shift each
side with this rank so this rank parameter here is just to
shift the seed so that each training agent is synchronized.
So for example if there are any training agents then the
ranks will go from 1 to 10 and there will be one integer per
agent from 1 to 10. So when we shift the seed by one thread
all the pseudo random numbers created by this thread will
be totally independent from the other threads. However the
seed or fixed numbers. So when we reproduce the
experience we will find exactly the same events. And that's
because it's deterministic with respect to the seat. So it's
important to understand that and that's why we need to
synchronize each trainee agent by using the right here to
shift the seed with the rank. So let's do this. We're going to
take our torche library. Then we're going to get the seed
with manual underscores seed parenthesis. This is a
function and now we're going to take the seeds of all the
agents which we can access with that seed and to shift
them by the rank to synchronize. Each of these agents will
just add here plus rec and that will shift the seed with the
rank to disenfranchise each trainee agent because there is
one seed for each training agency. All right, the first thing
done and now the next step is to get the environment. So
we're going to create a new variable that we're going to call
and and now will use to create Atari and function from the
end module to create the environment for breakout. That is
to get the environment to break out. So we take this
function to create Terry and and now we have to input just
one argument which are the parameters of the
environment. And we have them because this is one of the
inputs of the brain function. This is this parameter here
which will be the parameters of the environment of breakout
and therefore to get the breakout environments we take
these programs argument then that and then we get an
name which in the future that is in the next code with the
main function that will execute the whole code will be
breakout brachial Vizier. All right so that gets us the
environment perfect. And now the next step is to align the
seat of the environment on one of the agents. And why do
we do that? It's because remember each agent of the A-3
Silmaril has its own vision of the environment like its own
copy of the environment and therefore we need to line each
of the agents on one specific version of the environment
and to do that we're going to use the seat because each
seat determines a specific environment. So by associating a
different seed to each agent we'll get exactly what we want,
that is that each agent will have its own environment. And
so how can we do that we can take our environment then at
that then use the seed function to you know choose the
ones he gets for the environment. And so now to align the
seat of the environment to the seed of the agent. Well we
simply need to get this because this corresponds to the seat
of the agent that was shifted to rank to get decent
organized training agents because they're all on a different
set. So we just need to pay that here and this will align the
seat of the environment on the one of the agent Okay now
we are going to get our model. That is our A-3 see brains.
And so that is now that we're going to use the active class
from our model file. So we're basically going to create a new
object of this activity class and we're going to call this
object model or brain if you like. But basically this object will
contain all the convolutions the CM The linear connection
and the Ford function to propagate the signal. So it will
basically contain the brains of the actor in the critic with the
ability to propagate the signal throughout the brain to get
the final output. So let's do this, let's create our model so as
we said we want to call this object model. And so we create
an object of the Act create class and therefore we take a
class actor critic and now remember what arguments when
the two inputs. That's actually the arguments of the
function. So we have to input it. You know that's what we
have to do to use the object in the method. But then the
arguments we have to put are nomine puts which are in bad
shape that are done in chains of art in print images and the
actual space that contains, you know, the set of actions. So
let's put these arguments in the train function. So the first
one we can get it with our environment and that and then
we use observation space that's the space of observations
then that and then you get the number of inputs we get
shade bracket's zero. All right. That's for inputs. And now for
action space. Well that's almost the same as we need to get
it from our environment that we are more important than
that. And then action space. All right. And that gives us the
arguments we need to input when creating an object. The
model of the execrate class. OK so now we have our model
and now the next step is to prepare our input states. So
remember we're still doing deeper informal learning so the
input states our input images and therefore this will be
originally done by Ray which will contain one channel
because we will work with black and white images and it will
have time in oceans of 42 by 42. But it's important to
understand and to keep in mind here that the input states
are input images. So what we have to do is to get the non-
power then we will convert it into a torture answer. But the
first step as what we did previously is to get an umpire and
to get it. It's actually quite simple. Well first we need to
create a variable for the input state which will cross state
and this to get an umpire array. We simply need to take our
environment and then adapt and then use the reset
function. And this will initialize States as an empire array of
dimensions one by 42 by 42. One means 1 channel so black
and white image and 42 by 42 is of course the
denominations of the image. The number of pixels and the
width and the number of pixels and the height. So basically
that's just the time instance. And that's the ones we work
with. And now that we have this actually in umpiring
because this will get us these images of such time insurance
in the Empire. Now we can convert them into torch dancers
and to do this well we're going to a data state again
because we don't need to keep the number arrays. And
that's where we use the torch module. And remember we
already did that to do with the function from underscore
non-Thai parenthesis. And inside this function we need to
input the number rate which we want to convert into a torch
sensor. And that is the state the previous version of the
state non-pay array will become pipelined from the pipe
function a torch sensor so that just creates intenser from the
state. And now we just need to initialize the done viable.
Remember the variable is generally the variable that says if
an episode is over or if the game is over. Well here we just
want to introduce this done very well and initialiser to
specify that this Don't Voivode will be equal to true when
the game is done. That will be useful for later so that the AI
doesn't play indefinitely to break out. All right. So that was
basically the beginning of this trend function with some
initialization and some things that we have to do. The most
important part here was that we have to disenfranchise
each trainee agent. That's one first principle of the A3
similar we have to apply. And now in the next project we will
proceed to the synchronization with the shared model. Let's
not forget that there is a different model but also the share
model which is a model that all the agents share. And so we
have to synchronize with the show model so that each
agent can get this shared model to proceed to a small
exploration of a certain number of steps.
BREAKOUT - STEP 9
So what we're going to do is still send the function of course
and then initialize the length of one episode. So we're going
to call the length of an episode and the core lengths that we
go to and we are going to initialize it to zero. But then this
present length will be incremental. And speaking of
increments in that it's exactly what we'll do. So we're going
to use a while loop and use this trick to say while true Kallen
to repeat what's going to happen now what's going to
happen inside this world. And so the first thing that's going
to happen in this loop is this incrementation of the length of
an episode. So the first thing that we're going to do is
incremented by 1 and to do so we can simply take episode
lengths and add here plus equals 1. And now we are going
to synchronize with the share more. That means that it is
now that the agent will use the shared model to do its little
exploration on a certain number of steps and how is the
model going to get this shared model. Well we need to take
our moral than that and then use the load state Dick's
method because we're going to use it to get the state
dictionary of our shared model so we have to put the shared
model first and apply then the static method to get the
parameters of the shared model. And that's how our model
here will get the shared model to its little exploration Okay.
And once the model gets this shared model now we have to
distinguish two cases. The first one is if done meaning if the
game is done so the game is done then what happens in
that case. Well we have to re-initialize the hidden states and
the cell states of the LSD and the mall. And so that's why
I'm going to take See X the cell States and also age X the
hidden states and I'm going to re-initialize them books and
how are you going to re-initialize them. Well with only
zeroes there will be a vector of 256 zeroes because
remember the outputs of the rest. As I mentioned 1 and
256. So there we go we are going to initialize them by using
the torch library then the zero's function. And since we want
a vector of 256 zeroes we are going to hear the dimensions
one for the vector and 256 for the number of elements
which will be zeroes and then we go. But then we will
convert that into a torch Voivode because then some
gradients will be computed. So we need to integrate this
with a gradient. All right. And we're going to do the same for
the hidden states just below and really analyze them the
same way. There we go. So that's if the game is done. And
now the other case which we can access with Else other
than what happens in that case. Well we're going to keep
the old cell States and hidden stakes and so very easily we
can keep the old ones this way by typing see X equals
variable c x that data and same for that in the States we
can simply add here H x equals variable x x that data are at.
Good thing done. Now we can get out of the equation
because we're basically done with these two cases whether
the game is over or not but we stay in the while loop
because now we're going to do some more things which
basically are all the training process. And so what we're
going to do now is initialize several variables which are
going to be at the heart of the computations in the training.
So let's do this we're going to need the values which
remember the output of the critic. That's the function. And
we will initialize them as an empty nest. This way then we're
going to need to lug probabilities. So luck probs and we will
also initialize it as an empty list. Then of course we're going
to need all the words that we will also initialize as an empty
list. And finally we're going to need the entropy to be
something new. But this is indeed at the heart of the
training conditions. So until just as well. So now that we
initialize these four variables we can start a new FOR loop
and then this nymphal will update the values of these four
variables. And so this new Foluke is going to be a full hoop
over the expiration steps and therefore the looping variable
is going to be our steps. So for a step in range and inside we
can directly put perhaps dot non-stops because the
parameter in some steps is exactly the number of steps of
the acceleration. So for all the steps in the acceleration
What do we do? Well we're going to get the predictions of
the model. Now what is returned by the model and to get
these predictions we can simply take the model and apply it
to the inputs that input signal it goes through the brains in
the model. And that will give us the outputs but it will get
several outputs you know it will get us the values of the
function which is the output of the critic. Then the q values
QSA which is the output of the actor but also don't forget
that it will also output the topple of Doheny states and small
states because remember if we go back to our model well in
the forward function we can see that indeed it returns the
output of the critic. That is the value of the function yes
then the output of the actor which of the cube values QSA
and also the output of the CM which is this double checks
and see X then States and the cell States. So we must be
careful with that. This is quite different from what happened
before and therefore we're now going to apply them all to
the inputs which are the states. But now there are several
things to do which are related to torture. But that gives of
course power to what we're doing. The first thing we need
to do is to squeeze in the states to add this dimension that
must have the index 0. That's because the model can only
accept a batch of inputs and not an end by itself in a vector
or intenser. That's the first thing we must do and squeeze
but then that's not all we need to convert our input states
into a torch voidable so and I in here the very. So now we
are with the state police state but remember that the inputs
of the four functions are actually the input image. That's
what we just took care of but also this type of age X the
states and the cell States and therefore we need to add
here. This second part of the input with its appeal of age X
and 6. All right. And we must take up the parenthesis. There
we go we have our two inputs. The first one is the input
states that are input images all converted into variables and
squeezed to add this fake dimension of the batch and they
stop all of the states and the South states. So we're all good
to go. We are ready to get our predictions. And now since
this return. Well our three predictions are the output of the
critic, the up to the actor and the top all of the reinstates
necessitated by the CME. Well we're going to introduce
some three new variables now which will release three
outputs. So then we get the first output is the value of the V
function which is the output of the critic. So we're going to
call it that. So there we go. That's the first output. Then the
second output is going to be the output of the actor. And
that's the Q values QSA. But since the q values are
associated to the actions we can also call them the action
values. All right. And then find an output returned by the
morrow. That's the double of the hidden sales tax and the
cell states see X and then we go we have three outputs
returned by them all perfect. So now that we have the
predictions we need to use a soft Max to play the right
action.
And so now that's going to be exactly the same as what we
did before. The next step is to get our probabilities so we
can call them from and that's where we used to sighed Max
method which we take from the functional module that has
a shot at f f that sighed Max and that will generate a
distribution of probabilities of the input that we're about to
put right now and which of course the actual values that is
the q values that is the outputs of the actor in the model.
Occasionally we have our probabilities but as you noticed
we're going to work with the entropy and together entropy
would not only lead the probabilities but also the LUGG
probabilities because the entropy is the sum of the product
Lucke prob. times trub all this multiplied by minus 1. And so
we also need to get our love which is going to be generated
from LUGG soft max. So instead of taking a distribution of
the probabilities we take a distribution of the probabilities
and that's what we do with LUGG sighed knocks the
Optimax function to say we apply to the cube and use which
we call the action values. All right so now we have the prob
and the lock up. And so we are ready to get the entropy and
the entropy. What is the formula for that? Well as I just
mentioned we take the luck prompt we multiply by the
product. Then we're going to take the sum of all this and to
do that we can add here that some one we actually use the
street many times now. And as we said we multiply this by
minus 1. So it's the minus of the sum of the product. Lot of
times. Perfect. And now we are going to store this entropy
that was just computed in our list of entropy. There we go
we have the last computation of the entropy. And so we
need to store it in the entropies list and to do this nothing
more simple we're going to use the append function of
course because entropies is a list. So we take our entropies
list then start and we use the append function to add the
entropy that is computed. All right, so we're going to take a
break now. We're going to do this step by step in the next
story and will play the action by taking a random draw of
this generated distribution of probabilities. And after we
play the action we'll get the value of this state and we will
eventually store our new transition states reward.
BREAKOUT - STEP 10
So we just computed the entropy and added it to the
entropies list. And now what we're going to do is take a
random drop of an action according to the distribution of
probabilities of the next. So let's do this. That's the next
step. We are still in the loop because we're still running on
the steps here. And so you now know how to play the
action. We will first introduce a variable for the action called
action and then we take our distribution of probabilities and
we're going to use the multi no neural function to take a
random draw from this distribution of probabilities and then
we add that data. So it's important to note that the action
will actually be a tensor with only one value but you should
not see this as a simple value. You should see this as a
tensor damnation one by one that contains this value for the
action. And that's because it isn't squeezed out still in the
same for loop. We're going to get the log probability
associated with the action that was just played. And so
when I'm dating my luck probability here by taking the
previous one the previous luck from that we computed here
and then I'm going to use the other method to which I'm
going to input 1 and the action that was just playing
because we want to get the luck probability that is
associated to this action. And so the second argument here
I'm going to put my action but there has to be as a torture
horrible as required by the gathered function and the
gathered function just indexes with a tensor integer. All right
so now we just got the look associated with the action that
was displayed. And now the next step is to append what we
got to the list here. So we got the value. That's what we got
here as the output of the model. Then we also got the lock
problem. So we are going to add the lock to the lock props
list. We already append the entropy to the entropy is less
good and the rewards will get it afterwards. So we will now
open to value and look up the values list and the law
process. Let's do this. We take our values list, we add that
we use the spend function and we add the value that was
returned by the model perfectly then Same for the lock
probs We just got our new props and we are going to
append it to the lock props list. And so in this append
function we can put a log from our luck probably it was just
computed here. All right so our lists are now well updated.
Now what we're going to do is play the action because
actually right here we selected the action by taking a
random draw from the distribution of probabilities here. But
we actually haven't played it yet and we're going to play it
now so that we can reach the new state and therefore get
the new transition and to play it. We're going to take our
environment because we play the action in our environment
then we're going to use the step method. And inside we
specify the action that was selected to play it and to do this
we take our action and we add that none by because that's
what is expected that is the function. Ok but this actually
returns the new state and also the new reward because by
reaching the new state we get a new reward and also we
get a new value for Dunn to know if the game is done or
not. All right so with this we play the action we reach a new
state and we get a reward and we know if we're done with
the game. And speaking of being done with the game. Well
we're just going to add something here that will make sure
that an agent is not stacked in some state. And to do that
we're going to update that done very well the following way.
Well it's going to be equal to done or we're going to add a
condition saying that the episode of the game should not
last too much time and we will see in the main function that
there will be a max length parameter which will be equal to
10000. And we don't want an episode to last more than
10000 units. So we're going to hear episode length which is
the length of an episode and we're going to write a
condition larger than max episode Lex that we haven't
actually said this at length. We are getting it from our
parameters for an ending here Paramjit but Ramstad. Max is
at length. So this means that if the game is done or the
length of the episode is larger than the maximum length of
the episode set which will be equal to 10000. Well the game
will be done and we will start a new game. OK so that's just
a precaution. And speaking of precaution we're going to add
another precaution to clamp the reward between minus 1
and plus 1. We already got where we were but we want to
make sure that the reward is between minus 1 and plus 1.
And to do this we simply need to update the reward by
doing this taking the max then taking the men of reward
and 1. And here we take the max of the minimum reward
and 1 and minus 1 and that will make sure the reward is
between minus one plus one. All right. So another
percussion.
And now we just want to check if the game is done in which
case we will restart the environment. And why do we need
to check that now it's because we just reached a new state.
We just passed a new transition. So we need to check that
after passing this new transition. Well the game is not done
so if done again if done then in that case we will restart the
environments by setting the episode length to zero. And
also the state will be re-initialize to re-initialize as we take
our environment and we use the reset function OK. Now we
get out of this condition that was just checking. And now
what we're going to do is since we reached a new state
while this new state is right now and then by Ray because
remember the states are the input images which originally
are named by arrays. And so now what we have to do is to
convert the new state into a tortured answer. So we're going
to update our state and we're going to use the torch library.
And of course the from non-Thai function to convert this
non-payers state the input images into a torch sensor.
Perfect. And now the last thing we need to do before getting
out of this for loop is the loop on our steps Well it's to of
course spend the reward on the Watchlist. That's the last
thing that needs to be updated. We updated all the lists
here except for the reward. So we're going to do that right
now. We take our rewards and we use your brain function to
append the last word that was just received perfectly. And
just before we get out of the for loop we just need to do one
last check to check that if it's done then we want to stop the
expiration. And so we're simply going to add a break here.
Meaning that if it's done we stop the exploration and we
directly move on to the next step which will be the update
of the shared model and now we are done with this for now
that the agent has done its exploration.
BREAKOUT - STEP 11
So now the agent has done its exploration and then when
he's about to do it is to update the shared network. So the
first thing we're going to do is initialize the cumulative what
we're going to call it our capital R and we will initialize it as
a torch tensor but that will have dimensions one by one
because it's just a value but we wanted to be a tensor. And
so I'm using here but zeros and then 1 1. So basically the
cumulative reward is initialized to 0. OK then saying if we're
not done that is if the game is not over. What we want right
now is the cumulative reward to be equal to the value of the
last trade reached by the shared network. So we're going to
get the value output. You know the value of the function
outputs of our model and this is the value we will give to the
community we work. So let's first get this value we can get
it this way. Value Then you know since we only want the
value we can add here underscore and then underscore
again and then we get our model because it will output this
value but only the first output of the moral thing to do is
double on its course here and here we can just copy paste
what we have here. That is the input of the model with the
input images and the pull of the states and the South states.
So I'm just pasting that and there we go. We will get the
value. And so now what we're going to do is give to our
value so all will be equal to value and to access to the value
we at this start there. All right. Now the if condition is done
and now what we're going to do since we just got a new
value by you know getting the output of the model the first
output of the model well that's already append this new
value to the values list. Therefore we can take directly our
values list then towards a tent and we put variables because
our. This last value is so great that it is done now. We are
going to initialize the losses and remember intuition
lectures. You have two losses. You have the last of the policy
that is the last related to the predictions of the agent. And
then you have the last of the value which is less related to
the predictions of the critic. So we are going to introduce
these two variables initialized into zero and they're going to
make a horrible policy loss for us. Initialize it to zero and
then value lost a lot of the value and say initialized it to zero
then let's not forget to set the cumulative reward as a torch
variable because we will need it to be a torch Roybal
because we will be computing a gradient with respect to it
because the cumulative reward is going to be a term of the
value loss. So is this viable? It is now attached to the
dynamic graphs with a gradient. And now finally the last
thing we need to do before starting the big trend loop you
know when we applied gas to degrade in the sun to reduce
this last between the predictions and targets. Well we need
to initialize the GAAP to generalized advantage estimation
and not get it or uncoated. Be careful with that GAAP the
variable that we're about to initialize right now is
generalized advantage estimation. So as a reminder
generalized advantage estimation is by definition the
advantage of playing the action a by observing the state s.
So it's a function of the action and the state s and it is equal
to the difference between the q values Q A S and the value
of the V function. So actually I can write it here. The
generalized advantage estimation is a function of the action
and the state s and that is equal to the q values of the
action A and the state S minus the value of the V function
applied to the state s. That's the generalized advantages to
mention and that's what we want to initialize right now. And
we will initialize it to zero. But it has to be towards dancers
who were going to use the same trick as what we just did
right here. We are going to take the torch library and apply
to the zebra's function to set it as a tensor of only one value
which is zero. And we are going to use this new variable g
and that will be equal to that torch that zeros 1 one as
initializes us. So this will be initialized to zero and therefore
the q values of the action state s will be equal to the value
of the V function of the state s. All right. And now we are
ready to start the for loop. So we're going to have an
adventure here.
BREAKOUT - STEP 12
Now we're going to make the for loop to compute the policy
loss and the value loss and once we have these two lessons
we will be able to use our optimizer to place cigarettes in
the sand to reduce the losses. All right so there we go. We
start here by the way. In the previous project we
implemented this section and I forgot to remove the
indents. Sorry about that. So starting from here is not in the
fall. And now we're starting a new full loop so I'm starting
here with four. And now what we're going to do is we're
going to start from the last step that was done during the
exploration and we're going to move backward in time. So
that's why I'm doing it in a reversed range and the biggest
rewards are the least. And since each step of the
exploration is associated with where we work because at
each step we get rewarded when we speak. Is this number
of steps and this reverse here used so that we can move
back in time so that we go. And now what we're going to do
is update the cumulative reward that is far and we're going
to update it this way. That's actually the same as what we
did for Doom. It's equal to gamma which we get from our
parameters and taking from first programs that are not
times far plus the reward of this which we can get by taking
the least reward and taking the index. So for us this will be
the work of the last and then it will be the reward of the
previous day and etc. and each time we update our By
multiplying it by gamma and then adding this reward at the
set. And so by doing this remember we will get in the end.
So I'm going to write it as they come and we will get our
community reward that will be cool at the end of the loop to
our zero. The reward of step zero plus gamma times are
one. We were the first that plus gamma squared times are
to the word. The second step plus that plus gamma at the
power of and minus one times the reward obtained at step
and minus 1 were any number of steps but then be careful
at the end we'll have gamma at the power of number of
steps. Times devalue the value of the function applied to the
last state. That's what we should get at yet. And this we will
get that because remember here we got this value and the
last step because this was done at the end of this for loop
here. And so we got the value and we set it to be equal to
that value. So right now at the beginning of the second full
loop here will be equal to this value of the last date. But
then by doing this this is what we'll get in the end: equal or
zero percent or one tennis court or two plus can add the
power and minus one time that we were at step and minus
one plus game at the power of number of steps times this
value of the Lastings. So that's the main thing to understand
and this can be the action of the cumulative reward. And
that's why it is important to start from it by initializing or
with the here and doing this reversed loop to get this final
equation perfect. And now that we have the right value for
the cumulative reward, we will compute the advantage and
the advantage here is just the advantage of getting this
reward compared to them. So I'm going to introduce an
evolvable advantage and therefore it will be equal to this
cumulative reward minus the value of the V function
obtained at the stage. So therefore that is our minus value.
Perfect. And now that we have the community we work for
and the advantage then we can get the value loss. This is
the first that we can get now. So we're going to get our
value very well and this will be updated the following way.
Remember so far that devalued us was initialized to zero.
And so we're going to take the value loss again and at 0.5
times the square to the advantage so we can get it this way.
Advantage thought out too. So that just means to square
the advantage of the power to and that is exactly the value
plus the loss generated by the predictions of the value of
the function outputs by the creek. And so it makes sense
that this is devalued just because remember the advantage
of the action in the state s is the difference between the Q
value and the value of the B function. And so when we play
the optimal action Well we get the stationary state with Q
optimal of the optimal action a star player in the state as
equals the optimal value. Vistar of the state s. So it's quite
intuitive to understand that when the advantage is not
equal to zero then there will be a difference between these
two. And therefore that's how the last is measured. OK. So
very last computed one last down. We have now one more
to go. It is the policy loss and that is what we're going to
compute right now and to compute it we need to consider
again the generalized advantage estimation because to
compute the policy loss we need to generalized advantage
estimation and to get the generalized advantages the
nation we need first the temporal difference of the stage
valves. So we have multiple things to compute here and
we're going to start with this temporal difference once we
get the temporal difference. We will get the generalized
advantage estimation and once we get the generalized
advantages to mention we will get the peninsulas. All right.
So let's start with the temporal difference T.G. DD is equal to
the reward of the step I plus Ghana which we get things to
our programs list so Bromstad gamma times the value of
this debt plus one and we add that data to access it minus
the value of the step I and same we add the data. All right.
That's the formula of the temporal difference and the state
values. And now we can update the generalized advantage
estimation and how it is dated. Well we take R-GA and we
multiply it by gamma parameter gamma times so which we
access with our parameters as well. So we take a program
cell and we add this temporal difference of the state values.
So be careful. We are in the loop. And each time we multiply
the by and by and we add a temporal difference. So it's
important to understand that at the end of this loop Well
this generalized advantage estimation will be equal to the
sum on all the steps of gamma times so that the power of i
times the temporal difference at the step by are so
important to keep that in mind. And now that we have the
generalized advantage estimation and the general
difference we can finally compute the policy. So let's do this.
We are going to update our policy laws the following way by
taking the old policy for us and we subtract the LUGG
probabilities obtained at the step that we multiply by this
generalized advantage estimation that we have to put in a
variable because then we'll compute the gradients. So it has
to be attached to gradients in the graph and then we add
minus 0.1 times the entropy. The entropy obtained at the
step in the fall. And again. Now be careful. This is the inside
the loop which means that at the end of the flu what you'll
get is policy plus equals minus some over the steps of the
product luggin of the policy at the step times to generalized
advantage estimation. Plus this 0.01 times the entropy does
the so that we get. And now what is the policy of the I. Well
that's the soft Max probabilities of the actions and the
entropy of this that I will know what it is it's where we
computed earlier. And what we intended to do list. So we
already have that. But this year I hear it's the soft Max
probability of the actions. And why do we put a minus here.
That's because the luck of the probability and the entropy
are negative values. And since we want to minimize their
absolute value we must see this last as the LUGG likelihood
as opposed to a distance. No, we want to maximize the
probability of the action that will maximize the advantage.
That's the whole idea behind it. We want to maximize the
probability of playing the action that will maximize the
advantage and for those of you who might be wondering
what is the purpose of this entropy efficiently. There is this
factor 0.01 here. Well the purpose of it is just to prevent it
from falling too quickly into a trap where we have a
distribution of probabilities with zeros for all the actions
except one which has a probability of one. And if that
happens that would minimize the entropy. So that's why
we're adding this small revision 0.01 year that will make the
entropy increase in the Great in the sense. OK so now the
good news is that the most difficult part is done. We have
the two losses and therefore what we only need to do now
and we already know how to do it is to perform just to get a
grade in the sense to reduce these two classes. And so what
we're going to do now is get out of this loop and we're going
to take our optimizer. The one we made separately then
remember the first thing we have to do is to initialize all the
grading parameters to zero and to do this we add that then
to zero and it's called a grad method. All right so that's done
then. Now we're going to perform backward propagation but
we're going to give twice as much importance to the policy
last than the value lost because the policy is smaller. So to
do this we're going to put in parenthesis policy and the
score plus plus 0.5 value loss so 0.5 times the value to us
and we're going to add here that we applies the backward
method to perform backward propagation and thanks to this
trick here with the policy less plus half of the value that we
have twice as much importance to the policy than the
Vaness.
OK then we're going to use another trick which is to prevent
the gradient from taking extremely large values and
therefore to generate the algorithm. And the trick to do that
is to get first our torch library then the end and module from
the torch library then the utils submodule and now we're
going to use a function CLEP underscore grad's on the score
norm and we are going to input our model parameters with
a second input which will be 40. And that trick will basically
make sure that the gradients won't take extremely large
values and to generate the algorithm. And for those of you
who might be wondering whether these are 40 years.
Exactly. Well that just means that we're using these values
so that the norm of the gradient stays between 0 and 40
and therefore that's how we prevent the gradient from
taking to large values. OK now we're almost done.
Remember we made this and sure shared Gretz function at
the beginning of the fall which is to ensure that the agent
and the shared model share the same gradients and to do
this to make sure that we can apply this function here. And
so we are going to add and share grad's to make sure that
the moral and the shared model share the same gradients.
All right so that's just a precaution. I'm not sure that's totally
necessary but you know at least we won't get an issue here.
Okay. And finally in the last line of code we are of course
going to perform the optimization step to reduce the losses
and you know how to do it. Of course we take our optimizer
and we add that step with parenthesis and then we go to
training our brains. So congratulations. I hope this wasn't
too overwhelming. Don't worry I will provide the code with
all the comments. So if you missed any detail you can have
a look at the comments. And don't worry if you haven't
understood anything, this is very advanced. But rest
assured this is also the most powerful memory visit made
from the creator of pi. So we are really working with the best
here. The state of the art so it's totally normal if you didn't
get everything the first time but by working on it many
times you will definitely get more and more comfortable. So
now we're done with the training. So basically we made all
the most important things. You know, we made the brains
by building the architectures of the neural networks with the
convolutions of the LCN and the fully connected layers. We
trained his brain by making this train code here. So basically
the heart of the algorithm is done. You made the A3 see
congratulations. Now we have a few more things to do but
that is just to get the fun part. You know we need to make
this test that we found which will test the agents and
provide the projects and the airplane break out. So this will
be very fun to watch. We'll not code all the lines of this test
that I felt because as we said we did the most important
thing. All related 23C but I will of course explain the code
and eventually we have this made up I found which will
execute the code. And from the moment we execute this
code all the code will be generated. So the brains will be
made. The training will happen and the eye will play new
games of breakout and we will get all the projects. So I can't
wait to eventually watch them. We are going to see if he is
smart enough to catch the ball. So now I'm going to see in
the next project for this desktop UI so that we can test the
AI on some new games. And until then enjoy AI.
BREAKOUT - STEP 13
See we made it, we made the brains and we trained them.
But now we still have to make a test agent which will not
have a date tomorrow at all but it will just use the share
model to make its own explorations. And of course in this
code we will record some projects and these will be the
projects test agents break out with a certain score. So let's
go through this code. The most important thing is done so
as I told you we're not going to code it line by line but I think
it's important that you understand what's going on here. So
there we go with this code in the first section as you
noticed. We imported the libraries and then we did find this
test function which will make this test agent do its own
exploration and play the breakout game. So we get this test
function that takes three arguments. The first one is
ranking, which is still to synchronize the test agent as we
did for the training agents. Then we have our parameters of
course because you need some. And of course we have the
shared model because this test agent will use a shared
model to do its own exploration. All right then we go inside
the function and this line of code we synchronize to the test
agent. Exactly as we did before then we import the
environment. So I am reminded of that in the main code
which will be in the next project. Well and name here will be
replaced by break of zero so that we can go into the break
zero environment and play the game and the Red Cross
Trumans that will get the projects of our evening break out.
So basically this line of code in total means that we run one
environment with project. Then at the next line of code we
synchronize this environment to the exact same principle as
the Trend function. Then we get our model and to do this we
create an object of the activity class and we put the input
shape with our environment observation space and shape
zeroes exactly like the train function and our outputs which
are the actions with an action space. So exactly like before
then something new here since we're done with the training.
We don't want to put the model in train mode because
simply we don't want it to train, we want to put it in
development. And that's what we do here with models that
evolved. So that's just basically to put the test agent in a
mode that will basically test its performance. Then here we
get our input states which are the input images from the
game which at this point are an entire race. Then we
convert them into torch dancers. Here we initialize some of
the words here. We initialize it down to true. So still just like
last time then something new again we introduce this third
viable with a time function to measure the time of
computations. And that's because you want to get the
starting point. Then here the actions we use a very practical
type of cue that allows to add an element to the cue from
the right or from the left. So that's very practical and I'll give
you the reference I think in the decremented version of the
code. So you'll have a look at what this dequeue is and
that's what allows it to do that. Then we initialize the length
of an episode with zero of course and then we will increment
the size in this well loop. So we use the same trick here.
While true and in the loop we increments the length of the
episode by one. When the game is done when the game is
over we reload the last set of the shared model, the shared
model that is dated by the other models. Remember that
here the shared model is no longer dated then. Still if the
game is over if the game is done we Reinette we re-initialize
the cell states see X and then States H x and else if the
game is not over well we keep the same cell States and in
states. But to make sure they are taught variables so that
they can be attached to a gradient. OK so that's something
we already dead in the trend function and then still in the
while loop and after having data that states in the hidden
states the right way depending on the two cases here. Well
what do we do we get the predictions of tomorrow. That's
exactly what we do here with this line of code.
So we get the
value which is the output of the critic and the actual value
which is the output of the actor. And then it's up to all of the
hidden states H x and the cell States the X then we
generate a distribution of probabilities of the actions that is
at the Q values action value here. And we do this with the
next function. And of course we don't need to get the luck
probabilities here because this is just for the training for the
test agent. It will just play the actions we will just use it you
know like doom a certain activity to play it but we're not
doing any training here. So we have just a prop and from
this we play the action by taking directly to RMX of these
probabilities that is it takes the action that has the highest
probability. And the reason is that the test agent doesn't do
any exploration. Remember that we want have a chance to
pick up some actions that have low probabilities when you
want to do some exploration of these other actions and you
know not taking each time the action that has the highest
probability but here test agent can do any exploration and
therefore that's why we directly take the action that has the
maximum probability again then once we play the action we
reach the next state and we get the next word. And that is a
day dated whether or not the game is over. So we get all
this with this line of code by playing the action after having
selected it with our Max here. So we play the action here
and we get the state we get the reward and do it again and
then since we just got a new reward We're going to update
some of the reward by simply adding this new word. And
finally whenever the game is done. So if that means when
the game is done when I finish playing the game well we're
going to print the results with the time the opposite. We
wanted the length of the episode to be how long it lasted.
Playing great out and this is just how we print all these
variables using these tiny tricks. That's for the time then we
want some, it's just a variable. Some of the words and ideas
at length are the length of the present OK. And then once
we printed all the results well since the game is over and we
want to start a new game we're going to re-initialize
everything. That is the sum of two words zero the length of
an episode to zero. We're going to re-enact all the actions by
using this key function resets the input images you know by
repeating all the breaks altogether. And finally we use this
time to sleep 60 seconds to take a break of one minute to
let the other agents practice. And that if the game is over.
OK. And finally we have this last line of code which will get
us the new state and then we can move forward. We can
continue in this new game. So there we go. That's the test
function. Things to which you will see the projects in one or
two projects. I hope you will be together like last time to
watch the results that are with you. Curial and me will be
fun. And I'm telling you. Expect to see good results. But
keep in mind that this breakout game was super
challenging. We thought it was a simple game to play first
but not at all. It actually turned out to be much more
difficult than doom. And that's why we put it in the last
module. But anyway let's make this main function in the
next project. This is not the most important thing here. Now
that the A-380 is demented we will not code it line by line, it
will expand code and very quickly we will get the results.
BREAKOUT - STEP 14
I'm just going to explain the main code that will execute the
whole thing before we get the exciting results and watch the
projects. So this is the main code and as you can see this is
pretty short. We start by importing the libraries and the
modules and also the different classes and functions we
made like Altikriti from our model file. The train function
from the train file and the test function from the test file and
of course we import our optimizer. Then we start with the
first section where we get into a class with all the
parameters. And this is this perhaps. Remember this is
BRAMs object that we created from this harams class. Each
time we get a parameter like the learning rate we are not
without. So let's go through them quickly. This first one or
here is the learning rate. So as you can see we choose a
small and you rate the second one as the parameter. Again
we take it as 0.39 we take a OneTel parameter a set of 1 16
processes 20 steps and max length of 10000. And we spoke
about that this is the parameter we set to make sure the
agent doesn't get stuck indefinitely into a state of the
environment so this will stop the game. If the episode length
goes over this maximum length and eventually of course we
get the name of our environment. Break up is zero. And by
the way you can also play on some other environments just
by changing the name of the environment here. So if you
want to play some other breakout versions or even some
other Atari games, you can simply replace this breaking of
zero here by some other games. But I can tell you that
breakout projects are already very challenging. All right, so
all the parameters here. And then there is the main code for
the main run. And so here let's see what we do in this first
line. We set one thread per core. Then in the second line we
get all the renderers by, you know, creating a new object of
the Paramjit class which will initialize all these parameters
here because there are variables attached to this BRAMs
object. Then we set the seed. Then we get our environment
using the Create tree and function with the name of our
environment which breaks out of zero. You see cells that
name and therefore parameter and name is pretty zero. So
that will give us the environment to break out. And by the
way this is not the usual way of creating an environment but
you know to improve the whole process and to improve the
performance. Well we use this to actually create an
optimized environment and this we do this things the
universe universe is a package that comes with all the
packages you installed on open engine. Well thanks to the
universe we get an optimized environment. This is what it's
all about here. Then we get our shared model by creating an
object of the active critic class. And so here it's important to
understand that this shared model is the model shared by
the different agents. So we have different threads in
different courses.
And speaking of threats at the next line here, a shared
model that shares memory. What we do is we store the
model in the shared memory of the computer so that all the
threads can get access to it even if they are in different
courses. So that's what we do here. This is to enable this
then we get our optimizer linked to the parameters of our
shared model and with a learning rate of one point or one.
And again it's important to understand that the optimizer is
also shared because it's going to act on the shared model
and saying that the next line optimized that shared memory
we store the optimizer into shared memory so that all the
agents can get access to it to optimize the model. Then we
initialize our processes so the test process doesn't update
the shared model but it just uses it to try it in one part and
print the score and record the projects. So that's exactly
what is done here with the Target equals test. That's the test
process and this process here is cut from torture that multi
pre-processing. So here and what it does is that it basically
runs a function on an independent thread. So then when we
restart we start a new process which was the initial Last
year at this time. And then with this process to append P we
add the process in the list of the processes. And finally in
this loop here we just do a loop to run all the other
processes that will be trained by updating the shared model.
And that's basically what happens in the last lines of code
here. So if you don't want to get into the details of it, the
important thing to understand is that this will run the
processes in an optimal way and therefore we should all be
good at executing this code and to have a trained model
and eventually watch the results. So I can't wait to do that.
This is going to be pretty exciting.
BREAKOUT - STEP 15
This time we're going to watch the result for Breakout which
I remember was much more challenging than it was. So let's
hope we get some good results. OK. So here is the main
code. You know we have several calls to all the girls here
and here is the make up and the code that will execute all
the other code. So building a whole network can train the
agents and the create. OK. So basically what we're going to
do is just execute just to see how it's going but we will not
wait for the result because it actually takes several hours.
So we prepared the results in this Test folder. We already
have the project and we will show you right to that. OK. OK.
So to execute is basically just to select the whole code here.
And here we go. Right. And as you can see the projects are
populated. OK. All right. So right now we already have two
projects. And then the agents are training but it's going to
take a long time. Now what we're going to do is watch the
results. OK. OK. Which I haven't seen yet. We haven't seen
that yet. All right. So in the template folder that you will find
on the website you will find two of Fullers Cotonou
commands and a compass that can see if you've missed any
detail you can still have a look at this code. Yeah. All the
comments will explain so each line is welcome and you can
understand. And then in this fall they will find this test
folder. And that's the folder that will be populated with the
projects. OK. So that's where the projects are. So these are
the projects of our previous training. You know we trained
them for a couple of hours and we are seeing these projects.
So we're going to watch them one by one starting with the
first one and we're going to see if you know we see some
martial art spring break out. And if we manage to solve it.
OK. So let's have a look at the first project. So this project
shows you actually know what we're going to do. We're
going to stop the training because it's slowing down my
computer right now. So I get interrupted. But I'll get there.
So let's watch it for you. It's going to be really bad because
it's really the first one the battle might not even be. There
we go. It doesn't fly at all because you know it's the very
beginning it doesn't know what to do. And it's still pre-
processing the images of yet nothing. And it might even be
the case for the second project the second one the panel
might not still not move. OK. So what do you think of this?
That's worth spending so much energy on. OK. So next
project we might have something, maybe a little something,
a little bit of art of doing away towards not even not even
useless if we didn't break breakouts or not. Now I call it does
this have something cool. Let's hope that we get something
for the next one. Yeah I think we got something because
now I didn't say it better. OK. So now the eye has eyes it
can't see and can't sense what it's doing. But it's not very
quick yet. It's not a smart catch but it's a four four. OK so
next thing you know it looks much better now. Yeah. More
time there. You are ready. Let's have a look. OK. Got it. Got
you got another one last. Yeah. Still not perfect but
definitely it's like a human plane. Very good job but still it's
like you know like a human plane like a five year old two
year old human guy. Wow. You get it now. Yeah. So you're
old now. Yeah but it's like it's doing you know trying to catch
the boat with some weird tactic. Yeah. It's not like a perfect
prediction of where the ball is going to be. OK. Up and get
much better 27. OK. Next one 370. OK. OK. That was 20
seconds. OK. Now I mean at the time. Oh now I would say
it's a five year old OK for you. Yeah. And now the goal of the
book is growing faster. So it's of course much more difficult
with students. So let's see if we can be 27 times 23 24 28.
OK. Here we go.
See it's getting better and that's a good sign. The OK call
lasted three lives. Next one was a one minute 10 second
guy. That's straight there. That's great. And that's definitely
a pronoun isn't it. Yeah and you can tell if you're playing
because it's like it's moving sporadically like super fast. Yes.
And it's doing a lot of saving unnecessary moves but it's
getting there. Let's see how it does when the ball is moving
really fast. How one is supposed to be independent. I don't
know. I think it's a bit on the color of the block in distress
maybe. Or is it just random. It looks random but we should
check out the rules. But it's really doing great now. Yeah. So
three four three four score plus is really really great. So
yeah we definitely have some A-3 see things and break on
there. Of analysis. Yes it seems like we have a brain with
memory not only with eyes as good guys do. You know it's a
pretty quick gesture but also it's got some memory. I don't
know when you use your memory to play out yourself of
course. Yeah. OK. So well done rather than the A3. I was the
last one. Fifty five seconds or a little less. You remember
what the score was. Now the previous one should get out to
see if it gets an improvement. One that gets an
improvement every day takes place in the same incident
that yeah that's your inspiration they say OK we go you
know. Wish I could play some music. Well once they hit that
you can do it. It's like on steroids. Clicking left or right all the
time. But he's doing good. Yeah but we might not solve the
whole game of breaking up. We just need to have it trained
more. Yeah. Those are three really good balls. There was a
little bit to 60 to beat the score and I didn't see that as well.
So we didn't stop. I think it was very nice. Awesome. So I
hope you like this mostly. I hope you like this powerful
implementation of the A-3 see. And yeah you can see how
hard it can be really well. Yeah that's right. Like at the end of
the paper they have a comparison of different algorithms.
And this one is the best one by far. Not only do they have
the best comparison of these but also I experienced several
of them. And so I experienced the one without end and I see
one with and definitely de-listing improved the algorithm
how much better is a much better life. I barely had some
artificial intelligence before in the last like 15 seconds or 20
seconds but definitely not more than 10 minutes. As we saw
here. Yeah that's good. Ok cool. Thanks guys. Thanks a lot.
That was the end of the third module and therefore the end
of the Course. Congratulations for having the three modules
and mostly the A3 C which was a pretty big rig there. And
for sure explore the different modifications that exist today.
And you know maybe at the time we were watching this
there's some really cool new advancements in ATC and you
can probably or even a new model even better than we see
them. Let's see what 2018 says.
WHAT IS DEEP
LEARNING
What Allison should know, what are you saying? I don't
know what the Internet is, that massive computer. Was the
one that's becoming really big now. What do you mean?
That's just what you know, what do you write to what like I
don't know a lot of people use it and communicate. I guess
they can communicate with NBC writers and producers.
Allison, can you explain what the Internet is? How amazing
is that? Just over 20 years ago people didn't even know
what the internet was. And today we can't even imagine our
lives are for it. Welcome to the people earning ETAs in that
course. My name is Cora Menko and along with the
contractor Lunda Pontmercy were super excited to have you
on board. And today we're going to give you a quick
overview of what deploring it is and why it's picking up right
now. So let's get started. Why did we have a look at that clip
and what is this photo over here? Well that clip was from
1994. This is a photo of a computer from 1980. And the
reason why we kind of delve into history a little bit is
because neural networks along with deep learning have
been around for quite some time and they've only started
picking up now and impacting the world right now. But if you
look back at the 80s you'll see that even though they were
invented in the 60s and 70s they really caught on to a trend
or called the cold wind in the 80s so people are talking
about them a lot. There was a lot of research in that area
and everybody thought that deep learning or neural
networks were this new thing that is going to impact the
world and that everything is going to solve all the world's
problems and they did kind of slowly die off over the next
decade. And so what happened, why did the neural
networks not survive and not change the world with it. The
reason for that is that they were just not good enough, they
are not that good at predicting things, they are not that
good at modeling and messy, just not a good invention. Or
is there another reason? Well actually there is another
reason and the reason is in front of us it's the fact that
technology back then was not up to the right standard to
facilitate neural networks in order for neural networks and
deep learning to work properly.
You need two things: you need data and you need a lot of
data and you need processing power. You need strong
computers to process that data and facilitate and you know
that works. So let's have a look at how data or storage of
data has evolved over the years and then we'll look at how
technology has evolved. So here we go for three years 1956
1980 2017. How much did storage look back in 1956? Well
there's a hard drive and that hard drive is only a five. Wait
for a megabyte harddrive That's five megabytes right there
on the forklift the size of a small room that's a hard drive
being transported to another location on a plane. And that is
what storage looked like in the. In 1956 you had to pay a
company two and a half thousand dollars of those days'
dollars to rent that hard drive to rent it, not buy it or rent it
for one month. In 1980 the situation improved a little bit. So
here we got a 10 megabyte hard drive for three and a half
thousand dollars. It is still very expensive and only 10
megabytes So that's like one photo these days. And today in
2017 we've got a 256 gigabyte SD card for $150 which can
fit on your finger. And if you're watching this project a year
later or like in 2019 or 2025 you're probably laughing at us
all because by then you have even stronger storage
capacity. But nevertheless the point stands if we compare
these across the board and we even take price and size into
consideration just the capacity of whatever was trending at
the time. So from 1956 to 1980 capacity increased about
double. And then it increased about twenty five thousand
six hundred times and now the length of the period is not
that different from 1956 to 1980 24 years from 1980 to
2013 thirty seven years so not that much of an increase in
time but a huge jump in technological progress. And that
stands to show that this is not a linear trend. This is an
exponential growth in technology and if we add into it take
into account price and size you will be in the millions of
increase. And here we actually have a chart on a logarithmic
scale. So if we plot the harddrive cost per gigabyte you'll
see that it looks something like this. We're very quickly
approaching zero.
Right now you can get storage on Dropbox and Google Drive
which doesn't cost you anything. Cloud storage and that's
going to continue and in fact over the years this is going to
go even further. Right now scientists are looking into using
DNA for storage. And right now it's quite expensive. It costs
$7000 to synthesize two megabytes of data and then
another thought in the 2004's to read it. But that kind of
reminds you of this whole situation of the harddrive and the
plane. You know that this is going to be mitigated very very
quickly with this exponential curve 10 to 10 years from now
20 years from now everybody's going to be using DNA
storage if we go down this direction. And here are some
stats on that so you can explore it further. Maybe pause the
project if you want to read a bit more about this. This is from
nature dot com. And basically you can store all of the
world's data in just one kilo one kilogram of DNA storage or
you can store about 1 billion terabytes of data in one gram
of DNA storage.
So that's just something to to show how quickly we're
progressing and that this is why deep learning is picking up
now that we are finally at the stage where we have enough
data to train super cool super sophisticated models. Back
then in the 80s when I was first initially invented juice just
wasn't the case. And the second thing we talked about is
processing capacity So here we've got an exponential curve
again on a log scale. It's an ideally portrayed here but on
the right because it's a log scale and this is how computers
have been evolving. So again feel free to post the slide this
is called Moore's Law you've probably heard of it how
quickly the processing capacity of computers has been
evolving. Right now we're somewhere over here where an
average computer can buy for a thousand bucks thinks at
the speed of the brain of a rat and between two and five will
be the speed of a human or 20:23 and then by 2050 or 2045
it will surpass all of the humans combined. So basically
we're entering the era of computers that are extremely
powerful that can process things WAY faster than we can
imagine. And that is what is facilitating the learning. So all
of this brings us to the question What is deep learning what
what is this whole neural network situation what what is
going on what are we even talking about here. And you've
probably seen a picture or something like this. So let's dive
into it. What is deep learning this gentleman over here
Jeffrey Hinton is known as the godfather of deep thing. And
he did research on deep learning in the 80s and he's done
lots and lots of work lots of research papers he's published
in deep learning right now. He works at Google. So a lot of
the things that we're going to be talking about actually
come from Jeffrey Hinton and you can see a lot. He's got
quite a few YouTube projects. He explains things really well
so I highly recommend checking them out. And so the idea
behind deep learning is to look at the human brain. And this
guy is going to be quite a bit of neuroscience coming up.
And in these projects and what we're trying to do here is to
mimic how the human brain operates. And you know we
don't know that much. You don't know everything about the
human brain but that little man that we all know wants to
mimic it and recreate it. And why is that? Well because the
human brain seems to be one of the most powerful tools on
this planet for learning, adapting skills and then applying
them and if computers could copy that then we could just
leverage what natural selection has already decided for us.
All of those kinds of algorithms that it has decided are the
best which are going to leverage that. Why reinvent the
bicycle ride. So let's see how this works. Here we've got
some neurons so these neurons which have been smeared
onto glass and then have been looked at under a
microscope with some coloring. And this is you can see what
they look like so they have like a body they have these
branches and they have like tails and so and so you can see
them they have like a nucleus inside in the middle and
that's that's basically what a neuron looks like in the human
brain. There's approximately 100 billion neurons all together
so these are individual neurons. These are actually motor
neurons because they're bigger they're easier to see but
nevertheless there's a hundred billion neurons in the human
brain. And it is connected to as many as about a thousand
of its neighbors. So to give you a picture this is what it looks
like. This is an actual data section of the human brain. And
this is the cerebellum which is this part of your brain at the
back. It is responsible for more Torex and for you know
keeping a balance and some language capabilities and
things like that. So this is just to show how many neurons
there are like billions and billions and billions of neurons all
connecting. It's like we're talking about five or five hundred
or a thousand or millions billions of neurons in there. And so
that's what we're going to be trying to recreate. So how do
we recreate this in a computer? Well we create an artificial
structure called an artificial neural net where we have nodes
or neurons and we're going to have some neurons for input
value so these are values that you know about a certain
situation. So for instance you're modeling something you
want to predict something you always could have some
input to start. Your prediction is off then that's called the
input layer. Then you have the output. So that's the value
that you want to predict or it's a surprise whether somebody
is going to leave the bank or stay in the bank. Is this a
fraudulent transaction, is it a real transaction and so on. So
that's going to be output lower. And in between we're going
to have a hidden layer. So as you could see in your brain
you have so many neurons so some information is coming in
through your eyes ears nose so basically your senses. And
then it's not just going right away to the output where you
have the result is going through all of these billions and
billions and billions of neurons before guessing the output
and this is the whole concept behind it that we're going to
model the brain. So we need these hidden layers that are
there before the output so the input Lares neurons
connected to a hidden layer neurons that neurons are
connected to output. And so this is pretty cool. But what is
this all about? Where is the deep learning here or why is it
called deeper nothing deeper in here. While this is kind of
like an option which one might call shallow learning where
there isn't much indeed going on. But why is it called
deploring?
Well because then we take this to the next level we
separate it even further and we have not just one hit and
there we have lots and lots and lots of hidden layers and
then we connect everything just like in the human brain
connect everything interconnected everything. And that's
how the input values are processed through all these hidden
layers just like in the human brain. Then we have an output
value and now we're talking about deep learning. So that's
what learning is all about on a very abstract level. And the
further projects we're going to dissect and dive deep into
deep learning and by the end of it you will know what the
planning is all about and you will know how to apply it in
your projects. Super excited about this wait to get started
and I look forward to seeing the next project.
PLAN OF ATTACK
Super excited to get these things started. And today we're
going to find out how we're going to tackle this section. So
in this section we will learn the following things. First of all
we'll talk about the neuron so there'll be a little bit of
neuroscience and we'll find out a bit about how the human
brain works and why we are trying to replicate that. And
we'll also see what the main building block of a neural
network of the neuron looks like. Then in the next project
We'll talk about the activation function and we'll look at a
couple of examples of attrition functions that you could use
in your neural networks and we'll find out which ones of
which one of them is the most commonly used one in neural
networks and in which layers you'd rather use which
functions.
There was talk about how neural networks work. So in
contrast to what you would expect and what it was probably
conveyed in other courses and projects we're not going to
go into the learning we're actually going to go into the
working of the neural networks first because that way by
seeing a neural network in action that will allow us to
understand what we're aiming towards what our goal is. So
here we'll look at an example of a neural network we're
going to look at a very simplified and very simplified
hypothetical example or when your own network working to
predict housing prices are basically real estate prices. And
by looking at that example we'll understand better exactly
what we're aiming towards and what we want to achieve in
the end. And then we will move on to understanding how
neural networks learn because that way we'll be more
prepared for what's coming. Then we'll talk about gradient
descent. This is also part of neural networks learning and
we'll understand how that algorithm is better than just the
brute force method that you might be intending or willing to
take as a first resort or first method that comes to mind. So
we'll find out how great the advantages of gradient descent
are. And then we'll talk about stochastic gradient descent.
It's a continuation of the great and decent project but it's an
even better and even stronger method and we'll find out
exactly how it works. And finally we'll wrap things up by
mentioning the important things about back propagation
and summarizing everything in a step by step set of
instructions for running your artificial neural networks. I
hope this all sounds very exciting to you because I am very
excited myself and I can't wait to get started.
THE NEURON
Today we're talking about the neuron which is the basic
building block of artificial neural networks. So let's get
started. Previously we saw an image which looked like this.
And these are actual real life neurons which have been
smeared onto a gloss and color a little bit and they are
observed through a microscope. So this is what they look
like as you can see. Quite an interesting structure: a body
and then lots of different tails kind of branches coming out
of them. And this is very interesting. But the question is how
can we recreate that in a machine because we really need
to recreate it and machine since the whole purpose of deep
learning is to mimic how the human brain works in the hope
that by doing so we're going to create something amazing.
We're going to create an amazing infrastructure for
machines to be able to learn. And why do we hope for that?
Well because the human brain is well just happens to be one
of the most powerful learning tools on the planet or like
learning mechanisms on the planet. And we just hope that if
we recreate that we'll have something as awesome as that.
So our challenge right now our very first step to creating
artificial neural networks is to recreate a neuron. So how do
we do it? Well first of all let's have a closer look at what it
actually is. This image was first created by a Spanish
neuroscientist Chagga Ramon Yi Kajal in 1899. And what he
did was he died in neurons in actual brain tissue. And look
at them under a microscope. And while he was looking at
them he actually drew what he saw and this is what he saw.
He saw it to your hands or two large neurons over there at
the top which had all these branches coming off of them
towards their top parts and then each one had this Araud or
like thread coming out towards the bottom very long one.
And so that's what he saw. And now you know technology
has advanced quite a lot and we have seen neurons much
closer in more detail and now we can actually draw what it
looks like diagrammatically. So let's have a look at that.
Here's a neuron. This is what it looks like very similar to
what Santiago around one drew over here and here and this
year and what we can see is that it's got a body. That's the
main part of the neuron and then it's got some branches at
the top which are called dendrites and it's also got an X on
which is that long tail of the euro.
And so what are these dendrites and when foreign was the
axen for well. The key point to understand here is that
neurons by themselves are pretty much useless. It's like it's
like an ant. Right. And on its own can do my psych five ads
together. Maybe they can pick something up. But again they
don't, they can build an anthill or they can establish a
colony that can't work together as a huge organism. But at
the same time when you have lots and lots of ads like you
have in a million and they can build a whole colony they can
build an anthill. Same thing with neurons. By itself it's not
that strong but when you have lots of neurons together they
work together to do magic. And how do they work together?
That's the question. Well that's what the dendrites and
Aksenov for so the dendrites are kind of like the receivers of
the signal for the neuron and the axon is the transmitter of
the signal for the neuron. And here's an image of how it all
works conceptually. So at the top you've got on your own
and you can see that its dendrites are connected to axons of
other neurons that are like even further away above it. And
then the signal from your own travels down its axon and
connects or passes on to the dendrites of the next neuron
and that's how they're connected. And in that small image
over there you can see that the axon doesn't actually touch
the dendrite lot. A lot of machine learning or like a few
machine learning scientists are very adamant about the fact
that it doesn't touch it like the room it doesn't touch. It has
been proven that there is no physical connection there. But
the point that we're interested in is that connection between
them, that the whole concept of the signal being passed
that recalls the sign ups you can see over there in that little
image that's the figure bracket is a sign up. And that's the
trend we're going to be doing. So instead of calling our
artificial neurons the lines that we're going to have or the
connectors for artificial neurons we're now going to be
calling them axons or dendrites because then the question
is whose connection is this is it that neurons are neurons.
We just call that good which is good to call them sign of
cells. And that kind of just answers all questions in a. That's
just basically where the signal is passed doesn't matter who
that element belongs to.
They're just a representation of the signal pass and we'll see
that just now. So basically that's how a neuron works. And
yeah so let's move on to how we are going to represent
neurons creating neurons in machines. Are we moving away
now we're moving away from neuroscience and moving into
technology. And here we go. So here's our neuron also
sometimes called the node. Then your own gets some input
signals and it has an output signal. So dendrites and axons
remember. But again we're going to call these sign ups and
then these input signals we're going to present them to
other neurons as well. So in this specific case you can see
that this neuron is green and you're getting signals from
yellow neurons. And in this course we're going to try and
stick to a certain color coding regime where yellow means
an input layer. So basically all of the neurons that are on the
outer layer on the first front of where the signals are coming
in and by signal. It might be a bit of an overkill to call this a
signal, it's just basically input values. So you know how even
in a simple linear regression you have input values and then
you have a predicted value. Same thing here. So you have
input values and there they are the yellow ones and then on
the right you'll see just now it will be red. It'll be the output
value. The other thing that I wanted to point out here is that
in this specific example we're looking at a neuron which is
getting its signals from the input layer neurons. There are
also neurons but they input their neurons. Sometimes you'll
have neurons which get their signal from other hidden layer
neurons so from other green neurons and the concept is
going to be exactly the same I just in this case we use for
simplicity's sake we're portraying this example and in terms
of the input layer the way to think about it is in the in the
analogy of the human brain the input layer is your senses
right so whatever you can see hear feel touch or smell. And
of course it's like there's a lot of things you can see there's a
lot of information coming in. But those are your That's what
your brain is limited to is pretty much a life. Israel lives in a
box made out of bones and it's only just it's it's a mind
blowing concept to think about that your brain is just locked
in a black box and the only thing it can see can hear. The
only thing it's getting is electrical impulses coming from
these organs that you have. We should call your ears nose
eyes. You know your sense of touch and we're whatever and
you and your and your taste. Right so you're just getting
signals but it basically lives in this dark black box and it's
making sense of the world through your senses. It's
phenomenal. And so yeah. So you have these inputs that
are coming in in terms of the human brain. Those are your
five senses and in terms of machine learning or deep
learning that is basically your input values are your
independent variables and we'll get that in a second so your
input values they the signal is passed through sinuses to the
neuron and then your own has an output value that it
passes further on down the chain. In this specific case in
terms of color coding again yellow means input layer so we
kind of simplifying everything here we're saying we're only
going to have like the input layer and then we're going to
have one hidden layer with the green which is the
hinterland then we're going to have the output there right
away. So just so that we can get used to these calls for now.
So there we go that's the basic structure.
So now let's look in a bit more detail at these different
elements that we have. So we've got the input layer and
what do we have here. Well we have these inputs which are
in fact independent variables so depend variable one and a
bit variable to independent variable. And the important
thing to remember here is that these independent variables
are all for one single observation. So think of it as just one
row in your database. One observation you just take all of
the independent variables you know, maybe it's the age of
the person there, the amount of money in the bank account
and then how they drive or walk to work and what method
of Shampoo's protection they use. So but that's all
descriptors of one specific person that you are either
training your model on or you're performing some prediction
on. And the other thing you need to know about these
variables is that you need to standardize them so you need
to either standardize them which means make sure that
they have a mean of zero and a variance one or you can
also sometimes and headland will point out these traces in a
bit more detail perhaps in practical terms you might come
across these sometimes you might want to know
standardizing might want to normalize them. Meaning that
instead of making sure the mean and very Muser and
variance is one you just take you know to subtract the
minimum value and then you divide by the maximum minus
minimums by the range of your values and the four you get
values between 0 and 1. And it depends on this scenario
you might want to do one or the other. But basically you
want all of these variables to be quite similar in about the
same range of values and why. Why is that? Well all of these
values are going to go into a neural network where as we'll
see just now they'll be added up and multiplied by weights
it's added up and so on and just going to be is going to be
easier for the neural network to process them if they're all
about the same. And in fact you know that's just how it is
going to be able to work properly. And if you want to read
more about standardization, normalization and other things
that you can do if you know what variables a good
additional reading paper is called efficient back prob by
young Licken 1998 links over there. So we're actually going
to talk more about this phenomenal person in the space of
deep learning in the part of the course we're talking about.
Convolutional neural networks and you'll see that this is
definitely a person who knows what he's talking about. He's
a close friend of Jeffrey Hinton who we've already seen who
is very dim.. So in this paper you'll learn more about center
ization of normalization but also you can pick up lots of
other different tips and tricks and you'll be a good source for
additional reading as you go through this course. So yeah
check it out if you are interested in some additional reading.
There we go. So that's what we do for the verbs. And as
here we've got the output value. So what can our output
value be? Well we've got a couple of options. Output value
can be it can be continuous Like for instance price it can be
binary for instance a person will exit or will stay or it can be
categorical verbal and physical wriggler categorical verbal.
The important thing to remember here is that in that case
your output value won't be just one it'll be several output
values because these will be a dummy variables which will
be representing your categories and that's just this how it
works and just important to remember that in that case
that's how you're going to be getting your categories out of
the artificial neural network. But let's go back to a simple
case of one output volume. And now let's one more point or
kind of like the point the ready made I just wanted to
reiterate this point on the left you've got a single
observation. So wonder if you mean from your data set and
on the right you have a single direction as well and that is
the same observation.
So important to remember that like whatever inputs you put
in that's for one row and then the output you get that is for
that same exact row. Or if you're training your neural
network then you know you're putting the inputs in for that
one roll you're putting the output in for that one row. So if
you want to simplify the complexity, think of it as a simple
thing like a regression or a multivariate linear regression. So
you're putting in your values and you have the output.
There's like there's no question about it. When we're talking
about things like regression because we're so used to it.
Same thing here, it's nothing too complex. We're just
putting in values we're getting output. But just remember
that every time it's one row you're dealing with so you don't
get confused and start putting in like thinking that these are
different different rows that you're putting into your artificial
neural network or something this is all just values in that
one Rowse a different observation different characteristics
or attributes relating to that one observation every single
time. OK so next thing we want to talk about here is our
sinuses is Asylum's here we've got services and they all
actually get assigned weights weights. We're going to talk
more about weights further down but in short weights are
crucial to artificial neural network nerve works functioning
because weights are how neural networks learn by adjusting
the weights the neural network decides in every single case
what single signal is poor and what signal is not important
to a certain neuron. What single gets passed along and what
signal doesn't get passed along or what strength to what
extent signals get passed along. So weights are crucial.
They are and they are the things that get adjusted through
the process of learning. Like when you're training under an
artificial neural network you're basically adjusting all of the
weights in all of the sinuses across this whole neural
network. And that's where gradient descent and
backpropagation come into play and those are concepts
that we'll also discuss. So basically those are the weights.
That's what I need to know for now. And we've got the
neurons so signals go into the neuron and what happens in
Europe. So this is the interesting part. Like we're talking
about the neuron today what happens inside the neuron. So
a few things happen first thing and the first step is that all of
these values that it's getting gets added up so it takes that
added. So the weighted sum of all of the input values that is
getting very simple it's very very straightforward. Just add
up, multiply by the way, add them up and then it applies an
activation function. Now we're going to talk more about
activation functions down but it's basically a function that is
assigned to this neuron or to this whole layer and it is
applied this way to add some. And then from that the
eurozone understands if it needs to pass on a signal if you
like and that's the signal that it passes on that the function
applied to the way that some. But basically depending on
the function the neuron will either pass on the signal or it
won't pass the signal on. And that's exactly what happened
here in step three. The neuron pasta's on that signal to the
next neuron down the line. And that's what we're going to
talk about in the next project because it is quite an
important topic. We want to delve deeper into the activation
function but hopefully for now everything is it should be
pretty clear how you know the input values you've got
weights you've got design houses you've got something you
know happens in the neuron you've got a way Sarmad an
activation function applied and then that is passed down the
line and that is just repeated throughout the whole neural
network on and on and on and on you know thousands
hundreds of thousands of times depending on how big how
many neurons you have how many sign ups as you have in
your neural network.
THE ACTIVATION
FUNCTION
All right today we're talking about the activation function.
Let's get straight into it. So this is where we left off.
Previously we talked about the structure of one neuron. So
there it is in the middle we know that it has some input
values coming in it's got some weights then it adds up the
way to calculate the way that some of those inputs and then
apply the activation function in step 3. It passes on the
signal to the next year and then that's what we're talking
about today we're talking about the value that is going to be
passed over. So we're talking about the activation function
that's being applied. So what options do we have for the
activation function. Well we're going to look at four different
types of activation functions that you can choose from. Of
course there are more different types of activation function
but these are the predominant ones that you'll be hearing
about and that we'll be using in this course. So here is the
threshold function. This is what it looks like. So on the x axis
you have the weighted sum of inputs on the y axis. You have
just you know the values from 0 to 1 and basically the
threshold functions are very simple type a function where if
the value is less than zero then the free. Thanks ssion
passes on zero. If the value is more than zero or equal to
zero then threshold function pusses on a 1. So it's basically
kind of like yes no type of function. Very very
straightforward. Very kind of like a rigid type of function
either yes or no. No other options. So there you go. That's
how it works. Very simple function. Let's move on to
something a bit more complex. Now this sigmoid function is
a very interesting formula that we have here you'll see just
now there is one divide by one plus each. The power of
minus X whereas in this case of course X is the value of the
sums of the way that sums.
And so yeah. So this is what the sigmoid looks like. It's a
function which is used in logistic regression. If you recall
from the machine learning course. So what is good about
this function is that it is smooth. Unlike the virtual function.
This one doesn't have those kinks in its curve and therefore
it's just a nice and smooth gradual progression. So anything
below 0 is just like drops off above zero. It acts
approximates towards one and this sigmoid function is very
useful in the final Lehren the output layer. Especially when
you're trying to predict probabilities. And we'll see that
throughout the course. And then we've got the rectifier
function. The rectifier function even though it has a kink is
one of the most popular functions for artificial neural
networks so it goes all the way to zero. And then from there
it's gradually progresses as the input value increases as well
and we'll see that throughout the course we'll see that in
other intuition projects and we also see that how we use this
function in the practical side of the course and I will
comment on this a bit more in a few slides from now. So just
remember the direct fire function is one of the most used
functions in artificial neural networks. And finally we've got
one more function that you will probably hear about. It's the
hyperbolic tangent function. It's very similar to the sigmoid
function but here the hyperbolic tangent function goes
below zero so the values go from 0 to 1 or approximately 2
1 and go from zero to minus 1 on the other side. And that
can be useful in some applications. So we're not going to go
into too much depth on each one of these functions. I just
wanted to acquaint you with them so that you know what
they look like and what they're called. If you'd like to get
some additional reading then check out this paper by a 75
year lot. Have you a lot called Deep sparse rectifier neural
networks 2000 paper.
And there you will find out exactly why the rectifier function
is such a valuable function and why it's so popularly used.
But nevertheless for now we don't really need to know all of
those things. For now we're just going to start applying
them as you start using them more and more and more. And
so when you feel comfortable with the practical side of
things then you can go and refer to this paper and then you
will be able to soak in that knowledge much quicker and it
will make much more sense. But just keep this in mind that
when you're ready, when you feel that you're ready, then
you can go and research papers and get some valuable
knowledge from them. So just to quickly recap we have the
threshold activation function which goes like this the
sigmoid activation function which looks like this. We have
the rectifier function and we have the hyperbolic tangent
function and now to finish off this project Let's quickly do a
few exercises so just do two quick exercises to help that
knowledge sink in. So first one is we've got an example here
of a neural network of just one neuron and that right away
the output layer. And the question is assuming that your
dependent variable is binary So it's either 0 or 1 which
threshold function would you use. So out of the ones that
we've discussed we have a threshold function, the sigmoid
function, the rectifier function and we've got the hyperbolic
tangent function in its roll forms which ones would you be
able to use for a binary variable. OK. So the answers here
are there's two options that we can approach this with. So
one is the threshold activation function because we know
that it's between 0 and 1 and it gives us 0 Anderson
umbrellas and then otherwise it gives you once it only can
give you two values. It fits this requirement perfectly and
therefore you could say y equals the threshold function of
your sway to some and that's it. And in the second case
which you could use is the sigmoid activation function. It is
actually also between 0 and 1 just what we need. But at the
same time you want just one right so you are not exactly
what we need but in this case you could use it as is the
probability of Y being yes or no. So we want Y to be 0 1 but
instead we'll say that the sigmoid function Simoun
activation function tells us whether it would tell us of the
probability of Y being equal to 1. So basically the closer you
get to the top the more likely it is that this is indeed a one
or a yes rather than a no. And yeah so that's very similar to
the logistic regression approach. And those are just two
examples. If you have a binary variable. Now let's have a
look at another practical application. Let's have a look at
how all this would play out if we had in your all natural like
this. So in the first layer we have some inputs. They are sent
off to our first hidden layer and then an activation function
is applied. And usually what you would apply here and what
you will see throughout the Scorsese will apply a rectifier
activation function so it would look something like that.
We apply the rectifier activation function and then from
there the signals would be passed on to the output layer
where the sigmoid activation function would be applied and
that would be our final output. And that could predict a
probability for instance so this combination is going to be
quite common where in the hidden layers we apply the
rectifier function and then output there we apply the
sigmoid function. So there we go. Hope you enjoyed this
project now you are quite well versed in four different types
of activation functions and you will get some hands-on
practical experience with them throughout this course. You
will be using them all over the place so you'll get to know
them quite intimately and you should be quite comfortable
with them.
HOW DO NEURAL
NETWORKS WORK
Today we're talking about how neural networks work. Now
we've done a lot of ground work. We've talked about how
neural networks are structured, what elements they consist
of and even their functionality. And today we're going to
look at a real example of how an unusual neural network
can be applied and we're actually going to work step by
step through the process of its application so we know what
is going on. So let's have a look at what example we're
going to be talking about. We're going to be looking at a
property evaluation so we're going to look at a neural
network that takes in some parameters of our property and
value values. And the thing here. There's a small caveat for
today's project and that is we're not actually going to train
the network. So a very important part in neural networks is
training them up and we're going to look at that in the next
projects in this section. For now we're going to focus on
actual applications are we going to work with a neural
network that we're going to pretend is already trained up
and that will allow us to focus on the application side of
things and not get bogged down in the training aspect and
then we'll cover off the training when we already know the
end goal we're working towards. Sounds good. All right, let's
jump straight into it. So let's say we have some input
parameters. Right so let's say we have full parameters
about the property we have area in square feet we have the
number of bedrooms that distance the city and Miles the
New York City and the age of the property and all of those
four are going to comprise our inputs layer. Now of course
they're probably way more parameters that define the price
of a property but for simplicity sake we're going to look at
just this for now. It's a very basic form. A neural network
only has an input learning an output layer so no hidden
layers and our output layer is the price that we're
predicting. So in this form what these input variables would
do is they would just be weighted up by the synopses and
then the output there would be calculated. Basically the
price would be calculated and would get a price. And for
instance the price could be calculated as simply as the
weighted sum of all of the inputs. And again here you could
use pretty much any function you could use. What we're
using now we could use any of the activation functions we
had previously you could use logistic regression. You could
use a squared function; you can do pretty much anything
here but the point is that you get a certain output. And
moreover most of the machine learning algorithms that
exist can be represented in this form and this is basically a
diagrammatic representation of how you deal with the
variables or by changing the way it's a formalized you can
accomplish quite a lot of the machine learning algorithms
that we've talked about before and put them into this form
and that just tends to show how powerful Noul are neural
networks are.
Even without the hidden layers we are ready where we have
a representation that works for most other machine learning
algorithms. But in neural networks what we do have is an
advantage that gives us lots of flexibility and power which is
where that increase in accuracy comes from. And that
power is the hidden layers and there we go that's our hit
Alair we added it in and now we're going to understand how
that hidden lair gives us that extra power. And in fact to do
that we're going to walk through an example so as we
agreed this neural network has really been trained up and
now we're just going to plug in we're going to imagine they
were plugging in a property and we're going to walk step by
step through how the neural network will deal with the input
variables and calculate the Hindol area and then calculate
the output. So let's go through. This is going to be exciting.
All right. We've got all four variables on the left and we're
going to first start with the top Nurin on the Hindle there.
Now as we previously saw in the press literally all of the
neurons from the input layer have Cynapsus connecting it to
each one of them to the top neuron in the hidden lair. And
those systems have weights. Now let's agree that some
weights will have a non-zero value some ways will have zero
value because basically not all inputs will be valid or not all
inputs will be important for every single neuron, sometimes
inputs will not be important. Here we can see two examples
that X-1 next three the area and the distance to the city and
Miles are important for that neuron whereas bedrooms and
age are not like let's think about this for a second why would
that be the case. Like why would a neuron be linked to the
area and the distance what does that what could that mean.
Well that could mean that normally the further away you get
from the city the cheaper real estate becomes and therefore
the space in square feet of properties becomes larger. So for
the same price you can get a larger property the further
away you go from the city that's normal right. That that
makes sense and probably what this neuron is doing is it is
looking specifically like a sniper it's looking for area
properties which are not so far from the city but have a
large area. So for their distance from the city they have an
unfair square foot area. Right so something as abnormals
height is higher than average so they're quite close to the
city but they're still large as opposed to the other ones at
the same distance and so that neuron again we're
speculating here but that neuron might be picking up laser
picking out those specific properties and it will activate and
hence the activation function it will activate it'll fire up only
when the certain criteria is met that you know the distance
and the area of the proper distance to Syrian air of the area
of the property and it performs on calculations inside itself
and it combines those two and as soon as certain areas
where it fires up and that contributes to the price in output.
And therefore this neuron doesn't really care about
bedrooms and age of the property because it's focused on
that specific thing. That's where the power of the neural
network comes from because you have many of these years
and will see just now how the other ones work. But what I
want to agree here is that let's not even draw these lines for
the synopses that are not in place so that we don't clutter
up our image as the only reason we're not going to draw
them so let's just get rid of those too. And that way we'll
know exactly OK so this neuron is focused on area and
distance to the city. All right. So as always we agree on that.
Let's move on to the next. Let's take them one in the middle
here. We've got three parameters feeding into this neuron
so we've got the area, the bedrooms and the age of the
property. So what could be the reason here? Let's again let's
try to understand the intuition and the thinking of this
neuron. How is this neuron thinking? Why is it picking these
two parents? What could it be, what could have a hit like
found in the data. Right so we've already established this
trained up data set the training has happened a long time
ago maybe like a day ago or somebody is written up as it is
now we're just applying and we know that this neuron
through all of the thousands of examples of properties has
found out that the area plus the bedrooms plus the age
combination of those parameters is important. Why could
that be the case?
Well for instance maybe in that specific city in those suburbs
that this neural network has been trained up in perhaps
there's a lot of families with kids who have two or more
children who are looking for large properties with lots of
bedrooms but which are new rights which are not old proper
because maybe that's in that area almost appropriate or
kind of like big properties are usually old. But there's lots of
modern families and maybe there has been a social
demographic shift and or maybe there's been like a lot of
like some growth in terms of employment and jobs for the
younger self population maybe just you know the like the
population demographics have changed and now younger
couples or younger families are looking for properties but
they prefer new properties so they want the age of the
property to be lower and hence from the training that this
neural network has undergone. It knows that when there's a
property with a large area and with lots of bedroom with
these three at least three bedrooms for the parents for the
first of the second child for at least three bedrooms maybe a
guest room when you property with high area and lots of
bedrooms that is valued that in that market that is valuable
so that Meuron has picked that up. It knows that. OK so this
is what I'm going to be looking for. I don't care about the
distance to the city and Miles wherever it is as long as it has
a high area and lots of bedrooms. As soon as that criteria is
met the neuron fires up and the combination of these two
parameters and this is again this is where the power of the
neural network is coming from because it combines these
two parameters into a brand new parameter into brand new
attributes. That helps with the evaluation that helps with the
valuation of the property. It combines them into a new
attribute and therefore it's more precise. So there we go
that's how that works. And let's look at another one. Let's
look at the very bottom one for instance this neuron could
even have picked up just one pair and that it could have just
picked up eight and not in any of the other ones. And how
could that be the case. Well this is a classic example of
when age could mean like as we all know the older
properties usually it's less valuable because it's worn out.
Probably the building is old, probably you know things are
falling apart. More maintenance is required so the price
drops in terms of the price of the real estate. Whereas a
brand new building it would be more expensive because it's
brand new. Perhaps if a property is over a certain age that
could indicate that it's a historic property. For instance if a
property is under 100 years old then the older it is the less
valuable it is. But as soon as it jumps over 100 years old all
of a sudden it becomes a historic property because this is a
property where people still have hundreds of years ago. It
tells a story. It's got all this history behind it and some
people like that some people value that. In fact quite a lot of
people would like that and would be proud to live in a
property and especially in the higher socioeconomic classes
they would show off to their friends or things like that and
therefore properties that are over 100 years old could be
deemed as historic. And therefore this neuron as soon as it
sees a property over 100 years old it'll fire up and contribute
to the overall price. And otherwise if it's under 100 years old
then it won't work and this is a good example of that. The
rectifier function being applied. So here you've got
something very like a zero until a certain point and then
let's say 100 years old and then after 100 years old the
older it gets the higher the value the higher the contribution
of this neuron to the overall price. And there's just a
wonderful example of a very simple example of this rectifier
function in action. So there we go. That could be this year.
And moreover the neural network could have picked up
things that we wouldn't have thought of ourselves right. For
instance bedrooms plus distance the city maybe that's in
combination somehow contributes to the price maybe not
as strong as the other neurons and it contributes but it still
contributes or maybe it detracts from the price that could
also be the case or other things like that and maybe add
your own picked up all for a combination of all four of these
parameters and as you can see that these neurons this
whole hidden layer situation allows you to increase the
flexibility of your neural network and allows you to really
allows the neural network to look for very specific things
and then in combination that's where the power comes
from. It's like that example: the answer I'd like is that an ad
by itself cannot build an anthill. But when you have like a
thousand or 100000 ads they can build an anthill together
and that's that's the situation here. Each one of these
neurons by itself cannot predict the price. But together they
have super powers and they predict the price and they can
do quite an accurate job if trained properly. And that's what
this whole Course is about understanding how to utilize
them.
HOW DO NEURAL
NETWORKS LEARN
Now that we've seen your own networks in action, it's time
for us to find out how they learn. So let's get right into it.
They are two fundamentally different approaches to getting
a program to do what you want it to do. One is hard coded
coding where you actually tell the program's specific rules
and what outcomes you want. And you just guide it
throughout the whole way and you account for all the
possible options that the program has to deal with. On the
other hand you have neural networks where you create a
facility for the program to be able to understand what it
needs to do on its own. So you basically create this neural
network where you provide inputs, you tell it what you want
as outputs and then you let it figure everything out on its
own. Two fundamentally different approaches and that is
something to keep in mind as we go through these projects.
Our goal is to create this network which then learns on its
own. We are going to avoid trying to put in the rules and a
good example that I can give you right now is this will come
further in the course but it's just a very visual example for
instance. How do you distinguish between a dog and cat fur
on the left side on the process depicted on the left you
would program things like the cat's ears have to be like this
look out for whiskers look out for this type of nose look out
for this type of shape of face look out for these colors you
kind of you'd describe all these things and you'd have
conditions like if if the ears are pointy than cat if the ears
are sloping down and possibly dog and so on. On the other
hand for a neural network you just code the neural networks
you code the architecture and then you point the neural
network at a folder with all these cats and dogs with images
of cats and dogs which are already categorized and you tell
it OK I've got you I've got some images of cats and dogs go
and learn what a cat is. Go and learn what a dog is. And the
neural network will on its own understand everything it
needs to understand and then further down once it's trained
up when you give it a new image of a cat or dog it will be
able to understand what it was. So there they are, those are
the two fundamentally different approaches.
And today we're going to slowly start getting into how that
second approach works. All right. So let's get straight to it.
Here we have a very basic neural network with one layer
called a single layer feedforward neural network and it is
also called a perception. Now before we proceed one thing
that we do need to adjust is that output value. Right now
you can see that it's just a Y. We need to put a y hat in
there. And the reason for that is usually y stands for the
actual value and that's what we're going to be using. So
why is it going to be the actual value which we see inreality
output value is the predicted value by the algorithm by the
neural network. Why is the output value? Basically that's the
denomination for the output value. And the perception that
was first invented in 1957 by Frank Rosenblat and his whole
idea was to create something that can actually learn and
adjust itself. And this is what we're going to be looking at
now. So we've got our precepts drawn. Let's see how our
perception learns. So let's say we have some input values
that have been supplied to the perception and or basically
to our own network. Then the activation function is applied.
We have an output and now we're going to plot the output
on a chart. So there it is our output y hat. Now what we
need to do is in order to be able to learn we need to
compare the output value to the actual value that we want
the neural network to get right. And that is the value y. And
so if we put it here you'll see that there's a bit of a
difference. Now we're going to calculate a function called
the cost function that is calculated as one half of the square
difference between the actual value and output value. Now
there are many ways you can come up with class functions.
There are many different cost functions that you can use.
This is probably the most commonly used call function and
why it is specifically this function that we use will find out
further down when we're talking about a gradient decent
but for now we're just going to agree that this is the cost
function and basically what the cost function is telling us is
what is the error that you have in your prediction. And our
goal is to minimize the cost function because the lower the
cost function the closer the y hat is to y. OK so as only we
agree on that let's proceed. So basically from here what
happens is there is a cost function and from here what
happens is now we're going to once we've compared now
we're going to feed this information back into the neural
network. So there we go there's the information going back
into the neural network and it goes to the weights and the
weights get updated. Basically the only thing that we have
control of in this very simple neural network are the weights
w 1 W2 all the way to W.. And our goal is to minimize the
cost function so all we can do is update the weights. So we
update the weights and tweak them a little bit. And how
exactly we'll find out for the down but for now we agree that
we have the the weights and then we continue so. But here
I've put up this screenshot of the data just to make some
one point very clear that right now throughout this whole
experiment everything we're doing right now we're dealing
with just one role. So we're dealing with a dataset of one
row where we have for instance we're dealing with how long
you study it like the variable that we're predicting is what.
What's the result you're going to get on an exam?
And the dependent independent variables that we have is
how many hours did you study for how many hours did you
sleep and what did you get on the quiz. In the mid-
semester, there is a quiz in the middle of the semester.
What percentage did you get there? So based on those
variables we're trying to predict what score you'll get for the
exam and examine the 93 percent that's the actual value.
So that's why. So. So we feed these three values into a
neural network again for the second time now and then
we're going to be comparing the result to white. So let's see
how this works. We feed these values into the neural
network. Everything gets adjusted and weights get it just so
as you can see this is again we feed the values again the
point here is that we're feeding in the same ball so we only
have one roll we're trying to do we're training on one row.
This is because this is just a very simple basic example.
Then we'll see what happens when there's morals. So again
we feed these rows in our cross-functional get adjusted. As
you can see everything happens along those lines again. So
as you say every time our white hat is changing because
we've tweaked the weights. All I had is changing our clothes
function changing this whole look again so we feed those in.
Why does changing clothes function? We get information
back feedback to the weights so that the weights get
adjusted again. We feed in the same values every time
everything gets adjusted goes back to the weights. And one
more time feed in. OK. And another time so we've adjusted
the way that just the way we feel in the information. And
there we go. So now this time the white hat is equal to y
cross-functional 0. Usually you won't get a cost function
equal to zero. But this is a very simple example. So
hopefully all that makes sense every time we feed in exactly
that same row because just in this case we're just dealing
with that one row into our neural network. Well then the
weights get the values get valid supply supply the ways
activation function is applied we get y hat y had as
compared to Y then we see how the cost function is
changed. Feedback and the feed that information Bakker's
on your own network and then just adjust the weights again.
And then we repeat the same process again with the same
exact row. We're trying to minimize that cost. So up until
now we've been dealing with just that one row. Let's see
what happens when you have multiple roles. So here's the
full data set. We have eight rows of how many hours you
slept or maybe these are different students in the day
taking the same exam, how many other hours they studied,
how many hours they slept before the exam, and their final
result on the test. And as you can see here on the left I've
got eight of these perceptions actually. They are all the
same perception so this is also important. I just multiplied it
or like duplicated eight times just so that we can.
Conception is that. But the important thing here is the same
neural network we're going to be feeding these into one
Samual network. So let's go, let's get started. So one airport
as you'll hear had lain mentioning one airpark is when we go
through a whole dataset and we train our neural network on
all of these roles so those lists are. So there's our first row
and there's Why had for the first row there's a second role
there's why I had for the second round. So again it's being
fed into the same neural network every time. I just copied
them several times so we can visually see how this is
happening. Then again as it's happening again, that's third
row fourth row there is our white head for the fourth row
and so on. Basically then we get the same values for the
remaining four rows as well. So every time we just feed in a
row into our neural network we get about it. Then we
compare the actual value. So they are the actual values. So
for every single roll we have an actual value. And now based
on all of these differences between y hat and why we can
calculate the cost function which is the sum of all of those
squared differences between why and why and how all of
that is halved. And there's our cost function. And basically
now what we do after we have the full cost function we go
back and we update the weights we update a W 1 WTW. And
the important thing to remember here is that all of these
perception's are actually one neural network.
So there's not eight of them, there's just one. And when we
update the weights we're going to update the weights in
that one neural network so basically the weights are going
to be the same for all of the rows. So it's not the case that
every role has its own weight. Now all the rows share the
weights and so that's why we looked at the cost function
which is the sum of the square differences and then we
updated the weights and now from here there was just one
iteration. Next we're going to run this whole thing again.
We're going to feed every single row into the neural
network, find out our cost function and do this whole
process again. So just as we saw previously where we had
just one row and we were doing everything again and again
and again the same thing here. But now we're going to be
doing Pedros or 800 rows or eight thousand rows however
many rows you have in your data set. You do this process
and then you calculate the cost function. And the goal here
is to minimize the cost function and to get as soon as you
found a minute of the cost function that is your final neural
network that means your weights have been adjusted and
you have found the optimal weights for this dataset that you
began your training on and you're ready to proceed to the
testing phase or to the application phase. And this whole
process is called back propagation. So some additional
reading that you might want to do for the cost function and I
know we just talked about one and there are many different
ones. A good article is located on cross validated. It's called
a list of course functions used in neural networks alongside
applications. So the euro is there but you can just google for
that exact search term or search phrase and you will see
that this one will be the first one that pops up. It's actually
got some good examples and applications or use cases for
different cost functions so if you're interested to learn more
about cost functions Check out this article. And on that note
I hope you enjoy this project.
GRADIENT DESCENT
We're talking about gradient descent. What we learned
previously was that in order for a neural network to learn
what needs to happen is back propagation and that is when
the error the difference or the sum of squared differences
between y hat and Y is back propagated through the neural
network and the weights are adjusted accordingly. So we
saw that and today we're going to learn exactly how these
weights are adjusted. So let's have a look. This is our very
simple version of a neural work, a percept Trauner, a single
letter feedforward neural network and what we can see here
is this whole process in action where we've got some input
value then we've got to wait then an activation function is
applied. We have to get y hat and then we compare it to the
actual value we calculate the cost function. So how can we
minimize the cost function? What can we do about it? Well
one approach to do it is a brute force approach where we
just take all lots of different possible weights and look at
them and see which one looks best and what we do is for
instance we would try out for let's say for example a
thousand weights and we'd try them out that would get
something like this for the cost function and this is a chart of
on the Y axis of cross-functional the vertical axis on the
horizontal axis of y hat. And because you can see the
formulas I had minus Y squared. This is what the cost
function would look like. And basically you'd find the best
one is over here.
So a very simple, very intuitive approach. Why not do this
brute force method. Why not just try out a thousand
different costs for a thousand different parameters or inputs
for weights and see which one works best. You'll find the
best one that way. Well if you have just one way to optimize
this might work but as you increase the number of weights
increase the number of Synopsys in your network you have
to face the curse of dimensionality. And so what is the cause
of dimensionality? The best way to describe this or explain it
is to just look at a practical example. So remember this
example we had when we were talking about how neural
networks actually work where we were building or running a
neural network for a property valuation. So this is what it
looked like when it was trained up already well when it's not
trained before it's trained before we know which one what
are the weights. The actual neural network looks like this.
Right because we have all these different possible synopses
and we still have to train up the weights and here we have a
total of 25 weights so four times five at the start plus five
more from the hit out there 25 weights total. And let's see
how we could possibly brute force 25 ways. This is a very
simple neural network right here. Very simple just one hit in
there and how could we brute force our way through a
neural network of this size. Well there's some simple
mathematical calculations. We have 25 weights. So that
means if we have a thousand combinations that we're going
to solve for every weight the total number of combinations
is 1000 to the power 25 or a thousand or 10 to parse any
five different combinations. Now let's see how Sun is the
way to light the world's Fosse's supercomputer as of June
2016 and how it would approach this problem. Right so
Sunway ties who light. It looks like this is a whole huge
building pretty much for this one supercomputer and it got
the Guinness World Record for being the Fosses
supercomputer. Right now it is the fastest supercomputer in
the world and some way tie lights can operate at a speed of
93 of flops. Flop stands for floating operation per second. So
it can do ninety three to the power oil. Times ten to the
power of 15 floating operations per second. That's how
quick it is in comparison. Average computers right now do
like just over several gigaflops and so on. So it is kind of like
those ranges. Less than TEI Sunway type light. So suddenly
it's all a lie. It is in the forefront of technology. And let's say
hypothetically that it can do one, one test, one combination
of four on your own network in one floppy disk and one
floating operation that is not possible. That is not practical
because you need multiple floating operations to test out a
single weight in your own little. But let's give it a head start.
Let's say that it can do it in an ideal world. It can do that in
one floating operation and it can do one test per floating
operation. That means it will Doddridge still require ten of
any five. Divide by ninety three times ten to about 15
seconds to come to run all of those tests to brute force
through that network. So that means one or approximant
tends to power 58 seconds and that is the same as the
power of 50 years. That is a huge number that is longer
than the universe has existed and that is definitely not
going to simply this number is so huge it's just definitely not
going to work for us at all in our optimization. So there we
go. This is a no no. Even on the world's fastest
supercomputer Sunway tail light. So we have to come up
with a different approach. How are we going to find the
optimal weight? By the way, our neural network was very
simple. What about if the neural network looks like
something like this or even greater than that then yeah it's
just not going to happen at all ever. So the method we're
going to be looking at is called gradient descent and you
may have heard of it already. If not we will find out what it is
right now. So there's our cost function and now we go to see
how we can foster a faster way to find the best option. So
let's say we start somewhere you're going to start
somewhere. So we start over there. And from that point in
the top left what we're going to do is we're going to look at
the angle of our cost function at that point so we're just
going to basically that's what's called gradient because you
have to differentiate. We're not going to look at the
mathematical equations. We will provide some tips on
additional reading at the end of the next lecture. But
basically you just need to differentiate, find out what the
slope is in that specific point and find out if the slope is
positive or negative. If the slope is negative like in this case
means that you're going downhill so to the right is downhill
to the left is uphill. And from there it means you need to go
right. Basically you need to go downhill. And that's what
we're going to do. Boom takes a step right. The ball rolls
down again. Same thing. You calculate the slope and the
slope is positive meaning writer's uphill left is downhill and
you need to go left and you're on the ball down.
And again you calculate the slope and you're all the bull
right there you go so that's how you find in simple terms
that's how you find the best WAITES The best situation that
minimizes your cost function. Of course it's not going to be
like a ball rolling is going to be a very zigzag type of
approach but it's easier to remember or kind of it is more
fun to look at it as a ball rolling. But in reality yes you just
it's going to be like a step by step approach is going to be a
zigzag type of method. Yeah and also there's lots of other
elements to it. There's things like for instance why like why
does it go down why does it not go way over the line so it
could have jumped out of this gone upwards instead of
downwards and things like that so there are parameters that
you can tweak. And again we will mention where you can
find out more on that. And plus we'll have this in practical
application but in the simplest intuitive approach this is
what is happening. We are getting to the bottom by just
understanding which way we need to go. Instead of brute
forcing through thousands and thousands and millions and
billions and quadrillions of combinations. We can just simply
every time have a look at where it is, where it is sloping so
right like you or you imagine you're standing on a hill. Which
way does it feel that it's going downwards and whichever
way it is going down and you just keep walking that way you
like, take 50 steps away and then you assess again OK
which way is it going downwards this way. OK and I'll take
50 steps or less and take 40 steps that way. So it gets less
and less and less as you get closer. So here's an example of
gradient descent applied in a two dimensional space. So
that was a one dimensional example. Here we have a two
dimensional space for the gradient descent as you can see
it's getting closer to the minimum and it's also called
gradient descent because you're descending into the
minimum of the cost function and find that it has a gradient
descent applied in three dimensions. This is what it looks
like if you projected onto two dimensions you can see
zigzagging its way into the minimum. So there you go, it
was the gradient descent index. We'll talk about stochastic.
Gradient descent is really a continuation of this project.
STOCHASTIC GRADIENT
DESCENT
Previously we learned about gradient descent and we found
out that it is a very efficient method to solve our
optimization problem where we're trying to minimize the
cost function. It basically takes us from 10 to the power of
57 years to solve a problem within minutes or hours or
within a day or so. And it really helps speed things up
because we can see which way is downhill and we can just
go in that direction and take steps and get to the minimum
faster. But the thing with the stick with gradient descent is
that this method requires for the cost function to be convex.
And as you can see here we've specifically chosen a convex
cost function. Basically convex means that the function
looks similar to what we are seeing now that it's just kind of
vext into one direction and that in essence has one global
minimum. And that's the one that we're going to find.
But what if our function is not convex. What if our cost
function is not correct. What if it looks something like this.
Well first of all how could that happen. Well that could
happen because if we first of all choose a cost function
which is not the square difference between why how and
why or if we do choose the cost function which is like that.
But then in a multi dimensional space it can actually turn
into something that is not convex. And so what would
happen in this case if we just tried to apply our normal
gradient descent method something like this could happen.
We could find a local minimum of the cost function rather
than the global one. So this one was the best one and we
found the wrong one and therefore we don't have the
correct weight. We don't have an optimized neural network.
We have a subpar neural network. And so what do we do in
this case? Well the answer here is stochastic. Gradient
descent. And it turns out the sarcastic gradient descent
doesn't require for the cause function to be convex. So let's
have a look at the two differences between the normal
gradient descent that we talked about and the stochastic
range. So normal green descent is when we take all of our
rows we plug them into our neural network and once again
here we've got the neural network copied over several times
but the rows are being plugged into that same neural
network every time. So there's only a one year old trick that
is just for Kissel's action purposes. And then once we plug
them in we've calculated our cost function based on the
formula right and looking at the chart on the bottom and
then we adjust the weights then this is called the gradient
descent method or it's also the proper term is that batch
gradient descent method.
So we take the whole batch from our sample, we apply it
and then we run that the stochastic gradient descent
method is a bit different. Here we take the rows one by one
so we take this row we run our neural network and then we
adjust the weights. Then we move onto the second row we
take the second row we run our neural network. We look at
the cost function and then we adjust the weights again and
then we take another Rohtak rose three we run our neural
network will look at the cost function we adjust the weight.
So basically we're looking at adjusting the weights after
every single row rather than doing everything together and
then testing weights using two different approaches. And
now we're going to just compare the two side by side. So
here they are: this is how to visually remember them. So
you've got the best gradient descent where you are
adjusting the weights after you've run them after you've run
all of the rows in your neural network and then basically just
the weights and you run the whole thing again iteration
iteration iteration in the sixth grade in December and you
run one row at a time and you adjust the weights just the
way it's just the weights and then you do everything again
and again and that is called discussing. And you said that
the main two differences are that the sarcastic gradient
descent method helps you avoid the problem where you find
those local extremities or local minimums rather than the
overall global minimum. And the reason for that in simple
terms is that there is project of the stochastic gradient
descent method having much higher fluctuations because it
can afford them. It's doing one iteration or one row at a time
and therefore the fluctuations are much higher and it is
much more likely to find the global minimum rather than
just the local minimum. And the other thing about the
sarcastic gradient descent I think is a bad gradient is the it's
foster like the first impression that you might have is
because it's doing grow one at a time it is slower but
actually in fact it is faster because it is it doesn't have to
load up all the data into memory and run and wait until all
of those rules are on altogether. You can just roll around
them one by one so it's a much lighter algorithm and is
much faster in that sense, though it has way more in that
sense as it has more advantages over the bad. Gradient
descent method. The main advantage of or domain kind of
like proffer the bad gradient descent method is that it is a
deterministic algorithm or other than to cast a gradient
descent being a sarcastic algorithm meaning it's random
and with the best gradient and method as long as you have
the same starting weights for your neural network. Every
time you run the batch gradient descent method you will
get the same iterations the same results for you all the way
your weights are being updated for us to have for the
sarcastic gradient descent method. You won't get that
because it is a stochastic method you're picking your roles
possibly at random and you are updating your neural
network in a sarcastic manner and therefore you're just
going to every single time you run the category a decent
method even if you have the same weights at the start
you're going to have a different process and different
iterations to get there. So that's in a nutshell what's to
castigate and dissent is also there's a method in-between
the two called the Mini batch gradient descent method
where you combine the two and you basically run rather
than running a whole batch of running one at a time. You
run batches of rows maybe 5 10 100 however many rows
you decide to set you run those that number of rows at a
time then you update your way single digits and so on. And
that's called the Mini Batch gradient descent method. If
you'd like to learn more about gradient descent there's a
great article which you can have a look at. It's called a
neural network in 13 lines of Python part to great and
descend by Andrew Trask and the links below it's a good 12-
15 article with very well-written very very simple terms.
It's got some interesting philosophical or just interesting
thoughts on how to apply green decent water. You know
advantages and disadvantages and how to do things in
certain situations so you got some very cool tips, tricks and
hacks. Very easy read so definitely check that out. And
another one a bit more heavier read. For those of you who
are into mathematics who want to get to the bottom of
mathematics , why. Gradient descent is that specific. What
are the formulas that are driving gradings And how is it
calculated and so on. Check out the article or actually the
book. It's a free online book called neural networks and
deep learning by Michael Nielsen 2015 book. It's just
basically it's all on line you can go ahead and check it out
there. And there again a very soft introduction to
mathematics. But then for a mother the math but the
mathematics are pretty heavy as you go along as you read
through the article. But at the same time it gets you into
that mood. I think you mean like a warm up chapter where
you first warm up the math and then you jump into I'm so
interested in math then this is the article to go to. And there
we go, so that's in a nutshell the difference between
Graney's sense to cast the gradient descent and how to
work.
BACKPROPAGATION
We're going to wrap up with back propagation. All right so
we're you know pretty much everything we need to know
about what happens in in your all that we know that there's
a process called Forward propagation where information is
entered into the input layer and then it's propagated
forward to get our white hats our output values and then we
compare those to the actual values that we have in our
training set. And then we calculate the errors then the errors
are back propagated through the network in the opposite
direction and that allows us to train the network by
adjusting the weights. So the one key important thing to
remember here is that back propagation is an advanced
algorithm driven by very interesting and sophisticated
mathematics which allows us to adjust the weights. All of
them at the same time all the weights are adjusted
simultaneously. So if we were doing this manually or if we're
coming up with a very different type of algorithm than Even
if we calculated the error and then we were trying to
understand what effect each of the weights has on the error
we'd have to somehow adjust each of the weights
independently independently or individually. The huge
advantage of backwardation and it's a key thing to
remember is that during the process of back propagation
simply because of the way the algorithm is structured. You
are able to adjust all the way at the same time so you
basically know which part of the error each of your weights
in the neural network is responsible for. Now that is the key
fundamental underlying principle of back propagation. And
this was why it picked up so rapidly in the 1980s and this
was a major breakthrough. And if you'd like to learn more
about that and how exactly the mathematics works in the
background then a good article which we've already
mentioned is the neural networks and deep learning is
actually a book by Michael Nielsen. You'll find the
mathematics written out and it will help you understand
how exactly this is possible. But for now for our purposes if
from an intuition point of view the important part is to
remember that that's what backwardation does. It adjusts
all of the weights at the same time. And now we're going to
just wrap everything up with a step by step walkthrough of
what happens in the training of a neural network. All right so
step one we randomly initialized the weights to small
numbers close to zero but not zero. We didn't really focus on
the initialization of weights during the intuition projects but
then we have to start somewhere and they are initialized
with random values near zero. And from there through the
process for propagation by propagation these weights are
adjusted until the error is minimized. So the cost function is
minimized. Then step two inputs the first observation all
your data sets to the first row into the input Lehre each
feature is one input. So basically take the combs and put
them into the input nodes separately for propagation from
left to right. The neurons are activated in a way that they
pick in our vision. Neuron activation is limited by the
weights; the weights basically determine how important
each neuron's activation is, then propagate the activation
until getting the product as a result by hat. In this case. So
basically you propagate from left to right.
You go all the way until you get to and you get your hat.
Then compare the result to the actual result. Measure the
generated error and then you do the backwardation from
right to left the air is bipolar again. Update the weights
according to how much they are responsible for the error.
Again you are able to calculate that because of the way the
back propagated perturbation algorithm is structured the
learning rate decides by how much we update the weights
the learning rate as parameter you can control in your
neural network. Step 6 repeat steps 1 2 5 and update the
weights after each observation. That is called reinforcement
learning and in our case that was stochastic gradient
descent or repeat steps 1 to 5. But that way it's only after a
batch of observations or batch learning it's either a full
gradient descent or badge green Nissan or mini batch
gradient descent and step seven when the whole train has
passed through an artificial neural network that makes an
epoch redo more epochs. So basically just keep doing that
and doing that and doing that and to allow your neural
network to train better and better and better and constantly
adjust itself as you minimize the cost function. So there we
go. Those are the steps you need to take to build your
artificial neural networks and train them. And these are the
steps that you will be taking till I've had lunch in the
practical projects.
PLAN OF ATTACK
We're going to color off the plan of attack. How are we going
to learn everything in this section? There's so much to learn.
Let's see how we're going to approach this. All right, what
we learn in the section. First of all we'll talk about what
convolutional networks actually are very important to
understand the end goal that you're working towards before
you actually start working towards it. So we'll hear what
features will have a look at a few little examples. Will
compare the human brain to artificial neural networks in
terms of image recognition. So it'll be a fun light project to
get us started for this whole section. Then we'll talk about
Step 1. Diving straight into it can revolutionize operation. So
this part of the Course contains several steps that we need
to go through in order to build a convolutional neural
network and that's how these internals are going to be
broken up.
So this was going to be step one of the evolution operation.
We'll learn everything about feature detectors. We'll talk
about filters. We'll talk about future maps. And you know
what the different parameters they're what they mean and
have a look at some visual examples as well. Then we'll talk
about Step 1 Part B the realm you Lehre or really layer
which is then a rectified linear unit and we'll talk about why
linearity is not good and how we want more nonlinearity in
our network for image recognition. Then we'll talk about
Step 2 pooling and will understand how pooling works. We'll
talk specifically about Max pooling and we also mention a
couple of things about I mean pooling or some pooling and
other approaches that you can take to the process of
pooling. Also in this lecture we'll have a really cool example
so there will be a very visual interactive tool that we're
going to look at. So make sure to stick around to the end of
that lecture because that's going to add a lot of value to
your learning process. What we're going to discuss at the
end there. Step three: flattening. So here we will. It's going
to be a quick project on how to proceed from your pooled
layers to your flatten lair and then we're going to talk about
full connections. This is a very meaty project that puts
everything together and puts everything into perspective
and actually shows you how everything works. At the end of
the day, how those final neurons understand how to classify
Umich is very very important. Tauriel and hopefully that will
summarize or kind of put everything together for you. And
finally we'll have a summary which will summarize
everything we've talked about. And as an extra little feature
I've included a project on soft Max and cross entropy. So you
don't have to take this project but I thought it would be a
great addition of knowledge because these are terms that
you will come across when dealing with convolutional neural
networks. So maybe take it right away. Maybe when you
come across these terms you will always know you can
come back to this course and take this project to
understand better what soft Max and cross entropy are. And
also as always throughout these editorials there will be lots
of recommended reading for you to further upscale and get
more knowledge. And on that note I can't wait to see the
first project.
WHAT ARE
CONVOLUTIONAL
NEURAL NETWORKS
Today we're kicking off convolutional neural networks. It is
going to be exciting. Let's dive straight into it. We're going
to start off with an image. What do you see when you look
at this image? Do you see a person looking at you or do you
see a person looking to the right you can see that your brain
is struggling and is struggling to adjust if you look to the
right side of the image. Just look at the right border there
which you'll see a person looking to the right. If you look at
the left border of the image you'll see a person looking at
you. And this just proves that what our brain is looking for
when we see things is features depending on the features
that it sees depending on the features that you process. You
categorize things in certain ways. So when you look on the
right side of the image you see certain features of a person
looking to ride because they're closer to your center of focus
and therefore your brain classifies as a person looking to the
right. When you look to the left side of the image you see
more features of a person looking at you and therefore your
brain classifies it as such. So let's have a look at another
one. This is a very famous image. You probably have already
seen it. But what you see here. So some people will say that
they see a young lady wearing a dress looking away. Some
people say they see an old lady wearing a scarf on her head
looking down. So I'm going to point this out and you'll see
that will become very obvious so this is the face of the
young lady looking away. She's looking into the distance as
her coat. That's her hair, that's her little feather in her hair
and on the other hand. This is the head of the old lady
looking down her nose, her mouth, her chin, that's the scarf
on her head and she's looking down. So as you can see two
in one and depending on which features your brain picks up
it will switch between classifying each image as one or the
other. The oldest one of these illusions recorded in the
printed work is this one. It's the duck or the rabbit. So is this
a duck or is this a rabbit. Another example. And now I'm
going to show an image which will just for a second just look
at it and see what emotions or what kind of visual
experience you go through. So what do you see? Do you feel
a bit not dizzy but a little bit dazzled like your brain is trying
to try and understand what it is, what it is like it's trying to.
Is jumping between her eyes up and down and this is a
classic example of when there are certain features where it
could be this it could be that but your brain cannot decide.
And because both seem plausible. Yeah so basically all
these examples illustrate to us how the brain works that it
processes certain features on an image or on whatever you
see in real life and it classifies that as. You probably have
been in situations when you look over your shoulder quickly
and you see something you think it's. I don't know if it's like
a ball but it turns out to be a cat or you think it's a car. Turns
out to be a shadow or things like that because you don't
have enough time to process those features or you don't
have enough features to classify things as such. And this is
for me this is very interesting because what we're going to
be doing with neural networks with have convolutional
neural networks is very similar and you'll find that the way
that computers are going to be processing images is going
to be extremely similar to the way we are processing
images so it's is very valuable to understand and just kind of
remember these things that this is how we do it. And I'm
going to take this lady off your screens because she's
probably already freaking out by now. So here's something
different. Here's an experiment done on computers on
convolutional neural networks so we're slowly moving now
from humans to computers. And this slide is from a story by
Geoffrey Hinton and here you have basically described an
experiment that he had done on some conventional neural
networks that he had trained up. So here you see three
images and we're going to go through them from left to
right and see how you would classify them and then see
how they can be reclassified. So on the left what do you
think this is. He probably said cheetah and you will be right.
And this is what the computer said so and the right right
away right off the bat we're going to learn how to read these
images because if you going to go deep into call
convolutional neural networks no pun intended you're going
to start learning more and more about and using them you'll
see a lot of these. And I've actually seen people read them
incorrectly so here at the top Shida is what it actually is. So
that's the actual correct label of the image that's what's the
label of the images regardless of any processing. And the
computer vision and then here are the guesses: the top four
or five sometimes guesses of the algorithm and they're
given the probabilities so the computer or the neural
network said Chitta personal apparel or Egyptian cat can be
one of the four. And the cheetah has the highest vote. And
throughout this part of the Course you understand what
these votes mean and how they are derived. But for now it's
pretty intuitive right. So it's a cheetah in reality and the
neural network guessed right. It said with a hyper ability
about like 95 99 percent. Then the second one. What do you
think? It is a bullet train. And the neural network was able to
distinguish between bullet train passenger car subway train
electric locomotives. Those are the top choice of course. It
had many more options these neural networks learn to
distinguish from not just four categories from dozens
thousands of categories at the same time. So those are the
four options that it picked. And so that's the bullet train and
its will. And so what did you think the last one is? There are
a couple of options or it's not very clear what it could be, a
frying pan could be a magnifying glass, it could even be a
pair of scissors some might say while the neural network
said it was a pair of scissors. But you can see how you can
go wrong here. First of all it's not a very clear image. And
also you can see that the probabilities are not as clear here
so the neural network was a bit confused and a bit
indecisive just as we are. So I said Scissors with the high
probability but then it had hand-glass which it actually was
with not not so far away on second place and frying pan
stethoscope.
So basically here you can see that scissors was its first
guess but the correct option was number two and that's why
it's highlighted in red. So there we go. That's what all the
drugs are already capable of. And this is actually quite an
old slide. This was several years ago. Now they're even
better and you will see that from the practical application
that you'll be coding together at lunch. But now let's try this
out a bit better. What convenience or convolutional neural
networks actually are and why are they gaining so much
popularity? And they actually are gaining popularity so you
can see here a Google Trends comparison I did just
yesterday. Here you can see that conventional illusional
neural networks are even taking over artificial neural
networks, so a massive increase. And this is going to keep
going that way because it is a very important field that is
where all the things happen such as self-driving cars. How
do they recognize people on the road, how to recognize stop
signs and things like that? How is Facebook able to tag
images or people in images and not only just like remember
previously years ago you had to tell people yourself then it
would recognize faces you had to add the names. And now
it just recognizes the faces and adds the names at the same
time. Well that is what convolutional neural networks are
capable of being on Facebook. If Jeffrey Hinton is the
godfather of artificial neural networks and deep learning
then yalla Kuhn is the grandfather of convolutional neural
networks Lukken is a student of Jeffrey Hinton's and in fact
here you can see them together. And Jeffrey Hinton now is
pioneering de-planning at Google young. Is the director of
Facebook artificial intelligence research and also a professor
at NYU. So we're slowly aware of this part of the core. Slowly
we are building up this way. These names are this kind of
picture of the profiles of the people who are driving this field
and next in the next couple of pars will get to know about a
few more and we'll have this whole Mafia as they call
themselves or you can call them mafia or conspiracy of
deep learning and you'll learn a bit more about how this
whole field developed. Yeah it's just these are just some
great great people. And so RIKOON back in the 80s and the
90s made significant contributions to the field of
convolutional neural networks. And as you'll see throughout
this course has been able to develop or help the world
develop something so extremely powerful. So moving on to
how illusional neural networks work. You have an input. It's
very simple, it's very straightforward so they have an input
image. It goes through the can illusion neural network and
you have a label so it classifies that image as something like
a Cheeto or a bullet train or something else. Now kind of like
going into a bit more detail. For instance you can officer
neroli has been trained up on certain images on certain
classified images or categorized images prior there been
higher prior. After that you can give it let's say a neural
network has been trained up to recognize facial expressions
and motions. You can give it the face of a smiling person not
just a face like a drawing of a face like this but the actual
face of a person smiling. And I'll tell you that that person is
happy and you can get a face of a person that's frowning. I'll
tell you that the person is sad. He can recognize these
emotions and as you can see that's already very powerful in
terms of so many different implications just this one
example you can think of right away and in both cases I'll
give you an operability so it won't say you know we're 100
percent the person's happy or sad. It'll be 99 or 98 or maybe
80 percent when it's unclear what's going on and just like
we are right sometimes we can mistake things for what
they're not. Or sometimes we can sometimes it's just not
clear if the person is smiling or frowning or if it's if it's a dog
or a cat or if it's a train or a bullet train. All right, sometimes
we don't have it. We haven't seen enough features. It all
goes down to features because that's how we process visual
information as we saw from the start of this project. So how
is a neural network housing a neural network able to
recognize these features? Well it all starts at the very basic
level you have. Let's say you have an image. One is black
and white image of two by two pixels and one is a color
image of two by two pixels while neural networks leverage
the fact that the black and white image is a two dimensional
array. So the way we see it right now on the left is just the
visual representation. I suppose some kind of picture. And
for simplicity's sake it's just a two way to picture but in
computer terms it's actually a two dimensional array with
every single one of those pixels having a value between 0
and 55. So that's eight bits of information to the two to the
power of eight is 256. So therefore the values from 0 to 255
and that's the intensity of the color. And in this case the
color white so 0 will be a completely black pixel. 255 will be
a completely white pixel and between them you have the
grayscale range of possible options for this pixel. And based
on that information computers are able to then work with
the image and that's kind of like the starting point that any
image that actually has a digital representation has a digital
form. And those are just basically ones and zeros that form
a number 0 to 255 for every single pixel and that's what the
computer works with. It doesn't actually work with you know
colors or anything it works with the ones and zeros at the
end of the day. That's kind of like the foundation of it all.
And in a color image it's actually a three dimensional array.
You've got blue pixel blue Larry Green and the red glare and
arrows and that sense for RGV red green blue. And each one
of those colors has its own intensity. So basically a pixel has
three three values assigned to it. Each one of them is
between 0 and 256 255. And therefore you can find out
what's this image and what color exactly this pixel is. By
combining those three values and again computers are
going to be working with that. So that's the foundation of it
all : the red channel, the green channel, and the blue
channel. And finally let's have a look at for instance an
example of a very trivial example of a smiling face. In
computer terms. If we just really simplify things instead of
having from 0 to 255 and having those values just so that
we can understand things better and really grasp the
concepts we're going to say zero is white one is black. Right.
So we're just going to simplify things to the extreme and
you will see that that image can be represented like that.
So the reason why we've brought this up is because we go
into all of our intuitions. Stroh's we get to structure an
image like this which is very simple but at the same time
then all those concepts can translate back to the 0 2 256
range of values and everything applies the same way there.
And the steps we're going to be going through if these
images are optimal one evolution. Step number two max
pooling. Step number three flattening and step number a
full connection and I can imagine that probably all of these
words mean much to you at the moment but by the end of
this section of the course you will understand them in great
detail and exactly what they're doing. So we'll get started in
the next project. For now the additional reading that you
might want to look into is a young Lukens original paper
that gave rise to an emotional neural network. It's called
gradient based learning applied to documentary cognition.
You may have seen this image before floating around the
Internet. It is from that paper so if you want to go back to
the very beginnings of how it all happened where it all came
from this is the paper to look into and I look forward to
seeing in the next project.
STEP 1 - CONVOLUTION
OPERATION
We found out what convolutional neural networks are all
about. And today we're going to dive into Step 1
convolution. So this is the convolution function and we try to
stay away from mathematics and keep things intuitive. But I
couldn't help but share this formula for you because it is so
simple a convolution is basically a combined integration of
the two functions and it shows you how one function
modifies the other or modifies the shape of the other and if
you've done any signal processing or electrical engineering
or a profession where signal processing is required you
would have inevitably come across a conclusion function. It
is quite popular now. Once again we're going to keep the
mathematics lights or keep them separate. And if you'd like
to get into the math behind the convolutional neural
networks a great additional read is Introduction to
convolutional neural networks by Jensine Wu who is a
professor at Nanjing University in China. This paper was
published literally days ago like five or six days ago and it is
oriented specifically at people who are starting out at
beginners who are getting to know convolutional neural
networks so the mathematics there should be accessible
and actually emailed to Professor Johnson. And yeah he said
his whole goal is to make or break the complex things down
so that people who are new to this field can understand.
And also he mentioned that he's got some materials
available on his homepage so if you yourself if you just
remove the last two parts and you just go to like Slash W.J. X
to that part that's his home page and you'll be able to find
more additional projects and materials which haven't been
published as papers but but he uses them in his projects so
you might find those useful so browse around there if you'd
like to get an introduction into the mathematics behind
coalitional neural networks and kind of build a solid base
around that area. But we're going to move on and we're
going to talk about the convolution. So what is a good
solution in intuitive terms here on the left. We've got an
input image as we discussed. That's how we're going to look
at images, just ones and zeros to simplify things. And you
can see the smiley face there. Then we've got a feature
detector so feature detectors are a three by three Matrix.
Does it have to be three by three? No it doesn't. Alex net I
think uses seven by seven. And then some other one of
those other famous ones uses like five by five feature
detectors. They can be different but usually you'll see that
they are three by three and there are you know reasons to
make them three by three so we're going to stick to the
conventional way. Having a three by three feature detector
also the feature detectors called these are important terms
because you might come across them. There are many
different terms for the feature detector but the most
common ones are feature detector or Eik might hear it being
called kernel or you might hear it being called Filter. So in
this course we're going to be using either filter or a feature
detector interchangeably but just bear in mind that it has
those names and a coalition operation is signified by an X in
a circle. Just as you saw in the formulas before and here
what happens is on an intuitive level or just think of it in
terms of what is actually happening in the background
rather than the mathematics. Well you take this feature
detector or filter and you put it on your image like you see
on the left. So you cover the for instance in this case the top
left corner the nine pixels in the top left corner and you
basically multiply each a valuable value so respect to value
so the top 0 by the top left value by the top left value then
basically is in position of a 1 one by position about 1 1
position number or 0 1 0 1 0 2 by 0 2 and so on. So it's
element wise multiplication on these matrices. And then you
add up the result. So in this case nothing matches up so it's
always either 0 by 0 0 or by 1. So the result is zero. And
here you can see that one of them matched up and one on
the left matched up. And therefore we've got a 1 here.
Nothing matched up nothing matched up nothing matched
up. Then we move on to the next throw and step at which
we're moving this whole filter is called the stride. So here
we have a stride of one pixel. Here you can see again
something matched up the bottom right corner matched up
against the stride but a bottom one in the middle matched
up here top right hand matched up the nothing measure.
The stride is one. You can change the stride. You can make it
one two. You're going to get three whatever you like.
Eventually the one that works well is usually one or two. So
that's what people stick to. And we'll talk about what the
stride is towards the end of this project. So here we've got
so we're matching absolutely when I hear you can see we've
got two because two of them matched up and so on and so
on. So on there we go there's another one that matches up
there we go and we're done. So what's what we have
created? Right. Couple of important things here. The image
on the right is called a feature map and also has several
terms. It can also sometimes be called a Cold feature. So in
your blog and evolution operation operator to something it
doesn't become convoluted it becomes convolved and it has
sometimes like I think to myself in the wrong way but is the
correct term is convolved is a kind of old feature or it can
also be called the activation map. But we're going to be
calling it a feature map in this course so it can be called any
one of those things and what have we done here. Well as
you can see we've reduced the size of the image. That's
number one and that's the important thing I wanted to
mention about your input image and the feature text and
the stride. If you have a stride of one you can see the image
reduced a bit but if you have a right to the image is going to
produce more so the feature is going to be even smaller.
And that's a very important function of the feature detector
of this whole convolution step is to make the image smaller
because that'll be it'll be easier to process it and it'll be just
faster. It will and you'll be just foster because imagine like
here we've got a what a seven by seven image but imagine
if you have a proper photo right. Or if you have like a 256th
on 56 pixel image that's a huge number of pixels, if it is x
squared or like let's say you have a 300 but 300 pixels. So
we don't get confused with the R.G. B 256 has to say like we
have a 300 by 300 image in terms of size and pixels. Then
you have 300 square number of pixels that's a huge number
and therefore feature detectors will reduce the size of the
image and therefore stride of two is actually beneficial. But
then the question is do we lose information. Are we losing
information when we're applying the feature detector. Well
some information we are losing of course because we have
less values and of resulting matrix. But at the same time the
purpose of the feature detector is to detect certain features
certain parts of the image that are integral. And so for
instance if you think about it this way like the feature
detector has a certain pattern on it. The highest number in
your feature map is when that pattern matches up. In fact
the highest number you can get is in an all simplified
example is when the feature is that it matches exactly and
you can see that number four we have in our feature map
that's exactly. So if you look at it here that's exactly where
this feature detector because there's only four ones and it
matched perfectly so you can see this this part over here.
So the feature was detected here. And as we discussed at
the very start of this section that it features is how we see
things is how we recognize it. We don't look at every single
pixel so to speak in what we see on an image or in real life.
We don't look at every single picture we look at features we
look at the nose the hats the the feather the eyes under the
little black marks under the cheetah's eyes to distinguish
between a cheetah and a leopard or the shape of the train.
We don't distinguish between a bullet train and normal train
and so on so we don't look at everything we look at features
and that's what we're preserving and that's what the feature
map helps us preserve.
Actually that's what it allows us to bring forward and get rid
of all of the unnecessary things that even as humans we
don't process so much information going into your eyes that
at any given time like gigabytes of information if you look at
every single dot if not terabytes of information going into
your eyes per second and still we're able to proceed
because we get rid of what is unnecessary only focus on the
important features features that are important to us and
that is exactly what the feature does. So now moving on this
is our input image and you create a feature map so the front
one let's say the front one is the one we just created but
then how come there's many of them. But we create
multiple feature maps because we use different filters.
Right.
And that's another way that we preserve lots of the
information so we don't just have one feature map we look
for certain features and then or basically the network
decides through its training and this is something we'll
discuss towards the end of the section through his training it
decides which features are important for certain types or
certain categories and it looks for them and therefore will
have different filters and we'll talk about filters just now. But
basically I'll apply these filters so to get this feature map it
applied a filter like the one we saw but then to get this
feature Mabbett applied a different filter to get this feature
up apply a different filter and so on. And so basically it just
creates these feature maps. And actually that's why
personally I think the term feature detector is better than
filters. Remember we're here we have this filter which we
also can call a feature detector Well actually the word
feature detector I think is better suited. And the reason for
that is that's what the purpose is right. We don't want to just
filter out our image. But even though that's a whole thing,
it's the same, just a question of terminology. But basically
we want to detect features. All right. In this lair we're going
to our own this feature map we've detected where certain
features are in the image and this feature map we've
detected where certain other features are where a certain
specific feature is located and this feature map will be
detected where a certain other feature is located on the
image. So that's what we're doing. And listen we've got a
couple of examples So here we're using and this is from Gip
dot org. Their documentation it's a free kind of tool like paint
and you can use it to adjust your images or work with your
images. But basically they have some valuable examples in
their documentation and here they have a picture of the Taj
Mahal and you can choose which filter you want to apply. So
if you download this program and you upload a photo into it
and then you can actually start a conversion matrix and
apply filters and you will see that these things are actually
applied in image processing and design and so on. So let's
have a look at what we get and what we get. So if we apply
this filter five in the middle minus one one is one one is one
minus one. You can see that it sharpens the image. And so
this is quite intuitive if you think of it. So 5 is the pixel of the
main pixel like in the middle of the filter or the feature
detector and then minus one minus one one just one just
kind of like reduces the pixels around the a in an intuitive
sense. Then blur. So basically taking equal significance gives
equal significance to all of the pixels being all the one in the
center and therefore it combines them together and you get
a blur edge enhancement. So here you can see that's minus
one and one and then you get zeros right. So you did delete
to remove the pixels around the main one in the middle and
you only keep this one at a minus one and it gives you an
edge and this was a bit harder to understand how it works.
Like probably harder just to think of it intuitively edge
detect. Right so this one probably makes more sense. Right
you take them to the middle one. You reduce the middle
one. They probably like the strength of the middle pixel and
then you look for the ones you look for. These ones you see
increase the strength of the ones around them. So you have
the ones there. Yes that's. That gives you an edge and you
can see which you get there and boss another one. So the
key here is that it's symmetrical and you can see the image
becomes asymmetrical as well so you get that kind of
feeling that it's standing out towards you. And that's what
you get when you have minuses here and plus here again
this is very. This is getting a bit technical now but at least
we can get some kind of intuition and Lissa's go quickly
through them again. So there's sharpen there's blur there's
edgin hands there's an edge detect there's and boss as so
as you can see these are great examples of the same image
but we're getting feature maps. So we use different feature
detectors to get different feature maps of the same image
and therefore now we have lots of the last of these versions
of this image where in each one we've tried to detect
certain things in these terms they're not applicable to us.
Their second boss is probably not applicable to us in terms
of convolutional neural networks but age detect that's
important. We want to detect the edge's edge
enhancement, probably not blur sharpen so certain things
like edgy text. Probably the most important one for our type
of work. And in terms of understanding computers they will
decide for themselves or neural networks will decide for
themselves what's important what's not and it probably
won't be even recognizable to the human eye. You won't be
able to understand what those features mean. But the
computer will decide and that's the beauty of neural
networks that they can process so many different things and
understand without even having that intuition or without
having that explanation why they will understand which
features are important to them whether we have a name for
them or not that that's a whole that's an irrelevant question
for the artificial neural network. And my favorite one. Here's
an image of Geoffrey Hinton from Geoffrey Hinton passing
through one of these filters.
All right so that brings us to the end of Teresa Tauriel. I hope
you enjoyed learning about convolution. The key takeaway
is that convolution the primary purpose of we can evolution
is to find features in your image using the feature detector
put them into a feature map and by having them in a future
map it still preserves the spatial relationships between
pixels which is very important for us to you know because if
they are completely jumbled up then we've we've lost the
pattern. And at the same time it's important to understand
that most of the time the features a neural network will
detect and use to recognize certain images and Klaas's will
mean nothing to humans but nevertheless they work. And
that's what convolution is.
STEP 1(B) - RELU LAYER
Today we're talking about a clue which is rectified in your
units and this is an additional step on top of our convolution
step. So it's not a separate big step, it's a small step in step
one basically. And what is going on here. Well we have our
input image we have all convolutional there which we've
discussed and then on top of that we're going to apply. Wait
for it. Our favorite rectifier function and you're familiar with
the rectifier function from the previous section on artificial
neural networks and in our So sometimes authors or
instructors separate the convolution and direct fire as two
separate steps in our examples which is going to consider
them the just one big step for second evolution then the
rectifier. And the reason why we're applying the rectifier is
because you want to increase non-linearity in our image or
in our network and our commercial neural network and our
fire acts as that filter or access that function which breaks
up and you arity and the reason why we want to increase
nonlinearity in our network is because images themselves
are highly non-linear especially if you're recognizing
different objects next to each other or just on background
and stuff like that like the image is going to have lots of
nonlinear elements and the transition between pixels
adjacent pixels is often would be nonlinear. That's you know
because its borders are different colors.
There's different elements in your images, but at the same
time when we're applying a mathematical operation such as
convolution you know and running this feature detection to
create our feature maps we risk that we might create
something linear and therefore we need to break up the
narrative. So let's have a look at an example: here is an
image and an original image. Now when we apply a feature
detection detector to this image we get something like this.
So you can see here that black is negative white is positive
as well. When you apply a feature detector to a like a proper
image which has not just zeros and ones but has lots of
different values and you apply as we saw previously you
Texas can have negative values in themselves sometimes
you'll get negative values. And here the black ones are
negative white ones are positive. And what a rectified linear
unit function does is it removes all the black rights in
anything below zero it turns into zero. And so from this it
turns into this right. And so it's pretty hard to see what
exactly is the benefit in terms of breaking up linearity. I'll try
to explain. I'll try to show an example on this image but at
the end of the day it's a very mathematical concept and
would have to go into a lot of math to really explain what is
going on. But let's try, let's have a look. So for instance let's
look at this. This building here. Right. So this is a building on
its own. Then you can see this shadow. This black part, this
shadow over here, well you see that it's white the reflection
of the light and then it's gray and then it gets darker and
then it gets darker again. So when we take it out we take
out that black spot. So think of it in terms of linearity right.
So it looks like when you go from white to gray the next step
would be black. Right next up would be black . It's a linear
progression from bright to dark and therefore this is kind of
like a linear situation. When you take out the black you
break up the linearity. Let's try another one. Let's have a
look here.
And at the same time it's still that same building right. It's
not that it's not like you or you're like it's not like you're
blending two buildings into each other but that is secondary.
The main point is breaking up the linearity. So let's have a
look at the same thing here. So you see white gray black
gray white. And when you break it up you don't have that
anymore right. You don't have that progression, the gradual
progression that you just have is like an abrupt change. And
that helps introduce non-linearity into your image. So it's a
very rough explanation, very kind of like an on or on the
fingers explanation rather than technical but hopefully it
kind of helps you understand a bit better what we're talking
about here. So here again you can see white gray is a better
example even to bright darker darker darker and darker
darker. So this part looks like it's thinner than when you
break it up like that again so this is a very rough
explanation. It's not absolutely perfect but at least it gives
you some idea of what's going on. But if you'd like to learn
more there's a good paper as always there's always a paper.
This one is by CCJ corps from the University of California and
it's called Understanding convolutional neural networks who
have a mathematical model. And basically they're He
answers to questions and you need to just look at the first
one. And the question is why is not a nonlinear activation
function essential at the filter output of all intermediate
layers. So that kind of explains it in a bit more detail both in
terms of intuition and mostly in terms of mathematics. So
that's an interesting paper where you can get some more
additional information on this topic. And if you really want to
dig in and explore some cool stuff here. Then there's
another paper that you might be interested in. It's called
delving deeper into rectifiers surpassing human level
performance on image and that classification. And here the
author is combing hair and others from Microsoft Research.
They propose a different type of rectified linear unit
function. They propose the parametric rectified function
which you see here on the right. And they argue that it
delivers better results without sacrificing performance. So
interesting to read if you'd like to get a bit more into this
topic. And that's all for today. Really you layer is pretty
simple so for adjusting just applying the rectifier function
and I look forward to seeing you next time.
STEP 2 – POOLING
Today we're talking about Max pooling and we've got some
very exciting slides coming up ahead. And even a special
surprise at the very end of the project. So let's get started.
The first question is what is pooling and why do we need it.
Well to answer that question let's have a look at these
images. We've got a cheetah. In fact it is the same exact
cheetah in the first image. That image is positioned properly
and that she is looking straight at you on the second image.
It's a bit rotated. And the third image is a bit squashed. And
the thing here is that we want the neural network to be able
to recognize the cheetah in every single one of these
images. In fact this is just one cheetah. What if we have lots
of different shooters? Here's a cheetah. He is a cheetah.
Here's another cheetah his Ashira his Ishida cheetah and he
a cheetah and we want the neural network to recognize all
of these shooters as cheaters and how can it do that if
they're all looking in different directions they're all in
different parts of the image they're like their faces are
positioned in different parts of the image somebody is on
the right hand side somebody in the left corner or somebody
in the middle. They're all a bit different and the texture is a
little bit different. The lighting is a bit different. There's lots
of little differences and so if the neural network looks for
exactly a certain feature for instance a distinctive feature of
the cheetah is the tears that are on its face going from the
eyes or the sheer The Shadow shadows that look like tears
the texture of the pattern that is going from its eyes down
it's on the sides of its nose and looks like tears that's a
distinctive feature of the Cheetah. But if it's looking for that
feature which it learned from certain cheetahs in an exact
location or an exact shape or form or texture it will never
find these other shooters. So we have to make sure that our
neural network has a property called spatial invariance
meaning that it doesn't care where the features are again
not not so much as itch which part of the image because
we're we've kind of taken that into consideration with our
map we are poor with our convolutional there but it doesn't
have to care if the features are a bit tilted if the features are
a bit different in texture if the features are a bit closer of
features or a bit further apart relative to relative to each
other. So if the feature itself is a bit distorted our neural
network has to have some level of flexibility to be able to
still find that feature. And that is what pooling is all about.
So let's have a look at how pooling works. Here's our feature
map so we've already done our convolution and we've
completed that part and now we're working with the
convolutional there. Now we're going to apply pooling. So
how does it work? We're going to be applying back pooling.
There's several different types of play complies mean
pooling Max pooling some pooling and will comment on
those towards the end of the story. But for now we're just
applying Max pooling so we take a box of two by two pixels
like that and again it doesn't have to be two by two you can
choose any size of box and again will comment on that
towards and is Tauriel and you place it in the top left hand
corner and you find the maximum value in that box and
then you record only that value and you disregard the other
three. So in your box you have four values you just
disregard three you only keep one the maximum which is
one in this case. Then you move your box to the right by
stride and select the stride once again. So here we slide to
the stride of two and that's what you normally psyched you
can say like the stride of one you can select. So there are
overlapping boxes. You can select any kind of strike that you
like, even three if you want but we're selecting a stride of
two here and that's what is commonly used. And then you
repeat the process you record that maxim here if you cross
over and it doesn't matter you just keep continuing doing
what you're doing. So you still record the maximum here 0
here the maximum is four. Here are the maximums to here
the maximum is 1 0 1 or 2 and then 1. So as you can see a
few things happened. First of all we still were able to
preserve the features right. The maximum numbers they
represent because we know how the conclusion Lehre
works. We know that the maximal or the large numbers in
your feature map represent where you actually found the
closest similarity to a feature. But by then pooling these
features we are first of all getting rid of 75 percent of the
information that is not the feature which is not the
important thing that we're looking out for because we're just
really three pixels out of four. So we're only getting 25
percent. And then also because we are taking the maximum
of the pixels that we or the values that we have we are
therefore accounting for any distortion. So for instance two
images in which for example the cheater's tears on the eyes
are in one image there a bit to the left or a bit rotated to the
left and another one there a bit. And are how they're
supposed to be or how we like if you take one as the bases
and another one there are bits that rotate to the left. The
pull feature will be exactly the same. So you can see here if
we are talking about the cheater's tears then let's say this is
the four and this is where it was here then if it was a bit
rotated. So for instance the four ended up over here. Then
when we are doing the pooling we're still going to get the
same pool feature map and that's kind of the principle
behind it. It's a very rough explanation against intuitive
explanation but that's the point of pooling that we're still
able to preserve the features and moreover account for
their possible spatial or textural or other kind of distortions.
And in addition to all of that we are reducing the size so
there's another benefit. So we've got we're preserve the
features we're introducing spatial invariants we're reducing
the size by 75 percent which is huge which is really going to
help us in terms of processing. And moreover another
benefit of pooling is we are reducing the number of
parameters so we're reducing again by 75 percent or
reducing number of parameters that are going to go into our
final Lares of the neural network and therefore we're
preventing overfitting.
It is a very important benefit of pooling that we're removing
information and that is a good thing. That is a good thing
because that way our model won't be able to over fit onto
that information because especially because that
information is not well and remember like at the very start
we're talking about even for human as humans it's
important to see exactly the features rather than all this
other noise that is coming into our eyes. Well same thing for
neural networks they by disregarding the unnecessary non-
important formation we're helping with preventing of
overfitting. So there we go that is what pooling is about. And
the question here is of course why WiMax pooling right
there's lots of different types of pooling and a wide wide
stride of too wide a size of two by two pixels lots of all these
things. And on that note I'd like to introduce you to this
lovely research paper called evaluation of pooling
operations in convolutional architectures for object
recognition by Dominic Scherrer from University of Bonn.
There is the link and the beauty about this paper is that it's
very very simple very straightforward So if you've never
read a research paper before what you'd like to give it a go.
This is a great place to start it's very short. Only 10 pages.
Very easy to read. And plus the extra benefit is that now
that we've discussed convolution and pooling you will be
totally comfortable with everything that they're talking
about in this paper in you. This is a great way to actually
reinforce and also I highly recommend checking this paper
out. I'll take 20 minutes to read it and you can even skip
part 2 which is called related work if it feels a bit far-fetched
or alienating. Just don't read that part. Go straight to from
part 1 to part 3.
And one thing that you do need to know about this paper is
that they talk about a concept called subsampling which is
basically average pooling. So remember how we were
talking. We're taking the maximum. So in our squarer taking
the maximum value there's a concept called Mean pooling
or some pulling some pulling as you just some of these
values up average pooling or mean pooling you take the
average value out of all of these and subsampling is kind of
like a generalization of men pooling. It's a more kind of
generalized approach to taking the average of these values.
And you can read a bit more about it in the paper but
otherwise just think of it as average pooling when you're
reading a paper. And so that's where you can get some
additional information on this topic and now kind of let's
recap where we have gotten to. So here is our input image.
Then we applied the convolution operation and we got the
conclusion. And now to each of those feature maps that we
get We've applied the Pullinger. So we've got these two
steps: evolution and pooling and now we're going to do
something very fun and exciting. We're going to experiment
with this so this is a screenshot I took from a tool created by
Adam Harley from way back when he was at Ryerson
University of computer science and now he's at Carnegie
Mellon I think doing his page. And a great tool so let's open
up let's have a look. So you can find it through Google. You
have to know your role. It's as if it's just hard to find it
through Google because there's no text here as we were
just this year. I'll see the start of the Reierson dossier and
this stuff. And basically this is exactly what we're doing but
visualize So here you need to draw a number so say I draw
number four and this tool will put the number four here.
That's your image. In our first step then this is the
convolution step. Right. And this is the pooling step and also
pooling by the way is also called downsampling.
So pulling and downsampling are the same things. So you
can see it's applied convolution then it's applied pooling and
you can see how it exactly works. You can see what kind of
convolutions that it has applied or what kind of filters it is
applied to and what they look like. What features are you
looking for? And then it's applied pooling so it's reducing the
size and you can see here that this is important. So you can
see that this is the convolved image and this is the pooled
image and you can still see the same features with just less
information but the same features are preserved. That's the
important part. And moreover if you know if all four were a
bit too kind of like rotated a bit to the side it would still be
able to pick up very similar pool Lares. And then after that
it's got more letters. We haven't talked about that yet. So
then he's got another convolutional lair here which we
actually won't have. And then he has another poor lair but
he's basically just repeating that same process. And then
after that this is what we're going to be talking further down
in the course. He's got the fully connected Lares and so on.
But you can definitely play around with that. So if I delete
that you like if I draw a 7 you will see that it actually tells
you that the guess is a guess is that this is a 7. And the
second guess the second likelihood is three. So you can
draw some challenging things and see if it can pick them
up. So let's say if I draw something that looks like a 0 but it's
not a finished 0 it will pick it up this time didn't pick it up.
Looks like a 9 to that to the image. What if I kind of like to
finish it like that. So now it thinks it's a 0 or a 9 and you can
see over there what's lighting up the 0. But we'll talk about
that part for the doubt. Do one more let's say like 8. I think
it's pretty hard for this now to pick up an 8. So you can see
that goes into an 8 and then like after that it stops being
recognizable and stops making sense to us humans. Right.
These features that it's working with. But at the same time
it is correctly recognizing that it's an 8. So definitely play
around with that you can draw a smiley face. What happens
then. Looks like a three to this to this tool because the tool
is obviously trained up only on digits from 0 to nine. So it
has to recognize something there are those and recognizes
a three it's like in life when you when you see something like
a type of fruit that you've never seen before like a custard
apple or something and you think that's it's like it's it's a
pear because you've never actually seen one before you
don't know what to classify it as same thing here so it hasn't
actually trained on smiley faces and that's why it's thinks
it's a tree as a tree. So there you go it's a very powerful tool
it'll be helpful for you to play around with it actually when
you put your mouse over a pixel that will show. It shows you
where the feature detector was to pick up that pixel so you
can see where those pixels are coming from and also so you
can see how the filter was kind of like going through the
image exactly how we talked about and of course and here
you can see you can see the pooling you can see that the
pulling is done with the pulling is done with a little square
size of two by two and you can see that it's a stride of two
as well just as we discussed in today's project.
STEP 3 – FLATTENING
I hope you're tracking along with these intuition projects just
fine and that you had a chance to play around with
everything we've learned so far and today we're talking
about flattening and the good news is that this is a very
simple step and this project is going to be very quick and
then we'll be able to move on to the next interesting things.
All right so far we've got the Lehre pulled feature map and
that is after we apply the convolution operation to our
image and then we apply pooling to the result of the
collision with the involved image. And so what are we going
to do with this pooled feature map? Well we're going to take
it and we're going to flatten it into a column. So basically
just take the numbers row by row and put them into this one
long column.
And the reason for that is because we want to later input
this into an artificial neural network for further processing.
So this is what it looks like when you have many pooling
layers or you have the pulling levers with many pulled
feature maps and then you flatten them so you put them
into this one long column sequentially one after the other
and you get one huge vector of inputs for an artificial neural
network. And so to sum all this up we've got an input image.
We apply a convolutional system there and let's not forget
the reals or rectified linear units function that we apply after
the revolution there as well. And then we apply pooling and
then we flatten everything into a long vector which will be
our input layer for an artificial neural network and exactly
how that works we'll find out in the next project.
STEP 4 - FULL
CONNECTION
Today we're finally at STEPNELL before full connection. So
what is this step all about? Well in this step we're adding a
whole artificial neural network to our convolutional neural
network so to all of the things that we've done so far which
are convolution pooling and flattening. Now we're adding a
whole new thing. And then on the back of that how intense
is that. That is just something that is definitely something.
And so here we've got the input layer we've got a fully
connected plan. I'll put there and by the way the fully
connected Lehre in the artificial neural networks we used to
call them hidden layers and here we're calling them fully
connected because they are hidden lairs but at the same
time they're a more specific type of fiddlers that are fully
connected in artificial neural networks hidden letters don't
have to be fully connected. Whereas in convolutional neural
networks we're going to be using fully connected letters and
that's why they're generally called fully connected Lares.
And so basically that whole column or vector of outputs that
we have after the flattening we're passing it into the input
learned here we've got a very simplified example just for
illustration purposes. And what the main purpose of the
artificial neural network is is to combine our features into
more attributes that predict the Klaas's even better. So we
already in our vector of outputs in the Flaten of the
flattened result from what we've really done we have some
features encoded in the numbers in that vector and they
can already do probably a pretty good job at predicting
what's what's Clauss we're looking at whether it's a dog or a
cat or whether it's a tumor or not a tumor and so on. But at
the same time we know that we have this structure called
artificial neural network which is designed which which has
a purpose of dealing with attributes and coming out or
dealing with features and coming up with new attributes
and combining attributes together to even better predict
things that we're trying to predict and we know that from
the previous parts so why not leverage that. And that's
exactly what the plan here is. So how about we pass on
those values into an artificial neural network and let it even
further optimize everything that we're doing. And so that's
what we're going to be doing. But let's look at a more
realistic example because this one is a bit too simple. So
here we've got a better looking artificial neural network
where we have five attributes on the inputs that we have in
the first unless we have six neurons in the second or in the
second fully connected Larry have eight neurons and then
we have two outputs one for dog and one for cat. And so an
important thing to talk of for us to talk about here is that
why do we have two outputs. We're kind of used to having
only one output in our artificial neural networks Well one
output is for kind of when you're predicting a numerical
value when you print when you're running a regression type
of problem. But when you're doing classification you need
an output Proclus except for the exception is when you have
just two clusters like we have two classes here dog and cat
and we could have just done one output and made it a
binary output and said One is a dog and zeros a cat and that
would have worked totally fine and actually what you'll see
had lunch do that in the practical projects and that's how
they'll be structured. But at the same time if you have more
than two categories for instance dogs, cats and birds then
you have to have a neuron per every category and that's
why we're going to practice with two categories in this
example so that we know what to expect if we ever have
more than two categories. And so what's going to be
happening here. So we've already done all the groundwork,
we've done the convolution, we've done the pooling and the
flattening and now the information is going to go through
the artificial neural network so let's have a look at how the
other all happens. There is information going through from
the very start from the moment when the image is
processed and kind of convoluted , convolved then pulled
flattened and then through the artificial neural network all
four steps and then a prediction is made and we'll see how
this happens in a moment will be very very interesting. But
for now let's just say a prediction is made. And for instance
80 percent that it's a dog. But it turns out to be a cat and
then an error is calculated. A. Well what we used to call
accosts cost function in an artificial neural network and we
used mean square error there or in-common illusional neural
networks. It's called a loss function and we use a cross
entropy function for that. And we'll talk about cross entropy
and mean squared errors. In a separate project and how all
that happens. But for knowledge you say we have a lost
type of function which tells us how well our network is
performing and we're trying to optimize it or minimize that
function to optimize our network. So the error is calculated
and then it's back propagated through the network just like
we had in artificial neural networks is back propagated and
the some things are adjusted in the network to help
optimize the performance and the things that are adjusted
are as usual the weights in the artificial neural network are
part of those the blue lines that you see here the Cynapsus.
Then also another thing that is adjusted is the feature
detectors so we know that we're looking for features but
what if we're looking for the wrong features. What if this
didn't work out because the features are incorrect and so
the feature detectors remember those little matrices that
we had. That's the three by three matrices. They are
adjusted so that maybe next time it'll be better and let's see
what happens. Type of thing. And of course it's all done with
a lot of science in the background of a lot of math and it's all
done through a gradient gradient descent of back
propagation. So it's all it's all not just random perturbations
it's actually very thought through how it's done. But
nevertheless the feature detectors are adjusted, the weights
are adjusted and this whole process happens again and then
again the errors back propagate. And this keeps going on
and on and on. And that's how our network is optimized,
that's how our network trains on the data. So the important
thing here is that the data goes through the whole area from
the very start to the very end. Then the error is compared
so the error is calculated and then is back propagated. So
the same story as with artificial neural networks is just a bit
longer because of that whole for the first three steps that
we already had. And now let's have a look at the interesting
part, the really interesting part: how do these two classes
work? Or how do these two output neurons work because
before we've always kind of had one output neuron, what
happens when we have two. How does this situation of
classification or images play out? Well let's start with the top
neuron first going to start with the dog. The main purpose of
what we need to do first is to understand what weights to
assign to all of these syllabuses that connect to the dog so
that we know which of the previous neurons are actually
important for the dog and let's see how that is done. So let's
say hypothetically we've got these numbers in our previous
layer of previous fully connected. In the final fully connected
layer. And again these numbers can be absolutely anything.
They don't have to be that they can be any numbers but
just for argument's sake we're going to agree that we are
looking specifically at numbers between 0 and 1. So it's
easier for us to argue these things and understand and one
means that that neuron was very confident that it found this
feature and zero is going to mean that that neuron didn't
find a feature is looking for so because at the end of the day
these neurons like anything else on this from on this left
side is just looking at features at an image. This is already a
very very process. But still it's detecting a certain feature or
a combination of features on the image right before we can
evolve step. We had kind of recognizable features in the
pool set; they're less recognizable than they become even
less recognizable in the flattened image. And then they get
combined and so on. But nevertheless this we're talking
about here certain features that are present in the image or
their combination. So one which has been passed and this is
important has been passed to both the dog and the cat at
the same time to both the output neurons. So one means
that for us for our argument it means that this neuron is
firing up. It's really rapidly detecting that feature that you
know might be an eyebrow. It might be detecting this
eyebrow for simplicity sake. And it communicates that to
the dog running to the cat neuron saying I can see my
eyebrow I can see my eyebrow. And then it's up to the dog
and the cat neuron to understand what that means for
them. Right. And so in this case which neurons are firing up
these three neurons are firing up the eyebrow and that says
the nose is saying I can see I can see a big nose and I can
see floppy ears. So it and it's saying that to the dog and to
the cat and then what the dog. And then what happens is
we know that this is a dog. So the dog neuron knows that
the answer is it is actually a dog because at the end we're
comparing to the picture or to the label on the picture and
when another dog. So basically the dog neuron is going to
say Aha. So I should be triggered in this case. So these are
neurons they're telling this signal that they're sending to
both me to the dog and the cat is actually an indication for
me that it is a dog. And throughout these lots and lots and
lots of iterations of this happens many times the dog will
learn that these neurons do indeed fire up when the feature
belongs to a dog. On the other hand the cat neuron will
know that it's not a cat and it will know that this feature is
firing up and this neuron is telling me it can see floppy ears
floppy ears ears. But at the same time it's not a cat. So
basically to me that's a signal that I should ignore this
neuron like and the more that happens the more the cat
neuron is going to ignore this neuron about the floppy ears.
And so basically that's how through lots and lots of
iterations if this happens often. So this is just one example
but if this happens often maybe a one maybe 0.8 0.9 maybe
sometimes it won't fire but overall on average this neuron is
lighting up very often when it is indeed a dog the dog
neuron will start attributing higher importance to this
neuron. And so there we go. That's how we're going to
signify it. We're going to say that these three neurons
through this iterative process with me with many many
many many samples and many many a remember so a
sample is a row in your data set and Apoc is when you go
through your whole dataset again and again and again there
are lots and lots of iterations. This dog neuron learned that
this eyebrowed neuron and this big nose neuron and this
floppy ear neuron all seem to really contribute very well to
the classification of what it's looking for and which is a dog.
So that's how it works. And again these ears and nose and
eyebrows those are very very approximate or like very far
fetched examples because by this stage in this whole
convolution conventional neural network it is completely
unrecognizable what they're looking for but at the same
time it is something in the features of dogs or cats or
whatever you classify it. And then let's move on to the next.
Now we're going to look at the cat neuron but these We're
going to remember that these weights are you know how
we've sorted out the dog. So the dog is kind of like pretty
much ignoring all these other neurons one, two , three , four
or five but it's really paying attention to what these three
neurons are saying. Now what is the cat's listening to? Well
whenever it is actually a cat. Right. This is an example of a
situation when it's actually a cat. So you'll see that these
three neurons are 0.9 0.9 and one they're saying something
they're saying something to both the dog and the cat. And
this is again important remember so this output signal goes
both ways it's the same right. It's saying one to the dog
saying to the cat but then it's up to the dog to decide
whether to take into account that signal and learn from it or
not. And both the dog and the cat can see that this is a
photo. I should have put a photo of a cat here but basically
imagine a photo of a cat. Both a dog and the cat can see
that this is actually a cat. So basically the dog is like OK so
these whiskers and these pointy triangle ears and this small
size yes or or maybe this type you know how cats have
these things in their eyes their eyes are like little They're
not circles or lines or something like cat eyes. Basically
these cat eyes they're definitely not working for me. They're
not helping me I'll predict because every time these neurons
light up the prediction is not what I'm looking for. On the
other hand the cat is like hmm that's interesting. Every time
this one lights up it's more than most of the time it lights up.
It matches my expectations and it matches what I'm looking
for. OK. I'm going to listen to this guy more than this one.
This one same thing every time it lights up or most of the
times it lights up. I happened to get well . I happened to be
rewarded for my prediction because I got it right. It's a cat.
OK. I'm going to listen to him more. You know this one is
useless to me because he's not actually you know like he's
he's not even lighting up it's a cat but it's he's not lighting
up so the opposite is happening. And this one is well it's a
cad but he's not letting up so I'm not gonna listen to him.
But this one when he went what was this the eyes the cat
eyes light up we can see I can see that it's a cat. It matches
most of the time so I'm going to learn from that and I'm
going to listen to these three guys more often than not. And
so basically the cat is listening to these three and it's
ignoring the other five and that is how these final neurons
learn which neurons in the final fully connected Lehre to
listen to the output neurons learn which of the fully which
are the final fully connected. There are neurons to listen to.
And that's how they understand. Basically that's how the
features are propagated through the network and conveyed
to the output. And so even though these features of course
don't have that much meaning to them like floppy ears or
whiskers. At the same time they do have some distinctive
they are a distinctive feature of that specific class and that's
how the network is trained because we also during
remember during the back propagation process we also
adjust the feature detectors so if a feature is useless to the
output it's going to it's going to probably be disregarded
because this doesn't happen to one or two stories it
happens through thousands and thousands of iterations. So
with time a feature that is useless to the network is going to
be disregarded and replaced with feature is useful and so at
the end of the day in this final layer of neurons you are
likely to have lots of features or combinations of features
from the image that are indeed representative or descriptive
of dogs and cats. And so then once your network is trained
up then this is how it's applied. So this is the next step like
we were trained up our network will this happen. Let's see
what happens when this network is applied. So let's say we
pass on an image of a dog. The values are propagated
through a network and we get certain values. And so this
time the dog and the cat neurons don't know they don't
have the image of the dog here they don't know that it's a
dog or a cat. They have no idea what it is but they have
learned to listen to what is being shown here. Right. They
have learned to listen to dogs and listen to these three
neurons. A cat neuron listens to these three. And so the dog
neuron looks at one two three and says aha these are pretty
high. So my probability is going to be high. That is a dog the
cat neuron looks at these three and says OK these ones are
pretty high but these are pretty low. Interesting. So my
probability is going to be 0.05. And then and that's. And
that's where you get your prediction. So then your first
choice for this neural network is a dog. Second choice is a
cat and that's pretty much it. So the answer is dog and the
same thing happens when you pass an image of a cat. You
get new values and you can see that even though this one's
high these ones are low. And for the cat This was high this
was high and this one's a bit low. So the probability here
might not be as great as previously but still you can see that
it's a cat of 79 percent. And so therefore the neural network
is going to vote that it's a cat. And so basically all the neural
networks are going to conclude that it's a cat. Voting is a
term that is used for these guys so these neurons in the
final fully connected Lehre they get to vote. And these are
their votes. And again we are just for argument's sake
putting values between 0 and 1 here. These could be any
values but they get to vote and then these weights are the
importance of their vote.
So these purple weights are how the dog neuron views their
votes. How much importance is it assigned to these neurons
and those votes? And this is how much importance the cat's
neuron up size to these votes the votes of these neurons
and so these neurons vote the dog and the cat based on
their learned the weights they decide who to listen to and
then they make their predictions and then hold neural
network concludes that this is in this case a cat and then
that's And then that's your conclusion and that's how you
get images like this where you have a cheetah and then you
have a cheetah claws who you know like a high high
probability So this is you know the probability that the
network has predicted. And these are laws but these still
exist because they're still kind of like a small chance the
other neurons are also listening to their voters and they're
saying oh maybe it's actually a leopard and a bullet train.
Very very probable. I hear scissors. You know this one one
but hand-glass was a very close second and in five pence
stethoscope because you could see like this guy this neuron
the scissors neuron the output series neuron listened to its
voters and it had the predominant vote overall. But then the
hand-glass had a good outcome as well. So there we go.
That's how the full connection works and how this all plays
out together. I hope you enjoyed today's project.
SUMMARY
So we've learned quite a lot in this section of the course.
Let's summarize what we've talked about. All right so here
we go. We started with an input image to which we applied
multiple different feature detectors or also called filters to
create these feature maps. And this comprises our
convolutional lair. Then on top of that crucial Lehre we
applied the reglue or rectified linear unit to remove any
clarity or increase non-linearity in our images. Then we
applied a pooling lair to our convolutional lair. So from every
single feature map we created a pulled feature map and
basically the pulling Lehre has lots of advantages.
The main purpose of the pulling Lair is to make sure that we
have special spatial invariants in our images. So basically if
something tilts or twists or is a bit different to the ideal
scenario then we can still pick up that feature plus pulling
significantly reduces the size of our images. Also pooling
helps with avoiding any kind of overfitting of our data or
overall model to the data because it just simply gets rid of a
lot of that data. But at the same time pooling preserves the
main features that we're after just because the way
instruction and the pooling were used was Max pooling.
Then we flattened all of the pooled images into one long
vector or column of all of these values and we put that into
an artificial neural network and that was step flattening.
Step four is a fully connected artificial neural network where
all of these features are processed through a network and
then we have this final Lehre final fully connected layer
which performs the voting towards the classes that we're
after and then all of this is trained through a forward
propagation and backpropagation process. Lots of lots of
iterations and at parks and in the end we have a very well
defined neural network. And. Another important thing is not
only the weights are trained in artificial neurons work but
also the feature detectors are trained and adjusted in that
same ingredient descent process and that allows us to come
up with the best feature maps. And in the end we get a fully
trained convolutional neural network which can recognize
images and classify them. So there we go. That's how
convolutional neural networks work. And now you should be
totally comfortable with this concept and ready to proceed
to the practical applications. If you'd like to do some
additional reading then there's a great blog by L.D. to
disband from 2016.
You can see the link up there at the bottom. So the blog is
called The Nine deep learning papers you need to know
about understanding CNN's part three. And this blog
actually gives you a short overview of nine different CNN's
that have been created by people like you and licken and
others which you can then go ahead and study further. So
there will be a lot of new things that will be totally new to
you and that you will have to get your head around but just
keep this blog in mind are these nine papers in mind and
even if you're not ready to go through them right now
maybe after the practical projects maybe after you do some
additional training in the space of deep learning slowly you
can then reference these works and ideally I think you will
get a lot of value through looking through other people's
neural networks and how they structured. There can be
illusional nets and that will help you understand what are
the best practices and why people did certain things in a
certain way and that will help you with your architecture of
neural networks because neural networks and convolutional
neural networks are not an exception. They are like an
architecture challenge where you have to come up with an
idea and then structure and then adjust it and tweak it to
get the best possible design and the best possible and
optimal performance.
SOFTMAX CROSS-
ENTROPY
This is an additional project to talk about the soft and cross
entropy functions. It is not 100 percent necessary in order
for you to go through all of the parts that we've been
through in the main part of this section where we're talking
about the convolutional neural networks but at the same
time I thought it would be a good addition to your bag of
knowledge and skill set. So let's go ahead and dig into these
functions. So to start off with what we have here is the
conclusion of a neural network that we built in the main part
of the section and then at the end it pops out some
probabilities for zero point ninety five for a dog 0.05 five or
5 percent for a cat. Given that photo on the left as an input
This is after the train has been conducted this is actually it's
running and it's classifying a certain image. And so the
question here is how come these two values add up to one.
Because as far as we know from everything I learned about
artificial neural networks there is nothing to say that these
two final neurons are connected between each other. So
how would they know what the value of the hold each one
of them know what the value of the other one is. And how
would they know to add their values up to one. Well the
answer is they wouldn't. In the classic version of our
artificial neural network, the only way that they do is
because we introduce a special function called the soft max
function in order to help us out of the situation. So normally
what would happen is the dog and the cat neurons would
have any kind of real values that they don't have to be, they
don't have to add up to one. But then we would apply the
soft max function which is written up over there at the top
and that would bring these values to be between 0 and 1
and it would make them add up to 1 and 3 PPTA. The soft
max function or the normalized exponential function is a
generalization of the logistic function that quote unquote
squash has a k dimensional vector of arbitrary real values to
a k dimensional vector of real values in the range of zero to
one that add up to 1. So basically it does exactly what we
want. It brings these values to be between 0 and 1 and
makes sure that they add up to 1. And the way it works is
that the way that this is possible is that because at the
bottom we're here you can see that there is a summation.
So it takes the exponent and puts it in the power of Zed and
adds it up so that one's a two across all of your classes. All
of these values. And so there 's your normalization
happening right there. So that's how the Saucebox function
works and it makes sense to introduce the soft next function
into convolutional neural networks because how strange
would it be if you had a possible class of a dog and a cat
and for the dog class you had a possibility of 80 percent.
And for the cat claws you had a good 45 percent right. It
just doesn't make sense like that and therefore it's much
better when you introduce the soft next function and that's
what you will find happening most of the time in
convolutional and neural networks.
Now the other thing is that the soft max function comes
hand-in-hand with something called the Cross entropy
function and it's a very handy thing for us. So let's first look
at the formula. This is what the cross entry function looks
like. We're actually going to be using a different calculation
going to be using this representation of the century but the
results are basically the same. This is just easier to
calculate. And what I know this might sound very unrelated
to anything right now just formulas on your screen but there
will be some additional re recommended reading at the end
of this section so don't worry if you're not picking up on the
math. Like even if we haven't explained the math right now.
But the point here is that what is across entropy well across
entropy function. Remember how previously in artificial
neural networks we had a function called the mean squared
arrow function which we used as the cost function for
assessing our natural performance. And our goal was to
minimize the MSE in order to optimize our network
performance. Well that was our cost function then there and
in convolutional neural networks we can still use MSE but a
better option in convolutional neural networks after you
apply the soft max function turns out to be the cross
entropy function. And in convolutional neural networks
when you apply the cross entry functions not cost called the
cost function anymore is called the last function and they
are very similar. They're just a little terminological
differences and like little bit different and on what they
mean. But for all purposes it's pretty much the same thing.
And what happens is the last function is again something
that we want to minimize in order to maximize the
performance of our network. So let's have a look at a quick
example on how this function can be applied. So let's say
we put an image of a dog into our network. The predicted
value for the dog is 0.9 and this is doing the training so we
know that we know the label that is a dog. So the predictive
value 0.9 the prigged value for cat is 0.1 then here we have
the label so we know its a dog because this is training 0 1
for dogs or for cat. And so in this case you need to plug
these numbers into your formula for the cross entropy. So
how you do it is the values on the left going to the verbal
cue. The one that is under the logarithm in the on the right
side and the values from the right would go into P and so it's
important to remember which one goes there because if you
get them wrong you don't want to be taking a logarithm for
all me from zero value and or going from 1. So you just want
to plug them in. Make sure you plug them into the correct
places. And then you basically add that up. So that's how
the cross entry works and we'll look at it actually right now
we're going to look at a specific step by step example of
applying this function in real life and I'll kind of make more
sense what Cross entropy is and it'll be less like that. My
goal in this toil is to make you more comfortable with cross
centuries because it can sound very convoluted and no pun
intended it can. Like convolutional neural networks it can
sound very complex and scary but it's not. That's that's the
point. So let's go ahead and apply it just so we know that it's
not scary. So here's your all that. And also this will explain
why we're doing this, why we're looking into different cause
functions. So a neural network, one neural network, let's say
we have two neural networks and then we pass an image of
a dog and we know that this is a dog and not a cat. And
then we have another image of our cat, this time an animal
and it's a cat not a dog and here we have a we are looking
at a hole which is in fact a dog not a cat. If you look very
closely, we want to see what our neural networks will
predict in the first case. Neural network 1 90 percent dog 10
percent cat correct no network number to 60 percent dog 40
percent cat still correct worse. But correct. Second option
first neural network 10 percent cat dog 90 percent cat.
Correct. You know that number to 30 percent dog 70
percent cat worse but still correct. And then finally the
neural network in image year old network won 40 percent
dog 60 percent cat incorrect neural network number to 10
percent dog and 90 percent cat incorrect and worse.
So the key here is that even though both net folks got it
wrong in the last one through all three images, one was
outperforming the neural network. So even in the last case
it was very good. It gave dogs like a 40 percent chance as
opposed to neural networks to only give dogs a 10 percent
chance or neural network one is outperforming across the
board when compared to neural network 2. And so now
we're going to look at the functions that they can measure
performance that we've kind of talked about the rating. So
let's put these into a table so there's neural network 1 you
have the wrong number. So that's the image number. And
then for image one you have. It predicted 90 percent of dog
chimps and cats. So there's the hat Marable's and then you
have the actual value so the dog corrects the cat incorrectly.
Same thing for image number two and same thing for a
minimum of three and same for neural network number two.
So Dog 60 percent kept 40 percent in the first image. That's
what it predicted: crotons were dogs, not cats. And so on.
And so now let's see what errors we can actually get. So
what errors we can calculate to estimate the performance
and monitor the performance of our networks. So one type
of error is called the classification error. And that is basically
just asking if you got it right or not. Regardless of the
probabilities it just DID YOU GET IT RIGHT. Or did you get it
right. So in both cases for both neural networks each of
them they got one. So this is how you go wrong. So they got
one out of three wrong. So 33 percent error rate for your
network 1 and 30 percent error rate for neural network. As a
baseline from this standpoint both neural networks perform
at the same level but we know that's not true. We know that
the neural network Ikhwan is outperforming neural
networks. That's why a classification error is not a good
measure especially for the purposes of back propagation
meaning square error differently and by the way I did these
calculations in Excel I just didn't want to bore you with them
but you can Tony just sit down and do them on a paper or in
Excel. These are very straightforward calculations just
basically take the sum of squared errors and then just take
the average across your observations and that's pretty
much it. So for the neural network one gets 25 percent for
neural network 2 you get 71 percent error rates so as you
can see this one is more accurate. It's telling us that nearly
one has a much lower error rate than your own network.
And then cross entropy again. We've seen the formula you
can also calculate this is actually even easier to calculate
than the mean square error Cross area across entropy gives
you 38 percent for neural network 1 and 1.0 6 for neural
network 2. So you can see the results are a bit different.
When you look at them like that when you look at you know
the miniskirt area and cross entropy and the question of
why would you use cross entropy over means squared error
isn't just about the kind of like the numbers that they say
but all these calculations were just to show you that this is
all it's all doable you can just do it on a paper it's it's not. It
is not very intense mathematics. These are pretty pretty
simple straightforward things. But the question of why
would you use means cause entropy over means there is a
very very good question to ask. I'm glad you asked that the
answer to that is that there's several advantages of cross
entropy over mean squared error which are not obvious.
And so I'll mention a couple but other then I'll I'll let you
know where you can find out more. So one of them is that if
you for instance you're at the very start of your back
propagation your output value is very very very very tiny
very tiny. So it's much smaller than the actual value that
you want. Then at the very start the gradient in your great
and decent world will be very very low and you won't be
enough. It will be very hard for the neural network to
actually start doing something and start moving around and
start adjusting those weights and start Movistar actually
moving in the right direction. Whereas when you use
something like the cross entropy because it's got that
logarithm in it it actually helps the network assess even a
small area like that and do something about it. Here's how
to think about it. So let's say again this is very in and in a
very intuitive approach.
There's going to be a link to the mathematics and you can
derive these things through the mathematics in more detail
but with a very intuitive approach. Let's say you like the
outcome that you want. It is one and right now you are at
one one millionth of one. Right. $0.00 or is there one and
then you improve next time you improve your outcome from
from one millionth to one thousandth. And in terms of if you
calculate the squared error you just subtract one from the
other. Or basically in each case you're Kalka in a square and
you'll see that the squared errors when you compare one
case versus other it didn't change that much. You didn't
improve your network that much when you looked at the
mean square there. But if you're looking at the cross
entropy because you're taking a logarithm and then you're
comparing that to dividing one to the other. You will see that
you have actually improved your network significantly so
that that jump from one million to 1000 in mean squared
error terms will be very low. It will be insignificant and it
won't. It won't guide your gradient boosting process or your
back propagation in the right direction. It all it will guided in
the right direction but it'll be like a very slow guidance it
won't have enough power whereas if you do recross entropy
across entropy will understand that even though these are
very small adjustments that are just you know making a tiny
change in absolute terms in relative terms it's a huge
improvement. And we are definitely going in the right
direction. Let's keep going that way so cross entropy will
help your neural network get to the right to get to the
optimal state is a better way for the neural network to get to
an optimal state. But bear in mind that this only works when
it is the preferred method only for classification. So if you're
talking about things like regression like which we had in
artificial neural networks then you would rather go with me
and squared error whereas cross entropy is better for
classification and again it has to do with the fact that we're
using soft next function so that's a kind of intuitive
explanation of that a good place to learn a bit more about
that if you're really interested in you know why are we using
cross versus mean square error. Google a project by
Geoffrey Hinton called the soft max output function and he
explains it very well and you know being the godfather of
deep learning who can explain it better anyway. And by the
way any project by Geoffrey Hinton is golden. He's just got a
huge talent for explaining things anyway. So that's that soft
nice versus cross and I hope that gives you kind of like an
intuitive understanding of what's going on here. But more
importantly that you're not put off by the term cross entropy
because the headline will mention it in the practical stories
and I wanted to make sure that you're prepared for that.
And it's just another way of calculating your last function.
And another way of optimizing your network which is
specifically tailored to classification problems and therefore
convolutional neural networks and comes in hand hand-in-
hand with the soft max function. So additional reading if
you'd like a light introduction into cross entropy if you're
interested in the concentrate a bit more of course. A good
article to check out is called a friendly introduction to cross
entropy loss by Rob DePietro 2016. Here's the link below.
Very very nice, very soft and nothing no super complex.
Good analogies good examples using analogies of cars and
you look at cars and talks about information and bits and
restrictions and you know how would you decode this whole
Unico that it's so it's a good article to have a look at and
we'll give you a good overview of a cross entry like from an
introductory standpoint. If you want to dig into the heavy
math like what you see here then check out an article by or
a blog by how to implement a neural network Intermezzo
too so in terms of use it is like an intermediary thing like a.
Intermittency in. You know like when you go to a theater and
you have a break between the first part and the second
part. So because he's like going through all these steps and
then he's like and then he says I got to explain this first. And
yes so that's why it's called intermezzo. No other reason as
far as I understand the articles by Peter Rolands 2016 as
well so both are quite recent. And you know check out this if
you'd like to dig into the mathematics behind Kross entropy
behind the soft Max and cross entropy in this article actually.