0% found this document useful (0 votes)
281 views19 pages

Artificial Intelligence A-Z™ 2023 Build An AI With

Uploaded by

Nikhil katiyar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
281 views19 pages

Artificial Intelligence A-Z™ 2023 Build An AI With

Uploaded by

Nikhil katiyar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Artificial Intelligence A-Z™ 2023: Build an AI with

ChatGPT4

Author : Arnab Dana, Masters of Technology. National Institute of Technology, Uttarakhand

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING with Specialization in Artificial Intelligence & Machine Learning

Syllabus
Comments 1 Comments 2

Resource
Name Link Comments #

Artificial
Intelligence A-
Z™ 2023: Build https://www.udemy.com/course/artificial-intelligence-az/ 16h 9m total length
an AI with
ChatGPT4

AI for
everyone by
https://www.coursera.org/learn/ai-for-everyone/home/week/1
Andrew Ng on
Coursera

https://www.alphaa.ai/cds-resources/5-best-courses-to-learn-artificial-intelligence-
machine-learning-in-2023-beginner-to-advanced#:~:text=Artificial Intelligence A-Z™
5 Best Courses
2023%3A Build an AI with ChatGPT4 by Udemy&text=About the course%3A In
this,AI for real-world applications.

https://www.superdatascience.com/pages/artificial-intelligence

https://inst.eecs.berkeley.edu/~cs188/sp11/projects/reinforcement/reinforcement.html

http://ai.berkeley.edu/lecture_slides.html

http://ai.berkeley.edu/more_courses_other_schools.html

https://www.superdatascience.com/blogs/artificial-neural-networks-how-do-neural-
PPT
networks-learn

image to text https://www.imagetotext.io/

text to speech https://cloud.google.com/text-to-speech?hl=en

Content
Number Content Comments #

1 The Bellman Equation

2 Markov decision process (MDP)

3 Stochastic search

4 Q-Learning

5 Deep Learning

6 Deep Q-Learning

10

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 1


Number Content Comments #

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

1. The Bellman Equation


https://medium.com/analytics-vidhya/bellman-equation-and-dynamic-programming-773ce67fc6a7#:~:text=The Bellman equation
will be,technique for solving complex problems.
https://www.geeksforgeeks.org/bellman-equation/

s : State
a : Action

R : Reward
Y : Discount

V(s’) : value for being in the next state


V(s) : value for being in a certain state

R(s, a) : reward we get after taking action a in state s


P(s, a, s’) : probability of ending is state s’ from s by taking action a

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 2


Dynamic Programming

Dynamic programming (DP) is a technique for solving complex problems. In DP, instead of solving complex problems one at a
time, we break the problem into simple subproblems, then for each sub-problem, we compute and store the solution. If the same
subproblem occurs, we will not recompute, instead, we use the already computed solution.

We solve a Bellman equation using two powerful algorithms:

Value Iteration

Policy Iteration

Example
Without Bellman Equation

https://www.geeksforgeeks.org/bellman-equation/

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 3


What happens without Bellman Equation?

Initially, we will give our agent some time to explore the environment and let it figure out a path to the goal. As soon as it reaches
its goal, it will back trace its steps back to its starting position and mark values of all the states which eventually leads
towards the goal as V = 1.

The agent will face no problem until we change its starting position, as it will not be able to find a path towards the trophy
state since the value of all the states is equal to 1.

So, to solve this problem we should use Bellman Equation:

V(s)=MAXa(R(s, a)+ γV(s’))

2. Markov decision process (MDP)


MDP = Bellman Equation + Stochastic( or probability or Non-deterministic)

Deterministic Algorithm Non-deterministic Algorithm

A deterministic algorithm is one A non-deterministic algorithm is one


whose behavior is completely in which the outcome cannot be
determined by its inputs and the predicted with certainty, even if the
sequence of its instructions. inputs are known.

Can solve the problem in polynomial Can’t solve the problem in


time. polynomial time.

Non-deterministic algorithms are


Deterministic algorithms are
often used in applications where
commonly used in applications
finding an exact solution is difficult
where precision is critical, such as in
or impractical, such as in artificial
cryptography, numerical analysis,
intelligence, machine learning, and
and computer graphics.
optimization problems.

Examples of deterministic Examples of non-deterministic


algorithms include sorting algorithms include probabilistic
algorithms like bubble sort, insertion algorithms like Monte Carlo
sort, and selection sort, as well as methods, genetic algorithms, and
many numerical algorithms. simulated annealing.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 4


Deterministic Algorithm Non-deterministic Algorithm

Like linear search and binary


like 0/1 knapsack problem.
search

Markov property

A stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional
on both past and present values) depends only upon the present state; that is, given the present, the future does not depend on
the past. A process with this property is said to be Markov or Markovian and known as a Markov process.

Markov decision process


https://www.geeksforgeeks.org/markov-decision-process/
https://en.wikipedia.org/wiki/Markov_decision_process

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical
framework for modeling decision making in situations where outcomes are partly random and partly under the control of a
decision maker. MDPs are useful for studying optimization problems solved via dynamic programming

P : transition probability

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 5


Adding a "Living Penalty”

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 6


3. Stochastic search
Stochastic search is a class of optimization and search algorithms that use randomness or probabilistic techniques to explore
and find solutions in a search space. In stochastic search, the search process is not entirely deterministic; instead, it involves an
element of randomness, which can help in finding solutions in complex or high-dimensional spaces.
Stochastic search methods are often used when:

1. Deterministic methods are impractical: In some optimization problems, the search space is so vast or complex that
deterministic algorithms like gradient descent may not be effective.

2. There is uncertainty: Stochastic search methods can be used when there is uncertainty or noise in the objective function,
making it difficult to find a single optimal solution.

3. Escaping local optima: Randomness in the search process can help algorithms escape local optima and explore a wider
range of the search space.

Common stochastic search methods include:

1. Random Search: This method involves randomly sampling points in the search space and evaluating their objective function
values. It doesn't rely on any gradient information and can be effective for high-dimensional problems.

2. Simulated Annealing: Simulated annealing is inspired by the annealing process in metallurgy. It uses a temperature
parameter to control the level of randomness in the search. As the algorithm progresses, the temperature decreases,
reducing the randomness and converging toward an optimal solution.

3. Genetic Algorithms: Genetic algorithms are inspired by the process of natural selection and genetics. They maintain a
population of potential solutions and use operators like mutation and crossover to generate new solutions. Over time, these
algorithms evolve better solutions.

4. Particle Swarm Optimization (PSO): PSO is inspired by the social behavior of birds or fish. It involves a population of
particles moving through the search space, adjusting their positions based on their individual and social experiences to find
the optimal solution.

5. Ant Colony Optimization: This method is inspired by the foraging behavior of ants. It models the search process as the
movement of artificial ants on a graph, with pheromone trails influencing their choices to find optimal paths.

6. Markov Chain Monte Carlo (MCMC): MCMC methods are often used for sampling from complex probability distributions.
They use a Markov chain to explore the distribution and converge to a representative sample.

Stochastic search methods can be effective in various domains, including machine learning, operations research, and
engineering, particularly when dealing with complex, noisy, or high-dimensional optimization problems. They provide a way to
balance exploration and exploitation, making them useful in both global and local optimization tasks.

4. Q-Learning

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 7


Q-Learning Visualization small project

http://ai.berkeley.edu/reinforcement.html
5. Deep Learning
https://www.linkedin.com/pulse/industry-use-cases-neural-networks-sanskriti-shevgaonkar/
https://www.ibm.com/topics/neural-networks
https://aws.amazon.com/what-is/neural-network/#:~:text=Neural network training is the,process unknown inputs more
accurately.
https://www.superdatascience.com/blogs/artificial-neural-networks-how-do-neural-networks-learn

Introduction
Prominent Researchers
The Neuron
Multivariate Linear Regression and Linear Regression.
The Activation/ Threshold Function

Cost Function or loss function or objective function


How do Neural network Learn ?

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 8


Gradient Descent

Stochastic Gradient Descent


Backpropagation

Introduction

Deep learning is a subfield of machine learning, which is itself a subfield of artificial intelligence (AI). Deep learning focuses on
artificial neural networks and algorithms inspired by the structure and function of the human brain, known as artificial neural
networks. These neural networks are composed of multiple layers of interconnected nodes (neurons) that process information in
a hierarchical and layered fashion.
Key characteristics of deep learning include:

1. Neural Networks: Deep learning models are typically built using deep neural networks. These networks can have many
layers, hence the term "deep." Common architectures include feedforward neural networks, convolutional neural networks
(CNNs) for image analysis, and recurrent neural networks (RNNs) for sequential data.

2. Feature Learning: Deep learning algorithms can automatically learn features from raw data. This is in contrast to traditional
machine learning, where feature engineering is often required to extract relevant information from the data.

3. Training with Big Data: Deep learning models often require a large amount of data for training, which can be a challenge.
However, with sufficient data, deep learning models can generalize well and make accurate predictions.

4. Backpropagation: Deep learning models are trained using the backpropagation algorithm, which involves adjusting the
model's weights and biases to minimize the difference between predicted and actual outcomes. This process is typically
guided by a loss or cost function.

5. Activation Functions: Neural networks use activation functions to introduce non-linearity into the model. Common
activation functions include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.

6. Applications: Deep learning has achieved remarkable success in various applications, including image and speech
recognition, natural language processing, autonomous vehicles, recommendation systems, and healthcare, among others.

7. Deep Learning Frameworks: Several open-source libraries and frameworks, such as TensorFlow, PyTorch, and Keras,
have made it easier for researchers and developers to build and train deep learning models.

8. Challenges: Deep learning models can be computationally expensive to train, and they may require substantial hardware
resources, like powerful GPUs or TPUs. They can also be susceptible to overfitting when not enough data is available, and
understanding the inner workings of deep networks can be challenging.

Deep learning has revolutionized the field of AI by enabling the development of complex, high-performance models that excel in
various tasks. It has been a driving force behind the recent advancements in AI and has been applied in a wide range of
industries and domains.

Several prominent researchers who made significant contributions to the development of deep learning techniques and neural
networks. Some of the key figures often associated with the early development of deep learning include:
Geoffrey Hinton : Geoffrey Hinton is often referred to as one of the pioneers of deep learning. He made fundamental
contributions to the development of neural networks and was instrumental in popularizing the concept of deep neural networks.
His work on backpropagation and Boltzmann machines laid the foundation for modern deep learning techniques. Hinton, along
with his colleagues, significantly advanced the field and played a crucial role in its resurgence.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 9


Yann LeCun : Yann LeCun is known for his work in convolutional neural networks (CNNs) and the development of the
backpropagation algorithm for training neural networks. His contributions to the field of deep learning have been influential,
particularly in the domain of computer vision.

Yoshua Bengio : Yoshua Bengio is another key figure in the deep learning community. He has made important contributions to
deep neural networks, including the development of deep learning architectures and algorithms. Bengio's work on deep
learning's theoretical foundations and applications has had a lasting impact.

The Neuron

https://en.wikipedia.org/wiki/Neuron

A neuron, also known as a nerve cell, is the


fundamental building block of the nervous
system in living organisms, including
humans. Neurons are specialized cells that
transmit information in the form of electrical
and chemical signals within the body. They
are responsible for processing and
transmitting information, allowing the nervous
system to perform a wide range of functions,
including sensory perception, motor control,
and cognitive processes.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 10


Multivariate Linear Regression and Linear Regression.

1. Linear Regression (Simple Linear Regression):

Single Variable: In simple linear regression, you have a single independent variable (predictor variable) and a single
dependent variable (response variable).

Equation: The relationship between the independent variable (X) and the dependent variable (Y) is represented as a
straight line equation: Y = aX + b. Here, "a" is the slope of the line, and "b" is the intercept.

Objective: The objective is to find the best-fitting line that minimizes the sum of squared differences between the
predicted values (Y) and the actual values.

Example: Predicting a person's weight (Y) based on their height (X) where there is only one predictor variable.

2. Multivariate Linear Regression:

Multiple Variables: In multivariate linear regression, you have more than one independent variable (predictor variables)
and a single dependent variable (response variable).

Equation: The relationship between multiple independent variables (X1, X2, X3, etc.) and the dependent variable (Y) is
represented as: Y = a1X1 + a2X2 + a3*X3 + ... + b. Here, "a1," "a2," "a3," and so on are the coefficients for each
independent variable, and "b" is the intercept.

Objective: The objective is to find the best-fitting linear equation that minimizes the sum of squared differences
between the predicted values (Y) and the actual values while accounting for multiple predictor variables.

Example: Predicting a house's price (Y) based on multiple features, such as square footage (X1), number of bedrooms
(X2), and neighborhood quality (X3), where each of these features is a predictor variable.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 11


The Activation/ Threshold Function

EX :

How do Neural Networks works ?


Neural networks, often referred to as artificial neural networks, work by simulating the functioning of biological neurons to
process information and make predictions or decisions. These networks are a fundamental component of deep learning and are
used in a wide range of applications, including image recognition, natural language processing, and more. Here's how neural
networks work at a high level:

1. Neuron Model:

Neural networks are composed of artificial neurons, also called nodes or units. These neurons are organized into layers.

Each neuron receives input from one or more other neurons, processes the information, and produces an output.

2. Layers:

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 12


Neural networks are typically organized into three types of layers:

Input Layer: This layer receives the initial data or features and passes them to the next layer.

Hidden Layers: These intermediate layers, which can be one or more, perform complex computations on the input
data. They are responsible for feature extraction and representation learning.

Output Layer: The final layer produces the network's output, which can be a prediction, classification, or decision.

3. Connections and Weights:

Neurons are connected to neurons in adjacent layers through weighted connections.

Each connection has a weight associated with it, which determines the strength of the connection. These weights are
adjusted during training to make the network learn from data.

4. Activation Function:

Neurons apply an activation function to the weighted sum of their inputs, which introduces non-linearity into the network.
Common activation functions include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.

5. Forward Propagation:

Information is passed through the network from the input layer to the output layer in a process called forward
propagation.

Neurons in each layer perform a weighted sum of their inputs, apply the activation function, and pass the result to the
next layer.

6. Loss Function:

The network's output is compared to the actual target values, and a loss (or cost) function quantifies the error or
difference between the predicted and actual values.

7. Backpropagation:

Backpropagation is the process of adjusting the weights in the network to minimize the loss.

The gradients of the loss with respect to the weights are computed, and the weights are updated using optimization
algorithms (e.g., gradient descent).

8. Training:

The network goes through multiple iterations of forward and backward passes to learn the best set of weights that
minimizes the loss function.

This training process continues until the model's performance converges to a satisfactory level.

9. Inference:

Once the network is trained, it can be used for inference, where it takes new, unseen data as input and produces
predictions or decisions.

Neural networks can vary in


complexity, with simple feedforward
neural networks being just one layer
and complex architectures like
convolutional neural networks
(CNNs) and recurrent neural
networks (RNNs) having multiple
layers and specialized connections
for tasks such as image recognition
and sequence processing.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 13


The success of neural networks lies in their ability to learn complex patterns and representations from data, making them a
powerful tool for various machine learning and artificial intelligence applications.
Cost Function or loss function or objective function
https://www.javatpoint.com/cost-function-in-machine-learning
A cost function, also known as a loss function or objective function, is a critical component in many machine learning and
optimization algorithms, including neural networks. It is a mathematical function that quantifies the difference between the
predicted values generated by a model and the true values (or target values) for a given set of data points. The cost function is
used to measure how well the model is performing in terms of its predictions, and the goal during training is to minimize this cost
function. The choice of a specific cost function depends on the nature of the problem (regression, classification, etc.).
Here are some common types of cost functions:

1. Mean Squared Error (MSE):

MSE is used for regression problems. It calculates the average of the squared differences between the predicted values
and the true values.

The formula for MSE is: MSE = (1/n) * Σ(y_true - y_predicted)^2, where "n" is the number of data points, "y_true" is the
true value, and "y_predicted" is the predicted value.

2. Cross-Entropy (Log Loss):

Cross-entropy is used for binary and multiclass classification problems. It quantifies the dissimilarity between predicted
class probabilities and true class labels.

Binary Cross-Entropy: For binary classification, the formula is: BCE = -[y_true * log(y_predicted) + (1 - y_true) * log(1 -
y_predicted)].

Categorical Cross-Entropy: For multiclass classification, the formula is: CCE = -Σ(y_true_i * log(y_predicted_i)), where
"i" ranges over all classes.

3. Hinge Loss (SVM Loss):

Hinge loss is often used in support vector machines (SVMs) and is employed in binary classification tasks. It
encourages correct classification and penalizes incorrect predictions.

The formula is: Hinge Loss = max(0, 1 - (y_true * y_predicted)), where "y_true" is the true class label (either +1 or -1),
and "y_predicted" is the predicted value.

4. Huber Loss:

Huber loss is a robust regression loss that is less sensitive to outliers compared to MSE.

It combines quadratic and linear loss and is often used in regression problems where the data may contain outliers.

The formula involves a threshold "δ" and is a piecewise function that transitions from quadratic loss to linear loss as the
error grows.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 14


The choice of the cost function depends on the specific problem and the desired characteristics of the model. During the training
of machine learning models, the goal is to minimize the cost function by adjusting the model's parameters (such as weights and
biases in a neural network) through optimization techniques like gradient descent. Minimizing the cost function results in better
model performance and more accurate predictions on new, unseen data.

How do Neural network Learn ?


https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications
Neural networks learn through a process called training, during which they adjust their internal parameters (weights and biases)
to minimize a defined objective function, often referred to as a loss or cost function. The primary learning algorithm used in
training neural networks is called backpropagation, which is a form of supervised learning. Here's how neural networks learn:

1. Initialization:

Before training, the weights and biases of the neural network are typically initialized with small random values. These
initial parameters are the starting point for the learning process.

2. Forward Propagation:

During training, input data is fed into the neural network's input layer, and it propagates through the network layer by
layer, following the connections and weights.

Each neuron computes a weighted sum of its inputs, applies an activation function, and passes the result to the neurons
in the next layer. This process continues until the output layer is reached.

3. Loss Calculation:

The output of the neural network is compared to the true or target values (ground truth) associated with the input data.

A loss function (or cost function) is used to quantify the difference between the predicted values and the true values.
Common loss functions include mean squared error (MSE) for regression tasks and cross-entropy for classification
tasks.

4. Backpropagation:

The backpropagation algorithm is used to calculate the gradients of the loss with respect to the network's parameters
(weights and biases).

Gradients represent the direction and magnitude of the change needed to minimize the loss.

5. Weight Updates:

An optimization algorithm, such as gradient descent or one of its variants, is used to update the weights and biases of
the network.

The gradients guide the direction of weight updates, with the goal of reducing the loss. The learning rate is a
hyperparameter that controls the step size during these updates.

6. Iterative Process:

Steps 2 to 5 are repeated iteratively for a specified number of epochs or until the loss converges to a satisfactory level.

During each iteration, the network's weights are adjusted incrementally to improve its performance on the training data.

7. Generalization:

As the neural network continues to learn, it aims to capture underlying patterns and relationships in the training data.

The goal is not only to perform well on the training data but also to generalize its learned knowledge to make accurate
predictions on unseen or test data.

8. Validation and Testing:

The trained network is evaluated on a separate validation dataset to ensure it is not overfitting (i.e., memorizing the
training data) and can generalize to new examples.

Finally, the network's performance is assessed on a testing dataset to estimate its real-world predictive accuracy.

Neural networks learn by iteratively adjusting their internal parameters based on the gradients of the loss function, which guide
them toward minimizing prediction errors. The quality of the training data, the architecture of the network, the choice of loss
function, and the hyperparameters (e.g., learning rate) all play important roles in the learning process. The goal is to achieve a
well-trained network that can make accurate predictions on new, unseen data.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 15


Gradient Descent(How do the weights in a neural network adjust themselves while I am training the network?)
Read R 8 Gradient Descent for more Information
Gradient descent is an optimization algorithm used in machine learning and deep learning to minimize the cost function, also
known as the loss function. It's a fundamental technique for training models, including neural networks. The goal of gradient
descent is to find the set of model parameters (weights and biases) that results in the lowest possible value of the cost function.
Here's how gradient descent works:

1. Initialization: The algorithm begins by initializing the model's parameters with some initial values, often randomly. These
parameters are the values that the algorithm seeks to optimize to minimize the cost function.

2. Forward Propagation: The model processes a batch of training data using the current parameter values and calculates the
predicted values (output) for each data point.

3. Cost Calculation: The cost function is then applied to compare the predicted values with the true target values for the data
points in the batch. The cost function quantifies the difference between the predictions and the actual values.

4. Gradient Calculation: The key step in gradient descent is computing the gradient of the cost function with respect to each
parameter. The gradient represents the direction and magnitude of the steepest ascent of the cost function.

5. Parameter Update: The parameters are updated using the gradients. The goal is to move the parameters in the opposite
direction of the gradient to reduce the cost function. This update is performed iteratively for each parameter.

For each parameter, the update formula is:


Here, the learning rate is a hyperparameter that determines the step size for the parameter updates. It controls the
trade-off between convergence speed and stability.

New_Parameter = Old_Parameter - (learning_rate * Gradient)

6. Iteration: Steps 2 to 5 are repeated for a fixed number of iterations (epochs) or until the cost function converges to a
minimum. During each iteration, the parameters are updated to gradually reduce the cost.

Gradient descent aims to find the set of parameters that minimizes the cost function, which represents the model's performance
on the training data. It's an iterative process where the algorithm gradually converges towards the optimal parameter values.
There are variations of gradient descent, such as stochastic gradient descent (SGD), mini-batch gradient descent, and others,
which adapt the way in which data is processed and parameters are updated. These variations are often used to make training
more efficient and to handle large datasets.
The choice of learning rate and the specific gradient descent variant can impact the algorithm's convergence and effectiveness,
so fine-tuning these hyperparameters is an important part of model training.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 16


Stochastic Gradient Descent
https://iamtrask.github.io/2015/07/27/python-network-part2/
http://neuralnetworksanddeeplearning.com/about.html
In gradient descent, there is a problem of local minima. To overcome this issue, we need stochastic gradient descent.

Backpropagation
Neural networks and deep learning

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 17


Training the ANN with Stochastic Gradient Descent
STEP 1: Randomly initialize the weights to small numbers close to 0 (but not 0).
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each neuron's activation is
limited by the weights. Propagate the activations until getting the predicted result y
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 5: Back-Propagation: from right to left, the error is back-propagated. Update the weights according to how much they are
responsible for the error. The learning rate decides by how much we update the weights.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or: Repeat Steps 1 to 5
but update the weights only after a batch of observations (Batch Learning).
STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.
Perceptron
https://www.javatpoint.com/perceptron-in-machine-learning
https://datascientest.com/en/perceptron-definition-and-use-cases
https://www.w3schools.com/ai/ai_perceptrons.asp
A perceptron is a simple mathematical model of a biological neuron, and it forms the basis of artificial neural networks. It was
developed by Frank Rosenblatt in the late 1950s. The perceptron is the simplest form of a feedforward neural network, which
means that information flows in one direction, from the input layer to the output layer.

Here's how a perceptron works:

1. Input: The perceptron takes multiple binary or continuous input values, each of which is assigned a weight. These input
values could represent features of the data you're working with.

2. Weighted Sum: The inputs are multiplied by their respective weights, and the results are summed up. This is known as the
weighted sum. Mathematically, this can be represented as:
weighted_sum = (input_1 * weight_1) + (input_2 * weight_2) + ... + (input_n * weight_n)

3. Activation Function: The weighted sum is then passed through an activation function. The purpose of the activation
function is to introduce non-linearity into the model. In the case of a simple perceptron, the activation function is a step
function, which means that if the weighted sum is above a certain threshold, the perceptron "fires" and produces an output
of 1; otherwise, it produces an output of 0. Mathematically, this can be represented as:
output = 1 if weighted_sum >= threshold else 0

The perceptron's output can be thought of as a decision made by the model based on the inputs and their associated weights. It
can be used to classify data into two categories (e.g., binary classification problems), and it's particularly suited for linearly
separable data. In cases where data is not linearly separable, more complex neural network architectures like multilayer
perceptrons (MLPs) are used.
While a single perceptron has limitations, when you combine multiple perceptrons in layers, you create more powerful neural
networks capable of learning complex patterns and making more sophisticated decisions. These networks are the foundation of
modern deep learning and artificial neural networks.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 18


6. Deep Q-Learning
Deep Q-Learning Intuition

1. Learning, 2. Acting

Experience replay
Action Selection Policies

L : loss
we want to make this
loss close to “0”.
To make it near “0”,
we Backpropagate
the loss and adjust
the weight.

7.

8.

9.

10.

Artificial Intelligence A-Z™ 2023: Build an AI with ChatGPT4 19

You might also like