Artificial Intelligence A-Z™ 2023 Build An AI With
Artificial Intelligence A-Z™ 2023 Build An AI With
ChatGPT4
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING with Specialization in Artificial Intelligence & Machine Learning
Syllabus
Comments 1 Comments 2
Resource
Name Link Comments #
Artificial
Intelligence A-
Z™ 2023: Build https://www.udemy.com/course/artificial-intelligence-az/ 16h 9m total length
an AI with
ChatGPT4
AI for
everyone by
https://www.coursera.org/learn/ai-for-everyone/home/week/1
Andrew Ng on
Coursera
https://www.alphaa.ai/cds-resources/5-best-courses-to-learn-artificial-intelligence-
machine-learning-in-2023-beginner-to-advanced#:~:text=Artificial Intelligence A-Z™
5 Best Courses
2023%3A Build an AI with ChatGPT4 by Udemy&text=About the course%3A In
this,AI for real-world applications.
https://www.superdatascience.com/pages/artificial-intelligence
https://inst.eecs.berkeley.edu/~cs188/sp11/projects/reinforcement/reinforcement.html
http://ai.berkeley.edu/lecture_slides.html
http://ai.berkeley.edu/more_courses_other_schools.html
https://www.superdatascience.com/blogs/artificial-neural-networks-how-do-neural-
PPT
networks-learn
Content
Number Content Comments #
3 Stochastic search
4 Q-Learning
5 Deep Learning
6 Deep Q-Learning
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
s : State
a : Action
R : Reward
Y : Discount
Dynamic programming (DP) is a technique for solving complex problems. In DP, instead of solving complex problems one at a
time, we break the problem into simple subproblems, then for each sub-problem, we compute and store the solution. If the same
subproblem occurs, we will not recompute, instead, we use the already computed solution.
Value Iteration
Policy Iteration
Example
Without Bellman Equation
https://www.geeksforgeeks.org/bellman-equation/
Initially, we will give our agent some time to explore the environment and let it figure out a path to the goal. As soon as it reaches
its goal, it will back trace its steps back to its starting position and mark values of all the states which eventually leads
towards the goal as V = 1.
The agent will face no problem until we change its starting position, as it will not be able to find a path towards the trophy
state since the value of all the states is equal to 1.
Markov property
A stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional
on both past and present values) depends only upon the present state; that is, given the present, the future does not depend on
the past. A process with this property is said to be Markov or Markovian and known as a Markov process.
In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical
framework for modeling decision making in situations where outcomes are partly random and partly under the control of a
decision maker. MDPs are useful for studying optimization problems solved via dynamic programming
P : transition probability
1. Deterministic methods are impractical: In some optimization problems, the search space is so vast or complex that
deterministic algorithms like gradient descent may not be effective.
2. There is uncertainty: Stochastic search methods can be used when there is uncertainty or noise in the objective function,
making it difficult to find a single optimal solution.
3. Escaping local optima: Randomness in the search process can help algorithms escape local optima and explore a wider
range of the search space.
1. Random Search: This method involves randomly sampling points in the search space and evaluating their objective function
values. It doesn't rely on any gradient information and can be effective for high-dimensional problems.
2. Simulated Annealing: Simulated annealing is inspired by the annealing process in metallurgy. It uses a temperature
parameter to control the level of randomness in the search. As the algorithm progresses, the temperature decreases,
reducing the randomness and converging toward an optimal solution.
3. Genetic Algorithms: Genetic algorithms are inspired by the process of natural selection and genetics. They maintain a
population of potential solutions and use operators like mutation and crossover to generate new solutions. Over time, these
algorithms evolve better solutions.
4. Particle Swarm Optimization (PSO): PSO is inspired by the social behavior of birds or fish. It involves a population of
particles moving through the search space, adjusting their positions based on their individual and social experiences to find
the optimal solution.
5. Ant Colony Optimization: This method is inspired by the foraging behavior of ants. It models the search process as the
movement of artificial ants on a graph, with pheromone trails influencing their choices to find optimal paths.
6. Markov Chain Monte Carlo (MCMC): MCMC methods are often used for sampling from complex probability distributions.
They use a Markov chain to explore the distribution and converge to a representative sample.
Stochastic search methods can be effective in various domains, including machine learning, operations research, and
engineering, particularly when dealing with complex, noisy, or high-dimensional optimization problems. They provide a way to
balance exploration and exploitation, making them useful in both global and local optimization tasks.
4. Q-Learning
http://ai.berkeley.edu/reinforcement.html
5. Deep Learning
https://www.linkedin.com/pulse/industry-use-cases-neural-networks-sanskriti-shevgaonkar/
https://www.ibm.com/topics/neural-networks
https://aws.amazon.com/what-is/neural-network/#:~:text=Neural network training is the,process unknown inputs more
accurately.
https://www.superdatascience.com/blogs/artificial-neural-networks-how-do-neural-networks-learn
Introduction
Prominent Researchers
The Neuron
Multivariate Linear Regression and Linear Regression.
The Activation/ Threshold Function
Introduction
Deep learning is a subfield of machine learning, which is itself a subfield of artificial intelligence (AI). Deep learning focuses on
artificial neural networks and algorithms inspired by the structure and function of the human brain, known as artificial neural
networks. These neural networks are composed of multiple layers of interconnected nodes (neurons) that process information in
a hierarchical and layered fashion.
Key characteristics of deep learning include:
1. Neural Networks: Deep learning models are typically built using deep neural networks. These networks can have many
layers, hence the term "deep." Common architectures include feedforward neural networks, convolutional neural networks
(CNNs) for image analysis, and recurrent neural networks (RNNs) for sequential data.
2. Feature Learning: Deep learning algorithms can automatically learn features from raw data. This is in contrast to traditional
machine learning, where feature engineering is often required to extract relevant information from the data.
3. Training with Big Data: Deep learning models often require a large amount of data for training, which can be a challenge.
However, with sufficient data, deep learning models can generalize well and make accurate predictions.
4. Backpropagation: Deep learning models are trained using the backpropagation algorithm, which involves adjusting the
model's weights and biases to minimize the difference between predicted and actual outcomes. This process is typically
guided by a loss or cost function.
5. Activation Functions: Neural networks use activation functions to introduce non-linearity into the model. Common
activation functions include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.
6. Applications: Deep learning has achieved remarkable success in various applications, including image and speech
recognition, natural language processing, autonomous vehicles, recommendation systems, and healthcare, among others.
7. Deep Learning Frameworks: Several open-source libraries and frameworks, such as TensorFlow, PyTorch, and Keras,
have made it easier for researchers and developers to build and train deep learning models.
8. Challenges: Deep learning models can be computationally expensive to train, and they may require substantial hardware
resources, like powerful GPUs or TPUs. They can also be susceptible to overfitting when not enough data is available, and
understanding the inner workings of deep networks can be challenging.
Deep learning has revolutionized the field of AI by enabling the development of complex, high-performance models that excel in
various tasks. It has been a driving force behind the recent advancements in AI and has been applied in a wide range of
industries and domains.
Several prominent researchers who made significant contributions to the development of deep learning techniques and neural
networks. Some of the key figures often associated with the early development of deep learning include:
Geoffrey Hinton : Geoffrey Hinton is often referred to as one of the pioneers of deep learning. He made fundamental
contributions to the development of neural networks and was instrumental in popularizing the concept of deep neural networks.
His work on backpropagation and Boltzmann machines laid the foundation for modern deep learning techniques. Hinton, along
with his colleagues, significantly advanced the field and played a crucial role in its resurgence.
Yoshua Bengio : Yoshua Bengio is another key figure in the deep learning community. He has made important contributions to
deep neural networks, including the development of deep learning architectures and algorithms. Bengio's work on deep
learning's theoretical foundations and applications has had a lasting impact.
The Neuron
https://en.wikipedia.org/wiki/Neuron
Single Variable: In simple linear regression, you have a single independent variable (predictor variable) and a single
dependent variable (response variable).
Equation: The relationship between the independent variable (X) and the dependent variable (Y) is represented as a
straight line equation: Y = aX + b. Here, "a" is the slope of the line, and "b" is the intercept.
Objective: The objective is to find the best-fitting line that minimizes the sum of squared differences between the
predicted values (Y) and the actual values.
Example: Predicting a person's weight (Y) based on their height (X) where there is only one predictor variable.
Multiple Variables: In multivariate linear regression, you have more than one independent variable (predictor variables)
and a single dependent variable (response variable).
Equation: The relationship between multiple independent variables (X1, X2, X3, etc.) and the dependent variable (Y) is
represented as: Y = a1X1 + a2X2 + a3*X3 + ... + b. Here, "a1," "a2," "a3," and so on are the coefficients for each
independent variable, and "b" is the intercept.
Objective: The objective is to find the best-fitting linear equation that minimizes the sum of squared differences
between the predicted values (Y) and the actual values while accounting for multiple predictor variables.
Example: Predicting a house's price (Y) based on multiple features, such as square footage (X1), number of bedrooms
(X2), and neighborhood quality (X3), where each of these features is a predictor variable.
EX :
1. Neuron Model:
Neural networks are composed of artificial neurons, also called nodes or units. These neurons are organized into layers.
Each neuron receives input from one or more other neurons, processes the information, and produces an output.
2. Layers:
Input Layer: This layer receives the initial data or features and passes them to the next layer.
Hidden Layers: These intermediate layers, which can be one or more, perform complex computations on the input
data. They are responsible for feature extraction and representation learning.
Output Layer: The final layer produces the network's output, which can be a prediction, classification, or decision.
Each connection has a weight associated with it, which determines the strength of the connection. These weights are
adjusted during training to make the network learn from data.
4. Activation Function:
Neurons apply an activation function to the weighted sum of their inputs, which introduces non-linearity into the network.
Common activation functions include ReLU (Rectified Linear Unit), Sigmoid, and Tanh.
5. Forward Propagation:
Information is passed through the network from the input layer to the output layer in a process called forward
propagation.
Neurons in each layer perform a weighted sum of their inputs, apply the activation function, and pass the result to the
next layer.
6. Loss Function:
The network's output is compared to the actual target values, and a loss (or cost) function quantifies the error or
difference between the predicted and actual values.
7. Backpropagation:
Backpropagation is the process of adjusting the weights in the network to minimize the loss.
The gradients of the loss with respect to the weights are computed, and the weights are updated using optimization
algorithms (e.g., gradient descent).
8. Training:
The network goes through multiple iterations of forward and backward passes to learn the best set of weights that
minimizes the loss function.
This training process continues until the model's performance converges to a satisfactory level.
9. Inference:
Once the network is trained, it can be used for inference, where it takes new, unseen data as input and produces
predictions or decisions.
MSE is used for regression problems. It calculates the average of the squared differences between the predicted values
and the true values.
The formula for MSE is: MSE = (1/n) * Σ(y_true - y_predicted)^2, where "n" is the number of data points, "y_true" is the
true value, and "y_predicted" is the predicted value.
Cross-entropy is used for binary and multiclass classification problems. It quantifies the dissimilarity between predicted
class probabilities and true class labels.
Binary Cross-Entropy: For binary classification, the formula is: BCE = -[y_true * log(y_predicted) + (1 - y_true) * log(1 -
y_predicted)].
Categorical Cross-Entropy: For multiclass classification, the formula is: CCE = -Σ(y_true_i * log(y_predicted_i)), where
"i" ranges over all classes.
Hinge loss is often used in support vector machines (SVMs) and is employed in binary classification tasks. It
encourages correct classification and penalizes incorrect predictions.
The formula is: Hinge Loss = max(0, 1 - (y_true * y_predicted)), where "y_true" is the true class label (either +1 or -1),
and "y_predicted" is the predicted value.
4. Huber Loss:
Huber loss is a robust regression loss that is less sensitive to outliers compared to MSE.
It combines quadratic and linear loss and is often used in regression problems where the data may contain outliers.
The formula involves a threshold "δ" and is a piecewise function that transitions from quadratic loss to linear loss as the
error grows.
1. Initialization:
Before training, the weights and biases of the neural network are typically initialized with small random values. These
initial parameters are the starting point for the learning process.
2. Forward Propagation:
During training, input data is fed into the neural network's input layer, and it propagates through the network layer by
layer, following the connections and weights.
Each neuron computes a weighted sum of its inputs, applies an activation function, and passes the result to the neurons
in the next layer. This process continues until the output layer is reached.
3. Loss Calculation:
The output of the neural network is compared to the true or target values (ground truth) associated with the input data.
A loss function (or cost function) is used to quantify the difference between the predicted values and the true values.
Common loss functions include mean squared error (MSE) for regression tasks and cross-entropy for classification
tasks.
4. Backpropagation:
The backpropagation algorithm is used to calculate the gradients of the loss with respect to the network's parameters
(weights and biases).
Gradients represent the direction and magnitude of the change needed to minimize the loss.
5. Weight Updates:
An optimization algorithm, such as gradient descent or one of its variants, is used to update the weights and biases of
the network.
The gradients guide the direction of weight updates, with the goal of reducing the loss. The learning rate is a
hyperparameter that controls the step size during these updates.
6. Iterative Process:
Steps 2 to 5 are repeated iteratively for a specified number of epochs or until the loss converges to a satisfactory level.
During each iteration, the network's weights are adjusted incrementally to improve its performance on the training data.
7. Generalization:
As the neural network continues to learn, it aims to capture underlying patterns and relationships in the training data.
The goal is not only to perform well on the training data but also to generalize its learned knowledge to make accurate
predictions on unseen or test data.
The trained network is evaluated on a separate validation dataset to ensure it is not overfitting (i.e., memorizing the
training data) and can generalize to new examples.
Finally, the network's performance is assessed on a testing dataset to estimate its real-world predictive accuracy.
Neural networks learn by iteratively adjusting their internal parameters based on the gradients of the loss function, which guide
them toward minimizing prediction errors. The quality of the training data, the architecture of the network, the choice of loss
function, and the hyperparameters (e.g., learning rate) all play important roles in the learning process. The goal is to achieve a
well-trained network that can make accurate predictions on new, unseen data.
1. Initialization: The algorithm begins by initializing the model's parameters with some initial values, often randomly. These
parameters are the values that the algorithm seeks to optimize to minimize the cost function.
2. Forward Propagation: The model processes a batch of training data using the current parameter values and calculates the
predicted values (output) for each data point.
3. Cost Calculation: The cost function is then applied to compare the predicted values with the true target values for the data
points in the batch. The cost function quantifies the difference between the predictions and the actual values.
4. Gradient Calculation: The key step in gradient descent is computing the gradient of the cost function with respect to each
parameter. The gradient represents the direction and magnitude of the steepest ascent of the cost function.
5. Parameter Update: The parameters are updated using the gradients. The goal is to move the parameters in the opposite
direction of the gradient to reduce the cost function. This update is performed iteratively for each parameter.
6. Iteration: Steps 2 to 5 are repeated for a fixed number of iterations (epochs) or until the cost function converges to a
minimum. During each iteration, the parameters are updated to gradually reduce the cost.
Gradient descent aims to find the set of parameters that minimizes the cost function, which represents the model's performance
on the training data. It's an iterative process where the algorithm gradually converges towards the optimal parameter values.
There are variations of gradient descent, such as stochastic gradient descent (SGD), mini-batch gradient descent, and others,
which adapt the way in which data is processed and parameters are updated. These variations are often used to make training
more efficient and to handle large datasets.
The choice of learning rate and the specific gradient descent variant can impact the algorithm's convergence and effectiveness,
so fine-tuning these hyperparameters is an important part of model training.
Backpropagation
Neural networks and deep learning
1. Input: The perceptron takes multiple binary or continuous input values, each of which is assigned a weight. These input
values could represent features of the data you're working with.
2. Weighted Sum: The inputs are multiplied by their respective weights, and the results are summed up. This is known as the
weighted sum. Mathematically, this can be represented as:
weighted_sum = (input_1 * weight_1) + (input_2 * weight_2) + ... + (input_n * weight_n)
3. Activation Function: The weighted sum is then passed through an activation function. The purpose of the activation
function is to introduce non-linearity into the model. In the case of a simple perceptron, the activation function is a step
function, which means that if the weighted sum is above a certain threshold, the perceptron "fires" and produces an output
of 1; otherwise, it produces an output of 0. Mathematically, this can be represented as:
output = 1 if weighted_sum >= threshold else 0
The perceptron's output can be thought of as a decision made by the model based on the inputs and their associated weights. It
can be used to classify data into two categories (e.g., binary classification problems), and it's particularly suited for linearly
separable data. In cases where data is not linearly separable, more complex neural network architectures like multilayer
perceptrons (MLPs) are used.
While a single perceptron has limitations, when you combine multiple perceptrons in layers, you create more powerful neural
networks capable of learning complex patterns and making more sophisticated decisions. These networks are the foundation of
modern deep learning and artificial neural networks.
1. Learning, 2. Acting
Experience replay
Action Selection Policies
L : loss
we want to make this
loss close to “0”.
To make it near “0”,
we Backpropagate
the loss and adjust
the weight.
7.
8.
9.
10.