Deep Learning
Deep Learning
Various paradigms of learning problems, Perspectives and Issues in deep learning framework, review of fundamental
learning techniques
Supervised learning is a type of machine learning algorithm that learns from labeled data. Labeled data is data that has
been tagged with a correct answer or classification.
Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Supervised learning is when
we teach or train the machine using data that is well-labelled. Which means some data is already tagged with the
correct answer. After that, the machine is provided with a new set of examples(data) so that the supervised learning
algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data.
For example, a labeled dataset of images of Elephant, Camel and Cow would have each image tagged with either
“Elephant” , “Camel”or “Cow.”
Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” ,
“disease” or “no disease”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is already tagged with the
correct answer.
1- Regression
Regression is a type of supervised learning that is used to predict continuous values, such as house prices, stock prices,
or customer churn. Regression algorithms learn a function that maps from the input features to the output value.
Polynomial Regression
2- Classification
Classification is a type of supervised learning that is used to predict categorical values, such as whether a customer will
churn or not, whether an email is spam or not, or whether a medical image shows a tumor or not. Classification
algorithms learn a function that maps from the input features to a probability distribution over the output classes.
Evaluating supervised learning models is an important step in ensuring that the model is accurate and generalizable.
There are a number of different metrics that can be used to evaluate supervised learning models, but some of the most
common ones include:
For Regression
Mean Squared Error (MSE): MSE measures the average squared difference between the predicted values and
the actual values. Lower MSE values indicate better model performance.
Root Mean Squared Error (RMSE): RMSE is the square root of MSE, representing the standard deviation of the
prediction errors. Similar to MSE, lower RMSE values indicate better model performance.
Mean Absolute Error (MAE): MAE measures the average absolute difference between the predicted values and
the actual values. It is less sensitive to outliers compared to MSE or RMSE.
R-squared (Coefficient of Determination): R-squared measures the proportion of the variance in the target
variable that is explained by the model. Higher R-squared values indicate better model fit.
For Classification
Accuracy: Accuracy is the percentage of predictions that the model makes correctly. It is calculated by dividing
the number of correct predictions by the total number of predictions.
Precision: Precision is the percentage of positive predictions that the model makes that are actually correct. It is
calculated by dividing the number of true positives by the total number of positive predictions.
Recall: Recall is the percentage of all positive examples that the model correctly identifies. It is calculated by
dividing the number of true positives by the total number of positive examples.
F1 score: The F1 score is a weighted average of precision and recall. It is calculated by taking the harmonic mean
of precision and recall.
Confusion matrix: A confusion matrix is a table that shows the number of predictions for each class, along with
the actual class labels. It can be used to visualize the performance of the model and identify areas where the
model is struggling.
Applications of Supervised learning
Spam filtering: Supervised learning algorithms can be trained to identify and classify spam emails based on their
content, helping users avoid unwanted messages.
Image classification: Supervised learning can automatically classify images into different categories, such as
animals, objects, or scenes, facilitating tasks like image search, content moderation, and image-based product
recommendations.
Medical diagnosis: Supervised learning can assist in medical diagnosis by analyzing patient data, such as medical
images, test results, and patient history, to identify patterns that suggest specific diseases or conditions.
Fraud detection: Supervised learning models can analyze financial transactions and identify patterns that
indicate fraudulent activity, helping financial institutions prevent fraud and protect their customers.
Natural language processing (NLP): Supervised learning plays a crucial role in NLP tasks, including sentiment
analysis, machine translation, and text summarization, enabling machines to understand and process human
language effectively.
Supervised learning allows collecting data and produces data output from previous experiences.
Supervised machine learning helps to solve various types of real-world computation problems.
We have complete control over choosing the number of classes we want in the training data.
Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
Unsupervised learning is a type of machine learning that learns from unlabeled data. This means that the data does not
have any pre-existing labels or categories. The goal of unsupervised learning is to discover patterns and relationships in
the data without any explicit guidance.
Unsupervised learning is the training of a machine using information that is neither classified nor labeled and allowing
the algorithm to act on that information without guidance. Here the task of the machine is to group unsorted
information according to similarities, patterns, and differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the machine. Therefore the
machine is restricted to find the hidden structure in unlabeled data by itself.
You can use unsupervised learning to examine the animal data that has been gathered and distinguish between several
groups according to the traits and actions of the animals. These groupings might correspond to various animal species,
providing you to categorize the creatures without depending on labels that already exist.
Key Points
Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
Clustering algorithms group similar data points together based on their inherent characteristics.
Feature extraction captures essential information from the data, enabling the model to make meaningful
distinctions.
Label association assigns categories to the clusters based on the extracted patterns and characteristics.
Example
Imagine you have a machine learning model trained on a large dataset of unlabeled images, containing both dogs and
cats. The model has never seen an image of a dog or cat before, and it has no pre-existing labels or categories for these
animals. Your task is to use unsupervised learning to identify the dogs and cats in a new, unseen image.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as ‘dogs and cats ‘. But it can
categorize them according to their similarities, patterns, and differences, i.e., we can easily categorize the above picture
into two parts. The first may contain all pics having dogs in them and the second part may contain all pics having cats in
them. Here you didn’t learn anything before, which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was previously undetected. It mainly
deals with unlabelled data.
Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as
grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that describe large
portions of your data, such as people that buy X also tend to buy Y.
Clustering
Clustering is a type of unsupervised learning that is used to group similar data points together. Clustering
algorithms work by iteratively moving data points closer to their cluster centers and further away from data points in
other clusters.
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
Anomaly detection: Unsupervised learning can identify unusual patterns or deviations from normal behaviour in
data, enabling the detection of fraud, intrusion, or system failures.
Scientific discovery: Unsupervised learning can uncover hidden relationships and patterns in scientific data,
leading to new hypotheses and insights in various scientific fields.
Recommendation systems: Unsupervised learning can identify patterns and similarities in user behaviour and
preferences to recommend products, movies, or music that align with their interests.
Customer segmentation: Unsupervised learning can identify groups of customers with similar characteristics,
allowing businesses to target marketing campaigns and improve customer service more effectively.
Image analysis: Unsupervised learning can group images based on their content, facilitating tasks such as image
classification, object detection, and image retrieval.
Unsupervised learning can help you gain insights from unlabeled data that you might not have been able to get
otherwise.
Unsupervised learning is good at finding patterns and relationships in data without being told what to look for.
This can help you learn new things about your data.
Difficult to measure accuracy or effectiveness due to lack of predefined answers during training.
The user needs to spend time interpreting and label the classes which follow that classification.
Unsupervised learning can be sensitive to data quality, including missing values, outliers, and noisy data.
Without labeled data, it can be difficult to evaluate the performance of unsupervised learning models, making it
challenging to assess their effectiveness.
Training data Use training data to infer model. No training data is used.
Model We can test our model. We can not test our model.
Reinforcement learning
Reinforcement Learning (RL) is a branch of machine learning focused on making decisions to maximize cumulative
rewards in a given situation. Unlike supervised learning, which relies on a training dataset with predefined answers,
RL involves learning through experience. In RL, an agent learns to achieve a goal in an uncertain, potentially complex
environment by performing actions and receiving feedback through rewards or penalties.
RL operates on the principle of learning optimal behaviour through trial and error. The agent takes actions within the
environment, receives rewards or penalties, and adjusts its behaviour to maximize the cumulative reward. This
learning process is characterized by the following elements:
Policy: A strategy used by the agent to determine the next action based on the current state.
Reward Function: A function that provides a scalar feedback signal based on the state and action.
Value Function: A function that estimates the expected cumulative reward from a given state.
Model of the Environment: A representation of the environment that helps in planning by predicting future
states and rewards.
The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to
find the best possible path to reach the reward. The following problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond
and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path
which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong step
will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that is the
diamond.
Input: The input should be an initial state from which the model will start
Output: There are many possible outputs as there are a variety of solutions to a particular problem
Training: The training is based upon the input, The model will return a state and the user will decide to
reward or punish the model based on its output.
Deep learning, a branch of artificial intelligence, uses neural networks to analyze and learn from large datasets. It
powers advancements in image recognition, natural language processing, and autonomous systems. Despite its
impressive capabilities, deep learning is not without its challenges. It includes issues such as data quality,
computational demands, and model interpretability are common obstacles.
Deep learning faces significant challenges such as data quality, computational demands, and model interpretability.
This article explores Deep Learning Challenges and strategies to address them effectively. Understanding these
challenges and finding ways to overcome them is crucial for successful implementation.
Artificial Neural Networks (ANNs) have revolutionized the field of machine learning, offering powerful tools for pattern
recognition, classification, and predictive modeling. Among the various types of neural networks, the Feedforward
Neural Network (FNN) is one of the most fundamental and widely used. In this article, we will explore the structure,
functioning, and applications of Feedforward Neural Networks, providing a comprehensive understanding of this
essential machine learning model.
A Feedforward Neural Network (FNN) is a type of artificial neural network where connections between the nodes do not
form cycles. This characteristic differentiates it from recurrent neural networks (RNNs). The network consists of an input
layer, one or more hidden layers, and an output layer. Information flows in one direction—from input to output—hence
the name "feedforward."
1. Input Layer: The input layer consists of neurons that receive the input data. Each neuron in the input layer
represents a feature of the input data.
2. Hidden Layers: One or more hidden layers are placed between the input and output layers. These layers are
responsible for learning the complex patterns in the data. Each neuron in a hidden layer applies a weighted sum
of inputs followed by a non-linear activation function.
3. Output Layer: The output layer provides the final output of the network. The number of neurons in this layer
corresponds to the number of classes in a classification problem or the number of outputs in a regression
problem.
Each connection between neurons in these layers has an associated weight that is adjusted during the training process
to minimize the error in predictions.
Activation Functions
Activation functions introduce non-linearity into the network, enabling it to learn and model complex data patterns.
Common activation functions include:
Sigmoid: σ(x)=σ(x)=11+e−xσ(x)=1+e−x1
Tanh: tanh(x)=ex−e−xex+e−xtanh(x)=ex+e−xex−e−x
Sigmoid Function
It is a function which is plotted as ‘S’ shaped graph.
Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep. This means, small
changes in x would also bring about large changes in the value of Y.
Value Range : 0 to 1
Uses : Usually used in output layer of a binary classification, where result is either 0 or 1, as value for sigmoid
function lies between 0 and 1 only so, result can be predicted easily to be 1 if value is greater
than 0.5 and 0 otherwise.
Tanh Function
The activation that works almost always better than sigmoid function is Tanh function also known as Tangent
Hyperbolic function. It’s actually mathematically shifted version of the sigmoid function. Both are similar and
can be derived from each other.
Equation :-
f(x) = tanh(x) = 2/(1 + e-2x) – 1
OR
tanh(x) = 2 * sigmoid(2x) – 1
Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values lies between -1 to 1 hence the mean for
the hidden layer comes out be 0 or very close to it, hence helps in centering the data by bringing mean close to
0. This makes learning for the next layer much easier.
RELU Function
It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly implemented in hidden
layers of Neural network.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical
operations. At a time only a few neurons are activated making the network sparse making it efficient and easy
for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
Softmax Function
The softmax function is also a type of sigmoid function but is handy when we are trying to handle multi- class
classification problems.
Nature :- non-linear
Uses :- Usually used when trying to handle multiple classes. the softmax function was commonly found in the
output layer of image classification problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
Output:- The softmax function is ideally used in the output layer of the classifier where we are actually trying to
attain the probabilities to define the class of each input.
The basic rule of thumb is if you really don’t know what activation function to use, then simply use RELU as it is
a general activation function in hidden layers and is used in most cases these days.
If your output is for binary classification then, sigmoid function is very natural choice for output layer.
If your output is for multi-class classification then, Softmax is very useful to predict the probabilities of each
classes.
Training a Feedforward Neural Network involves adjusting the weights of the neurons to minimize the error between
the predicted output and the actual output. This process is typically performed using backpropagation and gradient
descent.
1. Forward Propagation: During forward propagation, the input data passes through the network, and the output
is calculated.
2. Loss Calculation: The loss (or error) is calculated using a loss function such as Mean Squared Error (MSE) for
regression tasks or Cross-Entropy Loss for classification tasks.
3. Backpropagation: In backpropagation, the error is propagated back through the network to update the weights.
The gradient of the loss function with respect to each weight is calculated, and the weights are adjusted using
gradient descent.
Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the loss function by iteratively updating the weights in
the direction of the negative gradient. Common variants of gradient descent include:
Batch Gradient Descent: Updates weights after computing the gradient over the entire dataset.
Stochastic Gradient Descent (SGD): Updates weights for each training example individually.
Mini-batch Gradient Descent: Updates weights after computing the gradient over a small batch of training
examples.
Accuracy: The proportion of correctly classified instances out of the total instances.
Precision: The ratio of true positive predictions to the total predicted positives.
F1 Score: The harmonic mean of precision and recall, providing a balance between the two.
Confusion Matrix: A table used to describe the performance of a classification model, showing the true
positives, true negatives, false positives, and false negatives.
This code demonstrates the process of building, training, and evaluating a neural network model using TensorFlow and
Keras to classify handwritten digits from the MNIST dataset. Initially, the MNIST dataset is loaded and normalized by
scaling the pixel values to the range [0, 1]. The model architecture is defined using the Sequential API, consisting of a
Flatten layer to convert the 2D image input into a 1D array, followed by a Dense layer with 128 neurons and ReLU
activation, and a final Dense layer with 10 neurons and softmax activation to output probabilities for each digit class.
The model is compiled with the Adam optimizer, SparseCategoricalCrossentropy loss function, and
SparseCategoricalAccuracy metric. The model is then trained for 5 epochs on the training data. Finally, the model's
performance is evaluated on the test set, and the test accuracy is printed.
Python
import tensorflow as tf
mnist = tf.keras.datasets.mnist
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
model.compile(optimizer=Adam(),
loss=SparseCategoricalCrossentropy(),
metrics=[SparseCategoricalAccuracy()])
Let’s understand how errors are calculated and weights are updated in backpropagation networks(BPNs).
The network in the above figure is a simple multi-layer feed-forward network or backpropagation network. It contains
three layers, the input layer with two neurons x1 and x2, the hidden layer with two neurons z1 and z2 and the output
layer with one neuron yin.
Now let’s write down the weights and bias vectors for each neuron.
Here since it is the input layer only the input values are present.
Here v11 refers to the weight of first input x1 on z1, v21 refers to the weight of second input x2 on z1 and v01 refers to the
bias value on z1.
Here v12 refers to the weight of first input x1 on z2, v22 refers to the weight of second input x2 on z2 and v02 refers to the
bias value on z2.
Here w11 refers to the weight of first neuron z1 in a hidden layer on yin, w21 refers to the weight of second neuron z2 in a
hidden layer on yin and w01 refers to the bias value on yin. Let’s consider three variables, k which refers to the neurons in
the output layer, ‘j’ which refers to the neurons in the hidden layer and ‘i’ which refers to the neurons in the input layer.
Therefore,
k=1
Conditions/Constraints:
Target value, t = 1
2. Backpropagation of errors, i.e., between output and hidden layer, hidden and input layer.
3. Updating weights.
Step 1:
The value y is calculated by finding yin and applying the activation function.
Here, z1 and z2 are the values from hidden layer, calculated by finding zin1, zin2 and applying activation function to them.
From (4)
zin1 = 0.2
z1 = 0.5498
From (5)
zin2 = 0.5 + 0*(-0.3) + 1*0.4
zin2 = 0.9
z2 = 0.7109
From (3)
yin = 0.0910
y = 0.5227
Here, y is not equal to the target ‘t’, which is 1. And we proceed to calculate the errors and then update weights from
them in order to achieve the target value.
Step 2:
Error between output and hidden layer is represented as δk, where k represents the neurons in output layer as
mentioned above. The error is calculated as:
f'(yin) = 0.2495
Therefore,
Note: (Target – Output) i.e., (t – y) is the error in the output not in the layer. Error in a layer is contributed by different
factors like weights and bias.
Error between hidden and input layer is represented as δj, where j represents the number of neurons in the hidden layer
as mentioned above. The error is calculated as:
where,
δinj = ∑k=1 to n (δk * wjk) eq. (9)
As j = 1, 2, we will have one error values for each neuron and total of 2 errors values.
δin1 = 0.04764
f'(zin1) = 0.2475
Substituting in (12)
δin2 = 0.0119
f'(zin2) = 0.2055
Substituting in (13)
The errors have been calculated, the weights have to be updated using these error values.
Step 3:
w01(new) = w01(old) + Δw01 = (-0.2) + α * δ * bias = (-0.2) + 0.25 * 0.1191 * 1 = -0.1709, kindly note the 1 taken here is
input considered for bias as per the conditions.
v01(new) = v01(old) + Δv01 = 0.3 + α * δ1 * bias = 0.3 + 0.25 * 0.0118 * 1 = 0.00295, kindly note the 1 taken here is input
considered for bias as per the conditions.
v02(new) = v02(old) + Δv02 = 0.5 + α * δ2 * bias = 0.5 + 0.25 * 0.00245 * 1 = 0.500612, kindly note the 1 taken here is input
considered for bias as per the conditions.
These three steps are repeated until the output ‘y’ is equal to the target ‘t’.
This is how the BPNs work. The backpropagation in BPN refers to that the error in the present layer is used to update
weights between the present and previous layer by backpropagating the error values.
As you read this article, which organ in your body is thinking about it? It’s the brain of course! But do you know how the
brain works? Well, it has neurons or nerve cells that are the primary units of both the brain and the nervous system.
These neurons receive sensory input from the outside world which they process and then provide the output which
might act as the input to the next neuron.
Each of these neurons is connected to other neurons in complex arrangements at synapses. Now, are you wondering
how this is related to Artificial Neural Networks ? Let’s check out what they are in detail and how they learn
information.
Well, Artificial Neural Networks are modeled after the neurons in the human brain. If you want to gain practical skills in
Artificial Neural Networks and explore their diverse applications through our interactive live data science course ,
perfect for aspiring data scientists.
Artificial Neural Networks
Artificial Neural Networks contain artificial neurons which are called units . These units are arranged in a series of layers
that together constitute the whole Artificial Neural Network in a system. A layer can have only a dozen units or millions
of units as this depends on how the complex neural networks will be required to learn the hidden patterns in the
dataset. Commonly, Artificial Neural Network has an input layer, an output layer as well as hidden layers. The input
layer receives data from the outside world which the neural network needs to analyze or learn about. Then this data
passes through one or multiple hidden layers that transform the input into data that is valuable for the output layer.
Finally, the output layer provides an output in the form of a response of the Artificial Neural Networks to input data
provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these connections has
weights that determine the influence of one unit on another unit. As the data transfers from one unit to another, the
neural network learns more and more about the data which eventually results in an output from the output layer.
The structures and operations of human neurons serve as the basis for artificial neural networks. It is also known as
neural networks or neural nets. The input layer of an artificial neural network is the first layer, and it receives input from
external sources and releases it to the hidden layer, which is the second layer. In the hidden layer, each neuron receives
input from the previous layer neurons, computes the weighted sum, and sends it to the neurons in the next layer. These
connections are weighted means effects of the inputs from the previous layer are optimized more or less by assigning
different-different weights to each input and it is adjusted during the training process by optimizing these weights for
improved model performance.
The concept of artificial neural networks comes from biological neurons found in animal brains So they share a lot of
similarities in structure and function wise.
Structure : The structure of artificial neural networks is inspired by biological neurons. A biological neuron has a
cell body or soma to process the impulses, dendrites to receive them, and an axon that transfers them to other
neurons. The input nodes of artificial neural networks receive input signals, the hidden layer nodes compute
these input signals, and the output layer nodes compute the final output by processing the hidden layer’s
results using activation functions.
Biological Neuron Artificial Neuron
Dendrite Inputs
Cell nucleus or
Nodes
Soma
Synapses Weights
Axon Output
Synapses : Synapses are the links between biological neurons that enable the transmission of impulses from
dendrites to the cell body. Synapses are the weights that join the one-layer nodes to the next-layer nodes in
artificial neurons. The strength of the links is determined by the weight value.
Learning : In biological neurons, learning happens in the cell body nucleus or soma, which has a nucleus that
helps to process the impulses. An action potential is produced and travels through the axons if the impulses are
powerful enough to reach the threshold. This becomes possible by synaptic plasticity, which represents the
ability of synapses to become stronger or weaker over time in reaction to changes in their activity. In artificial
neural networks, backpropagation is a technique used for learning, which adjusts the weights between nodes
according to the error or differences between predicted and actual outcomes.
Activation : In biological neurons, activation is the firing rate of the neuron which happens when the impulses
are strong enough to reach the threshold. In artificial neural networks, A mathematical function known as an
activation function maps the input to the output, and executes activations.
Artificial neural networks are trained using a training set. For example, suppose you want to teach an ANN to recognize
a cat. Then it is shown thousands of different images of cats so that the network can learn to identify a cat. Once the
neural network has been trained enough using images of cats, then you need to check if it can identify cat images
correctly. This is done by making the ANN classify the images it is provided by deciding whether they are cat images or
not. The output obtained by the ANN is corroborated by a human-provided description of whether the image is a cat
image or not. If the ANN identifies incorrectly then back-propagation is used to adjust whatever it has learned during
training. Backpropagation is done by fine-tuning the weights of the connections in ANN units based on the error rate
obtained. This process continues until the artificial neural network can correctly recognize a cat in an image with
minimal possible error rates.
Feedforward Neural Network : The feedforward neural network is one of the most basic artificial neural
networks. In this ANN, the data or the input provided travels in a single direction. It enters into the ANN through
the input layer and exits through the output layer while hidden layers may or may not exist. So the feedforward
neural network has a front-propagated wave only and usually does not have backpropagation.
Convolutional Neural Network : A Convolutional neural network has some similarities to the feed-forward
neural network, where the connections between units have weights that determine the influence of one unit on
another unit. But a CNN has one or more than one convolutional layer that uses a convolution operation on the
input and then passes the result obtained in the form of output to the next layer. CNN has applications in
speech and image processing which is particularly useful in computer vision.
Modular Neural Network: A Modular Neural Network contains a collection of different neural networks that
work independently towards obtaining the output with no interaction between them. Each of the different
neural networks performs a different sub-task by obtaining unique inputs compared to other networks. The
advantage of this modular neural network is that it breaks down a large and complex computational process
into smaller components, thus decreasing its complexity while still obtaining the required output.
Radial basis function Neural Network: Radial basis functions are those functions that consider the distance of a
point concerning the center. RBF functions have two layers. In the first layer, the input is mapped into all the
Radial basis functions in the hidden layer and then the output layer computes the output in the next step. Radial
basis function nets are normally used to model the data that represents any underlying trend or function.
Recurrent Neural Network: The Recurrent Neural Network saves the output of a layer and feeds this output
back to the input to better predict the outcome of the layer. The first layer in the RNN is quite similar to the
feed-forward neural network and the recurrent neural network starts once the output of the first layer is
computed. After this layer, each unit will remember some information from the previous step so that it can act
as a memory cell in performing computations.
1. Social Media: Artificial Neural Networks are used heavily in Social Media. For example, let’s take the ‘People
you may know’ feature on Facebook that suggests people that you might know in real life so that you can send
them friend requests. Well, this magical effect is achieved by using Artificial Neural Networks that analyze your
profile, your interests, your current friends, and also their friends and various other factors to calculate the
people you might potentially know. Another common application of Machine Learning in social media is facial
recognition . This is done by finding around 100 reference points on the person’s face and then matching them
with those already available in the database using convolutional neural networks.
2. Marketing and Sales: When you log onto E-commerce sites like Amazon and Flipkart, they will recommend your
products to buy based on your previous browsing history. Similarly, suppose you love Pasta, then Zomato,
Swiggy, etc. will show you restaurant recommendations based on your tastes and previous order history. This is
true across all new-age marketing segments like Book sites, Movie services, Hospitality sites, etc. and it is done
by implementing personalized marketing . This uses Artificial Neural Networks to identify the customer likes,
dislikes, previous shopping history, etc., and then tailor the marketing campaigns accordingly.
3. Healthcare : Artificial Neural Networks are used in Oncology to train algorithms that can identify cancerous
tissue at the microscopic level at the same accuracy as trained physicians. Various rare diseases may manifest in
physical characteristics and can be identified in their premature stages by using Facial Analysis on the patient
photos. So the full-scale implementation of Artificial Neural Networks in the healthcare environment can only
enhance the diagnostic abilities of medical experts and ultimately lead to the overall improvement in the quality
of medical care all over the world.
4. Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and also heard them based on the
phones you have!!! These are personal assistants and an example of speech recognition that uses Natural
Language Processing to interact with the users and formulate a response accordingly. Natural Language
Processing uses artificial neural networks that are made to handle many tasks of these personal assistants such
as managing the language syntax, semantics, correct speech, the conversation that is going on, etc.
An activation function in the context of neural networks is a mathematical function applied to the output of a neuron.
The purpose of an activation function is to introduce non-linearity into the model, allowing the network to learn and
represent complex patterns in the data. Without non-linearity, a neural network would essentially behave like a linear
regression model, regardless of the number of layers it has.
The activation function decides whether a neuron should be activated or not by calculating the weighted sum and
further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a
neuron.
Explanation: We know, the neural network has neurons that work in correspondence with weight, bias, and their
respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis
of the error at the output. This process is known as back-propagation. Activation functions make the back-propagation
possible since the gradients are supplied along with the error to update the weights and biases.