0% found this document useful (0 votes)
18 views

MODULE 2 Deep Learning

Computational graphs are used to represent mathematical expressions and calculations in deep learning models. Static graphs are optimized offline while dynamic graphs are more adaptable. Layers are combined into blocks to form neural networks, with shallow networks using one hidden layer and deep networks using more. Optimization techniques like gradient descent are used to minimize loss during training, addressing issues like local minima, saddle points, and vanishing gradients.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

MODULE 2 Deep Learning

Computational graphs are used to represent mathematical expressions and calculations in deep learning models. Static graphs are optimized offline while dynamic graphs are more adaptable. Layers are combined into blocks to form neural networks, with shallow networks using one hidden layer and deep networks using more. Optimization techniques like gradient descent are used to minimize loss during training, addressing issues like local minima, saddle points, and vanishing gradients.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

MODULE 2

Computational Graphs
Layers and Blocks, shallow neural network, deep
neural network, Optimization for training Deep
Models, self-organizing maps, Case study
Computational graphs
• Computational graphs are a type of graph that can be used to
represent mathematical expressions. This is similar to descriptive
language in the case of deep learning models, providing a functional
description of the required computation.

• In general, the computational graph is a directed graph that is used


for expressing and evaluating mathematical expressions.

• These can be used for two different types of calculations:


1.Forward computation
2.Backward computation
key terminologies in computational
graphs
• A variable is represented by a node in a graph. It could be a
scalar, vector, matrix, tensor, or even another type of variable.
• A function argument and data dependency are both
represented by an edge. These are similar to node pointers.
• A simple function of one or more variables is called an
operation. There is a set of operations that are permitted.
Functions that are more complex than these operations in this
set can be represented by combining multiple operations.
.

EXAMPLE
Y = (a+b) * (b-c)
we have three operations, addition,
subtraction, and multiplication. To create a
computational graph, we create nodes, each
of them has different operations along with
input variables. The direction of the array
shows the direction of input being applied to
other nodes.
d = a+b
e = b-c
Y = d*e
Types of computational graphs:
• Type 1: Static Computational Graphs
• Involves two phases:-
• Phase 1:- Make a plan for your architecture.
• Phase 2:- To train the model and generate predictions, feed it a lot of
data.
• The benefit of utilizing this graph is that it enables powerful
offline graph optimization and scheduling. As a result, they
should be faster than dynamic graphs in general.
• The drawback is that dealing with structured and even
variable-sized data is unsightly.
Type 2: Dynamic Computational Graphs
• As the forward computation is performed, the graph is implicitly
defined.
• This graph has the advantage of being more adaptable. The library
is less intrusive and enables interleaved graph generation and
evaluation. The forward computation is implemented in your
preferred programming language, complete with all of its features
and algorithms. Debugging dynamic graphs is simple. Because it
permits line-by-line execution of the code and access to all
variables, finding bugs in your code is considerably easier. If you
want to employ Deep Learning for any genuine purpose in the
industry, this is a must-have feature.
• The disadvantage of employing this graph is that there is limited
time for graph optimization, and the effort may be wasted if the
graph does not change.
Layers and Blocks
•Layers are blocks.
•Many layers can comprise a block.
•Many blocks can comprise a block.
•A block can contain code.
•Blocks take care of lots of housekeeping, including
parameter initialization and backpropagation.
•Sequential concatenations of layers and blocks are
handled by the Sequential block.
Multiple layers are combined into blocks,
forming repeating patterns of larger models.
• A block could describe a single layer, a component consisting
of multiple layers, or the entire model itself! One benefit of
working with the block abstraction is that they can be combined
into larger artifacts, often recursively
Shallow Neural Network
• Shallow Neural Network: A neural network with only one
hidden layer, often used for simpler tasks or as a building block
for larger networks.
Optimization for training Deep Models
• optimization provides a way to minimize the loss function for deep
learning
• Minimizing the training error does not guarantee that we find the
best set of parameters to minimize the generalization error.
• The optimization problems may have many local minima.
• The problem may have even more saddle points, as generally the
problems are not convex.
• Vanishing gradients can cause optimization to stall. Often a
reparameterization of the problem helps. Good initialization of the
parameters can be beneficial, too
Goal of Optimization
optimization provides a way to minimize
the loss function for deep learning, in
essence, the goals of optimization and
deep learning are fundamentally different
There are many challenges in deep learning
optimization.
• Local Minima
• Saddle Points
• Vanishing Gradients
Local optima
• A local optimum is an extrema (maximum or minimum) point of the
objective function for a certain region of the input space. More
formally, for the minimization case x_{local} is a local minimum of the
objective function f if:

f(x) \geq f(x_{local})


for all values of x in range [x_{local} - \epsilon, x_{local} + \epsilon].
Global Optima
• A global optimum is the maximum or minimum value the objective
function can take in all the input space. More formally, for the
minimization case x_{global} is a global minimum of the objective
function f if:

f(x) > f(x_{global})

for all values of x.


Saddle Points
• A saddle point is any location where all gradients of a function vanish
but which is neither a global nor a local minimum.
Vanishing Gradients
The problem:
As more layers using certain activation
functions are added to neural networks, the
gradients of the loss function approaches
zero, making the network hard to train.
Why:
Certain activation functions, like the sigmoid
function, squishes a large input space into a
small input space between 0 and 1.
Therefore, a large change in the input of the
sigmoid function will cause a small change in
the output. Hence, the derivative becomes
small.
solution

The simplest solution is to use other


activation functions, such as ReLU, which
doesn’t cause a small derivative.
Residual networks are another solution, as
they provide residual connections straight to
earlier layers.

batch normalization layers can also resolve


the issue. As stated before, the problem
arises when a large input space is mapped to
a small one, causing the derivatives to
disappear
Self Organizing Map
• SOM is a type of Artificial Neural Network which is also
inspired by biological models of neural systems from the
1970s. It follows an unsupervised learning approach and
trained its network through a competitive learning algorithm.
• SOM is used for clustering and mapping (or dimensionality
reduction) techniques to map multidimensional data onto
lower-dimensional which allows people to reduce complex
problems for easy interpretation.
• SOM has two layers, one is the Input layer and the other one is
the Output layer.
The architecture of the Self Organizing Map with two
clusters and n input features of any sample is given
below:
SOM working
• Let’s say an input data of size (m, n) where m is the number of
training examples and n is the number of features in each
example.
• First, it initializes the weights of size (n, C) where C is the
number of clusters.
• Then iterating over the input data, for each training example, it
updates the winning vector (weight vector with the shortest
distance (e.g Euclidean distance) from training example).
Weight updation rule is given by :
wij = wij(old) + alpha(t) * (xik - wij(old))
• where alpha is a learning rate at time t, j denotes the winning
vector, i denotes the ith feature of training example and k
denotes the kth training example from the input data.
• After training the SOM network, trained weights are used for
clustering new examples. A new example falls in the cluster of
winning vectors.
Algorithm
Training:
• Step 1: Initialize the weights wij random value may be assumed. Initialize the
learning rate α.
• Step 2: Calculate squared Euclidean distance.
D(j) = Σ (wij – xi)^2 where i=1 to n and j=1 to m
• Step 3: Find index J, when D(j) is minimum that will be considered as winning
index.
• Step 4: For each j within a specific neighborhood of j and for all i, calculate the
new weight.
wij(new)=wij(old) + α[xi – wij(old)]
• Step 5: Update the learning rule by using :
α(t+1) = 0.5 * t
• Step 6: Test the Stopping Condition.
CASE STUDY on computational Graph
• IMAGE CLASSIFICATION USING CONVENTIONAL NEURAL NETWORKS
• GRAPH NEURAL NETWORK FOR SOCIAL NETWORK ANALYSIS
• SMART AGRICULTURE USING DEEP LEARNING
• IMPROVING NATURAL LANGUAGE PROCCESSING WITH COMPUTATIONAL
GRAPHS
• OPTIMIZING DRUG DISCOVERY WITH COMPUTATIONAL GRAPHS
• PREDICTING STOCK MARKET TRENDS USING DYNAMIC GRAPHS
• IDENTIFYING DEEP LEAKING IN A CONVENTIONAL NEURAL NETWORK
• DEEP LEARNING DATA VISUALIZATION USING ARCHITECTURAL DATA
• Natural Language Processing (NLP) Advancement.
CASE STUDY on Shallow neural network
• Binary classification with shallow neural network
• Exploring the power of shallow neural network
• Using Shallow neural network building a spam email classifier
• A Shallow neural network approach for the short term forecast of hourly
energy consumption
• Handwritten Digit Recognition with Shallow Neural Network
• Improving Customer churn prediction with shallow Neural Networks
• Shallow Neural Networks for sentiment analysis.
• Solving XOR Problem with shallow neural network
• Shallow neural networks in stock markets.

You might also like