0% found this document useful (0 votes)
52 views

CMR University School of Engineering and Technology Department of Cse and It

The document provides details of an online CIE exam for the subject "Deep Learning - 4BCS803" conducted on 15/04/2020 from 9:00AM to 10:30AM. The exam was for student Ajay Singh Negi with registration number 16UG08003 studying in the Computer Science branch of the CMR University School of Engineering and Technology department of CSE and IT.

Uploaded by

AJAY SINGH NEGI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

CMR University School of Engineering and Technology Department of Cse and It

The document provides details of an online CIE exam for the subject "Deep Learning - 4BCS803" conducted on 15/04/2020 from 9:00AM to 10:30AM. The exam was for student Ajay Singh Negi with registration number 16UG08003 studying in the Computer Science branch of the CMR University School of Engineering and Technology department of CSE and IT.

Uploaded by

AJAY SINGH NEGI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

CMR UNIVERSITY

SCHOOL OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF CSE AND IT

DEEP LEARNING – 4BCS803

ONLINE CIE IAT - 1

ON 15/04/2020 AT 9:00AM TO 10:30AM

Name: AJAY SINGH NEGI

Reg.No: 16UG08003

Section: A

Branch: COMPUTER SCIENCE


2. a) Explain the training of RNN with back propagation algorithm

Ans.

Backpropagation Through Time (BPTT) is the algorithm that is used to update the weights in the
recurrent neural network. One of the common examples of a recurrent neural network is LSTM.
Backpropagation is an essential skill that you should know if you want to effectively frame
sequence prediction problems for the recurrent neural network. You should also be aware of the
effects of the Backpropagation Through time on the stability, the speed of the system while
training the system.

The basic Truncated Backpropagation algorithm is

- First, give the sequence of, say K1 time steps of input and output pairs to the network.
- Then calculate and accumulate the errors across say, k2 time steps by unrolling the network
- Finally, update the weights by rolling up the network.

b) Describe Hessian Free Optimization

Ans. Hessian-free optimization (aka truncated Newton) is a 2nd order optimization approach
which builds on two key ideas-
1) Approximating the error function by a 2nd order quadratic at each step
2) Using Conjugate Gradient to solve the optimization problem

Approximating the Error:

The error function at each point is approximated by a 2nd order quadratic. 2nd order derivatives
in the matrix world are called Hessians, from where the term Hessian comes. Once you
formulate it as a 2nd order quadratic , you can solve it as an optimization problem to find the best
search direction to move in.
Using CG instead of gradient descent:

In the method of steepest descent you start at an arbitrary point and then slide down to the
bottom of the error surface until you think you’re close enough to the local minima.

You might end up taking small steps in the same direction as few steps back. This means that
you need to take a lot of steps to reach the minima. In contrast in C.G. you take exactly one step
in each of the orthogonal directions required to reach the minima.

A simpler way to understand it is, think of the initial error between the starting point and minima
be a sum of n orthogonal components, then, in the method of C.G. each step eliminates one of
these errors. Well this is a just a simple way of understanding C.G method, I’ve missed a lot of
details but the main advantage of using C.G is that it often leads to faster convergence.

The above two techniques form the gist of the Newton method. The truncated Newton (Hessian-
free optimization) then simply approximates some expensive steps required in the Newton
method to make it faster.

4. a) Explain the reading of advanced SOM using the voting pattern for decision

Ans. This SOM displays voting patterns inside the United States Congress, with each sub-SOM
showing the patterns for a specific act or decision.
Without any supervision as usual, the SOM was able to monitor voting patterns among Congress
members, and based on these patterns, it was able to divide the members into groups based on
their partisan affiliations.

SOM 1x1: Clusters


The first SOM displaying the clusters actually differs from the "Party" SOM in the same row.
You would expect the clusters to be divided along partisan lines, but apparently, the SOM has
detected some overlapping in the voting patterns of members from both parties.
The "Clusters" SOM is based on the sum of all the votes on all these different issues, and so it's
very likely to find members of the Democratic Party siding with the Republicans and vice versa.

SOM 1x2: Unified Distance Matrix


The Unified Distance Matrix, or the U-matrix, represents the distances between the points or
nodes on the SOM. The dark parts in that matrix represent the parts of the map where the nodes
are far away from each other, and the lighter parts represent less distances between the nodes.
Notice that the darkest parts are the same parts where the clusters are divided, which logically
are the parts with the widest distances between the nodes.

SOM 1x4: Bankruptcy/Abuse Prevention


In this map, you can see that the border between the two sides is drawn a bit further to the right
from where it appears in the "Clusters" SOM. That means that more people took the Republican
side, including members who appear on the Democratic Party's side in the "Clusters" map.
SOMs make it possible for you to visually detect these deviations without having to monitor all
the data.

SOM 1x5: Border Protection/Anti-Terrorism


A lot of members from the Republican cluster appear to have taken the Democratic side which
interestingly deviates at the middle more than anywhere else.

SOM 1x6: Broadcast Decency Enforcement


What you see in this map is one of the rare incidents of cross-party unity in the US Congress.
Almost the entire Democratic bloc appears to have turned conservative on that particular issue.
We can find that the simplicity of SOMs facilitates various forms of implementation. These are
all forms of SOMs.
b) Compare the K-means clustering with Self Organizing Maps.

Ans. K-means is one of the simplest unsupervised learning algorithms that solve the well-known
clustering problem. The procedure follows a simple and easy way to classify a given data set
through a certain number of clusters (assume k clusters) and has a low computational cost.The
shortcoming of k-means is that the value of K(number of groups/clusters) must be determined
beforehand. K-means is a greedy algorithm and is hard to attain the global optimum clustering
results.

In K-means the nodes (centroids) are independent of each other,  clusters are formed through
centroid(nodes) and cluster size.Whereas in SOM(Self Organizing Maps), the number of neurons
of the output layer has a close relationship with the class number in the input stack. In this, the
clusters are formed geometrically.From the performance point of view, the K-means algorithm
performs better than SOM if the number of clusters increases. K-means is more sensitive to the
noise present in the dataset compared to SOM

6. Explain the intuitions of SOM.

Ans. In case of any neural network algorithms the main building blocks for SOFM are neurons.
Each neuron typically connected to some other neurons, but number of this connections is small.
Each neuron connected just to a few other neurons that are called as close neighbors. There are
many ways to arrange these connections, but the most common one is to arrange them into two-
dimensional grid.

-Each blue dot in the image is neuron and line between two neurons means that they are
connected. We call this arrangement of neurons grid.
-Each neuron in the grid has two properties: position and connections to other neurons. We
define connections before we start network training and position is the only thing that changes
during the training. There are many ways to initialize position for the neurons, but the easiest one
is just to do it randomly. After this initialization grid with more training iteration problem can be
solved.
-In each training iteration we introduce some data point and we try to find neuron that closest to
this point. Neuron that closest to this point can be called as neuron winner. But, instead of
updating position of this neuron we find its neighbors.
-Note that it is not the same as closest neighbors. Before training we specify special parameter
-known as learning radius. It defines the radius within which we consider other neuron as a
neighbors.
-On the image below you can see the same grid as before with neuron in center that we marked
as a winner. You can see in the pictures that larger radius includes more neurons.

And at the end of the iteration we update our neuron winner and its neighbors positions. We
change their position by pushing closer to the data point that we used to find neuron winner. We
“push” winner neuron much closer to the data point compared to the neighbor neurons. In fact,
the further the neighbors the less “push” it get’s towards the data point. You can see how we
update neurons on the image below with different learning radius parameters.

You might also like