CMR University School of Engineering and Technology Department of Cse and It
CMR University School of Engineering and Technology Department of Cse and It
Reg.No: 16UG08003
Section: A
Ans.
Backpropagation Through Time (BPTT) is the algorithm that is used to update the weights in the
recurrent neural network. One of the common examples of a recurrent neural network is LSTM.
Backpropagation is an essential skill that you should know if you want to effectively frame
sequence prediction problems for the recurrent neural network. You should also be aware of the
effects of the Backpropagation Through time on the stability, the speed of the system while
training the system.
- First, give the sequence of, say K1 time steps of input and output pairs to the network.
- Then calculate and accumulate the errors across say, k2 time steps by unrolling the network
- Finally, update the weights by rolling up the network.
Ans. Hessian-free optimization (aka truncated Newton) is a 2nd order optimization approach
which builds on two key ideas-
1) Approximating the error function by a 2nd order quadratic at each step
2) Using Conjugate Gradient to solve the optimization problem
The error function at each point is approximated by a 2nd order quadratic. 2nd order derivatives
in the matrix world are called Hessians, from where the term Hessian comes. Once you
formulate it as a 2nd order quadratic , you can solve it as an optimization problem to find the best
search direction to move in.
Using CG instead of gradient descent:
In the method of steepest descent you start at an arbitrary point and then slide down to the
bottom of the error surface until you think you’re close enough to the local minima.
You might end up taking small steps in the same direction as few steps back. This means that
you need to take a lot of steps to reach the minima. In contrast in C.G. you take exactly one step
in each of the orthogonal directions required to reach the minima.
A simpler way to understand it is, think of the initial error between the starting point and minima
be a sum of n orthogonal components, then, in the method of C.G. each step eliminates one of
these errors. Well this is a just a simple way of understanding C.G method, I’ve missed a lot of
details but the main advantage of using C.G is that it often leads to faster convergence.
The above two techniques form the gist of the Newton method. The truncated Newton (Hessian-
free optimization) then simply approximates some expensive steps required in the Newton
method to make it faster.
4. a) Explain the reading of advanced SOM using the voting pattern for decision
Ans. This SOM displays voting patterns inside the United States Congress, with each sub-SOM
showing the patterns for a specific act or decision.
Without any supervision as usual, the SOM was able to monitor voting patterns among Congress
members, and based on these patterns, it was able to divide the members into groups based on
their partisan affiliations.
Ans. K-means is one of the simplest unsupervised learning algorithms that solve the well-known
clustering problem. The procedure follows a simple and easy way to classify a given data set
through a certain number of clusters (assume k clusters) and has a low computational cost.The
shortcoming of k-means is that the value of K(number of groups/clusters) must be determined
beforehand. K-means is a greedy algorithm and is hard to attain the global optimum clustering
results.
In K-means the nodes (centroids) are independent of each other, clusters are formed through
centroid(nodes) and cluster size.Whereas in SOM(Self Organizing Maps), the number of neurons
of the output layer has a close relationship with the class number in the input stack. In this, the
clusters are formed geometrically.From the performance point of view, the K-means algorithm
performs better than SOM if the number of clusters increases. K-means is more sensitive to the
noise present in the dataset compared to SOM
Ans. In case of any neural network algorithms the main building blocks for SOFM are neurons.
Each neuron typically connected to some other neurons, but number of this connections is small.
Each neuron connected just to a few other neurons that are called as close neighbors. There are
many ways to arrange these connections, but the most common one is to arrange them into two-
dimensional grid.
-Each blue dot in the image is neuron and line between two neurons means that they are
connected. We call this arrangement of neurons grid.
-Each neuron in the grid has two properties: position and connections to other neurons. We
define connections before we start network training and position is the only thing that changes
during the training. There are many ways to initialize position for the neurons, but the easiest one
is just to do it randomly. After this initialization grid with more training iteration problem can be
solved.
-In each training iteration we introduce some data point and we try to find neuron that closest to
this point. Neuron that closest to this point can be called as neuron winner. But, instead of
updating position of this neuron we find its neighbors.
-Note that it is not the same as closest neighbors. Before training we specify special parameter
-known as learning radius. It defines the radius within which we consider other neuron as a
neighbors.
-On the image below you can see the same grid as before with neuron in center that we marked
as a winner. You can see in the pictures that larger radius includes more neurons.
And at the end of the iteration we update our neuron winner and its neighbors positions. We
change their position by pushing closer to the data point that we used to find neuron winner. We
“push” winner neuron much closer to the data point compared to the neighbor neurons. In fact,
the further the neighbors the less “push” it get’s towards the data point. You can see how we
update neurons on the image below with different learning radius parameters.