National Institute of Technology Tiruchirappalli
Department of Computer Applications
RNN
Name: NIKHIL CHHIPA
Roll No: 205123058
MCA IV SEMESTER
Under the guidance of DR. U. Srinivasulu Reddy,
Assistant Professor
Department of Computer Applications
NIT Trichy
Introduction
Recurrent Neural Networks (RNNs) work a bit different from regular neural networks. In
neural network the information flows in one direction from input to output. However in RNN
information is fed back into the system after each step. Think of it like reading a sentence,
when you’re trying to predict the next word you don’t just look at the current word but also
need to remember the words that came before to make accurate guess.
RNNs allow the network to “remember” past information by feeding the output from one step
into next step. This helps the network understand the context of what has already happened
and make better predictions based on that. For example when predicting the next word in a
sentence the RNN uses the previous words to help decide what word is most likely to come
next.
How RNN Differs from Feedforward Neural Networks?
Feedforward Neural Networks (FNNs) process data in one direction from input to output
without retaining information from previous inputs. This makes them suitable for tasks with
independent inputs like image classification. However FNNs struggle with sequential data
since they lack memory.
Recurrent Neural Networks (RNNs) solve this by incorporating loops that allow information
from previous steps to be fed back into the network. This feedback enables RNNs to
remember prior inputs making them ideal for tasks where context is important.
Key Components of RNNs
1. Recurrent Neurons
The fundamental processing unit in RNN is a Recurrent Unit. Recurrent units hold a hidden
state that maintains information about previous inputs in a sequence. Recurrent units can
“remember” information from prior steps by feeding back their hidden state, allowing them to
capture dependencies across time.
2. RNN Unfolding
RNN unfolding or unrolling is the process of expanding the recurrent structure over time
steps. During unfolding each step of the sequence is represented as a separate layer in a
series illustrating how information flows across each time step.
This unrolling enables backpropagation through time (BPTT) a learning process where
errors are propagated across time steps to adjust the network’s weights enhancing the
RNN’s ability to learn dependencies within sequential data.
Recurrent Neural Network Architecture
RNNs share similarities in input and output structures with other deep learning architectures
but differ significantly in how information flows from input to output. Unlike traditional deep
neural networks, where each dense layer has distinct weight matrices, RNNs use shared
weights across time steps, allowing them to remember information over sequences.
The core idea of an RNN is its ability to process data sequentially while retaining information.
Here’s how it works:
● Input Layer: Takes the input at each time step (e.g., a word in a sentence or a value
in a time series).
● Hidden Layer: This is where the "recurrence" happens. The hidden state at time t,
denoted ht, is computed based on the current input xt and the previous hidden state
h(t−1).
where Wxh and Whh are weight matrices, bh is the bias, and f is a non-linear
activation function (e.g., tanh or ReLU).
● Output Layer: Produces the output yt at each time step, computed as:
where g is typically a softmax or linear function depending on the task.
The weights are shared across all time steps, which reduces the number of parameters and
allows the network to generalize across sequences.
Key Features
1. Sequential Processing: RNNs process data one step at a time, making them ideal
for ordered data.
2. Memory: The hidden state acts as a memory, capturing information from earlier
steps.
3. Parameter Sharing: Weights are reused across time steps, making RNNs efficient
for variable-length sequences.
Types of RNNs
● One-to-One: Single input, single output (e.g., classification).
● One-to-Many: Single input, multiple outputs (e.g., image captioning).
● Many-to-One: Multiple inputs, single output (e.g., sentiment analysis).
● Many-to-Many: Multiple inputs and outputs (e.g., machine translation).
Challenges with RNNs
1. Vanishing Gradient Problem: During backpropagation through time (BPTT),
gradients can shrink exponentially, making it hard for the network to learn long-term
dependencies.
2. Exploding Gradients: Conversely, gradients can grow uncontrollably, destabilizing
training.
3. Limited Memory: Basic RNNs struggle to remember information from many steps
back.
Solutions to Challenges
● Long Short-Term Memory (LSTM): A variant of RNNs with specialized gates (input,
forget, and output) to regulate the flow and retention of information, solving the
vanishing gradient problem.
● Gated Recurrent Unit (GRU): A simpler alternative to LSTMs with update and reset
gates, balancing performance and computational efficiency.
Applications
1. Natural Language Processing (NLP): Text generation, sentiment analysis, and
language translation.
2. Time-Series Analysis: Stock price prediction, weather forecasting.
3. Speech Recognition: Converting audio sequences into text.
4. Video Analysis: Frame-by-frame processing for action recognition.
Training RNNs
RNNs are trained using Backpropagation Through Time (BPTT), an extension of
backpropagation that unrolls the network across time steps. The loss is computed as the
sum of errors at each time step, and gradients are used to update the shared weights.
Advantages
● Effective for sequential and temporal data.
● Flexible with variable input/output lengths.
● Can model complex patterns over time.
Disadvantages
● Computationally intensive due to sequential processing.
● Basic RNNs struggle with long-term dependencies.
● Prone to gradient-related issues.
Code
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
text = "This is GeeksforGeeks a software training institute"
chars = sorted(list(set(text)))
char_to_index = {char: i for i, char in enumerate(chars)}
index_to_char = {i: char for i, char in enumerate(chars)}
seq_length = 3
sequences = []
labels = []
for i in range(len(text) - seq_length):
seq = text[i:i + seq_length]
label = text[i + seq_length]
sequences.append([char_to_index[char] for char in seq])
labels.append(char_to_index[label])
X = np.array(sequences)
y = np.array(labels)
X_one_hot = tf.one_hot(X, len(chars))
y_one_hot = tf.one_hot(y, len(chars))
model = Sequential()
model.add(SimpleRNN(50, input_shape=(seq_length, len(chars)),
activation='relu'))
model.add(Dense(len(chars), activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_one_hot, y_one_hot, epochs=100)
start_seq = "This is G"
generated_text = start_seq
for i in range(50):
x = np.array([[char_to_index[char] for char in
generated_text[-seq_length:]]])
x_one_hot = tf.one_hot(x, len(chars))
prediction = model.predict(x_one_hot)
next_index = np.argmax(prediction)
next_char = index_to_char[next_index]
generated_text += next_char
print("Generated Text:")
print(generated_text)
Conclusion
RNNs are a powerful tool in the field of deep learning, bridging the gap between static data
and dynamic sequences. While they face challenges like vanishing gradients, advancements
like LSTMs and GRUs have made them indispensable in modern AI applications.
Understanding their architecture and applications provides a strong foundation for exploring
more advanced neural network models.