0% found this document useful (0 votes)
4 views

Experiment 10

The document outlines the implementation of a Recurrent Neural Network (RNN) for classifying IMDB movie reviews as positive or negative. It details the objectives, program code, and step-by-step explanation of loading data, preprocessing, building, compiling, training, and evaluating the model. The RNN achieves a test accuracy of approximately 85-87%, demonstrating its effectiveness in sentiment analysis.

Uploaded by

gnanesh847
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Experiment 10

The document outlines the implementation of a Recurrent Neural Network (RNN) for classifying IMDB movie reviews as positive or negative. It details the objectives, program code, and step-by-step explanation of loading data, preprocessing, building, compiling, training, and evaluating the model. The RNN achieves a test accuracy of approximately 85-87%, demonstrating its effectiveness in sentiment analysis.

Uploaded by

gnanesh847
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

# Experiment 10: Implement an RNN for IMDB Movie Review Classification

## Title

Recurrent Neural Network (RNN) for IMDB Movie Review Classification

## Aim

To implement a Recurrent Neural Network (RNN) for classifying IMDB movie reviews as
either positive or negative.

## Objectives

- Understand the use of RNN for text classification.

- Preprocess text data and convert it into sequences using word embeddings.

- Train an RNN model using TensorFlow/Keras for sentiment analysis.

- Evaluate the model's performance using accuracy metrics.

---

## Program with Line-by-Line Explanation

Below is the complete Python code to implement an RNN for sentiment classification
on the IMDB dataset:

```python

# Import required libraries

import tensorflow as tf

from tensorflow import keras

from tensorflow.keras.preprocessing import sequence

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Embedding, SimpleRNN, Dense

from tensorflow.keras.datasets import imdb

# Step 1: Load the IMDB dataset

max_features = 10000 # Vocabulary size (top 10,000 words)


maxlen = 500 # Max length of a review (truncate/pad to this size)

batch_size = 32

# Load dataset with only top `max_features` words

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# Step 2: Preprocess the data (pad sequences to ensure equal length)

x_train = sequence.pad_sequences(x_train, maxlen=maxlen)

x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# Step 3: Build the RNN model

model = Sequential([

Embedding(input_dim=max_features, output_dim=32), # Embedding layer

SimpleRNN(32), # Simple RNN layer with 32 units

Dense(1, activation='sigmoid') # Output layer for binary classification

])

# Step 4: Compile the model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Step 5: Train the model

model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_data=(x_test,


y_test))

# Step 6: Evaluate the model

test_loss, test_acc = model.evaluate(x_test, y_test)

print(f"Test Accuracy: {test_acc:.4f}")

```

### Explanation of Code (Line by Line)

#### Step 1: Load the IMDB Dataset


```python

max_features = 10000 # Vocabulary size (top 10,000 words)

maxlen = 500 # Max length of a review (truncate/pad to this size)

batch_size = 32

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

```

- The IMDB dataset contains 50,000 movie reviews (25,000 for training and 25,000 for
testing).

- Each review is a sequence of integers representing word indices.

- `num_words=max_features` limits the vocabulary to the 10,000 most frequent


words.

- Reviews are labeled as positive (1) or negative (0).

#### Step 2: Preprocess the Data

```python

x_train = sequence.pad_sequences(x_train, maxlen=maxlen)

x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

```

- Reviews vary in length, so they are padded or truncated to a fixed length of 500
words.

- This ensures all input sequences have the same shape, which is required for the
RNN.

#### Step 3: Build the RNN Model

```python

model = Sequential([

Embedding(input_dim=max_features, output_dim=32), # Embedding layer

SimpleRNN(32), # Simple RNN layer with 32 units

Dense(1, activation='sigmoid') # Output layer for binary classification

])

```

- **Embedding Layer**: Converts word indices into dense vectors of size 32, learning
word representations during training.
- **SimpleRNN Layer**: A basic RNN with 32 units that processes the sequence and
captures temporal dependencies between words.

- **Dense Layer**: A single neuron with a sigmoid activation function outputs a


probability (0 to 1) for binary classification.

#### Step 4: Compile the Model

```python

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

```

- **Loss Function**: `binary_crossentropy` is suitable for binary classification tasks.

- **Optimizer**: `adam` adapts the learning rate for efficient training.

- **Metrics**: `accuracy` measures the model's performance.

#### Step 5: Train the Model

```python

model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_data=(x_test,


y_test))

```

- Trains the model for 5 epochs with a batch size of 32.

- Uses training data (`x_train`, `y_train`) and validates on test data (`x_test`, `y_test`)
after each epoch.

#### Step 6: Evaluate the Model

```python

test_loss, test_acc = model.evaluate(x_test, y_test)

print(f"Test Accuracy: {test_acc:.4f}")

```

- Evaluates the model on the test dataset and prints the test accuracy, showing
performance on unseen data.

---

## Expected Output

After training for 5 epochs, the output might look like this:
```

Epoch 1/5

782/782 [==============================] - 35s 45ms/step - loss:


0.6500 - accuracy: 0.6000 - val_loss: 0.5500 - val_accuracy: 0.7000

Epoch 2/5

782/782 [==============================] - 32s 41ms/step - loss:


0.4500 - accuracy: 0.8000 - val_loss: 0.4000 - val_accuracy: 0.8200

Epoch 3/5

782/782 [==============================] - 32s 41ms/step - loss:


0.3000 - accuracy: 0.8800 - val_loss: 0.3500 - val_accuracy: 0.8500

Epoch 4/5

782/782 [==============================] - 32s 41ms/step - loss:


0.2000 - accuracy: 0.9200 - val_loss: 0.3200 - val_accuracy: 0.8600

Epoch 5/5

782/782 [==============================] - 32s 41ms/step - loss:


0.1200 - accuracy: 0.9500 - val_loss: 0.3100 - val_accuracy: 0.8700

Test Accuracy: 0.8700

```

The model typically achieves a test accuracy of around 85–87%, meaning it correctly
classifies reviews as positive or negative about 85% of the time.

---

## Conclusion

- Successfully implemented an RNN for IMDB movie review classification.

- Used word embeddings to numerically represent text data, enabling sequence


processing.

- The model effectively learns sentiment patterns, achieving good accuracy on the
test set.

This experiment demonstrates the power of RNNs in handling sequential data like text
for sentiment analysis tasks.

You might also like