rnn循环神经网络实战
时间: 2025-02-20 07:55:24 浏览: 34
### RNN 实战项目示例
#### 使用 TensorFlow 构建简单字符级文本生成器
构建一个基于 RNN 的字符级文本生成器可以作为理解 RNN 工作原理的一个很好的入门案例。此例子展示了如何训练一个模型来预测给定输入序列后的下一个字符。
```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
import numpy as np
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = Sequential([
Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
LSTM(rnn_units,
return_sequences=True,
stateful=True),
Dense(vocab_size)
])
return model
text = open('shakespeare.txt', 'rb').read().decode(encoding='utf-8')
vocab = sorted(set(text))
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
seq_length = 100
examples_per_epoch = len(text)//(seq_length+1)
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
BATCH_SIZE = 64
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
VOCAB_SIZE = len(vocab)
EMBEDDING_DIM = 256
RNN_UNITS = 1024
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
for input_example_batch, target_example_batch in dataset.take(1):
example_batch_predictions = model(input_example_batch)
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)
example_batch_loss = loss(target_example_batch, example_batch_predictions)
print(f'Loss: {np.mean(example_batch_loss.numpy())}')
model.compile(optimizer='adam', loss=loss)
EPOCHS = 10
history = model.fit(dataset, epochs=EPOCHS)
```
这段代码创建了一个简单的字符级别语言模型,该模型能够学习并模仿莎士比亚风格的文字[^2]。
#### 序列分类任务——情感分析
另一个常见的应用场景是在自然语言处理领域的情感分析。这里展示的是使用IMDB电影评论数据集来进行正面/负面情绪分类的任务:
```python
max_features = 20000 # 只考虑最常见的2万个单词
maxlen = 80 # 每条影评截断或填充到这个长度
batch_size = 32 # 批量大小
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(num_words=max_features)
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)
model = Sequential()
model.add(tf.keras.layers.Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=15,
validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
batch_size=batch_size)
print(f'Test score: {score}')
print(f'Test accuracy: {acc}')
```
上述脚本实现了对 IMDb 数据集中电影评论的情绪倾向进行二元分类的功能.
阅读全文
相关推荐


















