TensorFlow之循环神经网络&自然语言处理学习总结_以cos函数加一定噪声为标准函数生成训练数据集与测试数据集,使用循环神经网络预测-CSDN博客

本文链接：https://blog.csdn.net/jliang3/article/details/88318372

作者：jliang

https://blog.csdn.net/jliang3

junliang 20190303

说明：以下所有代码使用版本TensorFlow1.4.0或1.12.0版本

import tensorflow as tf
print(tf.__version__)

1.12.0

8. 循环神经网络

TensorFlow中实现LSTM结构的循环神经网络的前向传播过程

BasicLSTMCell类提供了zero_state函数来生成全零状态。
state是一个包含两个张量的LSTMStateTuple类，其中state.c和state.h分别对应c状态和h状态。
和其他神经网络类似，在优化循环神经网络时，每次也会使用一个batch的训练样本。


# LSTM中使用的变量也会在函数中自动被声明
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_hidden_size)

# 将LSTM中的状态初始化为全0数组。BasicLSTMCell类提供了zero_state函数来生成全零状态。
state = lstm.zero_state(batch_size, tf.float32)

# 定义损失函数
loss = 0.0

# 虽然在测试时循环神经网络可以处理任意长度的序列，但是在训练中为了将循环网络展开成前馈神经网络，
# 我们需要知道训练数据的序列长度。
# 以下使用num_steps来表示这个长度。
# 第9章中将介绍使用dynamic_rnn动态处理变长序列的方法。
for i in range(num_steps):
    # 在第一个时刻声明LSTM结构中使用的变量，在之后的时刻都需要复用之前定义好的变量。
    if i > 0: tf.get_variable_scope().reuse_variables()
        
    # 每一步处理时间序列中的一个时刻，将当前输入current_input
    # 和前一个时刻state（h和c）传入定义的LSTM结构
    # 可以得到当前的LSTM的输出lstm_output(h)和更新后状态state(h和c)
    # lstm_output用于输出给其他层，state用于输出给下一时刻，它们在dropout等方面可以有不同的处理方式。
    lstm_output, state = lstm(current_input, state)
    
    # 把当前时刻LSTM结构输出传入一个全连接层得到最后的输出。
    final_output = fully_connected(lstm_output)
    
    # 计算当前时刻的输出损失
    loss += calc_loss(final_output, expected_output)

8.3 循环神经网络的变种

在经典的循环神经网络中，状态的传输是从前往后单向的。然而，有些问题中当前时刻的输出不仅和之前的状态有关系，也和之后的状态有关系，这是就需要使用双向循环神经网络来解决这类问题。
如：预测一个语句中缺失的单词不仅需要根据前文来判断，也需要根据后文来判断。

双向循环神经网络时由两个独立的循环神经网络叠加在一起组成，输出由两个循环神经网络的输出拼接而成。
每一层网络中的循环体可以自由选用任意结构，如RNN、LSTM。

深层循环神经网络

为了增强模型的表达能力，可以在网络中设置多个循环层，将每层循环网络的输出传给下一层进行处理。

TensorFlow提供了MultiRNNCell类来实现深层循环神经网络的前向传播过程
只需要在BasicLSTMCell的基础上再封装一层MultiRNNCell就可以非常容易地实现深层循环神经网络

# 定义一个基本的LSTM结构作为循环体的基础结构
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell

# 通过MultiRNNCell类实现深层循环神经网络中每一个时刻的前向传播过程。
# number_of_layers表示有多少层
# 注意：从TensorFlow1.1版本起，不能使用[lstm_cell(lstm_size)] * N的形式来初始化MultiRNNCell，
# 否则TensorFlow会在每一层之间共享参数。
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(
    [lstm_cell(lstm_size) for _ in range(number_of_layers)]
)

# 和经典的循环神经网络一样，可以通过zero_state来获取初始状态
state = stacked_lstm.zero_state(batch_size, tf.float32)

# 计算每一时刻的前向传播结果
for i in range(len(num_steps)):
    if i > 0: tf.get_variable_scope().reuse_variables()
        
    stacked_lstm_output, state = stacked_lstm(current_input, state)
    final_output = fully_connected(stacked_lstm_output)
    loss += calc_loss(final_output, expected_output)

循环神经网络的dropout

通过dropout，可以让卷积神经网络更加健壮，类似，在循环神经网络中使用dropout也有同样的功能。
循环神经网络一般只在不同层循环体结构中使用dropout，而不在同一层的循环体结构之间使用（不同时刻之间不使用）
TensorFlow中使用tf.nn.rnn_cell.DropoutWrapper类可以很容易实现dropout功能

# 定义LSTM结构
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell

# 使用DropoutWrapper类实现dropout功能。该类通过两个参数来控制dropout的概率，
# 一个参数为Input_keep_prob，可以控制输入的dropout概率；另一个为output_keep_prob，它可以用来控制输出的dropout概率。
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(
    [tf.nn.rnn_cell.DropoutWrapper(lstm_cell(lstm_size)) for _ in range(number_of_layers)]
)

...

8.5 循环神经网络样例应用

利用循环神经网络实现函数sinx取值的预测

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

# 1. 定义RNN的参数。
HIDDEN_SIZE = 30                            # LSTM中隐藏节点的个数。
NUM_LAYERS = 2                              # LSTM的层数。
TIMESTEPS = 10                              # 循环神经网络的训练序列长度。
TRAINING_STEPS = 10000                      # 训练轮数。
BATCH_SIZE = 32                             # batch大小。
TRAINING_EXAMPLES = 10000                   # 训练数据个数。
TESTING_EXAMPLES = 1000                     # 测试数据个数。
SAMPLE_GAP = 0.01                           # 采样间隔。

# 2. 产生正弦数据。
def generate_data(seq):
    X = []
    y = []
    # 序列的第i项和后面的TIMESTEPS-1项合在一起作为输入；第i + TIMESTEPS项作为输
    # 出。即用sin函数前面的TIMESTEPS个点的信息，预测第i + TIMESTEPS个点的函数值。
    for i in range(len(seq) - TIMESTEPS):
        X.append([seq[i: i + TIMESTEPS]])
        y.append([seq[i + TIMESTEPS]])
    return np.array(X, dtype=np.float32), np.array(y, dtype=np.float32)  

# 用正弦函数生成训练和测试数据集合。
test_start = (TRAINING_EXAMPLES + TIMESTEPS) * SAMPLE_GAP
test_end = test_start + (TESTING_EXAMPLES + TIMESTEPS) * SAMPLE_GAP
train_X, train_y = generate_data(np.sin(np.linspace(
    0, test_start, TRAINING_EXAMPLES + TIMESTEPS, dtype=np.float32)))
test_X, test_y = generate_data(np.sin(np.linspace(
    test_start, test_end, TESTING_EXAMPLES + TIMESTEPS, dtype=np.float32)))

# 3. 定义网络结构和优化步骤。
def lstm_model(X, y, is_training):
    # 使用多层的LSTM结构。
    cell = tf.nn.rnn_cell.MultiRNNCell([
        tf.nn.rnn_cell.BasicLSTMCell(HIDDEN_SIZE) 
        for _ in range(NUM_LAYERS)])    

    # 使用TensorFlow接口将多层的LSTM结构连接成RNN网络并计算其前向传播结果。
    outputs, _ = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
    output = outputs[:, -1, :]

    # 对LSTM网络的输出再做加一层全链接层并计算损失。注意这里默认的损失为平均
    # 平方差损失函数。
    predictions = tf.contrib.layers.fully_connected(
        output, 1, activation_fn=None)
    
    # 只在训练时计算损失函数和优化步骤。测试时直接返回预测结果。
    if not is_training:
        return predictions, None, None
        
    # 计算损失函数。
    loss = tf.losses.mean_squared_error(labels=y, predictions=predictions)

    # 创建模型优化器并得到优化步骤。
    train_op = tf.contrib.layers.optimize_loss(
        loss, tf.train.get_global_step(),
        optimizer="Adagrad", learning_rate=0.1)
    return predictions, loss, train_op

# 4. 定义测试方法。
def run_eval(sess, test_X, test_y):
    # 将测试数据以数据集的方式提供给计算图。
    ds = tf.data.Dataset.from_tensor_slices((test_X, test_y))
    ds = ds.batch(1)
    X, y = ds.make_one_shot_iterator().get_next()
    
    # 调用模型得到计算结果。这里不需要输入真实的y值。
    with tf.variable_scope("model", reuse=True):
        prediction, _, _ = lstm_model(X, [0.0], False)
    
    # 将预测结果存入一个数组。
    predictions = []
    labels = []
    for i in range(TESTING_EXAMPLES):
        p, l = sess.run([prediction, y])
        predictions.append(p)
        labels.append(l)

    # 计算rmse作为评价指标。
    predictions = np.array(predictions).squeeze()
    labels = np.array(labels).squeeze()
    rmse = np.sqrt(((predictions - labels) ** 2).mean(axis=0))
    print("Root Mean Square Error is: %f" % rmse)
    
    #对预测的sin函数曲线进行绘图。
    plt.figure()
    plt.plot(predictions, label='predictions')
    plt.plot(labels, label='real_sin')
    plt.legend()
    plt.show()
    
# 5. 执行训练和测试。
# 将训练数据以数据集的方式提供给计算图。
ds = tf.data.Dataset.from_tensor_slices((train_X, train_y))
ds = ds.repeat().shuffle(1000).batch(BATCH_SIZE)
X, y = ds.make_one_shot_iterator().get_next()

# 定义模型，得到预测结果、损失函数，和训练操作。
with tf.variable_scope("model"):
    _, loss, train_op = lstm_model(X, y, True)
    
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # 测试在训练之前的模型效果。
    print("Evaluate model before training.")
    run_eval(sess, test_X, test_y)
    
    # 训练模型。
    for i in range(TRAINING_STEPS):
        _, l = sess.run([train_op, loss])
        if i % 1000 == 0:
            print("train step: " + str(i) + ", loss: " + str(l))
    
    # 使用训练好的模型对测试数据进行预测。
    print("Evaluate model after training.")
    run_eval(sess, test_X, test_y)

WARNING:tensorflow:From <ipython-input-4-b027e70174db>:39: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell').
Evaluate model before training.
Root Mean Square Error is: 0.681598

train step: 0, loss: 0.4930264
train step: 1000, loss: 0.0015030965
...
train step: 9000, loss: 3.4491877e-06
Evaluate model after training.
Root Mean Square Error is: 0.001859

9.自然语言处理

利用循环神经网络来搭建自然语言处理方面的一些经典应用，如语言模型、机器翻译等。

9.1语言模型的背景知识

语言模型：假设一门语言中所有可能的句子服从某一个概率分布，每个句子出现的概率加起来为1，那么语言模型的任务就是预测每个句子在语言中出现的概率。

对于语言中常见的句子，一个好的语言模型应得出相对较高的概率；而对于不合语法的句子，计算出的概率则应接近零。
语言模型仅仅对句子出现的概率进行建模，并不尝试去理解句子的内容含义。
神经网络机器翻译的Seq2Seq模型可以看作是一个条件语言模型（Conditional Language Model），它相当于在给定输入的情况下对目标语言的所有句子估算概率，并选座其中概率最大的句子作为输出。
常见的方法有：n-gram模型、决策树、最大熵模型、条件随机场、神经网络语言模型等。

语言模型的评价方法：语言模型效果好坏的常用评价指标是复杂度（perplexity）。在测试集上perplexity越低，效果越好。

perplexity值刻画的是语言模型预测一个语言样本的能力。比如已经知道(w1,w2,...wm)这句话会出现在语料库中，那么通过语言模型计算得到这句子的概率越高，说明语言模型对这个语料库拟合得越好。
perplexity实际是计算每一个单词得到的概率倒数的几何平均，因此perplexity可以理解为平均分支系数，即模型预测下一个词时的平均可选择数量。
目前在PTB(Penn Tree Bank)数据集上最好的语言模型perplexity为47.7，即在平均情况下，该模型预测下一个词时，有47.7个词等可能地作为下一个词的合理选择。
在神经网络模型中，p(wi|w1,w2,...wi-1)分布通常是由一个softmax层产生的，这时TensorFlow中提供了两个方便计算交叉熵的函数
- tf.nn.softmax_cross_entropy_with_logits
- tf.nn.sparse_softmax_cross_entropy_with_logits

tf.nn.softmax_cross_entropy_with_logits与tf.nn.sparse_softmax_cross_entropy_with_logits的区别

由于softmax_cross_entropy_with_logits允许提供一个概率分布，因此在使用时有更大的自由度。
举个例子：一种叫label smoothing的技巧是将正确数据的概率设为一个比1.0略小的值，将错误数据的概率设为比0.0略大的值，这样可以避免模型与数据过拟合，在某些时候可以提高训练效果。

# 假设词汇表的大小为3（即整个语料库只有3个单词），语料包含两个单词“2 0”
word_labels = tf.constant([2, 0])

# 假设模型对两个单词预测时，产生的logit分别是[2.0, -1.0, 3.0]和[1.0, 0.0, -0.5]
# 注意这里的logit不是概率，因此它们不是0.0~1.0之间的数字。
# 如果需要计算概率，则需要调用prop=tf.nn.softmax(logits)。但这里计算交叉熵的函数直接输入logits即可。
predict_logits = tf.constant([[2.0, -1.0, 3.0], [1.0, 0.0, -0.5]])

# 使用tf.nn.sparse_softmax_cross_entropy_with_logits计算交叉熵
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=word_labels, logits=predict_logits)
with tf.Session() as sess:
    print(sess.run(loss))

    # softmax_cross_entropy_with_logits与上面类似，但是需要将预测目标以概率分布的形式给出。
    word_prob_distribution = tf.constant([[0.0, 0.0, 1.0], [1.0, 0.0, 0.0]])
    loss = tf.nn.softmax_cross_entropy_with_logits(
        labels=word_prob_distribution, logits=predict_logits
    )
    print(sess.run(loss))
    
    # 由于softmax_cros