LSTM介绍
先去看RNN
LSTM使用“门”(sigmoid函数)解决了RNN不能记忆长距离信息的问题,使用累加方式的损失解决了RNN梯度爆炸/弥散的问题
LSTM中有三个门,分别是:
1.forget gate(遗忘门)
遗忘门用来控制记住以前记忆(Ct−1C_{t-1}Ct−1)的程度
2.input gate(输入门)
输入门用来控制记忆此时刻输入数据(XtX_tXt)的程度
此时在经过了遗忘门和输入门,我们可以拿到本次的记忆数据CtC_tCt
这里介绍一下,ftf_tft是遗忘门,iti_tit是输入门,Ct−1C_{t-1}Ct−1相当于RNN中的htWhhh_tW_{hh}htWhh,~CtC_tCt相当于xtWihx_tW_{ih}xtWih,门在这里是起一个选择的作用,LSTM后面还需要再经过输出门,得到hth_tht
3.output gate(输出门)
输出门用来控制输出hth_tht的程度
下面是LSTM的简化整体流程
迭代公式
不同门关闭与开启,造成的效果
torch.nn.LSTM
输入参数:
input_size:即对数据做embedding的数据维度feature_len
hidden_size:LSTM的隐层维度
num_layers:LSTM网络的层数,默认为1层
LSTM的前向传播
out, (h_t, c_t) = lstm(x, (h_t0, c_t0)
x:输入数据,维度为(seq_len, batch_size, feature_len)
h/c:上一隐层的输出,维度为(num_layers, batch_size, hidden_size)
out:为每一时刻隐层输出的列表集合,形如[h_1, h_2, …, h_t],维度为(seq_len, batch_size, hidden_size)
LSTM代码验证
import torch
lstm = torch.nn.LSTM(input_size=100, hidden_size=20, num_layers=4)
x = torch.randn(10, 3, 100)
h_0 = torch.zeros(4, 3, 20)
c_0 = torch.zeros(4, 3, 20)
out, (h_t, c_t) = lstm(x, (h_0, c_0))
print(out.shape, h_t.shape, c_t.shape)
torch.Size([10, 3, 20]) torch.Size([4, 3, 20]) torch.Size([4, 3, 20])
torch.nn.LSTMCell
输入与LSTM相同,但是没有num_layers
LSTMCell的前向传播
h_t, c_t = lstmcell(x_t, (h_t0, c_t0))
x_t:单个输入数据,维度为(batch_size, feature_len)
h_t/c_t:隐层输出,维度为(batch_size, hidden_size)
LSTM代码验证
import torch
lstmcell = torch.nn.LSTMCell(input_size=100, hidden_size=20)
x = torch.randn(10, 3, 100)
h_0 = torch.zeros(3, 20)
c_0 = torch.zeros(3, 20)
for x_t in x:
h_t, c_t = lstmcell(x_t, (h_0, c_0))
print(h_t.shape, c_t.shape)
torch.Size([3, 20]) torch.Size([3, 20])
import torch
lstmcell1 = torch.nn.LSTMCell(input_size=100, hidden_size=20)
lstmcell2 = torch.nn.LSTMCell(input_size=20, hidden_size=10)
x = torch.randn(10, 3, 100)
h_0 = torch.zeros(3, 20)
c_0 = torch.zeros(3, 20)
h_1 = torch.zeros(3, 10)
c_1 = torch.zeros(3, 10)
for x_t in x:
h_0, c_0 = lstmcell1(x_t, (h_0, c_0))
h_1, c_1 = lstmcell2(h_0, (h_1, c_1))
print(h_1.shape, c_1.shape)
torch.Size([3, 10]) torch.Size([3, 10])