第N9周:seq2seq翻译实战-Pytorch复现
时间: 2025-04-17 20:41:04 浏览: 30
### 使用 PyTorch 实现 Seq2Seq 翻译模型
#### 准备环境与数据集
为了构建一个有效的翻译模型,首先需要准备好必要的开发环境以及用于训练的数据集。这通常涉及到安装特定版本的PyTorch和其他依赖库,并下载并预处理目标语料库。
对于本案例中的英汉互译任务而言,加载预先定义好的映射表`en2id`和`zh2id`是非常重要的一步[^2]。这些字典帮助将单词转换成索引形式以便于神经网络处理。
#### 构建编码器 (Encoder)
```python
import torch.nn as nn
class Encoder(nn.Module):
def __init__(self, input_dim, emb_dim, hid_dim, n_layers, dropout):
super().__init__()
self.embedding = nn.Embedding(input_dim, emb_dim)
self.rnn = nn.LSTM(emb_dim, hid_dim, num_layers=n_layers, bidirectional=True, batch_first=True)
self.fc_out = nn.Linear(hid_dim * 2, hid_dim) # Bidirectional LSTM outputs are concatenated
def forward(self, src):
embedded = self.embedding(src)
output, (hidden, cell) = self.rnn(embedded)
hidden = torch.tanh(self.fc_out(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)))
return hidden, cell
```
此部分实现了编码器的功能,它接收源语言句子作为输入并通过多层双向LSTM获取上下文信息。最终状态被传递给解码器以启动生成过程[^1]。
#### 设计带有注意力机制的解码器 (Attention Decoder)
```python
class AttentionDecoder(nn.Module):
def __init__(self, output_dim, emb_dim, enc_hid_dim, dec_hid_dim, dropout, attention):
super().__init__()
self.attention = attention
self.embedding = nn.Embedding(output_dim, emb_dim)
self.rnn = nn.GRU(enc_hid_dim*2 + emb_dim, dec_hid_dim, batch_first=True)
self.fc_out = nn.Linear(dec_hid_dim, output_dim)
def forward(self, trg, encoder_outputs, hidden):
attn_weights = self.attention(hidden.unsqueeze(0), encoder_outputs).unsqueeze(1)
context_vector = torch.bmm(attn_weights, encoder_outputs)
embedding_output = self.embedding(trg).unsqueeze(1)
rnn_input = torch.cat([embedding_output, context_vector], dim=-1)
output, hidden = self.rnn(rnn_input, hidden.unsqueeze(0))
prediction = self.fc_out(output.squeeze(1))
return prediction, hidden.squeeze(0), attn_weights
```
这里引入了注意力机制来增强模型的表现力,在每一步解码过程中动态调整对不同位置的关注度,从而更好地捕捉长距离依赖关系[^4]。
#### 定义完整的Seq2Seq框架
```python
class Seq2Seq(nn.Module):
def __init__(self, encoder, decoder, device):
super().__init__()
self.encoder = encoder
self.decoder = decoder
self.device = device
def forward(self, src, trg, teacher_forcing_ratio=0.5):
max_len = trg.shape[1]
batch_size = trg.shape[0]
vocab_size = self.decoder.output_dim
outputs = torch.zeros(max_len, batch_size, vocab_size).to(self.device)
encoder_hidden, encoder_cell = self.encoder(src)
decoder_hidden = encoder_hidden.to(self.device)
decoder_input = trg[:, 0].unsqueeze(-1).to(self.device)
for t in range(1, max_len):
output, decoder_hidden, _ = self.decoder(
decoder_input,
encoder_hidden.permute(1, 0, 2),
decoder_hidden
)
outputs[t] = output
top1 = output.argmax(1)
teacher_force = random.random() < teacher_forcing_ratio
decoder_input = trg[:, t] if teacher_force else top1
return outputs
```
上述代码片段展示了如何组合编码器和带注意力建模的解码器形成完整的Seq2Seq体系结构。通过迭代地应用教师强制策略(teacher forcing),可以在一定程度上加速收敛速度并提高性能表现[^3]。
阅读全文
相关推荐


















