torch.nn.Bilinear函数计算原理
1. torch.nn.Bilinear函数官网解释
2. 作者解释
实例化 torch.nn.Bilinear 类时,通过Bilinear_layer = nn.Bilinear(in1_features, in2_features, out_features)
来实例化。
调用 Bilinear_layer
对象时需要输入
x
1
,
x
2
x_1, x_2
x1,x2,如 output = Bilinear_layer(input1, input2)
,这里 input1.shape
为 (batch_size, in1_features)
,input2.shape
为 (batch_size, in2_features)
。
双线性层的 W 的维度为 (out_features,in1_features, in2_features);
双线性层的 B 的维度为 (out_features);
x
1
x_1
x1 的维度为 (batch_size, in1_features);
x
2
x_2
x2 的维度为 (batch_size, in2_features);
双线性变换 y = x 1 T W x 2 + B y = x_1^T W x_2 + B y=x1TWx2+B 的计算原理如下:
- 循环遍历 W 第0个维度out_features,每次循环执行以下步骤2,3;
- (batch_size, in1_features) 矩阵乘法(in1_features, in2_features)点乘(batch_size, in2_features) 得到(batch_size, in2_features);
- 然后将(batch_size, in2_features)按照第一个维度求和得到(batch_size);
- 经过1,2,3步之后,会得到 out_features 个 (batch_size)维向量,把他们拼接起来构成(batch_size, out_features),最后加上偏置就得到输出 y y y.
3. 示例
import torch
import torch.nn as nn
import numpy as np
m = nn.Bilinear(10, 20, 30)
input1 = torch.randn(128, 10)
input2 = torch.randn(128, 20)
output = m(input1, input2)
print(output.size())
arr_output = output.data.cpu().numpy()
# 复制数据用以上计算原理得到y
weight = m.weight.data.cpu().numpy()
bias = m.bias.data.cpu().numpy()
x1 = input1.data.cpu().numpy()
x2 = input2.data.cpu().numpy()
print(x1.shape,weight.shape,x2.shape,bias.shape)
y = np.zeros((x1.shape[0],weight.shape[0]))
for k in range(weight.shape[0]):
buff = np.dot(x1, weight[k])
buff = buff * x2 # (128, 20)
buff = np.sum(buff,axis=1) # (128,)
y[:,k] = buff
y += bias
# 计算误差
dif = y - arr_output
print(np.mean(np.abs(dif.flatten())))
# torch.Size([128, 30])
# (128, 10) (30, 10, 20) (128, 20) (30,)
# 1.6828271327540277e-07