本文主要侧重于网络的代码实现,具体的公式推导可参考:https://zhuanlan.zhihu.com/p/61898234
完整代码:https://github.com/hui126/Deep_Learning_Coding/blob/main/Conv.py
卷积神经网络可以看作是感知机网络的拓展,神经元的数目等于图像的通道数,输入到网络中的值由向量变为张量,与感知机网络最大的不同在于使用权值共享,即每一通道卷积运算过程中共享使用一个卷积核。
前向传递
基于numpy,假设输入特征图aaa维度为(1,3,4,4)(1,3,4,4)(1,3,4,4),卷积核www的维度为(2,3,2,2)(2, 3, 2, 2)(2,3,2,2),步长为(1,1)(1, 1)(1,1),
a[0,0,:,:]=[1662434336771512],a[0,1,:,:]=[9167532657176858],a[0,2,:,:]=[8695461623383536]
a[0,0,:,:] = \left[\begin{matrix}
1 & 6 & 6 & 2 \\ 4 & 3 & 4 & 3 \\ 3 & 6 & 7 & 7 \\ 1 & 5 & 1 & 2
\end{matrix}\right], a[0,1,:,:] = \left[\begin{matrix}
9 & 1 & 6 & 7 \\ 5 & 3 & 2 & 6 \\ 5 & 7 & 1 & 7 \\ 6 & 8 & 5 & 8
\end{matrix}\right], a[0,2,:,:] = \left[\begin{matrix}
8 & 6& 9 & 5 \\ 4 & 6 & 1 & 6 \\ 2 & 3 & 3 & 8 \\ 3 & 5 & 3 & 6
\end{matrix}\right]
a[0,0,:,:]=⎣⎢⎢⎡1431636564712372⎦⎥⎥⎤,a[0,1,:,:]=⎣⎢⎢⎡9556137862157678⎦⎥⎥⎤,a[0,2,:,:]=⎣⎢⎢⎡8423663591335686⎦⎥⎥⎤
卷积核为
w[0,0,:,:]=[5958],w[0,1,:,:]=[1188],w[0,2,:,:]=[7128]w[1,0,:,:]=[5693],w[1,1,:,:]=[2191],w[1,2,:,:]=[8436]
w[0,0,:,:] = \left[\begin{matrix} 5 & 9 \\ 5 & 8\end{matrix}\right],
w[0,1,:,:] = \left[\begin{matrix} 1 & 1 \\ 8 & 8\end{matrix}\right], w[0,2,:,:] = \left[\begin{matrix} 7 & 1 \\ 2 & 8\end{matrix}\right] \\
w[1,0,:,:] = \left[\begin{matrix} 5 & 6 \\ 9 & 3\end{matrix}\right],
w[1,1,:,:] = \left[\begin{matrix} 2 & 1 \\ 9 & 1\end{matrix}\right], w[1,2,:,:] = \left[\begin{matrix} 8 & 4 \\ 3 & 6\end{matrix}\right]
w[0,0,:,:]=[5598],w[0,1,:,:]=[1818],w[0,2,:,:]=[7218]w[1,0,:,:]=[5963],w[1,1,:,:]=[2911],w[1,2,:,:]=[8346]
偏置系数为b=[1,2]b=[1, 2]b=[1,2]
则输出特征图为
z[i,j,:,:]=∑k=02a[i,k,:,:]∗w[j,k,:,:]+b[j]
z[i,j,:,:] = \sum^{2}_{k=0}a[i,k,:,:]*w[j,k,:,:] + b[j]
z[i,j,:,:]=k=0∑2a[i,k,:,:]∗w[j,k,:,:]+b[j]
结果为:
z[0,0,:,:]=[296250288277280294302297315],z[0,1,:,:]=[291252263230267239223283257]
z[0,0,:,:] = \left[\begin{matrix}
296 & 250 & 288 \\ 277 & 280 & 294\\ 302 & 297 & 315
\end{matrix}\right], z[0,1,:,:] = \left[\begin{matrix}
291 & 252 & 263 \\ 230 & 267 & 239 \\ 223 & 283 & 257
\end{matrix}\right]
z[0,0,:,:]=⎣⎡296277302250280297288294315⎦⎤,z[0,1,:,:]=⎣⎡291230223252267283263239257⎦⎤
为了便于梯度反向传播计算,我们将对卷积核与输入特征图进行变换,将卷积运算转化为矩阵乘法运算,其中卷积核转为(3∗2∗2,2)(3*2*2,2)(3∗2∗2,2)的矩阵,
wt=w.reshape(−1,2).T=[595811887128569321918436]T
w_t = w.reshape(-1,2).T= \left[\begin{matrix}
5&9&5&8&1&1&8&8&7&1&2&8\\ 5&6&9&3&2&1&9&1&8&4&3&6
\end{matrix}\right]^T
wt=w.reshape(−1,2).T=[559659831211898178142386]T
输入特征图转为(1,9,12)(1,9,12)(1,9,12)的矩阵,
at[0]=[164391538646663416326961624367269516433653574623346732716133437726171638361557682335675171853353771217583836]
a_t[0]=\left[\begin{matrix}
1& 6& 4& 3& 9& 1& 5& 3& 8& 6& 4& 6 \\
6& 6& 3& 4& 1& 6& 3& 2& 6& 9& 6& 1\\
6& 2& 4& 3& 6& 7& 2& 6& 9& 5& 1& 6\\
4& 3& 3& 6& 5& 3& 5& 7& 4& 6& 2& 3\\
3& 4& 6& 7& 3& 2& 7& 1& 6& 1& 3& 3\\
4& 3& 7& 7& 2& 6& 1& 7& 1& 6& 3& 8\\
3& 6& 1& 5& 5& 7& 6& 8& 2& 3& 3& 5\\
6& 7& 5& 1& 7& 1& 8& 5& 3& 3& 5& 3\\
7& 7& 1& 2& 1& 7& 5& 8& 3& 8& 3& 6
\end{matrix}\right]\\
at[0]=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡166434367662343677434367151343677512916532571167326717532571685326717858869461233695616338461233353616338536⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
则zt[0]=at[0]wtz_t[0] = a_t[0]w_tzt[0]=at[0]wt,其中zt[0]z_t[0]zt[0]的每一列对应于卷积输出的第一张特征图的每一个通道的值,所以z=zt.transpose([0,2,1]).rehsape(1,2,3,3)+bz=z_t.transpose([0, 2, 1]).rehsape(1,2,3,3)+bz=zt.transpose([0,2,1]).rehsape(1,2,3,3)+b
符号约定:假设每一层的神经元数目(输出特征图通道数)为nln^lnl,共LLL层,其中n0n^0n0为输入图像的通道数目。
al−1a^{l-1}al−1 第lll层卷积层的输入特征图,维度为(B,Cl−1,Hl−1,Wl−1)(B,C^{l-1},H^{l-1},W^{l-1})(B,Cl−1,Hl−1,Wl−1);
zlz^lzl 卷积输出结果,未经过激活函数,维度为(B,Cl,Hl,Wl)(B,C^l,H^l,W^l)(B,Cl,Hl,Wl)
h(z)h(z)h(z) 激活函数;
wlw^lwl 卷积核,维度为(Cl,Cl−1,hl,wl)(C^{l}, C^{l-1}, h^l, w^l)(Cl,Cl−1,hl,wl);
blb^lbl 偏置系数,维度为(Cl,)(C^l,)(Cl,)
卷积输出结果为,
zl[i,j,:,:]=∑k=0Cl−1−1al−1[i,k,:,:]∗wl[j,k,:,:]+bl[j]i=0,⋯ ,B−1,j=0,⋯ ,Cl−1
z^{l}[i,j,:,:] = \sum^{C^{l-1}-1}_{k=0}a^{l-1}[i,k,:,:]*w^l[j,k,:,:] + b^l[j] \quad i=0,\cdots,B-1,j=0,\cdots,C^l-1
zl[i,j,:,:]=k=0∑Cl−1−1al−1[i,k,:,:]∗wl[j,k,:,:]+bl[j]i=0,⋯,B−1,j=0,⋯,Cl−1
将卷积处理过程转化为矩阵乘积,
atl−1=trans(al−1),dim=(B,Hl⋅Wl,hl⋅wl⋅Cl−1)
a_t^{l-1} = trans(a^{l-1}), dim=(B, H^l\cdot W^l,h^l\cdot w^l\cdot C^{l-1})
atl−1=trans(al−1),dim=(B,Hl⋅Wl,hl⋅wl⋅Cl−1)
其中trans(al−1)trans(a^{l-1})trans(al−1)为将每一个卷积核停留处的对应数值展成一行,存储在atl−1a^{l-1}_tatl−1中。
wt=w.reshape(−1,Cl).T
w_t = w.reshape(-1, C^l).T
wt=w.reshape(−1,Cl).T
ztl=atl−1wtzl=ztl.transpose([0,2,1]).reshape(B,Cl,Hl,Wl)+b.reshape(1,−1,1,1) z_t^l = a^{l-1}_tw_t \\ z^l = z_t^l.transpose([0, 2, 1]).reshape(B, C^l, H^l,W^l)+b.reshape(1, -1, 1, 1) ztl=atl−1wtzl=ztl.transpose([0,2,1]).reshape(B,Cl,Hl,Wl)+b.reshape(1,−1,1,1)
def forward(self, inputs):
inputs = self.pad(inputs)
self.input_shape = inputs.shape
self.batch_size, in_channels, self.H_in, self.W_in = inputs.shape
assert in_channels == self.in_channels, 'inputs dim1({}) is not equal to convolutional in_channels({})'.format(in_channels, self.in_channels)
self.H_out = (inputs.shape[2] - self.kernel_size[0]) // self.stride[0] + 1
self.W_out = (inputs.shape[3] - self.kernel_size[1]) // self.stride[1] + 1
self.input_trans = np.empty((self.batch_size, self.H_out * self.W_out, self.kernel_trans.shape[0]))
ind = 0
h = 0
while (h + self.kernel_size[0] <= inputs.shape[2]):
w = 0
while (w + self.kernel_size[1] <= inputs.shape[3]):
self.input_trans[:, ind, :] = inputs[:, :, h:h + self.kernel_size[0], w:w + self.kernel_size[1]].reshape(self.batch_size, -1)
w += self.stride[1]
ind += 1
h += self.stride[0]
output = self.input_trans @ self.kernel_trans
output = output.transpose([0, 2, 1]).reshape(self.batch_size, self.out_channels, self.H_out, self.W_out)
if self.bias is not None:
output += self.bias.reshape(1, -1, 1, 1)
return self.input_trans, output
反向传播
与全连接层类似,在进行梯度反向传播过程中,计算损失函数对zlz^lzl的反向传播误差,然后再计算对卷积核及偏置的导数。
假设输入特征图为
a=[a11a12a13a14a21a22a23a24a31a32a33a34a41a42a43a44]
a = \left[\begin{matrix} a_{11}&a_{12}&a_{13}&a_{14} \\a_{21}&a_{22}&a_{23}&a_{24} \\a_{31}&a_{32}&a_{33}&a_{34}\\a_{41}&a_{42}&a_{43}&a_{44}\end{matrix}\right]
a=⎣⎢⎢⎡a11a21a31a41a12a22a32a42a13a23a33a43a14a24a34a44⎦⎥⎥⎤
卷积核为
w=[w11w12w21w22]
w = \left[\begin{matrix}w_{11}&w_{12}\\w_{21}&w_{22}\end{matrix}\right]
w=[w11w21w12w22]
卷积步长为(1,1)(1,1)(1,1),损失函数对卷积结果的反向传播误差为:
δ=[δ11δ12δ13δ21δ22δ23δ31δ32δ33]
\delta = \left[\begin{matrix} \delta_{11}&\delta_{12}&\delta_{13} \\ \delta_{21}&\delta_{22}&\delta_{23} \\\delta_{31}&\delta_{32}&\delta_{33} \end{matrix}\right]
δ=⎣⎡δ11δ21δ31δ12δ22δ32δ13δ23δ33⎦⎤
则损失函数对输入特征图的梯度为:
[w11δ11w11δ12+w12δ11w12δ12+w11δ13w12δ13w21δ11+w11δ21w22δ11+w21δ12+w12δ21+w11δ22w22δ12+w21δ13+w12δ22+w11δ23w21δ13+w11δ23w21δ21+w11δ31w22δ21+w21δ22+w12δ31+w11δ32w22δ22+w21δ23+w12δ32+w11δ33w21δ23+w11δ33w21δ31w22δ31+w21δ32w22δ32+w21δ33w22δ33]
\left[\begin{matrix}
w_{11}\delta_{11} & w_{11}\delta_{12}+w_{12}\delta_{11}& w_{12}\delta_{12}+w_{11}\delta_{13}& w_{12}\delta_{13} \\
w_{21}\delta_{11}+w_{11}\delta_{21}& w_{22}\delta_{11}+w_{21}\delta_{12}+w_{12}\delta_{21}+w_{11}\delta_{22}& w_{22}\delta_{12}+w_{21}\delta_{13}+w_{12}\delta_{22}+w_{11}\delta_{23}&
w_{21}\delta_{13}+w_{11}\delta_{23} \\
w_{21}\delta_{21}+w_{11}\delta_{31}& w_{22}\delta_{21}+w_{21}\delta_{22}+w_{12}\delta_{31}+w_{11}\delta_{32}& w_{22}\delta_{22}+w_{21}\delta_{23}+w_{12}\delta_{32}+w_{11}\delta_{33}&
w_{21}\delta_{23}+w_{11}\delta_{33} \\
w_{21}\delta_{31} & w_{22}\delta_{31}+w_{21}\delta_{32}& w_{22}\delta_{32}+w_{21}\delta_{33}& w_{22}\delta_{33}
\end{matrix}\right]
⎣⎢⎢⎡w11δ11w21δ11+w11δ21w21δ21+w11δ31w21δ31w11δ12+w12δ11w22δ11+w21δ12+w12δ21+w11δ22w22δ21+w21δ22+w12δ31+w11δ32w22δ31+w21δ32w12δ12+w11δ13w22δ12+w21δ13+w12δ22+w11δ23w22δ22+w21δ23+w12δ32+w11δ33w22δ32+w21δ33w12δ13w21δ13+w11δ23w21δ23+w11δ33w22δ33⎦⎥⎥⎤
即将对输出图的误差进行0填充后,与卷积核旋转180度后进行卷积,即获得损失函数对输入特征图的误差。
所以当已知δl+1\delta^{l+1}δl+1时,计算δl\delta^lδl,
δl=δl+1∗ROT180(wl+1)⊙∂al∂zl
\delta^l = \delta^{l+1}*ROT180(w^{l+1})\odot \frac{\partial a^l}{\partial z^l}
δl=δl+1∗ROT180(wl+1)⊙∂zl∂al
在这里同样换一种思路,根据上一小节的转换后的前向传播公式,我们可以计算损失函数对atl−1a^{l-1}_tatl−1的反向传播误差,然后将结果转为对应的al−1a^{l-1}al−1。已知δl\delta^lδl,
δtl=δl.transpose([0,2,3,1]).reshape(B,Hl⋅Wl,Cl)
\delta^l_t = \delta^l.transpose([0, 2, 3, 1]).reshape(B, H^l\cdot W^l,C^l)\\
δtl=δl.transpose([0,2,3,1]).reshape(B,Hl⋅Wl,Cl)
计算δtl,dim=(B,HlWl,Cl)\delta^l_t,dim=(B,H^lW^l,C^l)δtl,dim=(B,HlWl,Cl)与(wtl)T,dim=(Cl,Cl−1hlwl)(w^l_t)^T,dim=(C^l,C^{l-1}h^lw^l)(wtl)T,dim=(Cl,Cl−1hlwl)的张量乘积即可获得损失函数对atl−1a^{l-1}_tatl−1的梯度信息。
∂C∂atl−1=np.tensordot(δtl,(wtl)T,[(2),(0)])
\frac{\partial C}{\partial a^{l-1}_t} = np.tensordot(\delta^l_t,(w^l_t)^T, [(2),(0)])
∂atl−1∂C=np.tensordot(δtl,(wtl)T,[(2),(0)])
其中[(2),(0)][(2),(0)][(2),(0)]表示对δtl\delta^l_tδtl的第3维度和(wtl)T(w^l_t)^T(wtl)T的第1维度进行计算,结果的维度为(B,Hl,Wl,Cl−1hlwl)(B,H^l,W^l, C^{l-1}h^lw^l)(B,Hl,Wl,Cl−1hlwl)。
将获得中间误差信息反向变换(映射到同一位置处执行加法运算)可以获得损失函数对al−1a^{l-1}al−1的梯度。
def backward(self, grad):
grad_trans = grad.transpose([0, 2, 3, 1]).reshape(self.batch_size, -1, self.out_channels)
grad_backward_trans = np.tensordot(grad_trans, self.kernel_trans.T, [(2), [0]])
grad_backward = np.zeros(self.input_shape)
ind = 0
for ih in range(grad.shape[2]):
begin_h = ih * self.stride[0]
for iw in range(grad.shape[3]):
begin_w = iw * self.stride[1]
grad_backward[:, :, begin_h:(begin_h+self.kernel_size[0]), begin_w:(begin_w+self.kernel_size[1])] += \
grad_backward_trans[:, ind, :].reshape(self.batch_size, self.in_channels, self.kernel_size[0], self.kernel_size[1])
ind += 1
grad_backward = grad_backward[:, :, self.padding[0]:self.input_shape[2]-self.padding[0], self.padding[1]:self.input_shape[3]-self.padding[1]]
# print(grad_backward.shape)
self.grad_k_trans = np.tensordot(self.input_trans, grad_trans, [(0, 1), (0, 1)])
if self.bias is not None:
self.grad_b = np.sum(grad_trans, axis=(0, 1)).reshape(1, -1)
return grad_backward
已知δtl\delta^l_tδtl时,计算损失函数对wtl,blw^l_t,b^lwtl,bl的梯度,
∂C∂wtl=np.tensordot(atl−1,δtl,[(0,1),(0,1)])∂C∂bl=np.sum(δtl,axis=(0,1))
\frac{\partial C}{\partial w^l_t} = np.tensordot(a^{l-1}_t,\delta^l_t,[(0,1),(0,1)]) \\
\frac{\partial C}{\partial b^l} = np.sum(\delta^l_t, axis=(0,1))
∂wtl∂C=np.tensordot(atl−1,δtl,[(0,1),(0,1)])∂bl∂C=np.sum(δtl,axis=(0,1))
对于最大池化,可以利用相近的思想进行处理。
假设输入特征图aaa维度为(16,3,128,128)(16, 3, 128, 128)(16,3,128,128),卷积核www为(8,3,3,3)(8, 3, 3, 3)(8,3,3,3),偏置bbb为(8)(8)(8)步长为(1,1)(1,1)(1,1),不进行填充,则前向传播过程为:
获得输出特征图的反向传播误差δ(16,8,127,127)\delta(16, 8, 127,127)δ(16,8,127,127)后,计算对输入特征图的反向传播误差,
Font metrics not found for font: .
对mnist数据集进行分类,构建网络结构如下:
layers = [Conv2d(in_channels=1, out_channels=6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
MaxPool2d(kernel_size=(2, 2), stride=(2, 2)),
ReLU(),
Conv2d(in_channels=6, out_channels=16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
MaxPool2d(kernel_size=(2, 2), stride=(2, 2)),
ReLU(),
Flatten(),
Linear(in_features=784, out_features=120),
ReLU(),
Linear(in_features=120, out_features=10)]
使用随机梯度下降进行梯度更新,完成5轮训练,训练损失变化曲线为
验证准确率变化曲线为:
测试集准确率为0.9798。
github:https://github.com/hui126/Deep_Learning_Coding