卷积神经网络公式推导及numpy实现

本文聚焦卷积神经网络的代码实现,介绍其可看作感知机网络拓展,使用权值共享。详细阐述前向传递,将卷积运算转化为矩阵乘法;反向传播则计算损失函数对相关参数的梯度。最后用mnist数据集分类测试,测试集准确率达0.9798。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文主要侧重于网络的代码实现,具体的公式推导可参考:https://zhuanlan.zhihu.com/p/61898234
完整代码:https://github.com/hui126/Deep_Learning_Coding/blob/main/Conv.py

卷积神经网络可以看作是感知机网络的拓展,神经元的数目等于图像的通道数,输入到网络中的值由向量变为张量,与感知机网络最大的不同在于使用权值共享,即每一通道卷积运算过程中共享使用一个卷积核。

前向传递

基于numpy,假设输入特征图aaa维度为(1,3,4,4)(1,3,4,4)(1,3,4,4),卷积核www的维度为(2,3,2,2)(2, 3, 2, 2)(2,3,2,2),步长为(1,1)(1, 1)(1,1)
a[0,0,:,:]=[1662434336771512],a[0,1,:,:]=[9167532657176858],a[0,2,:,:]=[8695461623383536] a[0,0,:,:] = \left[\begin{matrix} 1 & 6 & 6 & 2 \\ 4 & 3 & 4 & 3 \\ 3 & 6 & 7 & 7 \\ 1 & 5 & 1 & 2 \end{matrix}\right], a[0,1,:,:] = \left[\begin{matrix} 9 & 1 & 6 & 7 \\ 5 & 3 & 2 & 6 \\ 5 & 7 & 1 & 7 \\ 6 & 8 & 5 & 8 \end{matrix}\right], a[0,2,:,:] = \left[\begin{matrix} 8 & 6& 9 & 5 \\ 4 & 6 & 1 & 6 \\ 2 & 3 & 3 & 8 \\ 3 & 5 & 3 & 6 \end{matrix}\right] a[0,0,:,:]=1431636564712372,a[0,1,:,:]=9556137862157678,a[0,2,:,:]=8423663591335686
卷积核为
w[0,0,:,:]=[5958],w[0,1,:,:]=[1188],w[0,2,:,:]=[7128]w[1,0,:,:]=[5693],w[1,1,:,:]=[2191],w[1,2,:,:]=[8436] w[0,0,:,:] = \left[\begin{matrix} 5 & 9 \\ 5 & 8\end{matrix}\right], w[0,1,:,:] = \left[\begin{matrix} 1 & 1 \\ 8 & 8\end{matrix}\right], w[0,2,:,:] = \left[\begin{matrix} 7 & 1 \\ 2 & 8\end{matrix}\right] \\ w[1,0,:,:] = \left[\begin{matrix} 5 & 6 \\ 9 & 3\end{matrix}\right], w[1,1,:,:] = \left[\begin{matrix} 2 & 1 \\ 9 & 1\end{matrix}\right], w[1,2,:,:] = \left[\begin{matrix} 8 & 4 \\ 3 & 6\end{matrix}\right] w[0,0,:,:]=[5598],w[0,1,:,:]=[1818],w[0,2,:,:]=[7218]w[1,0,:,:]=[5963],w[1,1,:,:]=[2911],w[1,2,:,:]=[8346]
偏置系数为b=[1,2]b=[1, 2]b=[1,2]

则输出特征图为
z[i,j,:,:]=∑k=02a[i,k,:,:]∗w[j,k,:,:]+b[j] z[i,j,:,:] = \sum^{2}_{k=0}a[i,k,:,:]*w[j,k,:,:] + b[j] z[i,j,:,:]=k=02a[i,k,:,:]w[j,k,:,:]+b[j]
结果为:
z[0,0,:,:]=[296250288277280294302297315],z[0,1,:,:]=[291252263230267239223283257] z[0,0,:,:] = \left[\begin{matrix} 296 & 250 & 288 \\ 277 & 280 & 294\\ 302 & 297 & 315 \end{matrix}\right], z[0,1,:,:] = \left[\begin{matrix} 291 & 252 & 263 \\ 230 & 267 & 239 \\ 223 & 283 & 257 \end{matrix}\right] z[0,0,:,:]=296277302250280297288294315,z[0,1,:,:]=291230223252267283263239257
为了便于梯度反向传播计算,我们将对卷积核与输入特征图进行变换,将卷积运算转化为矩阵乘法运算,其中卷积核转为(3∗2∗2,2)(3*2*2,2)(322,2)的矩阵,
wt=w.reshape(−1,2).T=[595811887128569321918436]T w_t = w.reshape(-1,2).T= \left[\begin{matrix} 5&9&5&8&1&1&8&8&7&1&2&8\\ 5&6&9&3&2&1&9&1&8&4&3&6 \end{matrix}\right]^T wt=w.reshape(1,2).T=[559659831211898178142386]T
输入特征图转为(1,9,12)(1,9,12)(1,9,12)的矩阵,
at[0]=[164391538646663416326961624367269516433653574623346732716133437726171638361557682335675171853353771217583836] a_t[0]=\left[\begin{matrix} 1& 6& 4& 3& 9& 1& 5& 3& 8& 6& 4& 6 \\ 6& 6& 3& 4& 1& 6& 3& 2& 6& 9& 6& 1\\ 6& 2& 4& 3& 6& 7& 2& 6& 9& 5& 1& 6\\ 4& 3& 3& 6& 5& 3& 5& 7& 4& 6& 2& 3\\ 3& 4& 6& 7& 3& 2& 7& 1& 6& 1& 3& 3\\ 4& 3& 7& 7& 2& 6& 1& 7& 1& 6& 3& 8\\ 3& 6& 1& 5& 5& 7& 6& 8& 2& 3& 3& 5\\ 6& 7& 5& 1& 7& 1& 8& 5& 3& 3& 5& 3\\ 7& 7& 1& 2& 1& 7& 5& 8& 3& 8& 3& 6 \end{matrix}\right]\\ at[0]=166434367662343677434367151343677512916532571167326717532571685326717858869461233695616338461233353616338536
zt[0]=at[0]wtz_t[0] = a_t[0]w_tzt[0]=at[0]wt,其中zt[0]z_t[0]zt[0]的每一列对应于卷积输出的第一张特征图的每一个通道的值,所以z=zt.transpose([0,2,1]).rehsape(1,2,3,3)+bz=z_t.transpose([0, 2, 1]).rehsape(1,2,3,3)+bz=zt.transpose([0,2,1]).rehsape(1,2,3,3)+b


符号约定:假设每一层的神经元数目(输出特征图通道数)为nln^lnl,共LLL层,其中n0n^0n0为输入图像的通道数目。

al−1a^{l-1}al1lll层卷积层的输入特征图,维度为(B,Cl−1,Hl−1,Wl−1)(B,C^{l-1},H^{l-1},W^{l-1})(B,Cl1,Hl1,Wl1)

zlz^lzl 卷积输出结果,未经过激活函数,维度为(B,Cl,Hl,Wl)(B,C^l,H^l,W^l)(B,Cl,Hl,Wl)

h(z)h(z)h(z) 激活函数;

wlw^lwl 卷积核,维度为(Cl,Cl−1,hl,wl)(C^{l}, C^{l-1}, h^l, w^l)(Cl,Cl1,hl,wl)

blb^lbl 偏置系数,维度为(Cl,)(C^l,)(Cl,)

卷积输出结果为,
zl[i,j,:,:]=∑k=0Cl−1−1al−1[i,k,:,:]∗wl[j,k,:,:]+bl[j]i=0,⋯ ,B−1,j=0,⋯ ,Cl−1 z^{l}[i,j,:,:] = \sum^{C^{l-1}-1}_{k=0}a^{l-1}[i,k,:,:]*w^l[j,k,:,:] + b^l[j] \quad i=0,\cdots,B-1,j=0,\cdots,C^l-1 zl[i,j,:,:]=k=0Cl11al1[i,k,:,:]wl[j,k,:,:]+bl[j]i=0,,B1,j=0,,Cl1
将卷积处理过程转化为矩阵乘积,
atl−1=trans(al−1),dim=(B,Hl⋅Wl,hl⋅wl⋅Cl−1) a_t^{l-1} = trans(a^{l-1}), dim=(B, H^l\cdot W^l,h^l\cdot w^l\cdot C^{l-1}) atl1=trans(al1),dim=(B,HlWl,hlwlCl1)
其中trans(al−1)trans(a^{l-1})trans(al1)为将每一个卷积核停留处的对应数值展成一行,存储在atl−1a^{l-1}_tatl1中。
wt=w.reshape(−1,Cl).T w_t = w.reshape(-1, C^l).T wt=w.reshape(1,Cl).T

ztl=atl−1wtzl=ztl.transpose([0,2,1]).reshape(B,Cl,Hl,Wl)+b.reshape(1,−1,1,1) z_t^l = a^{l-1}_tw_t \\ z^l = z_t^l.transpose([0, 2, 1]).reshape(B, C^l, H^l,W^l)+b.reshape(1, -1, 1, 1) ztl=atl1wtzl=ztl.transpose([0,2,1]).reshape(B,Cl,Hl,Wl)+b.reshape(1,1,1,1)

def forward(self, inputs):
    inputs = self.pad(inputs)
    self.input_shape = inputs.shape
    self.batch_size, in_channels, self.H_in, self.W_in = inputs.shape
    assert in_channels == self.in_channels, 'inputs dim1({}) is not equal to convolutional in_channels({})'.format(in_channels, self.in_channels)

    self.H_out = (inputs.shape[2] - self.kernel_size[0]) // self.stride[0] + 1
    self.W_out = (inputs.shape[3] - self.kernel_size[1]) // self.stride[1] + 1

    self.input_trans = np.empty((self.batch_size, self.H_out * self.W_out, self.kernel_trans.shape[0]))

    ind = 0
    h = 0
    while (h + self.kernel_size[0] <= inputs.shape[2]):
        w = 0
        while (w + self.kernel_size[1] <= inputs.shape[3]):
            self.input_trans[:, ind, :] = inputs[:, :, h:h + self.kernel_size[0], w:w + self.kernel_size[1]].reshape(self.batch_size, -1)
            w += self.stride[1]
            ind += 1
            h += self.stride[0]

            output = self.input_trans @ self.kernel_trans
            output = output.transpose([0, 2, 1]).reshape(self.batch_size, self.out_channels, self.H_out, self.W_out)
            if self.bias is not None:
                output += self.bias.reshape(1, -1, 1, 1)
	return self.input_trans, output

反向传播

与全连接层类似,在进行梯度反向传播过程中,计算损失函数对zlz^lzl的反向传播误差,然后再计算对卷积核及偏置的导数。

假设输入特征图为
a=[a11a12a13a14a21a22a23a24a31a32a33a34a41a42a43a44] a = \left[\begin{matrix} a_{11}&a_{12}&a_{13}&a_{14} \\a_{21}&a_{22}&a_{23}&a_{24} \\a_{31}&a_{32}&a_{33}&a_{34}\\a_{41}&a_{42}&a_{43}&a_{44}\end{matrix}\right] a=a11a21a31a41a12a22a32a42a13a23a33a43a14a24a34a44
卷积核为
w=[w11w12w21w22] w = \left[\begin{matrix}w_{11}&w_{12}\\w_{21}&w_{22}\end{matrix}\right] w=[w11w21w12w22]
卷积步长为(1,1)(1,1)(1,1),损失函数对卷积结果的反向传播误差为:
δ=[δ11δ12δ13δ21δ22δ23δ31δ32δ33] \delta = \left[\begin{matrix} \delta_{11}&\delta_{12}&\delta_{13} \\ \delta_{21}&\delta_{22}&\delta_{23} \\\delta_{31}&\delta_{32}&\delta_{33} \end{matrix}\right] δ=δ11δ21δ31δ12δ22δ32δ13δ23δ33
则损失函数对输入特征图的梯度为:
[w11δ11w11δ12+w12δ11w12δ12+w11δ13w12δ13w21δ11+w11δ21w22δ11+w21δ12+w12δ21+w11δ22w22δ12+w21δ13+w12δ22+w11δ23w21δ13+w11δ23w21δ21+w11δ31w22δ21+w21δ22+w12δ31+w11δ32w22δ22+w21δ23+w12δ32+w11δ33w21δ23+w11δ33w21δ31w22δ31+w21δ32w22δ32+w21δ33w22δ33] \left[\begin{matrix} w_{11}\delta_{11} & w_{11}\delta_{12}+w_{12}\delta_{11}& w_{12}\delta_{12}+w_{11}\delta_{13}& w_{12}\delta_{13} \\ w_{21}\delta_{11}+w_{11}\delta_{21}& w_{22}\delta_{11}+w_{21}\delta_{12}+w_{12}\delta_{21}+w_{11}\delta_{22}& w_{22}\delta_{12}+w_{21}\delta_{13}+w_{12}\delta_{22}+w_{11}\delta_{23}& w_{21}\delta_{13}+w_{11}\delta_{23} \\ w_{21}\delta_{21}+w_{11}\delta_{31}& w_{22}\delta_{21}+w_{21}\delta_{22}+w_{12}\delta_{31}+w_{11}\delta_{32}& w_{22}\delta_{22}+w_{21}\delta_{23}+w_{12}\delta_{32}+w_{11}\delta_{33}& w_{21}\delta_{23}+w_{11}\delta_{33} \\ w_{21}\delta_{31} & w_{22}\delta_{31}+w_{21}\delta_{32}& w_{22}\delta_{32}+w_{21}\delta_{33}& w_{22}\delta_{33} \end{matrix}\right] w11δ11w21δ11+w11δ21w21δ21+w11δ31w21δ31w11δ12+w12δ11w22δ11+w21δ12+w12δ21+w11δ22w22δ21+w21δ22+w12δ31+w11δ32w22δ31+w21δ32w12δ12+w11δ13w22δ12+w21δ13+w12δ22+w11δ23w22δ22+w21δ23+w12δ32+w11δ33w22δ32+w21δ33w12δ13w21δ13+w11δ23w21δ23+w11δ33w22δ33
即将对输出图的误差进行0填充后,与卷积核旋转180度后进行卷积,即获得损失函数对输入特征图的误差。

所以当已知δl+1\delta^{l+1}δl+1时,计算δl\delta^lδl
δl=δl+1∗ROT180(wl+1)⊙∂al∂zl \delta^l = \delta^{l+1}*ROT180(w^{l+1})\odot \frac{\partial a^l}{\partial z^l} δl=δl+1ROT180(wl+1)zlal


在这里同样换一种思路,根据上一小节的转换后的前向传播公式,我们可以计算损失函数对atl−1a^{l-1}_tatl1的反向传播误差,然后将结果转为对应的al−1a^{l-1}al1。已知δl\delta^lδl
δtl=δl.transpose([0,2,3,1]).reshape(B,Hl⋅Wl,Cl) \delta^l_t = \delta^l.transpose([0, 2, 3, 1]).reshape(B, H^l\cdot W^l,C^l)\\ δtl=δl.transpose([0,2,3,1]).reshape(B,HlWl,Cl)
计算δtl,dim=(B,HlWl,Cl)\delta^l_t,dim=(B,H^lW^l,C^l)δtl,dim=(B,HlWl,Cl)(wtl)T,dim=(Cl,Cl−1hlwl)(w^l_t)^T,dim=(C^l,C^{l-1}h^lw^l)(wtl)T,dim=(Cl,Cl1hlwl)的张量乘积即可获得损失函数对atl−1a^{l-1}_tatl1的梯度信息。
∂C∂atl−1=np.tensordot(δtl,(wtl)T,[(2),(0)]) \frac{\partial C}{\partial a^{l-1}_t} = np.tensordot(\delta^l_t,(w^l_t)^T, [(2),(0)]) atl1C=np.tensordot(δtl,(wtl)T,[(2),(0)])
其中[(2),(0)][(2),(0)][(2),(0)]表示对δtl\delta^l_tδtl的第3维度和(wtl)T(w^l_t)^T(wtl)T的第1维度进行计算,结果的维度为(B,Hl,Wl,Cl−1hlwl)(B,H^l,W^l, C^{l-1}h^lw^l)(B,Hl,Wl,Cl1hlwl)

将获得中间误差信息反向变换(映射到同一位置处执行加法运算)可以获得损失函数对al−1a^{l-1}al1的梯度。

def backward(self, grad):
	grad_trans = grad.transpose([0, 2, 3, 1]).reshape(self.batch_size, -1, self.out_channels)
	grad_backward_trans = np.tensordot(grad_trans, self.kernel_trans.T, [(2), [0]])
	grad_backward = np.zeros(self.input_shape)

	ind = 0
	for ih in range(grad.shape[2]):
		begin_h = ih * self.stride[0]
		for iw in range(grad.shape[3]):
			begin_w = iw * self.stride[1]
			grad_backward[:, :, begin_h:(begin_h+self.kernel_size[0]), begin_w:(begin_w+self.kernel_size[1])] += \
			grad_backward_trans[:, ind, :].reshape(self.batch_size, self.in_channels, self.kernel_size[0], self.kernel_size[1])
			ind += 1
	grad_backward = grad_backward[:, :, self.padding[0]:self.input_shape[2]-self.padding[0], self.padding[1]:self.input_shape[3]-self.padding[1]]
	# print(grad_backward.shape)

	self.grad_k_trans = np.tensordot(self.input_trans, grad_trans, [(0, 1), (0, 1)])
	if self.bias is not None:
		self.grad_b = np.sum(grad_trans, axis=(0, 1)).reshape(1, -1)
	return grad_backward

已知δtl\delta^l_tδtl时,计算损失函数对wtl,blw^l_t,b^lwtl,bl的梯度,
∂C∂wtl=np.tensordot(atl−1,δtl,[(0,1),(0,1)])∂C∂bl=np.sum(δtl,axis=(0,1)) \frac{\partial C}{\partial w^l_t} = np.tensordot(a^{l-1}_t,\delta^l_t,[(0,1),(0,1)]) \\ \frac{\partial C}{\partial b^l} = np.sum(\delta^l_t, axis=(0,1)) wtlC=np.tensordot(atl1,δtl,[(0,1),(0,1)])blC=np.sum(δtl,axis=(0,1))
对于最大池化,可以利用相近的思想进行处理。

假设输入特征图aaa维度为(16,3,128,128)(16, 3, 128, 128)(16,3,128,128),卷积核www(8,3,3,3)(8, 3, 3, 3)(8,3,3,3),偏置bbb(8)(8)(8)步长为(1,1)(1,1)(1,1),不进行填充,则前向传播过程为:

在这里插入图片描述

获得输出特征图的反向传播误差δ(16,8,127,127)\delta(16, 8, 127,127)δ(16,8,127,127)后,计算对输入特征图的反向传播误差,
Font metrics not found for font: .
对mnist数据集进行分类,构建网络结构如下:

layers = [Conv2d(in_channels=1, out_channels=6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
          MaxPool2d(kernel_size=(2, 2), stride=(2, 2)),
          ReLU(),
          Conv2d(in_channels=6, out_channels=16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
          MaxPool2d(kernel_size=(2, 2), stride=(2, 2)),
          ReLU(),
          Flatten(),
          Linear(in_features=784, out_features=120),
          ReLU(),
          Linear(in_features=120, out_features=10)]

使用随机梯度下降进行梯度更新,完成5轮训练,训练损失变化曲线为

在这里插入图片描述

验证准确率变化曲线为:

在这里插入图片描述

测试集准确率为0.9798。

github:https://github.com/hui126/Deep_Learning_Coding

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值