[Coursera][Stanford] Machine Learning Week 5

最新推荐文章于 2022-08-18 18:00:11 发布

Labanan

最新推荐文章于 2022-08-18 18:00:11 发布

阅读量1.2k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：公开课文章标签： MOOC machine learning 神经网络

本文链接：https://blog.csdn.net/u014789266/article/details/38756845

公开课专栏收录该内容

7 篇文章

订阅专栏

本文详细介绍了神经网络的学习过程，包括成本函数、反向传播算法等内容，并通过编程练习深入理解前向传播、梯度检查等关键步骤。文章还探讨了正则化成本函数及其在神经网络训练中的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

时间：8月20日---25日

本周介绍了神经网络（Neural Networks）的学习，包括Cost Function, Backpropagation Algorithm(反向传播算法）来最小化J、Forward propagation 、 Gradient checking、 Random Initialzation。

Programming Exercise 4:

Neural Networks Learning

1 Neural Networks

1.3 Feedforward and cost function

在课程论坛TA关于Forward Propagation 的提示：

perform the forward propagation:
a1 equals the X input matrix with a column of 1's added (bias units)
z2 equals the product of a1 and Θ1
a2 is the result of passing z2 through g()
a2 then has a column of 1st added (bias units)
z3 equals the product of a2 and Θ2
a3 is the result of passing z3 through g()

对于求J(Θ)需先求h,根据提议要求3层神经网络Θ1为25*401，a1为5000*401，得a2为5000*26，Θ2为10*26,求得h最终为5000*10

Q: 为什么最终求的h要是5000*10呢？10*5000不可以吗。。。（得到错误答案304.799133 ）

根据题意y为5000*1，在本习题的Neural Network中y（输出为10位）的表示不同于十进制，比如5表示为0000100000.因此需要把y由5000*1的向量转换为5000*10的矩阵，然后与h点乘。即：

Update: Remember to use element-wise multiplication with the log() function.

即可得正确结果。（此处不需要regularized）

a = sigmoid([ones(m,1) X] * Theta1');
h = sigmoid([ones(size(a,1), 1)  a] * Theta2');

y_matrix = zeros(size(h));
for i = 1:size(y_matrix,1)
    for j = 1:size(y_matrix,2)
        if j == y(i)
            y_matrix(i,j) = 1;
        end
    end
end

J = - (1 / m) * sum(sum(y_matrix .* log(h) + (1 - y_matrix) .* log(1 - h)));

</pre><pre name="code" class="plain">% 2nd
% tmp_eye=eye(num_labels);
% y_matrix=tmp_eye(y,:);

1.4 Regularized cost function

此处我无比无语的犯了一个无比无语的错误。。。我把Theta1写了两遍还死活找不出错误白白耽误了好久好久。。。。。。

J = J + (lambda / (2 * m)) * (sum(sum(Theta1(:,2:end) .^ 2)) + sum(sum(Theta2(:,2:end) .^ 2)));

2 Backpropagation

2.1 Sigmoid gradient

也就是求一下导数。g‘(z) = g(z) = g(z)(1 − g(z))

g = sigmoid(z) .* (1 - sigmoid(z));

2.3 Backpropagation

Now we work from the output layer back to the hidden layer, calculating how bad the errors are.

注意循环m次，a1为向量X(i;1) ，yk为向量，所求Delta1为25*401，Delta2为10*26（？），注意在此处for循环中为各种向量。。。我已经要被向量或矩阵搞疯了

Delta1 = zeros(hidden_layer_size, input_layer_size+1);
Delta2 = zeros(num_labels, hidden_layer_size+1);
for i = 1:m
    %	Compute activations
    a1 = X(i,:)';
    a1 = [1;a1];
    a2 = sigmoid(Theta1 * a1);
    a2 = [1;a2];
    a3 = sigmoid(Theta2 * a2);
    % Compute delta (output layer)
    yk = zeros(num_labels,1);
	yk( y(i) ) = 1;
    d3 = a3 - yk;
    % Compute delta (hidden layer) 
    d2 = (Theta2' * d3) .* sigmoidGradient([1;Theta1 * a1]);
    % Accumulate the gradient
    d2 = d2(2:end);
    Delta2 = Delta2 + d3 * a2';
    Delta1 = Delta1 + d2 * a1';
end

Theta1_grad = (1 / m) * Delta1;
Theta2_grad = (1 / m) * Delta2;

2.5 Regularized Neural Networks

当j = 0 时不需要regularized，也就是theta的第一列不需要。

Theta1_grad = (1 / m) * Delta1 + (lambda / m) * [zeros(size(Theta1,1),1) Theta1(:,2:end)];
Theta2_grad = (1 / m) * Delta2 + (lambda / m) * [zeros(size(Theta2,1),1) Theta2(:,2:end)];