时间:8月20日---25日
本周介绍了神经网络(Neural Networks)的学习,包括Cost Function, Backpropagation Algorithm(反向传播算法)来最小化J、Forward propagation 、 Gradient checking、 Random Initialzation。
Neural Networks Learning
1 Neural Networks
1.3 Feedforward and cost function
在课程论坛TA关于Forward Propagation 的提示:
perform
the forward propagation:
a1 equals
the X input matrix with a column of 1's added (bias units)
z2 equals
the product of a1 and Θ1
a2 is
the result of passing z2 through g()
a2 then
has a column of 1st added (bias units)
z3 equals
the product of a2 and Θ2
a3 is
the result of passing z3 through g()
对于求J(Θ)需先求h,根据提议要求3层神经网络Θ1为25*401,a1为5000*401,得a2为5000*26,Θ2为10*26,求得h最终为5000*10
Q: 为什么最终求的h要是5000*10呢?10*5000不可以吗。。。(得到错误答案304.799133 )
根据题意y为5000*1,在本习题的Neural Network中y(输出为10位)的表示不同于十进制,比如5表示为0000100000.因此需要把y由5000*1的向量转换为5000*10的矩阵,然后与h点乘。即:
Update: Remember
to use element-wise multiplication with the log() function.
即可得正确结果。(此处不需要regularized)
a = sigmoid([ones(m,1) X] * Theta1');
h = sigmoid([ones(size(a,1), 1) a] * Theta2');
y_matrix = zeros(size(h));
for i = 1:size(y_matrix,1)
for j = 1:size(y_matrix,2)
if j == y(i)
y_matrix(i,j) = 1;
end
end
end
J = - (1 / m) * sum(sum(y_matrix .* log(h) + (1 - y_matrix) .* log(1 - h)));
</pre><pre name="code" class="plain">% 2nd
% tmp_eye=eye(num_labels);
% y_matrix=tmp_eye(y,:);
此处我无比无语的犯了一个无比无语的错误。。。我把Theta1写了两遍还死活找不出错误白白耽误了好久好久。。。。。。
J = J + (lambda / (2 * m)) * (sum(sum(Theta1(:,2:end) .^ 2)) + sum(sum(Theta2(:,2:end) .^ 2)));
2 Backpropagation
2.1 Sigmoid gradient
也就是求一下导数。g‘(z) = g(z) = g(z)(1 − g(z))
g = sigmoid(z) .* (1 - sigmoid(z));
2.3 Backpropagation
Now we work from the output layer back to the hidden layer, calculating how bad the errors are.
注意循环m次,a1为向量X(i;1) ,yk为向量,所求Delta1为25*401,Delta2为10*26(?),注意在此处for循环中为各种 向量。。。我已经要被向量或矩阵搞疯了
Delta1 = zeros(hidden_layer_size, input_layer_size+1);
Delta2 = zeros(num_labels, hidden_layer_size+1);
for i = 1:m
% Compute activations
a1 = X(i,:)';
a1 = [1;a1];
a2 = sigmoid(Theta1 * a1);
a2 = [1;a2];
a3 = sigmoid(Theta2 * a2);
% Compute delta (output layer)
yk = zeros(num_labels,1);
yk( y(i) ) = 1;
d3 = a3 - yk;
% Compute delta (hidden layer)
d2 = (Theta2' * d3) .* sigmoidGradient([1;Theta1 * a1]);
% Accumulate the gradient
d2 = d2(2:end);
Delta2 = Delta2 + d3 * a2';
Delta1 = Delta1 + d2 * a1';
end
Theta1_grad = (1 / m) * Delta1;
Theta2_grad = (1 / m) * Delta2;
当j = 0 时不需要regularized,也就是theta的第一列不需要。
Theta1_grad = (1 / m) * Delta1 + (lambda / m) * [zeros(size(Theta1,1),1) Theta1(:,2:end)];
Theta2_grad = (1 / m) * Delta2 + (lambda / m) * [zeros(size(Theta2,1),1) Theta2(:,2:end)];