Cost Function Loss Function
Cost Function Loss Function
In this cost function, the error for each training data is calculated and then
the mean value of all these errors is derived.
Calculating the mean of the errors is the simplest and most intuitive way
possible.
The errors can be both negative and positive. So they can cancel each
other out during summation giving zero mean error for the model.
Thus this is not a recommended cost function but it does lay the foundation
for other cost functions of regression models.
It is robust to outliers thus it will give better results even when our dataset
has noise or outliers.
Output = [P(Orange),P(Apple),P(Tomato)]
The actual probability distribution for each class is shown below.
Orange = [1,0,0]
Apple = [0,1,0]
Tomato = [0,0,1]
If during the training phase, the input class is Tomato, the predicted probability
distribution should tend towards the actual probability distribution of Tomato. If
the predicted probability distribution is not closer to the actual one, the model
has to adjust its weight. This is where cross-entropy becomes a tool to
calculate how much far the predicted probability distribution from the actual
one is. In other words, Cross-entropy can be considered as a way to measure
the distance between two probability distributions. The following image
illustrates the intuition behind cross-entropy:
Let us now define the cost function using the above example (Refer cross
entropy image -Fig3),
y(Tomato) = [0, 0, 1]
Cross − Entropy(y, P ) = —(0 ∗ Log(0.1) + 0 ∗ Log(0.3) + 1 ∗
Log(0.6)) = 0.51
The above formula just measures the cross-entropy for a single observation or
input data. The error in classification for the complete model is given by
categorical cross-entropy which is nothing but the mean of cross-entropy for
all N training data.