0% found this document useful (0 votes)

59 views

Cost Function Loss Function

Uploaded by

dothiminhphuong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Cost Function Loss Function

Uploaded by

dothiminhphuong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Cost Function & Loss Function

What is a loss/Cost function?

‘Loss’ in Machine learning helps us understand the difference between the
predicted value & the actual value.
The Function used to quantify this loss during the training phase in the form of
a single real number is known as the “Loss Function”.
These are used in those supervised learning algorithms that use optimization
techniques. The terms cost function & loss function are analogous.
Loss function: Used when we refer to the error for a single training example.
Cost function: Used to refer to an average of the loss functions over an entire
training data.

Why Cost Function?

Consider a scenario where we wish to classify data. Suppose we have the
height & weight details of some cats & dogs. Let us use these 2 features to
classify them correctly.

If we plot these records, we get the following scatterplot:

Fig 1: Scatter Plot for Height & Weight

Cost Function & Loss Function 1

Fig 2: Probable Solutions of classification problems
Blue dots are cats and red dots are dogs.
Essentially all three classifiers have very high accuracy but the third solution is
the best because it does not misclassify any point. The reason why it classifies
all the points perfectly is that the line is almost exactly in between the two
groups, and not closer to any one of the groups. This is where the concept of
cost function comes in. Cost function helps us reach the optimal solution. The
cost function is the technique of evaluating “the performance of our
algorithm/model”.
It takes both predicted outputs by the model and actual outputs and calculates
how much wrong the model was in its prediction. It outputs a higher number if
our predictions differ a lot from the actual values. As we tune our model to
improve the predictions, the cost function acts as an indicator of how the model
has improved. This is essentially an optimization problem. The optimization
strategies always aim at “minimizing the cost function”

Types of the cost function

There are many cost functions in machine learning and each has its use cases
depending on whether it is a regression problem or classification problem.

1. Regression cost Function

2. Binary Classification cost Functions

3. Multi-class Classification cost Functions

1. Regression cost Function:

Regression models deal with predicting a continuous value for example salary
of an employee, price of a car, loan prediction, etc. A cost function used in the

Cost Function & Loss Function 2

regression problem is called “Regression Cost Function”. They are calculated
on the distance-based error as follows:

The most used Regression cost functions are below,

1.1 Mean Error (ME)

In this cost function, the error for each training data is calculated and then
the mean value of all these errors is derived.

Calculating the mean of the errors is the simplest and most intuitive way
possible.

The errors can be both negative and positive. So they can cancel each
other out during summation giving zero mean error for the model.

Thus this is not a recommended cost function but it does lay the foundation
for other cost functions of regression models.

1.2 Mean Squared Error (MSE)

This improves the drawback we encountered in Mean Error above. Here a

square of the difference between the actual and predicted value is
calculated to avoid any possibility of negative error.

It is measured as the average of the sum of squared differences between

predictions and actual observations.

Mean Squared Error

Cost Function & Loss Function 3

MSE = (sum of squared errors)/n

It is also known as L2 loss.

In MSE, since each error is squared, it helps to penalize even small

deviations in prediction when compared to MAE. But if our dataset has
outliers that contribute to larger prediction errors, then squaring this error
further will magnify the error many times more and also lead to higher MSE
error.

Hence we can say that it is less robust to outliers

1.3 Mean Absolute Error (MAE)

This cost function also addresses the shortcoming of mean error

differently. Here an absolute difference between the actual and predicted
value is calculated to avoid any possibility of negative error.

So in this cost function, MAE is measured as the average of the sum of

absolute differences between predictions and actual observations.

Mean Absolute Error

MAE = (sum of absolute errors)/n

It is also known as L1 Loss.

It is robust to outliers thus it will give better results even when our dataset
has noise or outliers.

2. Cost functions for Classification problems

Cost functions used in classification problems are different than what we use in
the regression problem. A commonly used loss function for classification is
cross-entropy loss. Let us understand cross-entropy with a small example.
Consider that we have a classification problem of 3 classes as follows.
Class(Orange,Apple,Tomato)

Cost Function & Loss Function 4

The machine learning model will give a probability distribution of these 3
classes as output for a given input data. The class with the highest probability
is considered as a winner class for prediction.

Output = [P(Orange),P(Apple),P(Tomato)]
The actual probability distribution for each class is shown below.

Orange = [1,0,0]

Apple = [0,1,0]

Tomato = [0,0,1]
If during the training phase, the input class is Tomato, the predicted probability
distribution should tend towards the actual probability distribution of Tomato. If
the predicted probability distribution is not closer to the actual one, the model
has to adjust its weight. This is where cross-entropy becomes a tool to
calculate how much far the predicted probability distribution from the actual
one is. In other words, Cross-entropy can be considered as a way to measure
the distance between two probability distributions. The following image
illustrates the intuition behind cross-entropy:

Fig 3: Intuition behind cross-entropy

This was just an intuition behind cross-entropy. It has its origin in information
theory. Now with this understanding of cross-entropy, let us now see the

Cost Function & Loss Function 5

classification cost functions.

2.1 Multi-class Classification cost Functions

This cost function is used in classification problems where there are multiple
classes and input data belongs to only one class. Let us now understand how
cross-entropy is calculated. Let us assume that the model gives the probability
distribution as below for ’n’ classes & for a particular input data D.

And the actual or target probability distribution of the data D is

Then cross-entropy for that particular data D is calculated as

Cross-entropy loss(y, p) = —yT l og(p)

Cross-entropy loss(y, p) = −(y1 l og(p1 ) + y2 l og(p2 ) + …… + yn l og(pn ))

Let us now define the cost function using the above example (Refer cross
entropy image -Fig3),

p(Tomato) = [0.1, 0.3, 0.6]

y(Tomato) = [0, 0, 1]
Cross − Entropy(y, P ) = —(0 ∗ Log(0.1) + 0 ∗ Log(0.3) + 1 ∗
Log(0.6)) = 0.51
The above formula just measures the cross-entropy for a single observation or
input data. The error in classification for the complete model is given by
categorical cross-entropy which is nothing but the mean of cross-entropy for
all N training data.

Cost Function & Loss Function 6

Categorical Cross-Entropy = (Sum of Cross-Entropy for N data)/N
2.2 Binary Cross Entropy Cost Function

Binary cross-entropy is a special case of categorical cross-entropy when there

is only one output that just assumes a binary value of 0 or 1 to denote negative
and positive classes respectively. For example-classification between cat &
dog.
Let us assume that actual output is denoted by a single variable y, then cross-
entropy for a particular data D is can be simplified as follows –

when y = 1 → Cross-entropy(D) = — y*log(p)

when y = 0 → Cross-entropy(D) = — (1-y)*log(1-p)
The error in binary classification for the complete model is given by binary
cross-entropy which is nothing but the mean of cross-entropy for all N training
data.
Binary Cross-Entropy = (Sum of Cross-Entropy for N data)/N

Cost Function & Loss Function 7

UNIT4 CostFunctions
No ratings yet
UNIT4 CostFunctions
23 pages
Top 7 Loss Functions to Evaluate Regression Models
No ratings yet
Top 7 Loss Functions to Evaluate Regression Models
8 pages
chp2 cost functions
No ratings yet
chp2 cost functions
7 pages
Cost Function in Machine Learning - Javatpoint
No ratings yet
Cost Function in Machine Learning - Javatpoint
9 pages
Loss functions
No ratings yet
Loss functions
29 pages
Lecture 4 - Cost Function
No ratings yet
Lecture 4 - Cost Function
18 pages
Module 6_Loss Function
No ratings yet
Module 6_Loss Function
22 pages
04 LossFunctions
No ratings yet
04 LossFunctions
22 pages
loss function
No ratings yet
loss function
23 pages
Cost Function
100% (1)
Cost Function
21 pages
Detailed Guide 7 Loss Functions Machine Learning Python Code
No ratings yet
Detailed Guide 7 Loss Functions Machine Learning Python Code
16 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
3 - Loss Functions
No ratings yet
3 - Loss Functions
14 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Loss Functions
No ratings yet
Loss Functions
7 pages
Assignment 1 - Machine Learning
No ratings yet
Assignment 1 - Machine Learning
9 pages
DL Practical 3 Loss Function
No ratings yet
DL Practical 3 Loss Function
6 pages
Cost Function
No ratings yet
Cost Function
2 pages
Losses
No ratings yet
Losses
9 pages
Machine Learning
No ratings yet
Machine Learning
60 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
Lecture3_Linear Regression and Logistic Regression
No ratings yet
Lecture3_Linear Regression and Logistic Regression
60 pages
WINSEM2024-25_BCSE332L_TH_VL2024250502026_2025-01-02_Reference-Material-I
No ratings yet
WINSEM2024-25_BCSE332L_TH_VL2024250502026_2025-01-02_Reference-Material-I
30 pages
Lect 9- Loss Functions
No ratings yet
Lect 9- Loss Functions
28 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
12 pages
Notes on logistic regression
No ratings yet
Notes on logistic regression
3 pages
Loss Functions Types
No ratings yet
Loss Functions Types
11 pages
ML-UNIT-3
No ratings yet
ML-UNIT-3
46 pages
Deep Learning(Part 2). Loss Function and Gradient Function _ by Sumbatilinda _ Medium
No ratings yet
Deep Learning(Part 2). Loss Function and Gradient Function _ by Sumbatilinda _ Medium
30 pages
CM20315 05 Loss
No ratings yet
CM20315 05 Loss
100 pages
1 Intro
No ratings yet
1 Intro
5 pages
Notes-1
No ratings yet
Notes-1
3 pages
2 Softmaxregression
No ratings yet
2 Softmaxregression
4 pages
ML-UNIT-3-1
No ratings yet
ML-UNIT-3-1
57 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
Unit2_ML
No ratings yet
Unit2_ML
79 pages
Loss
No ratings yet
Loss
18 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
4-Loss Function
No ratings yet
4-Loss Function
8 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
ML Coursera
No ratings yet
ML Coursera
10 pages
GR_1_report_week_7
No ratings yet
GR_1_report_week_7
6 pages
8 Linear Classifiers HInge Loss 03-08-2024
No ratings yet
8 Linear Classifiers HInge Loss 03-08-2024
20 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
43 pages
Linear Regression: Level:4 Department: IT, Security
No ratings yet
Linear Regression: Level:4 Department: IT, Security
35 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000689_2025-01-09_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000689_2025-01-09_Reference-Material-I
15 pages
ML_AI
No ratings yet
ML_AI
53 pages
Loss Functions
No ratings yet
Loss Functions
26 pages
lecture7-linear-regression
No ratings yet
lecture7-linear-regression
36 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Cost Function For Logistic Regression
No ratings yet
Cost Function For Logistic Regression
42 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
loss-functions
No ratings yet
loss-functions
8 pages
05 AIS302 ANN-Optimization
No ratings yet
05 AIS302 ANN-Optimization
44 pages
cost function
No ratings yet
cost function
3 pages
Lect 8
No ratings yet
Lect 8
117 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
AdventureWorks_Project
No ratings yet
AdventureWorks_Project
5 pages
_Câu hỏi ôn tập 3
No ratings yet
_Câu hỏi ôn tập 3
3 pages
Final Capstone BI Project (1)
No ratings yet
Final Capstone BI Project (1)
7 pages
Practical 08 Solutions
No ratings yet
Practical 08 Solutions
6 pages
GeM Bidding 6785234
No ratings yet
GeM Bidding 6785234
5 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
55 pages
Arline Industry: Appicaion of Business Analytics and Intelligence in Airline Industry
No ratings yet
Arline Industry: Appicaion of Business Analytics and Intelligence in Airline Industry
51 pages
1 s2.0 S1877050919302789 Main
No ratings yet
1 s2.0 S1877050919302789 Main
7 pages
Storm Baylis Heckelei MLReview Paper April 2019
No ratings yet
Storm Baylis Heckelei MLReview Paper April 2019
40 pages
Ijser: Mathematical Modeling in Finance
No ratings yet
Ijser: Mathematical Modeling in Finance
21 pages
Wa0023.
No ratings yet
Wa0023.
141 pages
Credit Card Default
No ratings yet
Credit Card Default
30 pages
Lead Time Forecasting With Machine Learning Techniques For A Pharmaceutical Supply Chain
No ratings yet
Lead Time Forecasting With Machine Learning Techniques For A Pharmaceutical Supply Chain
8 pages
Machine Learning Techniques Applied To Prediction PDF
No ratings yet
Machine Learning Techniques Applied To Prediction PDF
13 pages
Evaluation Metrics for Your Regression Model - Analytics Vidhya
No ratings yet
Evaluation Metrics for Your Regression Model - Analytics Vidhya
6 pages
House File
No ratings yet
House File
30 pages
Secrets of RLHF in Large Language Models Part I: Ppo: Fudan NLP Group Bytedance Inc
100% (1)
Secrets of RLHF in Large Language Models Part I: Ppo: Fudan NLP Group Bytedance Inc
32 pages
Steel 2
No ratings yet
Steel 2
39 pages
Exam - 1 - OSTA
No ratings yet
Exam - 1 - OSTA
17 pages
MLT Lab Manual
No ratings yet
MLT Lab Manual
41 pages
Articolo 2023
No ratings yet
Articolo 2023
9 pages
Causal Forecasting For Pricing
No ratings yet
Causal Forecasting For Pricing
17 pages
Python Tutorial
No ratings yet
Python Tutorial
37 pages
Assignment 4
No ratings yet
Assignment 4
1 page
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
No ratings yet
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
6 pages
World Population Analysis Machine Learning Project (Data Analyst)
No ratings yet
World Population Analysis Machine Learning Project (Data Analyst)
27 pages
ch17 Solutions
No ratings yet
ch17 Solutions
71 pages
Assign3 Lasso
No ratings yet
Assign3 Lasso
3 pages
2. Measure of Dispersion or Variation
No ratings yet
2. Measure of Dispersion or Variation
5 pages
Volatility Is (Mostly) Path-Dependent
No ratings yet
Volatility Is (Mostly) Path-Dependent
46 pages
Yy 1 Xy
No ratings yet
Yy 1 Xy
4 pages
InCCCS 2024 Stock Price Prediction
No ratings yet
InCCCS 2024 Stock Price Prediction
7 pages
QBANK_ML
No ratings yet
QBANK_ML
6 pages
Machine Learning Project Basic - Linear Regression - Kaggle PDF
No ratings yet
Machine Learning Project Basic - Linear Regression - Kaggle PDF
10 pages

Cost Function Loss Function

Uploaded by

Cost Function Loss Function

Uploaded by

Cost Function & Loss Function

What is a loss/Cost function?

Why Cost Function?

If we plot these records, we get the following scatterplot:

Fig 1: Scatter Plot for Height & Weight

Cost Function & Loss Function 1

Types of the cost function

1. Regression cost Function

2. Binary Classification cost Functions

3. Multi-class Classification cost Functions

1. Regression cost Function:

Cost Function & Loss Function 2

The most used Regression cost functions are below,

1.1 Mean Error (ME)

1.2 Mean Squared Error (MSE)

This improves the drawback we encountered in Mean Error above. Here a

It is measured as the average of the sum of squared differences between

Mean Squared Error

Cost Function & Loss Function 3

It is also known as L2 loss.

In MSE, since each error is squared, it helps to penalize even small

Hence we can say that it is less robust to outliers

1.3 Mean Absolute Error (MAE)

This cost function also addresses the shortcoming of mean error

So in this cost function, MAE is measured as the average of the sum of

Mean Absolute Error

MAE = (sum of absolute errors)/n

It is also known as L1 Loss.

2. Cost functions for Classification problems

Cost Function & Loss Function 4

Fig 3: Intuition behind cross-entropy

Cost Function & Loss Function 5

2.1 Multi-class Classification cost Functions

And the actual or target probability distribution of the data D is

Then cross-entropy for that particular data D is calculated as

Cross-entropy loss(y, p) = —yT l og(p)﻿

Cross-entropy loss(y, p) = −(y1 l og(p1 ) + y2 l og(p2 ) + …… + yn l og(pn ))﻿

p(Tomato) = [0.1, 0.3, 0.6]

Cost Function & Loss Function 6

Binary cross-entropy is a special case of categorical cross-entropy when there

when y = 1 → Cross-entropy(D) = — y*log(p)

Cost Function & Loss Function 7

You might also like

Cross-entropy loss(y, p) = —yT l og(p)

Cross-entropy loss(y, p) = −(y1 l og(p1 ) + y2 l og(p2 ) + …… + yn l og(pn ))