Backpropagation in Data Mining
Last Updated :
21 Apr, 2025
Backpropagation is a method used to train neural networks where the model learns from its mistakes. It works by measuring how wrong the output is and then adjust the weights step by step to make better predictions next time. In this artcle we will learn how backpropgation works in Data Mining.
Working of Backpropagation
Neural networks generate output vectors from input vectors on which neural network operates on. It compares generated output with the desired output and generates an error report if the result does not match the generated output vector. Then it adjusts the weights accordingly to get the desired output. It is based on gradient descent and updates weights by minimizing the error between predicted and actual output. Training of backpropagation consists of three stages:
- Forward propagation of input data.
- Backward propagation of error.
- Updating weights to reduce the error.
Let’s walk through an example of backpropagation in machine learning. Assume the neurons use the sigmoid activation function for the forward and backward pass. The target output is 0.5 and the learning rate is 1.

Example (1) of backpropagation sum
1. Forward Propagation
1. Initial Calculation
The weighted sum at each node is calculated using:
[Tex]a j =∑(w i ,j∗x i )[/Tex]
Where,
- [Tex]a_j[/Tex] is the weighted sum of all the inputs and weights at each node
- [Tex]w_{i,j}[/Tex] represents the weights between the [Tex]i^{th}[/Tex]input and the [Tex]j^{th}[/Tex] neuron
- [Tex]x_i[/Tex] represents the value of the [Tex]i^{th}[/Tex] input
o
(output): After applying the activation function to a,
we get the output of the neuron:
[Tex]o_j[/Tex] = activation function([Tex]a_j [/Tex])
2. Sigmoid Function
The sigmoid function returns a value between 0 and 1, introducing non-linearity into the model.
[Tex]y_j = \frac{1}{1+e^{-a_j}}[/Tex]

To find the outputs of y3, y4 and y5
3. Computing Outputs
At h1 node
[Tex]\begin {aligned}a_1 &= (w_{1,1} x_1) + (w_{2,1} x_2) \\& = (0.2 * 0.35) + (0.2* 0.7)\\&= 0.21\end {aligned}[/Tex]
Once we calculated the a1 value, we can now proceed to find the y3 value:
[Tex]y_j= F(a_j) = \frac 1 {1+e^{-a_1}}[/Tex]
[Tex]y_3 = F(0.21) = \frac 1 {1+e^{-0.21}}[/Tex]
[Tex]y_3 = 0.56[/Tex]
Similarly find the values of y4 at h2 and y5 at O3
[Tex]a_2 = (w_{1,2} * x_1) + (w_{2,2} * x_2) = (0.3*0.35)+(0.3*0.7)=0.315[/Tex]
[Tex]y_4 = F(0.315) = \frac 1{1+e^{-0.315}}[/Tex]
[Tex]a3 = (w_{1,3}*y_3)+(w_{2,3}*y_4) =(0.3*0.57)+(0.9*0.59) =0.702[/Tex]
[Tex]y_5 = F(0.702) = \frac 1 {1+e^{-0.702} } = 0.67 [/Tex]

Values of y3, y4 and y5
4. Error Calculation
Our actual output is 0.5 but we obtained 0.67. To calculate the error we can use the below formula:
[Tex]Error_j= y_{target} – y_5[/Tex]
[Tex]Error = 0.5 – 0.67 = -0.17[/Tex]
Using this error value we will be backpropagating.
2. Backpropagation
1. Calculating Gradients
The change in each weight is calculated as:
[Tex]\Delta w_{ij} = \eta \times \delta_j \times O_j[/Tex]
Where:
- [Tex]\delta_j[/Tex] is the error term for each unit,
- [Tex]\eta[/Tex] is the learning rate.
2. Output Unit Error
For O3:
[Tex]\delta_5 = y_5(1-y_5) (y_{target} – y_5) [/Tex]
[Tex] = 0.67(1-0.67)(-0.17) = -0.0376[/Tex]
3. Hidden Unit Error
For h1:
[Tex]\delta_3 = y_3 (1-y_3)(w_{1,3} \times \delta_5)[/Tex]
[Tex]= 0.56(1-0.56)(0.3 \times -0.0376) = -0.0027[/Tex]
For h2:
[Tex]\delta_4 = y_4(1-y_4)(w_{2,3} \times \delta_5) [/Tex]
[Tex]=0.59 (1-0.59)(0.9 \times -0.0376) = -0.0819[/Tex]
3. Weight Updates
For the weights from hidden to output layer:
[Tex]\Delta w_{2,3} = 1 \times (-0.0376) \times 0.59 = -0.022184[/Tex]
New weight:
[Tex]w_{2,3}(\text{new}) = -0.022184 + 0.9 = 0.877816[/Tex]
For weights from input to hidden layer:
[Tex]\Delta w_{1,1} = 1 \times (-0.0027) \times 0.35 = 0.000945[/Tex]
New weight:
[Tex]w_{1,1}(\text{new}) = 0.000945 + 0.2 = 0.200945[/Tex]
Similarly other weights are updated:
- [Tex]w_{1,2}(\text{new}) = 0.273225[/Tex]
- [Tex]w_{1,3}(\text{new}) = 0.086615[/Tex]
- [Tex]w_{2,1}(\text{new}) = 0.269445[/Tex]
- [Tex]w_{2,2}(\text{new}) = 0.18534[/Tex]
The updated weights are illustrated below
.png)
Through backward pass the weights are updated
After updating the weights the forward pass is repeated yielding:
- [Tex]y_3 = 0.57[/Tex]
- [Tex]y_4 = 0.56[/Tex]
- [Tex]y_5 = 0.61[/Tex]
Since [Tex]y_5 = 0.61[/Tex] is still not the target output the process of calculating the error and backpropagating continues until the desired output is reached.
This process demonstrates how backpropagation iteratively updates weights by minimizing errors until the network accurately predicts the output.
[Tex]Error = y_{target} – y_5[/Tex]
[Tex]= 0.5 – 0.61 = -0.11[/Tex]
This process is said to be continued until the actual output is gained by the neural network. Backpropagation is a technique that makes neural network learn. By propagating errors backward and adjusting the weights and biases neural networks can gradually improve their predictions.
Similar Reads
Aggregation in Data Mining
Aggregation in data mining is the process of finding, collecting, and presenting the data in a summarized format to perform statistical analysis of business schemes or analysis of human patterns. When numerous data is collected from various datasets, it's important to gather accurate data to provide
7 min read
Data Cleaning in Data Mining
Data Cleaning is the main stage of the data mining process, which allows for data utilization that is free of errors and contains all the necessary information. Some of them include error handling, deletion of records, and management of missing or incomplete records. Absolute data cleaning is necess
15+ min read
Backpropagation in Neural Network
Backpropagation is also known as "Backward Propagation of Errors" and it is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network. In this article we will explore what
10 min read
Classification-Based Approaches in Data Mining
Classification is that the processing of finding a group of models (or functions) that describe and distinguish data classes or concepts, for the aim of having the ability to use the model to predict the category of objects whose class label is unknown. The determined model depends on the investigat
5 min read
Associative Classification in Data Mining
Data mining is the process of discovering and extracting hidden patterns from different types of data to help decision-makers make decisions. Associative classification is a common classification learning method in data mining, which applies association rule detection methods and classification to c
7 min read
Back Propagation with TensorFlow
Backpropagation is a key method used to train neural networks by improving model accuracy. This article explains backpropagation, its working process and implementation in TensorFlow. Understanding BackpropagationBackpropagation is an algorithm that helps neural networks learn by reducing the error
5 min read
Active Learning in Data Mining
Active learning is an iterative type of supervised learning and this learning method is usually preferred if the data is highly available, yet the class labels are scarce or expensive to obtain. The learning algorithm queries the labels. The number of tuples that use Active learning for learning the
2 min read
Feature extraction in Data Mining
Data mining refers to extracting or mining knowledge from large amounts of data. In other words, Data mining is the science, art, and technology of discovering large and complex bodies of data in order to discover useful patterns. Theoreticians and practitioners are continually seeking improved tech
7 min read
Types of Association Rules in Data Mining
Association rule learning is a machine learning technique used for discovering interesting relationships between variables in large databases. It is designed to detect strong rules in the database based on some interesting metrics. For any given multi-item transaction, association rules aim to obtai
3 min read
Data Mining in R
Data mining is the process of discovering patterns and relationships in large datasets. It involves using techniques from a range of fields, including machine learning, statistics and database systems, to extract valuable insights and information from data. In this article, we will provide an overvi
3 min read