Alshammari 2024 Ijca 923446
Alshammari 2024 Ijca 923446
1. INTRODUCTION
In recent years, machine learning has played a major role in the ∆x Δy
m= (Slope)
development of computer systems. Machine learning (ML) is a Δx
branch of Artificial Intelligence (AI) which is focused on the c (Intercept)
study of computer algorithms to improve the performance of
computer programs [1-10].
x
Linear regression is one of the important applications of Fig 2: Equation of Line
machine learning. It is a common area that is sharing
knowledge between various fields such as machine learning, A line can have different shapes depending on the values of
programming, data science, mathematics, statistics, and slope and intercept as shown in the following diagram:
numerical methods [11-13, 14, 15, 16, 17].
Machine Mathematics
Learning
Linear Statistics
Programming Regression
Numerical
Data Science Methods
Different Slopes Different Intercepts
Fig 1: Field of Linear Regression
Fig 3: Different Shapes of Line
In this paper, linear regression is applied using least squares
and gradient descent to find the optimal line that best fits to the The fundamental concepts of linear regression using least
data points using a linear polynomial. Linear regression is squares and gradient descent are explained in the following
extensively used in data analysis. It has a wide range of section.
applications in many fields like: engineering, industry,
52
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.9, February 2024
Linear Regression: continues until the optimal solution, that provides the minimum
Linear regression is an algorithm used to find the line that best value to the error function, is reached.
fits to the data points. It is used to model the relationship
between the independent variable (x) and the dependent The concept of gradient descent is explained in the following
variable (y) using a linear polynomial of slope (m) and intercept diagram:
(c). Linear regression is widely used in data analysis because of
its simplicity and efficiency. E(p)
Observed Minimum
Value
Predicted
Initial Optimal p
Value Value
To find the optimal line that best fits to the data points, the The parameter (p) is updated by the following formula:
concept of least squares is used to minimize the variance
∂E
between the observed and predicted points. pnew = pold - α ( ) (2)
∂p
The concept of least squares is explained in the following
∂E
section. where (p) is the required parameter, (α) is the learning rate, ( )
∂p
is the partial derivative of error function (E) with respect to
Least Squares: parameter (p).
Least Squares is a mathematical method used to minimize the
error between the observed and predicted points. The error Now, applying that to the error function (MSE). The partial
function is defined by the Mean Squared Error (MSE). It is derivative of (MSE) with respect to slope (m) is given by the
computed by the following formula: following formula:
1 2 ∂MSE −1
MSE = ( ) ∑ (y − yp ) (1) = ( ) ∑ (y − yp ) (x) (3)
n ∂m n
1 2 2 2 And, the partial derivative of (MSE) with respect to intercept
= ( ) [(y0 − yp0 ) + (y1 − yp1 ) + … + (yn − yp𝑛 ) ]
n (c) is given by the following formula:
The differences between the observed and predicted points are ∂MSE −1
shown in the following diagram: = ( ) ∑ (y − yp ) (4)
∂c n
∆2 ∂MSE
mnew = mold − α ( ) (5)
∆3 ∂m
Fig 5: Explanation of Least Squares The steps of the gradient descent method are explained in the
following algorithm:
Gradient Descent:
Gradient descent is an optimization method used to find the Algorithm 1: Gradient Descent Method
optimal solution that provides the minimum value to the error # initialize slope
function. m=0
# initialize intercept
In general, the gradient descent method starts by giving an c=0
initial value to the required parameter (p). Then, the partial # learning rate
derivative of error function (E) with respect to parameter is α = 0.0001
used to update the value of parameter. The iterative process # number of iterations
53
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.9, February 2024
The R2 can take values in the range [0, 1], where (0) indicates
that the line does not fit to the data points and (1) indicates that
the line fits to the data points. Fig 8: Steps of Linear Regression
54
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.9, February 2024
Observed Points:
3. Computing Predicted Points: The observed points (X, Y) are printed as shown in the
The predicted points (X, Yp) are computed for the current following view:
values of slope (m) and intercept (c). It is done by the following
code: X Y
---------------------------------
for i in range(n): 0 25.128 53.454
Yp[i] = m*X[i] + c 1 31.588 50.393
2 32.502 31.707
3 32.669 45.571
4. Computing Partial Derivatives: 4 32.94 67.171
The partial derivative of error function with respect to slope 5 33.094 50.72
(dm) is computed by the following code: 6 33.645 69.9
7 33.864 52.725
def compute_dm(X, Y, Yp): 8 34.333 55.723
sum = 0 9 35.568 41.413
for i in range(n): 10 35.678 52.722
sum += (Y[i] - Yp[i])*X[i] ...
return (-1/n)*sum
Processing Gradient Descent:
The partial derivative of error function with respect to intercept The gradient descent method is processed (1,000) iterations.
(dc) is computed by the following code: For each iteration; the slope (m), intercept (c), and error
function (MSE) are printed as shown in the following view:
def compute_dc(Y, Yp):
sum = 0 t m c MSE
for i in range(n): ----------------------------------------
sum += (Y[i] - Yp[i]) 0 0.368535 0.007274 5565.107834
return (-1/n)*sum 100 1.478861 0.032102 112.648861
200 1.478802 0.035105 112.647057
300 1.478743 0.038107 112.645254
5. Updating Slope and Intercept: 400 1.478684 0.041108 112.643452
The values of slope (m) and intercept (c) are updated by the 500 1.478625 0.044107 112.641652
following code: 600 1.478566 0.047106 112.639853
700 1.478507 0.050103 112.638055
m = m – alpha * dm 800 1.478448 0.053099 112.636259
c = c – alpha * dc 900 1.47839 0.056095 112.634464
6. Computing Error Function: A summary of the results is shown in the following view:
The error function (MSE) is computed by the following code:
Nt = 1000
m = 1.478331327454368
def compute_MSE(Y, Yp): c = 0.0590585566568512
sum = 0 MSE = 112.63268870038829
for i in range(n): R2 = 0.59001
sum += (Y[i] - Yp[i])**2
return sum/n
Predicted Points:
7. Making Equation of Line: The final values of the predicted points (X, Yp) are printed as
shown in the following view:
The equation of line is obtained as shown in the following form:
X Yp
y = m * x + c ---------------------------------
0 25.128 37.207
8. Plotting Predicted Line: 1 31.588 46.757
2 32.502 48.108
The predicted line is plotted using the "matplotlib" library. It is 3 32.669 48.355
done by the following code: 4 32.94 48.756
55
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.9, February 2024
5 33.094 48.983
6 33.645 49.797 Nt = 10000
7 33.864 50.122 m = 1.4731251028822563
8 34.333 50.815 c = 0.3239430786286303
9 35.568 52.64 MSE = 112.47669301199636
10 35.678 52.803 R2 = 0.590577
...
Nt = 100000
m = 1.4297282427708158
c = 2.531907100632597
MSE = 111.38251276556622
R2 = 0.59456
y = 1.429728 x + 2.531907
Fig 10: Error Function Plot
The R2 value is (0.59456) which indicates that the resulting line
The error function plot shows that MSE is decreasing with fits to the data points better than the previous runs.
iterations which indicates that the linear regression model is
converging well to the optimal solution. A comparison of R2 between the three runs of the linear
regression model is shown in the following table:
Equation of Line: Table 1: Comparison of R2 between The Three Runs
The equation of line is formed and printed as shown in the
following view: Linear Regression Model
Run 1 Run 2 Run 3
y = 1.478331 x + 0.059059
2
R 0.59001 0.590577 0.59456
Predicted Line: In summary, the program output clearly shows that the program
The predicted line is plotted as shown in the following chart:
has successfully performed the basic steps of linear regression
using least squares and gradient descent and provided the
required results.
5. CONCLUSION
Machine learning is playing a major role in the development of
computer systems. Linear regression is an important
application of machine learning. It helps to find the line that
best fits the data points. Linear regression is used to model the
relationship between the independent variable (x) and the
dependent variable (y) using a linear polynomial with slope (m)
and intercept (c). Least squares is used to minimize the error
between the observed and predicted points. Gradient descent is
used to find the optimal solution that provides the minimum
value of error function.
In this research, the author developed a program to perform
Fig 11: Linear Regression Model linear regression using least squares and gradient descent in
Python. The developed program performed the basic steps of
The R2 value is (0.59001) which indicates that the predicted linear regression: preparing observed points, initializing slope
line fits to the observed points. and intercept, computing predicted points, computing partial
derivatives, updating slope and intercept, computing error
Improving the Results: function, making equation of line, and plotting predicted line.
To improve the results of the program, the number of iterations
is increased 10 times to (10,000) iterations. The results of run The program was tested on an experimental dataset from
(2) are shown in the following view: Kaggle and provided the required results: computed slope and
56
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.9, February 2024
intercept, computed error function, predicted points, equation Analysis". John Wiley & Sons.
of line, and predicted line.
[17] Chapra, S. C. (2010). "Numerical Methods for Engineers".
In future work, more research is needed to improve and develop McGraw-Hill.
the current methods of linear regression using least squares and
gradient descent. In addition, they should be more investigated [18] Gray, J.B. (2002). "Introduction to Linear Regression
on different fields, domains, and datasets. Analysis". Technometrics, 44, 191-192.
6. REFERENCES [19] Groß, J. (2003). "Linear Regression". Springer Science &
[1] Sammut, C., & Webb, G. I. (2011). "Encyclopedia of
Business Media.
Machine Learning". Springer Science & Business Media.
[20] Olive, D. J. (2017). "Linear Regression". Berlin: Springer
[2] Jung, A. (2022). "Machine Learning: The Basics".
International Publishing.
Singapore: Springer.
[21] Yan, X., & Su, X. (2009). "Linear Regression Analysis:
[3] Kubat, M. (2021). "An Introduction to Machine
Theory and Computing". World Scientific.
Learning". Cham, Switzerland: Springer International
Publishing.
[22] Su, X., Yan, X., & Tsai, C. L. (2012). "Linear
Regression". Wiley Interdisciplinary Reviews:
[4] Dey, A. (2016). "Machine Learning Algorithms: A
Computational Statistics, 4(3), 275-294.
Review". International Journal of Computer Science and
Information Technologies, 7 (3), 1174-1179.
[23] Montgomery, D.C., Peck, E.A., Vining G. G. (20012).
"Introduction to Linear Regression Analysis". Wiley
[5] Jordan, M. I., & Mitchell, T. M. (2015). "Machine
Series in Probability and Statistics: John Wiley & Sons.
Learning: Trends, Perspectives, and Prospects". Science,
349 (6245), 255-260.
[24] Kutner, N., Nachtsheim, C., & Neter, J. (2004). "Applied
Linear Regression Models". McGraw-Hill/Irwin Series:
[6] Forsyth, D. (2019). "Applied Machine Learning". Cham:
Operations and Decision Sciences.
Springer International Publishing.
[25] Leemis, L.M. (1991). "Applied Linear Regression
[7] Chopra, D., & Khurana, R. (2023). "Introduction to
Models". Journal of Quality Technology, 23, 76-77.
Machine Learning with Python". Bentham Science
Publishers.
[26] Seber, G. A., & Lee, A. J. (2003). "Linear Regression
Analysis". John Wiley & Sons.
[8] Sarker, I. H. (2021). "Machine Learning: Algorithms,
Real-world Applications and Research Directions". SN
[27] Neter, J., Wasserman, W., & Kutner, M. H. (1983).
Computer Science, 2(3), 160.
"Applied Linear Regression Models". Irwin.
[9] Das, S., Dey, A., Pal, A., & Roy, N. (2015). "Applications
[28] Weisberg, S. (2005). "Applied Linear Regression". John
of Artificial Intelligence in Machine Learning: Review
Wiley & Sons.
and Prospect". International Journal of Computer
Applications, 115(9), 31-41.
[29] Malik, M.B. (2005). "Applied Linear Regression".
Technometrics, 47, 371-372.
[10] Dhall, D., Kaur, R., & Juneja, M. (2020). "Machine
Learning: A Review of the Algorithms and its
[30] Maulud, D., & Abdulazeez, A. M. (2020). "A Review on
Applications". Proceedings of ICRIC 2019: Recent
Linear Regression Comprehensive in Machine Learning".
Innovations in Computing, 47-63.
Journal of Applied Science and Technology Trends, 1(4),
140-147.
[11] Raschka, S. (2015). "Python Machine Learning". Packt
Publishing.
[31] Stigler, Stephen M. (1981). "Gauss and the Invention of
Least Squares". The Annals of Statistics. 9 (3): 465–474.
[12] Müller, A. C., & Guido, S. (2016). "Introduction to
Machine Learning with Python: A Guide for Data
[32] Python: https://www.python.org
Scientists". O'Reilly Media.
[33] Numpy: https://www.numpy.org
[13] Swamynathan, M. (2019). "Mastering Machine Learning
with Python in Six Steps: A Practical Implementation
[34] Pandas: https:// pandas.pydata.org
Guide to Predictive Data Analytics using Python". Apress.
[35] Matplotlib: https://www. matplotlib.org
[14] Brandt, S. (2014). "Data Analysis: Statistical and
Computational Methods for Scientists and Engineers".
[36] NLTK: https://www.nltk.org
Springer.
[37] SciPy: https://scipy.org
[15] VanderPlas, J. (2017). "Python Data Science Handbook:
Essential Tools for Working with Data". O'Reilly Media.
[38] SK Learn: https://scikit-learn.org
[16] Atkinson, K. (1989). "An Introduction to Numerical
[39] Kaggle: https://www.kaggle.com
IJCATM : www.ijcaonline.org 57