0% found this document useful (0 votes)
17 views8 pages

Thunweni

Uploaded by

KamalSilvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Thunweni

Uploaded by

KamalSilvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.

1051/e3sconf/202451716001
ICETIA 2023

Multiple Linear Regression Modeling for Analysis of Factors


Affecting COD and BOD on River Water Quality in Yogyakarta,
Indonesia
Muhammad Andang Novianta1,2,*, Syafrudin 3,4, Budi Warsito 4,5
1
Students Study Program of Doctoral Environmental Science, School of Postgraduate Studies, Diponegoro University, Semarang 50275,
Indonesia
2
Department of Electrical Engineering, Faculty of Industrial Technology, Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia
3
Department of Environmental Engineering, Faculty of Engineering, Diponegoro University, Semarang 50275, Indonesia
4
Study Program of Doctoral Environmental Science, School of Postgraduate Studies, Diponegoro University, Semarang 50275, Indonesia
5
Department of Statistics, Faculty of Science and Mathematics, Diponegoro University, Semarang 50275, Indonesia

Abstract. Many factors can affect the quality of river water in DIY, both the activities of the population and industry. Several
river water quality parameters that can be used to determine the health condition of river water are Chemical Oxygen Demand
(COD) and Biological Oxygen Demand (BOD). This study tested the effect of TSS and DO on BOD and COD in 10 rivers in
DIY. The method used is multiple linear regression modeling. Based on hypothesis testing in multiple linear regression with a
significance level of 5%, it is found that TSS and DO significantly affect BOD and COD conditions in 2019. Furthermore, in
2020 only DO significantly affects COD. The prediction result is that if TSS is high then BOD and COD will be predicted to have
high value. If DO is high then COD and BOD will be predicted to be low.

1 Introduction Techniques combined with Genetic algorithms to identify


groundwater quality [5].
River water quality parameters consist of physical, Many factors affect the high COD and BOD. The
chemical, and biological parameters. Chemical Oxygen utilization of land around the river that is used for
Demand (COD) and Biological Oxygen Demand (BOD) hospitality activities will affect the quality of river water.
are chemical parameters. COD indicates the total amount Rivers can be polluted by wastes originating from hotels
of oxygen required to chemically oxidize organic matter operating around the river. Waste generated from
or indicates the level of inorganic waste as measured by industrial activities can pollute rivers which are a source
the amount of oxygen required to break down inorganic of water for daily needs and affect the development of
waste. If the water contains a lot of inorganic waste, the biota in them. Meanwhile, it is stated that the roughness
amount of oxygen needed by microorganisms to break of the channel and the physical condition of the river have
down the waste will be large, so the COD number will a big impact on pollutant concentrations based on the
also be high. Meanwhile, BOD is a measure of the amount COD parameter. River water temperature also affects
of oxygen used by microbial populations contained in COD [6]. This is because the temperature will follow the
waters in response to the entry of organic matter that can movement of the flow and pollutant discharges that enter
be decomposed. BOD indicates the amount of easily the water body by balancing the physical condition of the
decomposed organic matter present in the waters. If the river which causes turbulence in the water body and has a
water contains a lot of organic waste, the amount of direct impact with little effect on COD.
oxygen needed by microorganisms to break down the Factors that can affect COD and BOD are dissolved
waste will be large, so the BOD number will also be high. oxygen, organic matter, and other pollutant sources.
Research on water quality was also carried out by Environmental parameters pH, Dissolved Oxygen (DO),
previous researchers including using a clustering analysis and temperature have a very strong relationship and are
algorithm. His research focuses on classifying rivers inversely proportional to BOD5 and COD in Lake
based on water quality classes [1] and clustering [2]. Maninjau, West Sumatra [7]. If DO is high, BOD and
Furthermore, in China, research has been carried out on COD will be low. DO is the amount of dissolved oxygen
the water quality of the Yangtze River using Machine in the water that comes from photosynthesis and
Learning [3]. The technique used is more modern because absorption of the atmosphere/air. It is also said that
it utilizes machine learning data. Furthermore, in BOD5, NA+, T, DO, and PO4-3 are important factors that
Banjarmasin, Indonesia, research was carried out on river can be relied upon to predict COD values as indicators of
water quality using K-Means Clustering [4]. Another organic and non-organic pollution in rivers [8]. In
research is by combining K-Means Clustering and Fuzzy research on the Riva River, Türkiye, COD values

* Corresponding author: [email protected]

© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.1051/e3sconf/202451716001
ICETIA 2023

increased due to the influence of NH4-N, TSS, and T are 2. Determine the dependent and independent variables.
increased [9]. 3. Exploration of data based on minimum, average, and
Analysis of factors affecting COD and BOD parameters maximum values, and their comparison with quality
is very important to evaluate river water quality. Many standards. This quality standard is based on Government
statistical methods can be used, one of which is multiple Regulation of the Republic of Indonesia Number 22 of
linear regression analysis [10-11]. The method is a type of 2021 concerning the Implementation of Environmental
modeling that produces a mathematical equation that Protection and Management.
shows the effect of the independent variable on the 4. Identification of the relationship between variables
dependent [12]. This equation can also be used to predict through the scatterplot and Pearson correlation test.
the value of the dependent variable. The use of this 5. Regression analysis
method can also be used to analyze factors that influence Multiple linear regression analysis is an analysis to
COD [8]. The use of linear regression methods was also determine the effect of two or more independent variables
used to estimate the water quality index in the Yamuna on one dependent variable. The general form of the
River, India [13]. Many researchers use the linear multiple linear regression model with the dependent
regression analysis method, including using Modal Linear variable (Y) and the independent variables 𝑋1, 𝑋2, …, 𝑋𝑝
Regression to estimate the regression coefficient of modal is presented as follows:
linear regression [14]. Apart from that, research was also
carried out on multiple linear regression [15]. Use of Y =0 + 1X1 + 2X2 +  + pXp +  (1)
Multiple Linear Regression to find factors that can better
predict an outcome [16]. Then, in other research, Multiple With (0, 𝜎 2 ), 𝛽 is the regression parameter, p is the
Linear Regression was used for regression experiments number of independent variables. The matrix form in
using random numbers which produced critical values equation (2) is
(Fmax) which could be used to assess significance [17].
Y = X +  (2)
The multiple linear regression method will be applied to
the analysis of river water quality in the Special Region of This research will predict an equation model that shows
Yogyakarta (DIY). DIY has 10 rivers that flow in five the effect of TSS and DO on BOD and COD, respectively
regencies which have different qualities. The activities of in 2019 and 2020. The form of the equation is as follows:
the population and industry greatly influence it. Even the
Department of Environment and Forestry for the Special BOD = 𝛽0 + 1TSS + 2DO+ ε (3)
Region of Yogyakarta (DIY) said that river water
pollution is one of 17 DIY environmental issues or COD = 𝛽0 + 1TSS + 2DO+ ε (4)
problems in 2021. In addition, it is also one of the three
main issues that are a priority in improving environmental The details of the modeling steps to obtain equations (3)
quality DIY with the issue of waste and land conversion and (4) are as follows:
that are not under spatial planning. 1. Parameter estimation 𝛽 uses the Ordinary Least
Many factors can affect the quality of river water in DIY. Square (OLS) method, using equation (5) as follows:
This study tested the effect of TSS and DO on BOD and (5)
COD. The method used is multiple linear regression
modeling. With this analysis, it will be obtained whether 2. Residual assumption test (identical, independent,
TSS and DO significantly affect BOD and COD and find normal distribution) using the Breusch Pagan test method,
out what form their influence takes. Durbin Watson Test, and Kolmogorov Smirnov
3. Test the significance of the parameters in each
2 Method independent variable using the t test. the hypothesis used
is
The source of the data in this study was secondary data H0 : p = 0 (There is no
from the book Environmental Quality Index, by the significant effect between the independent variables on
Department of Environment and Forestry of the Special the dependent variable)
Province of Yogyakarta. From this secondary data, 149 H1 : p  0 (There is a significant effect between the
sample points were taken for 2019 data and 210 sample independent variables on the dependent variable)
points for 2020 data. In 2020, the sample points came
from the Winongo River, Code River, Gadjah Wong (6)
River, Tambakbayan River, Kuning River, Konteng
River, Bedog River, Belik River, Bulus River, and Oyo
River.
The variables used in this study are several physical and With is the standard error of the coefficient
chemical parameters of river water quality which are on the p-th observation
divided into dependent and independent variables. The With a significance level (α) = 5%, conclusions are drawn
independent variables are DO and TSS. While the by rejecting H0 if
dependent variable is BOD and COD. The data analysis
steps are as follows. or P value < α
1. Prepare river water quality data.

2
E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.1051/e3sconf/202451716001
ICETIA 2023

4. Model interpretation and coefficient of determination. above the quality standard. In 2019, there were 78 sample
points (53%) that had a BOD above the quality standard
and there were 15 sample points (10%) that had COD
3 Result and discussion above the quality standard. In 2020, there are 102 sample
points (49%) that have a BOD above the quality standard
3.1 Data description and there are 61 sample points (23%) that have a COD
above the quality standard.
A description of the data used in this study is shown in
Table 1. The average TSS of the 149 samples in 2019 was
35.052 mg/L. There are 45 sample points (30%) which are 3.2 Relationship patterns
above the quality standard. Furthermore, in 2020, the The initial stage before getting the modeling is to identify
average TSS of 210 samples was 17 mg/L, this data the pattern of relationships between the variables TSS,
decreased compared to 2019. The number of sample DO, BOD, and COD. This relationship pattern is
points that were above the quality standard also identified through the scatterplot between variables in Fig.
decreased, namely 12 sample points (6%). 1 and Fig. 2, as well as the Pearson correlation test in
Table 1. Data Description Table 2. Fig. 1 shows the data relationship pattern in 2019.
It is known that TSS has a relationship that is comparable
to BOD and COD, the higher the TSS number, the BOD
TSS DO BOD COD
Characteristics and COD will be high. TSS which shows suspended solids
(mg/L) (mg/L) (mg/L) (mg/L)
in the water, has an impact on the decrease in natural
In 2019 dissolved oxygen in water so that the BOD and COD
values are high. Meanwhile, the DO number has the
Minimum 0.800 2.390 0.100 1.390 opposite relationship with BOD and COD, namely the
higher the DO number, the lower the BOD and COD. The
Average 35.052 6.242 3.178 15.287
greater the DO value in water, the higher the amount of
dissolved oxygen and the water has good quality. This has
Maximum 101.800 12.440 11.560 61.034
an impact on low BOD and COD. Based on the correlation
In 2020 test in Table 2, the relationship between the variables
TSS, DO, BOD, and COD is very strong. This is shown
Minimum 0.019 4.420 0.250 3.180 by testing the hypothesis with a significance level of 5%.
The strong relationship shows that TSS and DO have a
Average 17.000 7.908 4.722 23.841
significant effect on BOD and COD.
Maximum 147.000 12.130 75.080 243.890 Fig. 2 shows the data relationship pattern in 2020. The
relationship pattern is slightly different compared to 2019.
Quality From Fig. 2 it can be seen that the higher the TSS number,
50 4 3 25
Standard the lower the BOD and COD. However, this relationship
is not significant, which means that TSS does not really
DO conditions have also been good where the average in affect BOD and COD. This is different from the pattern in
2019 and 2020 has been above the quality standard. 2019. The higher the DO number, the lower the BOD and
However, 2020 was better than 2019. Meanwhile, the COD. The greater the DO value in water, the higher the
BOD and COD conditions in 2019 and 2020 were still not amount of dissolved oxygen and the water has good
good because there were still sample points that were quality. This has an impact on low BOD and COD.

3
E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.1051/e3sconf/202451716001
ICETIA 2023

Fig. 1. Scatterplot Pattern of relationship between TSS, DO, BOD, and COD variables in 2019

Fig. 2. Scatterplot Pattern of relationship between TSS, DO, BOD, and COD variables in 2020

4
E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.1051/e3sconf/202451716001
ICETIA 2023

Table 2. Pearson Correlation Coefficient Value t-Test Statistics


Parameter
Variable
BOD COD Estimation P-value
Variabel
(mg/L) (mg/L)
Dependen BOD Variable
In 2019
Constant 8.546 0.046 2.01
Pearson Correlation 0.432 0.316
TSS
TSS -0.020 0.533 -0.62
P-value 0.000* 0.000*
DO -0.439 0.411 -0.82
Pearson Correlation -0.235 -0.279
DO Dependen COD Variable
P-value 0.004* 0.001*
Constant 52.61 0.000 4.36
In 2020

Pearson Correlation -0.048 -0.109 TSS -0.134 0.162 -1.41


TSS
P-value 0.218 0.115 DO* -3.350 0.028 -2.21
Information = *) significant at the 5% significance level
Pearson Correlation -0.061 -0.160 with the null hypothesis is a variable that has no
DO
significant effect
P-value 0.379 0.020*
Information: *) Significant at the 5% significance level The resulting model equation is as follows:
with the null hypothesis is that there is no significant
correlation between variables. In 2019:
BOD = 1.71 + 0.0321 TSS – 0.0545 DO
COD = 15.7 + 0.0669 TSS – 0.445 DO
3.3 Regression analysis In 2020:
Based on the identification of the relationship pattern, it BOD = 8.55 – 0.0209 TSS – 0.439 DO
COD = 52.61 – 0.134 TSS – 3.350 DO
can be seen that TSS and DO have a relationship with
BOD and COD. To find out more detailed patterns of In the 2019 data, the pattern of the relationship between
relationships and influences, multiple linear regression TSS and BOD and COD is that if TSS increases by 1
modeling was carried out. The results of model parameter mg/L, BOD will increase by 0.032 mg/L and COD will
estimation are presented in Table 3 and Table 4. increase by 0.067 mg/L. Meanwhile, if DO increases by 1
Table 3. Parameter Data Estimation in 2019 mg/L, BOD will decrease by 0.054 mg/L and COD will
decrease by 0.445 mg/L. Based on the significance test
t-Test Statistics with a significance level of 5% it gives the result that TSS
Parameter and DO really significantly affect BOD and COD. This
Variable
Estimation P-value model has a coefficient of determination of 18.9% for the
BOD model and 11% for the COD model which shows
Dependen BOD Variable
that the variables TSS and DO affect BOD and COD at
Constant 1.710 0.011 2.56 that rate. It can be said that there are many other factors
that influence BOD and COD besides TSS and DO.
TSS* 0.032 0.000 4.91 The results of regression modeling in 2020 are different
from 2019. The pattern of the relationship between TSS
DO* -0.054 0.009 -5.71 and BOD and COD is that if TSS increases by 1 mg/L,
then BOD will decrease by 0.02 mg/L and COD will
Dependen COD Variable
decrease by 0.134 mg/L. However, this relationship is not
Constant 15.700 0.000 5.25 significant. Meanwhile, if DO increases by 1 mg/L, BOD
will decrease by 0.439 mg/L and COD will decrease by
TSS* 0.067 0.024 2.28 3.350 mg/L. In this model only the DO variable has a
significant effect on COD. This model has a coefficient of
DO* -0.445 0.028 -2.39 determination of 6 for the BOD model and 30% for the
Information = *) significant at the 5% significance level COD model which indicates that the TSS and DO
with the null hypothesis is a variable that has no variables affect BOD and COD at that rate. It can be said
significant effect. that there are many other factors that affect BOD and
COD besides TSS and DO.
To find out whether the formed model meets the required
assumptions, this study also tests the residual assumptions
as shown in Table 5, namely normal, independent, and
Table 4. Parameter Data Estimation in 2020 identical distributions.

* Corresponding author: [email protected]

5
E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.1051/e3sconf/202451716001
ICETIA 2023

Table 5. Residual Assumption Test Results exceeds the 5% significance level. This shows that the
residuals in all models have met the assumptions of
Model Name Test Name P-Value normal distribution and are identical. However, the
independent assumptions based on the Durbin-Watson
In 2019 test on the COD model do not meet the normal and
Kolmogorov Smirnov 0.062 identical distribution assumptions.
The prediction results for BOD and COD in 2019 and
BOD Brusch Pagan 0.121 2020 are presented in Fig. 3 and Fig. 4. Based on data in
2019, it can be predicted that if TSS increases by 1 mg/L
Durbin Watson 0.053 then BOD will increase by 0.032 mg/L and if DO
increases by 1 mg /L, the BOD will decrease by 0.054
Kolmogorov Smirnov 0.071
mg/L. For example, if the TSS is 110 mg/L and DO is 13
COD Brusch Pagan 0.066 mg/L, the BOD will be 5.95 mg/L. Meanwhile, it can be
predicted that if TSS increases by 1 mg/L then COD will
Durbin Watson 0.031 increase by 0.0699 mg/L and if DO increases by 1 mg/L
then COD will decrease by 0.455 mg/L. For example, if
In 2020
the TSS is 110 mg/L and DO is 13 mg/L, the COD will be
Kolmogorov Smirnov 0.085 17.274 mg/L.
Based on data for 2020, it can be predicted that if TSS
BOD Brusch Pagan 0.271 increases by 1 mg/L then BOD will decrease by 0.0209
mg/L and if DO increases by 1 mg/L then BOD will
Durbin Watson 0.065
decrease by 0.493 mg/L. For example, if the TSS is 150
Kolmogorov Smirnov 0.125 mg/L and DO is 13 mg/L, the BOD will be -0.292 mg/L.
Meanwhile, it can be predicted that if TSS increases by 1
COD Brusch Pagan 0.332 mg/L then COD will decrease by 0.134 mg/L and if DO
increases by 1 mg/L then COD will decrease by 3.350
Durbin Watson 0.021 mg/L. For example, if the TSS is 150 mg/L and DO is 13
mg/L, the COD will be -11.04 mg/L.
Based on the hypothesis test, it can be seen that the P value
in the Kolmogorov Smirnov and Breusch Pagan tests

Fig. 3. BOD and COD predictions based on modeling results in 2019

6
E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.1051/e3sconf/202451716001
ICETIA 2023

Fig. 4. BOD and COD predictions based on modeling results in 2020


1. Warsito, B., et al. Evaluation of river water
quality by using hierarchical clustering analysis. IOP
4 Conclusion Conference Series: Earth and Environmental Science.
The development of river water quality conditions in 2020 896(1), p. 012072, IOP Publishing, (2021)
is higher than in 2019 based on the BOD and COD factors. 2. Novianta, M. A., Syafrudin, and Warsito, Bo. K-
This is indicated by the number of river water sample Means Clustering for Grouping Rivers in DIY based on
point locations which are above the quality standard. Water Quality Parameters. JUITA: Jurnal Informatika ,
Based on multiple linear regression modeling, it can be 11(1), p 155 (2023)
seen that TSS and DO significantly affect BOD and COD 3. Di, Zhenzhen, Miao Chang, and Peikun Guo.
conditions in 2019. Furthermore, in 2020 only DO Water quality evaluation of the Yangtze River in China
significantly affected COD. The shape of the influence using machine learning techniques and data monitoring
and the prediction is that if the TSS is high, the BOD and on different time scales. Water, 11(2), p. 339 (2019)
COD will be predicted to have high values. Conversely, if 4. Zubaidah, Tien, Nieke Karnaningroem, and
DO is high, COD and BOD will be predicted to be low. Agus Slamet. K-means method for clustering water
However, the COD modeling results do not meet the quality status on the rivers of Banjarmasin, Indonesia.
assumption of independent residuals. The unfulfilled ARPN Journal of Engineering and Applied Sciences,
independent assumptions show that the residuals are still 13(6), p. 3692 (2018)
interconnected. This shows that the sample points are 5. Mohammadrezapour, Omolbani, Ozgur Kisi,
indeed related to each other. Future research can use other and Fariba Pourahmad. Fuzzy c-means and K-means
modeling as an alternative, for example, spatial regression clustering with genetic algorithm for identification of
or nonlinear regression. homogeneous regions of groundwater quality. Neural
Computing and Applications, 32 (8), p.3763 (2020)
6. Marlina, Nelly, Hudori Hudori, and Ridwan
Acknowledgement Hafidh. Pengaruh Kekasaran Saluran dan Suhu Air
Sungai pada Parameter Kualitas Air COD, TSS di Sungai
Winongo Menggunakan Software QUAL2Kw. Jurnal
This research was supported by the Directorate of
Sains & Teknologi Lingkungan , 9 (2), p. 122 (2017)
Research and Community Service of the Ministry of
7. Komala, P. S., A. Nur, and I. Nazhifa. Pengaruh
Education, Culture, Research and Technology with
Parameter Lingkungan Terhadap Kandungan Senyawa
contract No: 449A-06/UN7.D2/PP/VI/2023.
Organik Danau Maninjau Sumatera Barat. Seminar
Nasional Pembangunan Wilayah dan Kota
References Berkelanjutan. 1(1), p. 265 (2019)

7
E3S Web of Conferences 517, 16001 (2024) https://doi.org/10.1051/e3sconf/202451716001
ICETIA 2023

8. Ali Abed, Salwan, Salam Hussein Ewaid, and (2017)


Nadhir Al-Ansari. Evaluation of water quality in the 13. Gaya, Muhammad Sani, et al. Estimation of
Tigris River within Baghdad, Iraq using multivariate water quality index using artificial intelligence
statistical techniques. Journal of Physics: Conference approaches and multi-linear regression. Int. J. Artif. Intell.
Series. 1294(1), p.072025. (2019) ISSN 2252 (2020)
9. Oz, Nurtac, Bayram Topal, and Halil Ibrahim 14. Yao, Weixin, and Longhai Li. A new regression
Uzun. Prediction of water quality in Riva River model: modal linear regression. Scandinavian Journal of
watershed. Ecological Chemistry and Engineering Statistics, 41(3), p. 656 (2014)
S. 26(4), p.727 (2019) 15. Uyanık, Gülden Kaya, and Neşe Güler. A study
10. Montgomery, D. C., Peck, E. A. and Vining, G. on multiple linear regression analysis. Procedia-Social
G. Introduction to linear regression analysis. John Wiley and Behavioral Sciences , 106, p. 234 (2013)
& Sons, (2021) 16. Pandis, Nikolaos. Multiple linear regression
11. Poole, Michael A., and Patrick N. O'Farrell. The analysis. American journal of orthodontics and
assumptions of the linear regression model. Transactions dentofacial orthopedics , 149 (4), p 581 (2016)
of the Institute of British Geographers: 52 (1), p. 145 17. Livingstone, David J., and David W. Salt.
(1971) Judging the significance of multiple linear regression
12. Olive, David J., and David J. Olive. Multiple models. Journal of Medicinal Chemistry , 48(3), p
linear regression. Springer International Publishing, 661(2005)

You might also like