Air Quality Index Analysis of Bengaluru City Air Pollutants Using Expectation Maximization Clustering
Air Quality Index Analysis of Bengaluru City Air Pollutants Using Expectation Maximization Clustering
clustering
Dr.R.Senthil Kumar Dr.Anidha Arulanandham Dr.Suresh Arumugam
Associate Professor, Department of Associate Professor, Department of Associate Professor,Department of
Computer Science and Engineering Computer Science and Engineering Artificial Intelligence and Machine
New Horizon College of Engineering New Horizon College of Engineering Learning
Marathalli,Bengaluru Marathalli,Bengaluru New Horizon College of Engineering
[email protected] [email protected] Marathalli,Bengaluru
[email protected]
Abstract— Local air quality is important which affects the of the pollutants which plays major impact The goal of
human breathe and life. Air quality changes like the weather attribute subset selection method determines a small subset
from day to day. The information about outdoor air quality a of highly informative attributes which reduces processing
or AQI pollution in the air. The AQI measuring system gives complexity and provides higher classification accuracy.
information to the people about the locations air quality. The
objective of the proposed method is to analyze and visualize the The remaining parts of this paper is organized as
AQI of Bengaluru city. This data collected ten different follows; The section II gives the details of the related work
stations of Bengaluru city for the six years from 2015 to 2020. implementation and analysis. Section III gives the details of
The proposed work analyses the important pollutants which preprocessing techniques[13] applied on the data set and the
are selected using attribute selection method such as decision selection of attributes using attribute subset selection
tree and correlation matrix used in the date set of all the method. The results are discussed and presented in section
stations. Expectation Maximization (EM) Clustering technique IV and section V concludes the paper.
applied to analyze the data and the results are discussed.
II. LITREATURE
Keywords— Air quality index, Machine Learning,
Kingsy et al., [1] has calculated the AQI from the
Classification, ELM, SVM
enhanced k-means algorithm. The quality of air indicated in
I. INTRODUCTION numeric values. K-means accuracy is greater than the other
implemented algorithms. Ojeda-Magaña[2] developed a
The AQI is divided into six levels. The AQI value is technique to monitor the PM10 in air. The Particle Matter
between 0 and 50 then air quality is good and satisfactory which is less than 2.5 micro diameters is dangerous for
and no health risk. If The AQI is between 51 and 100 it is health as per WHO standard. Austin [3] proposed pollutant
moderate but sensitive individual may experience respiratory monitoring of NO2, O3, PM2.5 for about 48 hours. The
symptoms. If the values are between 151 and 200, then the proposed model predicts efficient accuracy. The proposed
AQI is unhealthy. People may experience minor health model use machine learning techniques which proves better
effects but sensitive people may experience some serious than linear regression technique.
health effects. If the values between 201 and 300, then AQI
is very unhealthy which trigger a health alert. If the index is Yajie et al, introduced a technique for pollutant data
over 300 then AQI is termed as hazardous considered as the PM10, PM2.5, CO and O3 dataset from Londo which uses
condition is may cause serious health issues grid sensor data[4]. Pearl Pullan[5] et al., proposed Air
Quality Management System which considers the data of
Bengaluru is the third most populated and capital city of the pollution levels by considering PM2.5 levels and
Karnataka state, India. The major sources of pollution in calculates the Air Quality Index (AQI). S Sankar Ganesh[6]
Bengaluru are from the roads. This comprises the various et al, has developed Support Vector Regression for the AQI
vehicles on the road and the road dust into the atmosphere. which is dependent on pollutants of NO2, CO, O3, PM2.5,
These dust consist of fine particulate matter from gases PM10 and SO2. Based on these models, the SVR has
released from vehicle and the dirt particles of the particular exhibited a high performance in terms of quality measures.
areas. The extremely large number of vehicles on the road
and the emissions in very high quantity. Diesel Vehicle Liye Song [7], has analyzed the effects of various
emits far more pollution when compared with other pollutants such as PM2.5, NO2, PM10, SO2, CO and O3 on
available fuel. These fossil fuels and other organic matter the AQI in the year of 2016 in Jinan. The effects were
combined with many other chemicals that are produced from analyzed using the correlation analysis and path analysis.
vehicular emissions, such as Nitrogen Dioxide (NO2) and Ruijun Yang[8] et al, developed a Bayes network evaluated
Sulfur Dioxide (SO2) and Ozone (O3) which emits the air quality characteristics to check the city air quality,
pollutants cause detrimental effects on human health. So, the which create a directed acyclic graph (DAG), and the
vehicles and the construction industry are the main culprit in training and validation data set contains Shanghai data to
the production of high air pollution in various places of experiment and the it coincides with the real situation,
Bangalore. The machine learning classification is a two–step Ranjana Waman Gore[9] et al, has conducted AQI analysis
process. The first step selects a subset of significant and of using various data of pollutants such as SO2, CO, NO2,
relevant features from the dataset and the second step and O3. Kaggle dataset contains the information of air
implements classification model which produce prediction pollutant and its AQI values. The proposed work applies
Authorized licensed use limited to: Academy for Technical and Management Excellence. Downloaded on April 03,2025 at 08:10:23 UTC from IEEE Xplore. Restrictions apply.
Naive Baysian[12] classification method and Decision tree the burning of fossil fuels, which contain sulfur and found in
J48 classification algorithm predicts the health concern the atmosphere. Ozone (O3) gas will produced when
issues related to AQI. Jana Shafi [10] et al., has temperature cause chemical reactions between oxides of
experimented a K-Means clustering technique , shows quick nitrogen (NOX, Benzene, toluene, and xylene are the Volatile
changes happening in AQI of lowermost toxic level to Organic Compounds (VOCs) in atmosphere is present as a
highest toxic level of the same place based on the fire precursors for ground level ozone production.
pollutants in hourly based.
III. AIR QUALITY ANALYSIS AND The AQI values also classified to the six standard
VISUALIZATION classifications such as good, moderate, satisfactory, poor,
very poor and severe. The data set contains every day values
A. Data pre processing of the all above attributes from the year 2015 to 2020. The
The Central Pollution Control Board (CPCB) has original data contains many missing values, which are
installed ten active stations in and around of Bengaluru city. handled using filling mean values of the missing values.
The data set collected from Kaggle website which gives the
pollution data of BTM Layout, Kadabesanahalli , Bapuji The actual AQI bucket values contain six categories as
Nagar, City Railway Station, Hebbal,Hombegowda Nagar, per the CPCB standard which is reduced to three categories
Jayanagar 5th Block, Peenya, Sanegurava Halli and Silk such as Moderate, Satisfactory and Poor. The six category
Board. The data set comprises of the numerical values of values are converted to three classes by applying min max
PM2.5, PM10, NO, NO2, CO, SO2, NOx, NH3, O3, normalization, which is represented in Table-I. The
Benzene, Toluene and Xylene pollutants of the above ten normalization applied based on the minimum and maximum
stations as shown in Fig-1 before preprocessing. values of the data set. The lower two boundaries and the
upper two boundaries of the actual data set has very less
values which are scaled down to the above mentioned three
levels.
Authorized licensed use limited to: Academy for Technical and Management Excellence. Downloaded on April 03,2025 at 08:10:23 UTC from IEEE Xplore. Restrictions apply.
analysis. Fig.2 shows the preprocessed data which is taken The three clusters Good, Moderate and Poor is represented
for the analysis. by blue color , red color and green color respectively. All the
10 stations are represented in numbers from KA-2 to KA-11
which is represented in Y-axis.
The visualization shows clearly from the Fig.3 for PM10
TABLE I. CLUSTER RANGE VS NORMALIZED CLUSTER RANGE
analysis, Fig.4 for PM2.5 analysis, Fig.5 for O3 analysis and
Normalized Fig.6 for CO analysis, states that station KA-9 and KA-10
AQI Range AQI Value
Cluster Ranges which are located in Peenya and Sanegurava Halli
0-50 Good Cluster-0 respectively has poor air quality index. The Peenya area of
51-100 Moderate ( Good) Bengaluru is the one of the biggest industrial area comprises
large number of small scale , medium scale and large-scale
Cluster-1 industries. The Sanegurava Halli area is the center of city
101-150 Satisfactory where the maximum number vehicle crosses. The Table-II
(Satisfactory)
represents the fine tuned mean values of the pollutants value
151-200 Poor which gives the improved mean value by mixing two normal
201-250 Very Poor Cluster-2 (Poor) distributed values. The Table-II represents the results are
drawn from the improved mean values which is actually
251-300 Severe
contributes with the analysis.
Authorized licensed use limited to: Academy for Technical and Management Excellence. Downloaded on April 03,2025 at 08:10:23 UTC from IEEE Xplore. Restrictions apply.
Maximization clustering technique. The feature selections
used J48 decision tree to select the features which has
maximum gain ratio. Correlation matrix analysis used to
remove the similar features from the input data. The
analysis results of the 10-fold Cross Validation test using
the standard datasets shows the area which has poor air
quality in Bengaluru city. The further work can be
extended in future by relating health issues of the people
living in these specific areas with pollutants presents in
the air.
Fig. 6. CO visualization of all stations
V. CONCLUSION
The air quality index of the four major features such
as PM10, PM2.5, O3 and CO analyzed using Expectation
Authorized licensed use limited to: Academy for Technical and Management Excellence. Downloaded on April 03,2025 at 08:10:23 UTC from IEEE Xplore. Restrictions apply.