0% found this document useful (0 votes)

21 views9 pages

Final - IJCSAPaper 21 07 2022 Updated

Uploaded by

Radhika Venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views9 pages

Final - IJCSAPaper 21 07 2022 Updated

Uploaded by

Radhika Venkatesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 13, No. 5, 2022

Anomaly Detection for Spatiotemporal rainfall

data in Cloud Computing Environment.
Radhika T V1
Dept. of Computer Science & Engineering
RNS Institute of Technology
Bengaluru, India
[email protected]
Dr. S Sathish Kumar3
Dept. of Information Science & Engineering
RNS Institute of Technology
Bengaluru, India
Dr.K C Gouda2 [email protected]
Council of Scientific & Industrial Research(CSIR)
Fourth Paradigm Institute(4PI)
Bengaluru, India
[email protected]

Abstract— Anomaly detection is a critical task for to climate data analysis, modeling and prediction [3]. Many
maintaining the performance of a cloud system as it helps to exhaustive big data analytics applications have evolved based
identify unusual behavior or patterns in large datasets that may on big climate data, also the emergence of various
indicate security threats or system failures. Traditional techniques for technologies such as Internet of Things (IoT), cloud
anomaly detection can be time-consuming and resource-intensive, computing and many advanced Big Data analytics tools have
especially when dealing with large volumes of data. Our proposed
begun to investigate on climate, as well as established various
cloud computing technique aims to address these challenges by
providing a scalable and efficient solution for detecting anomalies in
intelligent analytic platforms and new technological
real-time.One of the main motivations for developing our proposed advancements have further emphasized its importance and
technique is to improve the security and reliability of cloud-based potential impacts on climate science and Big data science
systems. By continuously monitoring data streams and identifying development [14][15].
anomalies, we can detect potential threats and take appropriate action
to mitigate them. Additionally, our technique can help to identify and Traditional Big Data techniques are usually incapable of
resolve system failures or performance issues in a timely manner, handling large amounts of spatiotemporal data. For example,
ensuring that cloud-based systems remain available and research has added spatial indexing, spatiotemporal indexing
efficient.Overall, our proposed cloud computing technique for
[1] and trajectory analytics features to Hadoop. One of the
anomaly detection represents a significant advance in the field, and
we believe it has the potential to make a significant impact on the basic idea of using spatiotemporal data, with respect to large
security and reliability of cloud-based systems. Our work considers spatial database systems, is the emergence of moving objects
the problem of detecting anomalous (abnormal or unexpected) [4]. A moving object is a spatial object that varies in
behavior in the global climate system, discovering teleconnection geographical position or dimensions over a time period. For
patterns and providing consequential insights to the analysts. Example, Rainfall in one region differs from others; also a
Keywords— Anomaly detection, Big Climate Data,
river that switches its path over a geologic time scale may be
represented as a moving line. Moving region can also be
I Introduction indicated by a hurricane which switches its dimension and
geographic position as it evolves [16]. Thus there is a need for
Big climate data are preferably provided to scientists for High Performance Computing (HPC) environment to process
on-demand processing and for analyzing critical problems big spatiotemporal data.
which may help them to relieve from time consuming
computational tasks. Since processing of big climate data The number of cores in HPC environment is persistently
requires efficient data management approaches, scalable getting increased depending on the application’s requirement.
computing resources and complex parallel computing Since these applications generate large volumes of
algorithms, so dealing with this problem is considered to be spatiotemporal data, that will ultimately be stored and
more challenging task. To address these kinds of challenges, accessed in parallel [15]. The Scientific applications like
high performance computing technologies have been applied weather prediction models use standard high-level libraries

1|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

and data formats such as Network Common Data Format-4

(NetCDF- 4) and Hierarchical Data Format 5(HDF 5), which Spark is considered to be one of the excellent platform for
will helps to store and operate on the dataset that is situated Data Scientists as it has number of data-centric tools which
inside a parallel file system interface. There are various file may assist the data scientists to move forward ahead of the
formats and software libraries in order to reduce the problems that is pertinent in a single machine and also it assist
restrictions imposed by plain binary files. Because of which data engineers since it has an integrated method that takes out
NetCDF file format has been introduced for systematic the need to utilize various special-purpose tools for streaming,
reading and writing of various kinds of scientific data, mainly machine learning, and graph analytics [13]. More importantly
for array data. NetCDF file is composed of various kinds of Spark is very essential for researchers, as the platform fosters
data, which includes BYTE, CHAR, SHORT, LONG, new opportunities and ideas to design and develop distributed
FLOAT, and DOUBLE. The main intention of NetCDF file is algorithms and also to test their performance in various
to store rectangular arrays of data such as Interactive Data clusters.
Language (IDL) arrays [12]. NetCDF files are self-descriptive;
that is, every file consists of the basic information required to The rest of this paper is organized as follows: Section 2
read. provides overview of various research on Hadoop based
approaches to process array-based multidimensional
Big spatiotemporal data have gained huge attention in spatiotemporal data; Section 3 presents our proposed Spark-
recent years. Analyzing such massive amount of based approach to process multidimensional spatiotemporal
multidimensional data is one of the most common data and provides highlights on prediction of rainfall using
requirements today and processing of this data is considered to ARIMA model; Section 4 describes evaluation results of our
be most challenging task. The ability to assess global concerns proposed work by performing sequence of experiments;
such as climate change and natural disasters, as well as their finally, Section 6 gives summary of the proposed research and
influence on different sectors such as agriculture and disease, envisage on future enhancement.
requires efficient data processing. This is challenging not only
because of the large data volume, but also because of the
intrinsic high-dimensional nature of the climate data. The II. BACKGROUND STUDY
emergence of Apache sparks provides quicker solution for big Big Data analytics has evolved with advanced opportunities
spatiotemporal data analysis and processing speed has reduced for research, development, business and innovation. It has been
drastically compared to the traditional way of processing identified by four Vs: volume, velocity, veracity and variety
multidimensional data with multi core processors. and may deliver value via processing of Big Data [2]. The
conversion of these 4 Vs into the 5th (value) is one of the
In the proposed work we have used Spark MapReduce magnificent challenge for processing capacity. The emergence
Framework for processing of big spatiotemporal data at of Cloud Computing as a new standard is to provide computing
multiple spatial and temporal scales. Also time series ARIMA as a utility service is to deal with various processing needs such
model is being used for rainfall prediction of Bengaluru as on demand services, pooled resources, elasticity, broad band
access and measured services. The capability of delivering
region.
computing capacity promotes a possible solution for the
In the proposed work we have considered time series rainfall
conversion of Big Data's 4 Vs into the 5th (value). The
data, we have read rainfall data (precipitation) of past 20 continuously increasing volume of big data has accelerated
years (2000-2020) and identified the Box-Jenkins time series technological developments and practical applications.
seasonal ARIMA (Auto Regression Integrated Moving
Average) approach for prediction of rainfall on monthly Earth is composed of complex dynamic system; as big data
scales. Seasonal ARIMA model (0, 0, 0) (0, 2, 0) for rainfall analytics works with vast amounts of climate data, it poses
was identified the best model to forecast rainfall for next 5 greater challenges in climate research than in any other field
years with confidence level of 76 percent by analysing last 20 [3]. Climate change is the present concern throughout globe
year’s data (2000-2020). and also a data-intensive subject, making it one of the main
research area for big data experts in recent decades [4]. The
Apache Spark is an integrated platform for cluster anomalous growth of climate data makes climate data to be a
computing to facilitate efficient big data management and candidate in the Big Data research. The climate scientists have
analytics [13]. It is a non-proprietary, distributed computing been exploring on historic data to understand the physics and
scheme which enhances the MapReduce framework. Spark dynamics, merge millions & billions of daily global
system is made of various main modules including Spark core observational records and undertake simulations of various
and various high level libraries such as Spark’s MLlib for climate-change scenarios, all of which leads to huge volumes
machine learning, GraphX for graph analysis, Spark of data [8].
Streaming for stream processing and Spark SQL for structured
data processing [17]. It functions as a consolidated tool for Extremities in climate such as floods, droughts, and cold
Machine learning, SQL, Streaming and Graph processing and and heat-waves may lead to considerable impact on society,
it supports batch, interactive and stream processing. ecology and also on the economy globally. Thus

2|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

spatiotemporal data acquisition, analysis, management and These real time standard software and data formats have added
processing are considered to be more important, which will be major benefits to store, acquire, examine and exchange climate
helpful for various sectoral applications. Spatiotemporal data data. Also there are number of tools available for performing
refers to the data which is connected to both space and time climate data analytics and visualization, one of such tool is
and is considered to be at least 2-dimensional and often 3- Apache Open Climate Workbench, a Python-based tool to
dimensional, such that the volume of data gets increased at carry out interpretations on climate science employing remote
tremendous speed [8]. Since the general database cannot sensing rainfall data taken away from various sources and also
manage such large volumes of data, there is a need of large using climate model outputs.
database software to play a significant role in the management
Since the above mentioned tools and libraries deal
of spatiotemporal data. Big data is collected from a range of
with only discrete machines and have restrictions on cloud
sources, archived, and processed in a variety of computing
computing systems, compatibility with HPC and scalability.
modes, including cloud computing, mobile computing, edge
The absence of proper libraries leads to difficulty in dealing
computing, and wearable computing.
with variety, veracity, format and resolution of Big Climate
Data that give rise to a challenge in the emergence of
Spatiotemporal data mining is the process of identifying
advanced computing technologies.
interesting patterns and critical information from
spatiotemporal data. Discovering weather patterns, 1) Big Climate Data Management and Analytics
anticipating earthquakes and storms, exposing the progressive
history of towns and regions, and identifying global warming In [19] authors have presented a case study
trends are the examples of such processes. The unusual rise in supervised by Deutscher Wetterdienst (DWD) which
spatiotemporal data, combined with the introduction of new includes storage of array based multidimensional raster
technologies, has increased the demand for automatic data with hands-on exposure on extraction and processing
spatiotemporal knowledge realization. Spatiotemporal data of gridded meteorological data sets. As the big data acquire
mining techniques are very much essential for many various challenges such as repositioning, managing and
organizations which take decisions based on huge processing with high computational requirements. One of
spatiotemporal datasets. As these data are multidimensional in the key resolutions to this is achieved through the database
nature, the complexities of such data and their system having the capability of parallel processing and
interrelationships create computational and statistical distributed storage. In [20] authors have conducted a study
challenges [11]. on processing of the multi-temporal satellite image data
using SciDB, which is an array-based database mainly
Researchers of climate science have been exposed to used to accumulate, manage and perform computations on
ample of recognized resources of Big climate data for analysis such data. The main goal of the proposed work is to
and prediction ,for instance, the NASA Global Climate provide elastic solution using SciDB to accumulate and
Change (climate.nasa.gov), the Climate Observing System execute time series analysis on multi-temporal satellite
(GCOS), NASA Center for Climate Simulation imagery.
(nccs.nasa.gov), Global Earth System Grid Federation
(esgf.llnl.gov), the National Center for Atmospheric Research In [22] authors have illustrated the working of
(ncar.ucar.edu), United Nations Global Pulse SpatialHadoop, It is regarded as one of the first capable
(unglobalpulse.org), the Climate Data Guide open-source MapReduce frameworks to support
(climatedataguide.ucar.edu), and many other international and spatiotemporal data. The working of ST-Hadoop have
national climate analysis and monitoring centers over the been illustrated in [21],which has given a support for
world. spatio-temporal data and considered to be one of the first
proficient open-source MapReduce framework. In [23]
Multi-dimensional, array-based data model are mainly authors have introduced SciHadoopa, a Hadoop plugin that
used to represent Climate data .The GRIB, HDF and NetCDF aids scientists in identifying logical queries in data models
are the three most commonly used data formats to store based on arrays. SciHadoop was used to run queries as
climate data. HDF5/NetCDF4 was mainly developed to enable map/reduce programs over the logical data model. Authors
support for nested structures, ragged arrays, unsigned data have shown implementation of a SciHadoop paradigm for
types, chunking data structure, and caching techniques which NetCDF data and evaluate the performance of five separate
ultimately helps to systematically organize climate science optimizations that address the following goals representing
data and to have control over the changing computer models an integrated aggregate function query
[19]. Meanwhile in order to flexibly use data as multi-
dimensional arrays, many software and libraries such as 2) Time Series analysis for rainfall Prediction
Panoply, h5py, and NetCDF-Java were introduced.
Time series analysis is a statistical technique that
deals with time series data, or trend analysis. Time series
data means that data is in a series of particular time periods

3|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

or intervals. The data is considered to be in three types, move forward ahead of the problems that is pertinent in a
such time series data which includes set of observations on single machine.
the values that a variable takes at different times, Cross-
sectional data which is the data made of one or more 1) Data Analysis and processing using spark MapReduce
variables, collected at the same point in time, Pooled data
which is a combination of time series data and cross- As weather data is considered to be multidimensional
sectional data. Various research groups attempted to array based, so in the proposed work, we have considered
predict rainfall on a seasonal time scales using different precipitation and temperature data of Bangalore for rainfall
techniques. Below we have discussed existing work done prediction and also seasonal weather analysis has been
related to rainfall prediction using ARIMA. carried out on other states of India. Various
experimentation are carried out by reading, analyzing and
Climate and rainfall are highly non-linear and processing the data. NetCDF data are procured from
complicated phenomena, which require classical, modern National Center for Environmental Prediction (NCEP) &
and detailed models to obtain accurate prediction. Authors India Meteorological Department (IMD) has been used.
in [24] have considered various statistical models for Following are the work carried out.
prediction of rainfall time series data for designing a  Initially Raw station-level NetCDF based temperature
model, models such as the statistical method based on and precipitation data of Bangalore district which is
autoregressive integrated moving average (ARIMA), the located between 12o latitude and 77o longitude has
emerging fuzzy time series(FST) model and the non-
been read in the Google Colab Environment. The data
parametric method(Theil’s regression) were used. To
evaluate the prediction efficiency, they have used 31 years considered for analysis is from Jan 2010-Dec 2020
of annual rainfall data from year 1982 to 2012 of Ibadan (11 years data). Data from each year are displayed as
South West, Nigeria. ARIMA (1, 2, 1) was used to derive a single plot and also 11 years data is also plotted as
the weights and the regression coefficients, while the single graph to analyze past 11 years data and use it
theil’s regression was used to fit a linear model. The for processing to assist in future prediction.
performance of the model was evaluated using Mean  Mean value has been computed for every year (Jan-
Squared Forecast Error (MAE), Root Mean Square
Dec) using past 11 years data and plotted as a single
Forecast Error (RMSE) and Coefficient of determination.
point in a graph for analysis.
To forecast future climatic data, the ARIMA model  Mean value has been computed for all 11 years using
was utilized. The authors in [25] have proposed ARIMA Spark MapReduce Platform and plotted as a single
based daily weather forecasting tool which they have graph. This step is considered to be more important
considered as case study for predicting weather of as data is effectively processed using spark
varanasi. The authors have implemented the ARIMA MapReduce platform for analysis and future weather
algorithm in R to create an ARIMA-based weather
prediction. The detailed diagram illustrating how the
forecasting tool. The Indian Meteorological Department
provided 65 years of daily meteorological data (1951- data is processed using spark platform can be seen in
2015) for this study. The accuracy of the model was (Fig.1). Following are the steps
calculated according to the root mean square error (RMSE)  First step is to import and execute main
estimated for each forecasting. They approximated future library files for setting up Spark MapReduce
values for the following fifteen years using ARIMA (2, 0, functions in google colab environment.
2) for rainfall data and ARIMA (2, 1, 3) for temperature  Raw station level precipitation data (pr_wtr)
data. The root means square error values for rainfall and
temperature data were 0.0948 and 0.085, respectively, of banglore (from Jan 2010-Dec 2020) is
indicating that the technique functioned correctly. The read individually and data frame for each
outcome of this can be further used for the management of year are created.
solar cell station, agriculture, natural resources and  New data frame is created by adding years
tourism. The error is regarded to be minimal by observing as columns (total No. is 12) ie from 2010-
at the values of RMSE, indicating that the ARIMA model 2020 and values of corresponding year are
has forecasted the data properly.
placed in the appropriate place and convert
III. RESEARCH METHODOLOGY the datframe to .csv file.
In the proposed work we have considered Spark  Split Data: As spark MapReduce works by
MapReduce framework which is considered to be one of the splitting the data and assigning key-value
excellent platform for Data Scientists as it has number of pairs. (key is day and value is pr_wtr). In
data-centric tools which may assist the data scientists to this step, we split the data row wise, and
perform read operation using

4|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

spark.read.option() function. Each row refers shown to illustrate season-wise analysis of the weather of
to daily data of every year i.e row 1 is Day 1 particular region such as Bangalore. Comparison of weather
data from 2010 to 2020, similarly next row status of different cities is undertaken. The results of the same
is discussed in section IV.
is day 2 data from 2010 to 2020. Same
applies till last row which is day 365 from
3) Time Series Forecasting using ARIMA Model
the year 2010-2020. Temporary last column
is created in data frame to hold the final In this study, we have considered past 11 years data
row-wise mean value. and trained them using ARIMA( autoregressive integrated
 Map phase: This step computes sum of all moving average) model. The trained model is used for
the values in each row and calculate ‘n’ future forecasting.ARIMA is a class of models that predicts
value, where ‘n’ is No. of columns(2010- a given time series based on its own past values. An
ARIMA model is one where the time series was differenced
2020).we use the formula,
at least once to make it stationary.
n=lit(len(df.column)-1.0) and use the value The working principle behind autoregressive (AR) model is
of it in the next step. that there is a relationship between the present value and the
 Reduce Phase: Row mean is calculated past values. It means that the present value is equal to past
using reduce function of spark ie using values adding with some random value. Moving average
(MA) model says that present value is related to the
reduce ((add,(col(x) for x in
residuals of the past. AR is not capable of forecasting
df.column[1:]))/n).alias(“11 years mean”) nonlinear data; it can be utilized for data which are linearly
and the same is displayed. related. Using AR and MA together will give best results.
 Aggregate Phase: This step Aggregates all But it can be used for stationary weather data and forecasting
mean values, place that in last column short term weather. So the proposed work considers ARIMA
model which works good for long-term rainfall prediction.
created in step 4.
We worked on ARIMA (2,0,2) for rainfall data.Following
 Finally display the aggregated precipitation steps are used for time series forecasting of rainfall using
value as a single plot. ARIMA
1. Plot the data
2. Make the data stationary
3. Identify the model technique best suited for rainfall
forecasting. In the proposed work we have used ARIMA
model.
4. Build the model
5. Compute the mean and Root Mean Squared error
(RMSE) value. Use the same for finding accuracy of
model.
6. Do the future forecasting based on accuracy of ARIMA
model.
Generalized equation used in ARIMA model is as as shown
below (1)

Y t =α + β 1 Y t −1 + β 2 Y t−2 +…+ β p Y t− p ε t +ϕ 1 ε t−1 +ϕ 2 ε t −2+ …+ϕ q ε

Fig. 1. Spark MapReduce Model to compute aggregated climate Parameters

namely precipitation and temperature
(1)
Where α is intercept term, β 1 is is the coefficient of
lag1 that the model estimates, Y t −1 is the coefficient of
2) Seasonal Analysis
In the proposed work, we have considered seasonal analysis of lag1 that the model estimates.
temperature and precipitation data of Bangalore to analyze the
state of weather during various seasons such as pre-monsoon
(March 1-May 31), monsoon (June 1 to September 30), post-
monsoon (October 1 to December 31). Various graphs were VI. EXPERIMENTAL STUDIES

5|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

The proposed work is executed in Google Colab environment.

Python code is used for implementation and necessary
libraries were imported. Following are the results. In the Figs.
2(a-c) precipitation rainfall data is read individually and
plotted as separate graph. Whereas Figs. 2(d-e) shows 5 years
and 11 years plot as single graph.

(a)

(b)
Figs. 3(a-b): Mean value for every year (Jan-Dec) using 11 years data (2010-
2020) and plotted as a single point in a graph for analysis.

(b)
Mean value has been computed for 11 years (Jan 2010-Dec
2020) using Spark MapReduce Platform and plotted as a
single graph. The Fig. 4 shows how aggregated mean value is
placed in new data frame and shows the final plot after
applying to Spark MapReduce.

(c)

Fig. 4 Final plot from aggregated mean precipitation values of 11 years (2010-
2020) using Spark MapReduce Model

(d)

TABLE I. Daily rainfall (precipitable water) dataset from the year 2010-2020

Days Pr_wtr

2010 2011 2012 …….. 2020

0 34.75 39.14 32.89 …….. 29.48

1 29.5 42.82 28.85 …….. 30

(e) 2 29.62 43.72 27.42 …….. 32.48
Figs. 2(a-e): Raw station-level 11 years daily data (of Bangalore region) has
been read and plotted individually and also as a single graph for comparison. …… …… …… …… …….. ……

Next we have computed Mean value for every year (Jan-Dec) 364 39.14 41.64 35.92 …….. 40.05
using 11 years data (2010-2020) and plotted as a single point
in a graph for analysis. The same is plotted in line and bar
graph as shown in the Figs 3(a-b)

6|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

Table II. Aggregated daily rainfall data (precipitable water) result

after processing 11 years dataset in Spark MapReduce platform.

Day Pr_wtr
0 Day 1 23.87
1 Day 2 22.26

2 Day 3 22.13

3 Day 4 22.56

4 Day 5 21.08
5 Day 6 20.73 Fig. 6. Sample view of first five rows of precipitation dataset for the year 2010
: : :

364 Day 365 25.71

The table I shows overview of daily dataset of

perceptible water (in mm) for rainfall prediction from the
years 2010 to 2020. These daily data of past 11 years has been
processed using Spark MapReduce Platform which gives
aggregated result as shown in Table II. The same result is used
in the analysis and prediction of future rainfall. (a)
The seasonal analysis of Bangalore weather using
temperature data of 2012 has been shown below in Fig. 5.
The bar plots shows pre-monsoon, monsoon and post
monsoon temperature.

(b)
Figs. 7(a-b): Preview and results of ARIMA (2 ,0, 2) Model

IV. CONCLUSION
Vast amounts of climate data are being generated rapidly by
satellite observations and numerical climate models.
Fig.5: Seasonal temperature analysis of Bangalore for the year 2012 Agriculture, tourism, water, electricity, wildfire management,
and other sectors are all require climate data. The utility of
climatic data depends on timely analysis. Existing
technologies, such as Apache Hadoop, which are based on the
The sample first five rows of precipitation dataset (pr_wtr) for idea of breaking problems down into smaller chunks and
the year 2010 is shown in Fig. 6.Time series forecasting is solving them on a cluster of commodity servers, have emerged
done using ARIMA model. We have worked on ARIMA (2, 0, as a possible solution for analysing huge climate datasets.
2 ) for rainfall data. Past 11 years rainfall data is trained using Apache Spark has recently emerged as a viable alternative to
the model and the same is used for future prediction. Figs 7(a- Hadoop's disk-based architecture. The proposed work
b) shows ARIMA forecasting results. Metrics used for considers analysis and processing of big spatiotemporal data
evaluation are Mean error (ME) and Root Mean Squared Error using Spark MapReduce platform. Multidimensional NetCDF
(RMSE). Average error is calculated as shown below in based precipitation and temperature data from NCEP and
equation (2). CSIR-4PI are considered for analysis. Analysis shows that
Spark platform is computationally more efficient (double the
Average error=ME/RMSE * 100 ------------------------ (2) No. of times) than Hadoop - MapReduce Platform of same
configuration. Monthly and seasonal analysis of climate data
The accuracy for rainfall data considered in our work using has been carried out. Time Series prediction approach such as
ARIMA (2, 0, 2) model is found to be 76.8. ARIMA (2,0,2) model was used for forecasting future rainfall
of Bangalore region, results shows that ARIMA performs
well for long term weather prediction. Performance analysis of

7|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

the model has been carried out using NetCDF data of NCEP
and CSIR-4PI Bangalore.
REFERENCES
[1] Z. Li, F. Hu, J. L. Schnase, D. Q. Duffy, T. Lee, M. K.
Bowen, and C. Yang. A spatiotemporal indexing
approach for efficient processing of big array-based
climate data with MapReduce, International Journal of
Geographical Information Science, pages 17–35, 2017
[2] Chaowei Yang, Manzhu Yu, Fei Hu,Yongyao Jiang,Yun [11] Gowtham Atluri,Anuj Karpatne, Vipin kumar,
Li, Utilizing cloud computing to address big geospatial Spatio-Temporal Data Mining: A Survey of
data challenges, Journal of Computers, Environment and Problems and Methods, ACM Computing Surveys,
Urban Systems, 2016, Volume 51, Issue 4, Article No.: 83, July 2019
[21] Ahmed Eldawy, Mohamed F. Mokbel, Pages 1–41,https://doi.org/10.1145/3161602.
http://dx.doi.org/10.1016/j.compenvurbsys.2016.10.010.
Demonstration of SpatialHadoop: An Efficient
[3] James H. Faghmous and Vipin Kumar, A big data guide [12] R. Rew, G. Davis, NetCDF: an interface for
MapReduce Framework for Spatial Data, scientific data access, Volume: 10, Issue: 4, July
to understanding climate change: The case for theory-
proceedings of the VLDB Endowment, Volume 6, 1990, pages: 76 – 82, DOI: 10.1109/38.56302
guided data science, Journal of Big Data, Vol. 2, No. 3,
Issue 12, August 2013, Pages 1230-1233, [13] Salman Salloum, Ruslan Dautov, Xiaojun Chen,
Sep 2014, Pages 155–163,
https://doi.org/10.14778/2536274.2536283. Patrick Xiaogang Peng, Joshua Zhexue Huang, Big
https://doi.org/10.1089/big.2014.0026.
[22] Joe B. Buck, Noah Watkins, Jeff LeFevre, Kleoni data analytics on Apache Spark, International
[4] Markus Götz, Christian Bodenstein, Matthias
Ioannidou, SciHadoop: Array-based query Journal of Data Science and Analytics, Springer
Richerzhagen, Gabriele Cavallaro. On Scalable Data
processing in Hadoop, Proceedings of 2011 International Publishing Switzerland 2016, Pages
Mining Techniques for Earth Science, Procedia Computer
International Conference for High Performance 145-164.
Science, December 2015, Volume 51, Pages 2188–2197.
Computing, Networking, Storage and Analysis,
[5] Jinsong Wu, Song Guo, Jie Li,Deze zeng, Big data meet [14] Abdul Salam, Internet of Things for Sustainable
November 2011, Article No.: 66, Pages 1–11 Human Health, Book chapter in Internet of Things
green challenges: Greening big data. IEEE Systems
https://doi.org/10.1145/2063384.2063473 for Sustainable Community Development. Internet
Journal, Volume: 10, Issue: 3, 19 May 2016, Pages 873 -
[23] Timothy Olatayo, A. I. Taiwo, Statistical Modelling of Things, Springer, January 2020, Pages 217-242,
887
and Prediction of Rainfall Time Series Data, Global https://doi.org/10.1007/978-3-030-35291-2_7.
[6] Ralf Hartmut Guting, M. H. Bohlen, Martin Erwig,
Journal of Computer Science and Technology:
Christian S. Jensen, Nikos A. Lorentzos, Markus [15] Pankaj Mudholkar and Megha Mudholkar, Internet
Interdisciplinary, Volume 14, Issue 1 Version1.0, of Things (IoT) and Big Data: A Review,
Schneider, and Michalis Vazirgiannis, A foundation for
2014, Online ISSN: 0975-4172 & Print ISSN: 0975- International Journal of Management, Technology
representing and querying moving objects, ACM
4350. and Engineering, Volume 8, Issue XII, December
Transactions on Database Systems (TODS), Vol. 25, No.
[24] Nikita Shivhare, Atul Kumar Rahul, Shyam Bihari 2018, ISSN NO: 2249-7455, Pages 5001-5007.
1, March 2000, Pages 1–42.
Dwivedi and Prabhat Kumar Singh Dikshit, ARIMA
[7] Sebestyén Viktor, Czvetkó Tímea, Abonyi János, The [16] Mark McKenney, Niharika Nyalakonda, Jarrod
based daily weather forecasting tool: A case study McEvers, Mitchell Shipton, Pyspatiotemporalgeom:
Applicability of Big Data in Climate Change Research:
for Varanasi, Journal Mausam 70(1), January 2019, A Python Library for Spatiotemporal Types and
The Importance of System of Systems Thinking, Frontiers
Pages 133-140. Operations, Proceedings of the 24th ACM
in Environmental Science, Volume 9, March
2021,DOI:10.3389/fenvs.2021.619092. SIGSPATIAL International Conference on
[8] Yang C., Clarke K., Shekhar S., Tao C.V, Big Advances in Geographic Information Systems,
Spatiotemporal Data Analytics: a research and innovation October 2016, Article No.: 93, Pages 1–4.
frontier, International Journal of Geographical [17] R. Rew, G. Davis, NetCDF: an interface for
Information Science, April 2020, scientific data access, IEEE Journal of Computer
https://doi.org/10.1080/13658816.2019.1698743. Graphics and Applications, Volume: 10, Issue: 4,
[9] Fei Hu, Chaowei Yang, Daniel Q. Duffy, Michael Bowen, July 1990, Pages 76 – 82.
Weiwei Song, Tsengdar Lee, Mengchao Xu and John L. [18] Dimitar Misev, Peter Baumann, Jürgen Seib,
Schnase, ClimateSpark: An in-memory distributed Towards Large-Scale Meteorological Data Services:
computing framework for big climate data analytics, A Case Study, Journal of Datenbank Spektrum-
Journal of Computers and Geosciences, March 2018, Springer, Volume 21,Issue 1,Pages183–192, 22nd
Pages 154-166, September 2012.
https://doi.org/10.1016/j.cageo.2018.03.011 [19] A. Joshi, E. Pebesma, R. Henriques, M. Appel,
[10] Christopher Bartz, Konstantinos Chasapis, Michael Kuhn, SCIDB Based Framework For Storage And
Petra Nerge & Thomas Ludwig, A Best Practice Analysis Analysis Of Remote Sensing Big Data, The
of HDF 5 and NetCDF- 4 Using Lustre, International International Archives of the Photogrammetry,
Conference on High Performance Computing, ISC High Remote Sensing and Spatial Information Sciences,
Performance 2015: High Performance Computing, Volume XLII-5/W3, Capacity Building and
volume 9137, Pages 274-281 Education Outreach in Advance Geospatial
Technologies and Land Management,pp.10–11
December 2019, Dhulikhel, Nepal
[20] Louai Alarabi, Mohamed F. Mokbel, A
Demonstration of STHadoop: A MapReduce
Framework for Big Spatiotemporal Data,
proceedings of the VLDB Endowment, Vol. 10, 8 | No.
Page
www.thesai.org 12, August 2017.
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

Timothy Olatayo, A. I. Taiwo, Rainfall Time Series Data”, Global Journal of

Computer Science an

9|Page
www.thesai.org

Anomaly Detection in Meteorological Data Using Machine Learning Techniques
No ratings yet
Anomaly Detection in Meteorological Data Using Machine Learning Techniques
6 pages
Automatic Anomaly Detection in The Cloud Via Statistical Learning
No ratings yet
Automatic Anomaly Detection in The Cloud Via Statistical Learning
13 pages
Proposing A New Methodology For Weather
No ratings yet
Proposing A New Methodology For Weather
6 pages
10 1109@iciem48762 2020 9160174
No ratings yet
10 1109@iciem48762 2020 9160174
6 pages
A Deep Learning Approach Using Graph Neural Networks For Anomaly Detection in Air Quality Data Considering Spatiotemporal Correlations
No ratings yet
A Deep Learning Approach Using Graph Neural Networks For Anomaly Detection in Air Quality Data Considering Spatiotemporal Correlations
15 pages
AI For Atomsphere
No ratings yet
AI For Atomsphere
27 pages
Silo - Tips - Environmental Conditions Big Data Management and Cloud Computing Analytics For Sustainable Agriculture
No ratings yet
Silo - Tips - Environmental Conditions Big Data Management and Cloud Computing Analytics For Sustainable Agriculture
9 pages
Environmental Conditions' Big Data Management and Cloud Computing Analytics For Sustainable Agriculture
No ratings yet
Environmental Conditions' Big Data Management and Cloud Computing Analytics For Sustainable Agriculture
9 pages
Final Report
No ratings yet
Final Report
13 pages
Big Data and Climate Change: Hossein Hassani, Xu Huang and Emmanuel Silva
No ratings yet
Big Data and Climate Change: Hossein Hassani, Xu Huang and Emmanuel Silva
17 pages
Contemporary Computing Technologies
No ratings yet
Contemporary Computing Technologies
25 pages
Rainfall Prediction System 2
No ratings yet
Rainfall Prediction System 2
6 pages
Big Earth Data Analytics Survey
No ratings yet
Big Earth Data Analytics Survey
26 pages
Anomaly Detection System in Se
No ratings yet
Anomaly Detection System in Se
13 pages
Model Based Anomaly Detection in High Dimensional DATA
No ratings yet
Model Based Anomaly Detection in High Dimensional DATA
6 pages
A - COMPARATIVE - ANALYSIS - OF - CONV (Tripathi Al., 2018)
No ratings yet
A - COMPARATIVE - ANALYSIS - OF - CONV (Tripathi Al., 2018)
7 pages
IoT Time-Series Anomaly Detection
No ratings yet
IoT Time-Series Anomaly Detection
14 pages
# Foundation Models For Weather and Climate Data Understanding A Comprehensive Survey
No ratings yet
# Foundation Models For Weather and Climate Data Understanding A Comprehensive Survey
38 pages
Smart Anomaly Detection in Sensor Systems: A Multi-Perspective Review
No ratings yet
Smart Anomaly Detection in Sensor Systems: A Multi-Perspective Review
21 pages
Spatial Interpolation For Climate Data The Use of GIS in Climatology and Meteorology Geographical Information Systems Series 1st Edition Hartwig Dobesch PDF Download
100% (1)
Spatial Interpolation For Climate Data The Use of GIS in Climatology and Meteorology Geographical Information Systems Series 1st Edition Hartwig Dobesch PDF Download
56 pages
Cloud Anomaly Detection with TopoMAD
No ratings yet
Cloud Anomaly Detection with TopoMAD
15 pages
Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
No ratings yet
Sensors: Multivariate-Time-Series-Driven Real-Time Anomaly Detection Based On Bayesian Network
13 pages
Ijsra 2024 2184
No ratings yet
Ijsra 2024 2184
19 pages
CauseFormer Interpretable Anomaly Detection With Stepwise Attention For Cloud Service
No ratings yet
CauseFormer Interpretable Anomaly Detection With Stepwise Attention For Cloud Service
16 pages
Darshan 2
No ratings yet
Darshan 2
3 pages
IoT Anomaly Detection Methods and Applications - A Survey
No ratings yet
IoT Anomaly Detection Methods and Applications - A Survey
17 pages
An Intelligent System For Effective Forest Fire
No ratings yet
An Intelligent System For Effective Forest Fire
13 pages
Atf ETH Master Thesis AD+RCA
No ratings yet
Atf ETH Master Thesis AD+RCA
43 pages
2023 10 4 2 Miletic
No ratings yet
2023 10 4 2 Miletic
12 pages
Mandan Bda
No ratings yet
Mandan Bda
17 pages
Predictive Modeling of Climate Patterns Using Deep Spatiotemporal Convolutional Networks
No ratings yet
Predictive Modeling of Climate Patterns Using Deep Spatiotemporal Convolutional Networks
1 page
Literature Survey
No ratings yet
Literature Survey
3 pages
GANGULY - 2008 - Data Mining For Climate Change and Impact
No ratings yet
GANGULY - 2008 - Data Mining For Climate Change and Impact
10 pages
2205A1L024
No ratings yet
2205A1L024
12 pages
Base Paper
No ratings yet
Base Paper
9 pages
1 s2.0 S1877050919305812 Main
No ratings yet
1 s2.0 S1877050919305812 Main
6 pages
Iot Sma
No ratings yet
Iot Sma
6 pages
Data Analysis and Visualization in Python For Polar Meteorological Data
No ratings yet
Data Analysis and Visualization in Python For Polar Meteorological Data
5 pages
2503 00036v1
No ratings yet
2503 00036v1
17 pages
A Review of Weather Conditions Monitoring System Based On Iot
No ratings yet
A Review of Weather Conditions Monitoring System Based On Iot
17 pages
Draft - Copy - Implementation of The Proposed Method
No ratings yet
Draft - Copy - Implementation of The Proposed Method
26 pages
Machine Learning Methods To Weather Forecasting To Predict Apparent Temperature
100% (1)
Machine Learning Methods To Weather Forecasting To Predict Apparent Temperature
11 pages
System Architecture Analysis - Edited
No ratings yet
System Architecture Analysis - Edited
9 pages
Anomaly Detection in Streaming Environmental Sensor Data - A Data-Driven
No ratings yet
Anomaly Detection in Streaming Environmental Sensor Data - A Data-Driven
9 pages
Identification of Dipoles in Climate Data 1
No ratings yet
Identification of Dipoles in Climate Data 1
11 pages
Anomaly Detection in Ambient Air Quality
No ratings yet
Anomaly Detection in Ambient Air Quality
9 pages
GEOSPATIAL BIG DATA ANALYTICS FOR SUSTAINABLE SMART CITIES - 2023 - International Society For Photogrammetry and Remote Sensing
No ratings yet
GEOSPATIAL BIG DATA ANALYTICS FOR SUSTAINABLE SMART CITIES - 2023 - International Society For Photogrammetry and Remote Sensing
6 pages
Cloud Burst Prediction System Using Machine Learning
No ratings yet
Cloud Burst Prediction System Using Machine Learning
6 pages
Symmetry 12 01056 With Cover
No ratings yet
Symmetry 12 01056 With Cover
16 pages
Weather Forecasting Using Big Data
No ratings yet
Weather Forecasting Using Big Data
6 pages
Innovative Way To Support Data Processing Using The Geospatial Data Science
No ratings yet
Innovative Way To Support Data Processing Using The Geospatial Data Science
8 pages
Article Analysis 1
No ratings yet
Article Analysis 1
7 pages
Real-Time Anomaly Detection For Streaming Analytics
No ratings yet
Real-Time Anomaly Detection For Streaming Analytics
10 pages
Conference Template A4
No ratings yet
Conference Template A4
2 pages
2020 Srinivasan
No ratings yet
2020 Srinivasan
16 pages
Exploring Temperature Dynamics in Madhya Pradesh: A Spatial Temporal Analysis
No ratings yet
Exploring Temperature Dynamics in Madhya Pradesh: A Spatial Temporal Analysis
21 pages
Climate Data Management Systems: Systematic Review of Analytical Tools For Informing Policy Decisions
No ratings yet
Climate Data Management Systems: Systematic Review of Analytical Tools For Informing Policy Decisions
21 pages
1312 2808 PDF
No ratings yet
1312 2808 PDF
5 pages
Synopsis "Time Series Geospatial Big Data Analysis Using Array Database"
No ratings yet
Synopsis "Time Series Geospatial Big Data Analysis Using Array Database"
5 pages
Lesson Plan - UHV - 2025
No ratings yet
Lesson Plan - UHV - 2025
3 pages
Human Aspiration: Happiness & Prosperity
No ratings yet
Human Aspiration: Happiness & Prosperity
93 pages
Uhv 1
No ratings yet
Uhv 1
34 pages
Module 1 CO & CA
No ratings yet
Module 1 CO & CA
65 pages
OS Notes Module 2
No ratings yet
OS Notes Module 2
24 pages
Memory Systems-Module 3
No ratings yet
Memory Systems-Module 3
79 pages
Module 1 Fundamentals
No ratings yet
Module 1 Fundamentals
73 pages
Java and J2EE Lab Guide
No ratings yet
Java and J2EE Lab Guide
15 pages
Module 2-Basics of Wireless Networks
No ratings yet
Module 2-Basics of Wireless Networks
52 pages
360 Users Manual
No ratings yet
360 Users Manual
548 pages
Envy Tutorial
No ratings yet
Envy Tutorial
658 pages
EED2-TP-030 Rev01 HEG UsersGuide 2.15
No ratings yet
EED2-TP-030 Rev01 HEG UsersGuide 2.15
118 pages
0 JGGJJK
No ratings yet
0 JGGJJK
176 pages
Hawk T1A - Rogers
No ratings yet
Hawk T1A - Rogers
23 pages
Installing and Licensing IDL 6 3
No ratings yet
Installing and Licensing IDL 6 3
124 pages
Docs H5py Org en Latest
No ratings yet
Docs H5py Org en Latest
126 pages
ASTER Higher-Level Product User Guide: Advanced Spaceborne Thermal Emission and Reflection Radiometer
No ratings yet
ASTER Higher-Level Product User Guide: Advanced Spaceborne Thermal Emission and Reflection Radiometer
80 pages
Python for Data-Driven Programmers
100% (3)
Python for Data-Driven Programmers
49 pages
Adrián Lugo Bendezú (Senior, University of Puerto Rico Río Piedras Campus) Mentor: Tristan Goulden (AOP)
No ratings yet
Adrián Lugo Bendezú (Senior, University of Puerto Rico Río Piedras Campus) Mentor: Tristan Goulden (AOP)
1 page
Grads and Hdf5
No ratings yet
Grads and Hdf5
20 pages
Remote Admin Training
No ratings yet
Remote Admin Training
196 pages
NCL and NetCDF PDF
No ratings yet
NCL and NetCDF PDF
158 pages
Session2 - Advanced Webinar Data Analysis Tools For High Resolution Air Quality Satellite Datasets PDF
No ratings yet
Session2 - Advanced Webinar Data Analysis Tools For High Resolution Air Quality Satellite Datasets PDF
43 pages
Envi Tutorial
No ratings yet
Envi Tutorial
590 pages
Dask For Parallel Computing Cheat Sheet
No ratings yet
Dask For Parallel Computing Cheat Sheet
2 pages
HDF5 Intro
No ratings yet
HDF5 Intro
25 pages
WRFDA Installation & Compilation Guide
No ratings yet
WRFDA Installation & Compilation Guide
7 pages
HDF5 and H5py
No ratings yet
HDF5 and H5py
26 pages
Weather Radar Data Model Guide
No ratings yet
Weather Radar Data Model Guide
25 pages
Unit Iv: Data Reception AND Data Products
No ratings yet
Unit Iv: Data Reception AND Data Products
23 pages
Pymodis Documentation: Release 2.0.6
No ratings yet
Pymodis Documentation: Release 2.0.6
54 pages
ASTER Images: Background
No ratings yet
ASTER Images: Background
6 pages
HSPICE Compatibility
No ratings yet
HSPICE Compatibility
41 pages
Lung Cancer
No ratings yet
Lung Cancer
70 pages
Field3d: An Open Source File Format For Voxel Data
No ratings yet
Field3d: An Open Source File Format For Voxel Data
25 pages
PIE-Basic User Manual
No ratings yet
PIE-Basic User Manual
417 pages
SOFA Specs 0.6 PDF
No ratings yet
SOFA Specs 0.6 PDF
17 pages
Unidata'S Common Data Model and Netcdf Java Library Api: John Caron Unidata/Ucar Nov 2008
No ratings yet
Unidata'S Common Data Model and Netcdf Java Library Api: John Caron Unidata/Ucar Nov 2008
51 pages
Climate Data Operators Guide
No ratings yet
Climate Data Operators Guide
164 pages

Final - IJCSAPaper 21 07 2022 Updated

Uploaded by

Final - IJCSAPaper 21 07 2022 Updated

Uploaded by

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 13, No. 5, 2022

Anomaly Detection for Spatiotemporal rainfall

and data formats such as Network Common Data Format-4

Y t =α + β 1 Y t −1 + β 2 Y t−2 +…+ β p Y t− p ε t +ϕ 1 ε t−1 +ϕ 2 ε t −2+ …+ϕ q ε

Fig. 1. Spark MapReduce Model to compute aggregated climate Parameters

The proposed work is executed in Google Colab environment.

2010 2011 2012 …….. 2020

0 34.75 39.14 32.89 …….. 29.48

1 29.5 42.82 28.85 …….. 30

Table II. Aggregated daily rainfall data (precipitable water) result

364 Day 365 25.71

The table I shows overview of daily dataset of

Timothy Olatayo, A. I. Taiwo, Rainfall Time Series Data”, Global Journal of

You might also like