0% found this document useful (0 votes)
21 views9 pages

Final - IJCSAPaper 21 07 2022 Updated

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

Final - IJCSAPaper 21 07 2022 Updated

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 13, No. 5, 2022

Anomaly Detection for Spatiotemporal rainfall


data in Cloud Computing Environment.
Radhika T V1
Dept. of Computer Science & Engineering
RNS Institute of Technology
Bengaluru, India
[email protected]
Dr. S Sathish Kumar3
Dept. of Information Science & Engineering
RNS Institute of Technology
Bengaluru, India
Dr.K C Gouda2 [email protected]
Council of Scientific & Industrial Research(CSIR)
Fourth Paradigm Institute(4PI)
Bengaluru, India
[email protected]

Abstract— Anomaly detection is a critical task for to climate data analysis, modeling and prediction [3]. Many
maintaining the performance of a cloud system as it helps to exhaustive big data analytics applications have evolved based
identify unusual behavior or patterns in large datasets that may on big climate data, also the emergence of various
indicate security threats or system failures. Traditional techniques for technologies such as Internet of Things (IoT), cloud
anomaly detection can be time-consuming and resource-intensive, computing and many advanced Big Data analytics tools have
especially when dealing with large volumes of data. Our proposed
begun to investigate on climate, as well as established various
cloud computing technique aims to address these challenges by
providing a scalable and efficient solution for detecting anomalies in
intelligent analytic platforms and new technological
real-time.One of the main motivations for developing our proposed advancements have further emphasized its importance and
technique is to improve the security and reliability of cloud-based potential impacts on climate science and Big data science
systems. By continuously monitoring data streams and identifying development [14][15].
anomalies, we can detect potential threats and take appropriate action
to mitigate them. Additionally, our technique can help to identify and Traditional Big Data techniques are usually incapable of
resolve system failures or performance issues in a timely manner, handling large amounts of spatiotemporal data. For example,
ensuring that cloud-based systems remain available and research has added spatial indexing, spatiotemporal indexing
efficient.Overall, our proposed cloud computing technique for
[1] and trajectory analytics features to Hadoop. One of the
anomaly detection represents a significant advance in the field, and
we believe it has the potential to make a significant impact on the basic idea of using spatiotemporal data, with respect to large
security and reliability of cloud-based systems. Our work considers spatial database systems, is the emergence of moving objects
the problem of detecting anomalous (abnormal or unexpected) [4]. A moving object is a spatial object that varies in
behavior in the global climate system, discovering teleconnection geographical position or dimensions over a time period. For
patterns and providing consequential insights to the analysts. Example, Rainfall in one region differs from others; also a
Keywords— Anomaly detection, Big Climate Data,
river that switches its path over a geologic time scale may be
represented as a moving line. Moving region can also be
I Introduction indicated by a hurricane which switches its dimension and
geographic position as it evolves [16]. Thus there is a need for
Big climate data are preferably provided to scientists for High Performance Computing (HPC) environment to process
on-demand processing and for analyzing critical problems big spatiotemporal data.
which may help them to relieve from time consuming
computational tasks. Since processing of big climate data The number of cores in HPC environment is persistently
requires efficient data management approaches, scalable getting increased depending on the application’s requirement.
computing resources and complex parallel computing Since these applications generate large volumes of
algorithms, so dealing with this problem is considered to be spatiotemporal data, that will ultimately be stored and
more challenging task. To address these kinds of challenges, accessed in parallel [15]. The Scientific applications like
high performance computing technologies have been applied weather prediction models use standard high-level libraries

1|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

and data formats such as Network Common Data Format-4


(NetCDF- 4) and Hierarchical Data Format 5(HDF 5), which Spark is considered to be one of the excellent platform for
will helps to store and operate on the dataset that is situated Data Scientists as it has number of data-centric tools which
inside a parallel file system interface. There are various file may assist the data scientists to move forward ahead of the
formats and software libraries in order to reduce the problems that is pertinent in a single machine and also it assist
restrictions imposed by plain binary files. Because of which data engineers since it has an integrated method that takes out
NetCDF file format has been introduced for systematic the need to utilize various special-purpose tools for streaming,
reading and writing of various kinds of scientific data, mainly machine learning, and graph analytics [13]. More importantly
for array data. NetCDF file is composed of various kinds of Spark is very essential for researchers, as the platform fosters
data, which includes BYTE, CHAR, SHORT, LONG, new opportunities and ideas to design and develop distributed
FLOAT, and DOUBLE. The main intention of NetCDF file is algorithms and also to test their performance in various
to store rectangular arrays of data such as Interactive Data clusters.
Language (IDL) arrays [12]. NetCDF files are self-descriptive;
that is, every file consists of the basic information required to The rest of this paper is organized as follows: Section 2
read. provides overview of various research on Hadoop based
approaches to process array-based multidimensional
Big spatiotemporal data have gained huge attention in spatiotemporal data; Section 3 presents our proposed Spark-
recent years. Analyzing such massive amount of based approach to process multidimensional spatiotemporal
multidimensional data is one of the most common data and provides highlights on prediction of rainfall using
requirements today and processing of this data is considered to ARIMA model; Section 4 describes evaluation results of our
be most challenging task. The ability to assess global concerns proposed work by performing sequence of experiments;
such as climate change and natural disasters, as well as their finally, Section 6 gives summary of the proposed research and
influence on different sectors such as agriculture and disease, envisage on future enhancement.
requires efficient data processing. This is challenging not only
because of the large data volume, but also because of the
intrinsic high-dimensional nature of the climate data. The II. BACKGROUND STUDY
emergence of Apache sparks provides quicker solution for big Big Data analytics has evolved with advanced opportunities
spatiotemporal data analysis and processing speed has reduced for research, development, business and innovation. It has been
drastically compared to the traditional way of processing identified by four Vs: volume, velocity, veracity and variety
multidimensional data with multi core processors. and may deliver value via processing of Big Data [2]. The
conversion of these 4 Vs into the 5th (value) is one of the
In the proposed work we have used Spark MapReduce magnificent challenge for processing capacity. The emergence
Framework for processing of big spatiotemporal data at of Cloud Computing as a new standard is to provide computing
multiple spatial and temporal scales. Also time series ARIMA as a utility service is to deal with various processing needs such
model is being used for rainfall prediction of Bengaluru as on demand services, pooled resources, elasticity, broad band
access and measured services. The capability of delivering
region.
computing capacity promotes a possible solution for the
In the proposed work we have considered time series rainfall
conversion of Big Data's 4 Vs into the 5th (value). The
data, we have read rainfall data (precipitation) of past 20 continuously increasing volume of big data has accelerated
years (2000-2020) and identified the Box-Jenkins time series technological developments and practical applications.
seasonal ARIMA (Auto Regression Integrated Moving
Average) approach for prediction of rainfall on monthly Earth is composed of complex dynamic system; as big data
scales. Seasonal ARIMA model (0, 0, 0) (0, 2, 0) for rainfall analytics works with vast amounts of climate data, it poses
was identified the best model to forecast rainfall for next 5 greater challenges in climate research than in any other field
years with confidence level of 76 percent by analysing last 20 [3]. Climate change is the present concern throughout globe
year’s data (2000-2020). and also a data-intensive subject, making it one of the main
research area for big data experts in recent decades [4]. The
Apache Spark is an integrated platform for cluster anomalous growth of climate data makes climate data to be a
computing to facilitate efficient big data management and candidate in the Big Data research. The climate scientists have
analytics [13]. It is a non-proprietary, distributed computing been exploring on historic data to understand the physics and
scheme which enhances the MapReduce framework. Spark dynamics, merge millions & billions of daily global
system is made of various main modules including Spark core observational records and undertake simulations of various
and various high level libraries such as Spark’s MLlib for climate-change scenarios, all of which leads to huge volumes
machine learning, GraphX for graph analysis, Spark of data [8].
Streaming for stream processing and Spark SQL for structured
data processing [17]. It functions as a consolidated tool for Extremities in climate such as floods, droughts, and cold
Machine learning, SQL, Streaming and Graph processing and and heat-waves may lead to considerable impact on society,
it supports batch, interactive and stream processing. ecology and also on the economy globally. Thus

2|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

spatiotemporal data acquisition, analysis, management and These real time standard software and data formats have added
processing are considered to be more important, which will be major benefits to store, acquire, examine and exchange climate
helpful for various sectoral applications. Spatiotemporal data data. Also there are number of tools available for performing
refers to the data which is connected to both space and time climate data analytics and visualization, one of such tool is
and is considered to be at least 2-dimensional and often 3- Apache Open Climate Workbench, a Python-based tool to
dimensional, such that the volume of data gets increased at carry out interpretations on climate science employing remote
tremendous speed [8]. Since the general database cannot sensing rainfall data taken away from various sources and also
manage such large volumes of data, there is a need of large using climate model outputs.
database software to play a significant role in the management
Since the above mentioned tools and libraries deal
of spatiotemporal data. Big data is collected from a range of
with only discrete machines and have restrictions on cloud
sources, archived, and processed in a variety of computing
computing systems, compatibility with HPC and scalability.
modes, including cloud computing, mobile computing, edge
The absence of proper libraries leads to difficulty in dealing
computing, and wearable computing.
with variety, veracity, format and resolution of Big Climate
Data that give rise to a challenge in the emergence of
Spatiotemporal data mining is the process of identifying
advanced computing technologies.
interesting patterns and critical information from
spatiotemporal data. Discovering weather patterns, 1) Big Climate Data Management and Analytics
anticipating earthquakes and storms, exposing the progressive
history of towns and regions, and identifying global warming In [19] authors have presented a case study
trends are the examples of such processes. The unusual rise in supervised by Deutscher Wetterdienst (DWD) which
spatiotemporal data, combined with the introduction of new includes storage of array based multidimensional raster
technologies, has increased the demand for automatic data with hands-on exposure on extraction and processing
spatiotemporal knowledge realization. Spatiotemporal data of gridded meteorological data sets. As the big data acquire
mining techniques are very much essential for many various challenges such as repositioning, managing and
organizations which take decisions based on huge processing with high computational requirements. One of
spatiotemporal datasets. As these data are multidimensional in the key resolutions to this is achieved through the database
nature, the complexities of such data and their system having the capability of parallel processing and
interrelationships create computational and statistical distributed storage. In [20] authors have conducted a study
challenges [11]. on processing of the multi-temporal satellite image data
using SciDB, which is an array-based database mainly
Researchers of climate science have been exposed to used to accumulate, manage and perform computations on
ample of recognized resources of Big climate data for analysis such data. The main goal of the proposed work is to
and prediction ,for instance, the NASA Global Climate provide elastic solution using SciDB to accumulate and
Change (climate.nasa.gov), the Climate Observing System execute time series analysis on multi-temporal satellite
(GCOS), NASA Center for Climate Simulation imagery.
(nccs.nasa.gov), Global Earth System Grid Federation
(esgf.llnl.gov), the National Center for Atmospheric Research In [22] authors have illustrated the working of
(ncar.ucar.edu), United Nations Global Pulse SpatialHadoop, It is regarded as one of the first capable
(unglobalpulse.org), the Climate Data Guide open-source MapReduce frameworks to support
(climatedataguide.ucar.edu), and many other international and spatiotemporal data. The working of ST-Hadoop have
national climate analysis and monitoring centers over the been illustrated in [21],which has given a support for
world. spatio-temporal data and considered to be one of the first
proficient open-source MapReduce framework. In [23]
Multi-dimensional, array-based data model are mainly authors have introduced SciHadoopa, a Hadoop plugin that
used to represent Climate data .The GRIB, HDF and NetCDF aids scientists in identifying logical queries in data models
are the three most commonly used data formats to store based on arrays. SciHadoop was used to run queries as
climate data. HDF5/NetCDF4 was mainly developed to enable map/reduce programs over the logical data model. Authors
support for nested structures, ragged arrays, unsigned data have shown implementation of a SciHadoop paradigm for
types, chunking data structure, and caching techniques which NetCDF data and evaluate the performance of five separate
ultimately helps to systematically organize climate science optimizations that address the following goals representing
data and to have control over the changing computer models an integrated aggregate function query
[19]. Meanwhile in order to flexibly use data as multi-
dimensional arrays, many software and libraries such as 2) Time Series analysis for rainfall Prediction
Panoply, h5py, and NetCDF-Java were introduced.
Time series analysis is a statistical technique that
deals with time series data, or trend analysis. Time series
data means that data is in a series of particular time periods

3|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

or intervals. The data is considered to be in three types, move forward ahead of the problems that is pertinent in a
such time series data which includes set of observations on single machine.
the values that a variable takes at different times, Cross-
sectional data which is the data made of one or more 1) Data Analysis and processing using spark MapReduce
variables, collected at the same point in time, Pooled data
which is a combination of time series data and cross- As weather data is considered to be multidimensional
sectional data. Various research groups attempted to array based, so in the proposed work, we have considered
predict rainfall on a seasonal time scales using different precipitation and temperature data of Bangalore for rainfall
techniques. Below we have discussed existing work done prediction and also seasonal weather analysis has been
related to rainfall prediction using ARIMA. carried out on other states of India. Various
experimentation are carried out by reading, analyzing and
Climate and rainfall are highly non-linear and processing the data. NetCDF data are procured from
complicated phenomena, which require classical, modern National Center for Environmental Prediction (NCEP) &
and detailed models to obtain accurate prediction. Authors India Meteorological Department (IMD) has been used.
in [24] have considered various statistical models for Following are the work carried out.
prediction of rainfall time series data for designing a  Initially Raw station-level NetCDF based temperature
model, models such as the statistical method based on and precipitation data of Bangalore district which is
autoregressive integrated moving average (ARIMA), the located between 12o latitude and 77o longitude has
emerging fuzzy time series(FST) model and the non-
been read in the Google Colab Environment. The data
parametric method(Theil’s regression) were used. To
evaluate the prediction efficiency, they have used 31 years considered for analysis is from Jan 2010-Dec 2020
of annual rainfall data from year 1982 to 2012 of Ibadan (11 years data). Data from each year are displayed as
South West, Nigeria. ARIMA (1, 2, 1) was used to derive a single plot and also 11 years data is also plotted as
the weights and the regression coefficients, while the single graph to analyze past 11 years data and use it
theil’s regression was used to fit a linear model. The for processing to assist in future prediction.
performance of the model was evaluated using Mean  Mean value has been computed for every year (Jan-
Squared Forecast Error (MAE), Root Mean Square
Dec) using past 11 years data and plotted as a single
Forecast Error (RMSE) and Coefficient of determination.
point in a graph for analysis.
To forecast future climatic data, the ARIMA model  Mean value has been computed for all 11 years using
was utilized. The authors in [25] have proposed ARIMA Spark MapReduce Platform and plotted as a single
based daily weather forecasting tool which they have graph. This step is considered to be more important
considered as case study for predicting weather of as data is effectively processed using spark
varanasi. The authors have implemented the ARIMA MapReduce platform for analysis and future weather
algorithm in R to create an ARIMA-based weather
prediction. The detailed diagram illustrating how the
forecasting tool. The Indian Meteorological Department
provided 65 years of daily meteorological data (1951- data is processed using spark platform can be seen in
2015) for this study. The accuracy of the model was (Fig.1). Following are the steps
calculated according to the root mean square error (RMSE)  First step is to import and execute main
estimated for each forecasting. They approximated future library files for setting up Spark MapReduce
values for the following fifteen years using ARIMA (2, 0, functions in google colab environment.
2) for rainfall data and ARIMA (2, 1, 3) for temperature  Raw station level precipitation data (pr_wtr)
data. The root means square error values for rainfall and
temperature data were 0.0948 and 0.085, respectively, of banglore (from Jan 2010-Dec 2020) is
indicating that the technique functioned correctly. The read individually and data frame for each
outcome of this can be further used for the management of year are created.
solar cell station, agriculture, natural resources and  New data frame is created by adding years
tourism. The error is regarded to be minimal by observing as columns (total No. is 12) ie from 2010-
at the values of RMSE, indicating that the ARIMA model 2020 and values of corresponding year are
has forecasted the data properly.
placed in the appropriate place and convert
III. RESEARCH METHODOLOGY the datframe to .csv file.
In the proposed work we have considered Spark  Split Data: As spark MapReduce works by
MapReduce framework which is considered to be one of the splitting the data and assigning key-value
excellent platform for Data Scientists as it has number of pairs. (key is day and value is pr_wtr). In
data-centric tools which may assist the data scientists to this step, we split the data row wise, and
perform read operation using

4|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

spark.read.option() function. Each row refers shown to illustrate season-wise analysis of the weather of
to daily data of every year i.e row 1 is Day 1 particular region such as Bangalore. Comparison of weather
data from 2010 to 2020, similarly next row status of different cities is undertaken. The results of the same
is discussed in section IV.
is day 2 data from 2010 to 2020. Same
applies till last row which is day 365 from
3) Time Series Forecasting using ARIMA Model
the year 2010-2020. Temporary last column
is created in data frame to hold the final In this study, we have considered past 11 years data
row-wise mean value. and trained them using ARIMA( autoregressive integrated
 Map phase: This step computes sum of all moving average) model. The trained model is used for
the values in each row and calculate ‘n’ future forecasting.ARIMA is a class of models that predicts
value, where ‘n’ is No. of columns(2010- a given time series based on its own past values. An
ARIMA model is one where the time series was differenced
2020).we use the formula,
at least once to make it stationary.
n=lit(len(df.column)-1.0) and use the value The working principle behind autoregressive (AR) model is
of it in the next step. that there is a relationship between the present value and the
 Reduce Phase: Row mean is calculated past values. It means that the present value is equal to past
using reduce function of spark ie using values adding with some random value. Moving average
(MA) model says that present value is related to the
reduce ((add,(col(x) for x in
residuals of the past. AR is not capable of forecasting
df.column[1:]))/n).alias(“11 years mean”) nonlinear data; it can be utilized for data which are linearly
and the same is displayed. related. Using AR and MA together will give best results.
 Aggregate Phase: This step Aggregates all But it can be used for stationary weather data and forecasting
mean values, place that in last column short term weather. So the proposed work considers ARIMA
model which works good for long-term rainfall prediction.
created in step 4.
We worked on ARIMA (2,0,2) for rainfall data.Following
 Finally display the aggregated precipitation steps are used for time series forecasting of rainfall using
value as a single plot. ARIMA
1. Plot the data
2. Make the data stationary
3. Identify the model technique best suited for rainfall
forecasting. In the proposed work we have used ARIMA
model.
4. Build the model
5. Compute the mean and Root Mean Squared error
(RMSE) value. Use the same for finding accuracy of
model.
6. Do the future forecasting based on accuracy of ARIMA
model.
Generalized equation used in ARIMA model is as as shown
below (1)

Y t =α + β 1 Y t −1 + β 2 Y t−2 +…+ β p Y t− p ε t +ϕ 1 ε t−1 +ϕ 2 ε t −2+ …+ϕ q ε

Fig. 1. Spark MapReduce Model to compute aggregated climate Parameters


namely precipitation and temperature
(1)
Where α is intercept term, β 1 is is the coefficient of
lag1 that the model estimates, Y t −1 is the coefficient of
2) Seasonal Analysis
In the proposed work, we have considered seasonal analysis of lag1 that the model estimates.
temperature and precipitation data of Bangalore to analyze the
state of weather during various seasons such as pre-monsoon
(March 1-May 31), monsoon (June 1 to September 30), post-
monsoon (October 1 to December 31). Various graphs were VI. EXPERIMENTAL STUDIES

5|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

The proposed work is executed in Google Colab environment.


Python code is used for implementation and necessary
libraries were imported. Following are the results. In the Figs.
2(a-c) precipitation rainfall data is read individually and
plotted as separate graph. Whereas Figs. 2(d-e) shows 5 years
and 11 years plot as single graph.

(a)

(a)

(b)
Figs. 3(a-b): Mean value for every year (Jan-Dec) using 11 years data (2010-
2020) and plotted as a single point in a graph for analysis.

(b)
Mean value has been computed for 11 years (Jan 2010-Dec
2020) using Spark MapReduce Platform and plotted as a
single graph. The Fig. 4 shows how aggregated mean value is
placed in new data frame and shows the final plot after
applying to Spark MapReduce.

(c)

Fig. 4 Final plot from aggregated mean precipitation values of 11 years (2010-
2020) using Spark MapReduce Model

(d)

TABLE I. Daily rainfall (precipitable water) dataset from the year 2010-2020

Days Pr_wtr

2010 2011 2012 …….. 2020

0 34.75 39.14 32.89 …….. 29.48

1 29.5 42.82 28.85 …….. 30


(e) 2 29.62 43.72 27.42 …….. 32.48
Figs. 2(a-e): Raw station-level 11 years daily data (of Bangalore region) has
been read and plotted individually and also as a single graph for comparison. …… …… …… …… …….. ……

Next we have computed Mean value for every year (Jan-Dec) 364 39.14 41.64 35.92 …….. 40.05
using 11 years data (2010-2020) and plotted as a single point
in a graph for analysis. The same is plotted in line and bar
graph as shown in the Figs 3(a-b)

6|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

Table II. Aggregated daily rainfall data (precipitable water) result


after processing 11 years dataset in Spark MapReduce platform.

Day Pr_wtr
0 Day 1 23.87
1 Day 2 22.26

2 Day 3 22.13

3 Day 4 22.56

4 Day 5 21.08
5 Day 6 20.73 Fig. 6. Sample view of first five rows of precipitation dataset for the year 2010
: : :

364 Day 365 25.71

The table I shows overview of daily dataset of


perceptible water (in mm) for rainfall prediction from the
years 2010 to 2020. These daily data of past 11 years has been
processed using Spark MapReduce Platform which gives
aggregated result as shown in Table II. The same result is used
in the analysis and prediction of future rainfall. (a)
The seasonal analysis of Bangalore weather using
temperature data of 2012 has been shown below in Fig. 5.
The bar plots shows pre-monsoon, monsoon and post
monsoon temperature.

(b)
Figs. 7(a-b): Preview and results of ARIMA (2 ,0, 2) Model

IV. CONCLUSION
Vast amounts of climate data are being generated rapidly by
satellite observations and numerical climate models.
Fig.5: Seasonal temperature analysis of Bangalore for the year 2012 Agriculture, tourism, water, electricity, wildfire management,
and other sectors are all require climate data. The utility of
climatic data depends on timely analysis. Existing
technologies, such as Apache Hadoop, which are based on the
The sample first five rows of precipitation dataset (pr_wtr) for idea of breaking problems down into smaller chunks and
the year 2010 is shown in Fig. 6.Time series forecasting is solving them on a cluster of commodity servers, have emerged
done using ARIMA model. We have worked on ARIMA (2, 0, as a possible solution for analysing huge climate datasets.
2 ) for rainfall data. Past 11 years rainfall data is trained using Apache Spark has recently emerged as a viable alternative to
the model and the same is used for future prediction. Figs 7(a- Hadoop's disk-based architecture. The proposed work
b) shows ARIMA forecasting results. Metrics used for considers analysis and processing of big spatiotemporal data
evaluation are Mean error (ME) and Root Mean Squared Error using Spark MapReduce platform. Multidimensional NetCDF
(RMSE). Average error is calculated as shown below in based precipitation and temperature data from NCEP and
equation (2). CSIR-4PI are considered for analysis. Analysis shows that
Spark platform is computationally more efficient (double the
Average error=ME/RMSE * 100 ------------------------ (2) No. of times) than Hadoop - MapReduce Platform of same
configuration. Monthly and seasonal analysis of climate data
The accuracy for rainfall data considered in our work using has been carried out. Time Series prediction approach such as
ARIMA (2, 0, 2) model is found to be 76.8. ARIMA (2,0,2) model was used for forecasting future rainfall
of Bangalore region, results shows that ARIMA performs
well for long term weather prediction. Performance analysis of

7|Page
www.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

the model has been carried out using NetCDF data of NCEP
and CSIR-4PI Bangalore.
REFERENCES
[1] Z. Li, F. Hu, J. L. Schnase, D. Q. Duffy, T. Lee, M. K.
Bowen, and C. Yang. A spatiotemporal indexing
approach for efficient processing of big array-based
climate data with MapReduce, International Journal of
Geographical Information Science, pages 17–35, 2017
[2] Chaowei Yang, Manzhu Yu, Fei Hu,Yongyao Jiang,Yun [11] Gowtham Atluri,Anuj Karpatne, Vipin kumar,
Li, Utilizing cloud computing to address big geospatial Spatio-Temporal Data Mining: A Survey of
data challenges, Journal of Computers, Environment and Problems and Methods, ACM Computing Surveys,
Urban Systems, 2016, Volume 51, Issue 4, Article No.: 83, July 2019
[21] Ahmed Eldawy, Mohamed F. Mokbel, Pages 1–41,https://doi.org/10.1145/3161602.
http://dx.doi.org/10.1016/j.compenvurbsys.2016.10.010.
Demonstration of SpatialHadoop: An Efficient
[3] James H. Faghmous and Vipin Kumar, A big data guide [12] R. Rew, G. Davis, NetCDF: an interface for
MapReduce Framework for Spatial Data, scientific data access, Volume: 10, Issue: 4, July
to understanding climate change: The case for theory-
proceedings of the VLDB Endowment, Volume 6, 1990, pages: 76 – 82, DOI: 10.1109/38.56302
guided data science, Journal of Big Data, Vol. 2, No. 3,
Issue 12, August 2013, Pages 1230-1233, [13] Salman Salloum, Ruslan Dautov, Xiaojun Chen,
Sep 2014, Pages 155–163,
https://doi.org/10.14778/2536274.2536283. Patrick Xiaogang Peng, Joshua Zhexue Huang, Big
https://doi.org/10.1089/big.2014.0026.
[22] Joe B. Buck, Noah Watkins, Jeff LeFevre, Kleoni data analytics on Apache Spark, International
[4] Markus Götz, Christian Bodenstein, Matthias
Ioannidou, SciHadoop: Array-based query Journal of Data Science and Analytics, Springer
Richerzhagen, Gabriele Cavallaro. On Scalable Data
processing in Hadoop, Proceedings of 2011 International Publishing Switzerland 2016, Pages
Mining Techniques for Earth Science, Procedia Computer
International Conference for High Performance 145-164.
Science, December 2015, Volume 51, Pages 2188–2197.
Computing, Networking, Storage and Analysis,
[5] Jinsong Wu, Song Guo, Jie Li,Deze zeng, Big data meet [14] Abdul Salam, Internet of Things for Sustainable
November 2011, Article No.: 66, Pages 1–11 Human Health, Book chapter in Internet of Things
green challenges: Greening big data. IEEE Systems
https://doi.org/10.1145/2063384.2063473 for Sustainable Community Development. Internet
Journal, Volume: 10, Issue: 3, 19 May 2016, Pages 873 -
[23] Timothy Olatayo, A. I. Taiwo, Statistical Modelling of Things, Springer, January 2020, Pages 217-242,
887
and Prediction of Rainfall Time Series Data, Global https://doi.org/10.1007/978-3-030-35291-2_7.
[6] Ralf Hartmut Guting, M. H. Bohlen, Martin Erwig,
Journal of Computer Science and Technology:
Christian S. Jensen, Nikos A. Lorentzos, Markus [15] Pankaj Mudholkar and Megha Mudholkar, Internet
Interdisciplinary, Volume 14, Issue 1 Version1.0, of Things (IoT) and Big Data: A Review,
Schneider, and Michalis Vazirgiannis, A foundation for
2014, Online ISSN: 0975-4172 & Print ISSN: 0975- International Journal of Management, Technology
representing and querying moving objects, ACM
4350. and Engineering, Volume 8, Issue XII, December
Transactions on Database Systems (TODS), Vol. 25, No.
[24] Nikita Shivhare, Atul Kumar Rahul, Shyam Bihari 2018, ISSN NO: 2249-7455, Pages 5001-5007.
1, March 2000, Pages 1–42.
Dwivedi and Prabhat Kumar Singh Dikshit, ARIMA
[7] Sebestyén Viktor, Czvetkó Tímea, Abonyi János, The [16] Mark McKenney, Niharika Nyalakonda, Jarrod
based daily weather forecasting tool: A case study McEvers, Mitchell Shipton, Pyspatiotemporalgeom:
Applicability of Big Data in Climate Change Research:
for Varanasi, Journal Mausam 70(1), January 2019, A Python Library for Spatiotemporal Types and
The Importance of System of Systems Thinking, Frontiers
Pages 133-140. Operations, Proceedings of the 24th ACM
in Environmental Science, Volume 9, March
2021,DOI:10.3389/fenvs.2021.619092. SIGSPATIAL International Conference on
[8] Yang C., Clarke K., Shekhar S., Tao C.V, Big Advances in Geographic Information Systems,
Spatiotemporal Data Analytics: a research and innovation October 2016, Article No.: 93, Pages 1–4.
frontier, International Journal of Geographical [17] R. Rew, G. Davis, NetCDF: an interface for
Information Science, April 2020, scientific data access, IEEE Journal of Computer
https://doi.org/10.1080/13658816.2019.1698743. Graphics and Applications, Volume: 10, Issue: 4,
[9] Fei Hu, Chaowei Yang, Daniel Q. Duffy, Michael Bowen, July 1990, Pages 76 – 82.
Weiwei Song, Tsengdar Lee, Mengchao Xu and John L. [18] Dimitar Misev, Peter Baumann, Jürgen Seib,
Schnase, ClimateSpark: An in-memory distributed Towards Large-Scale Meteorological Data Services:
computing framework for big climate data analytics, A Case Study, Journal of Datenbank Spektrum-
Journal of Computers and Geosciences, March 2018, Springer, Volume 21,Issue 1,Pages183–192, 22nd
Pages 154-166, September 2012.
https://doi.org/10.1016/j.cageo.2018.03.011 [19] A. Joshi, E. Pebesma, R. Henriques, M. Appel,
[10] Christopher Bartz, Konstantinos Chasapis, Michael Kuhn, SCIDB Based Framework For Storage And
Petra Nerge & Thomas Ludwig, A Best Practice Analysis Analysis Of Remote Sensing Big Data, The
of HDF 5 and NetCDF- 4 Using Lustre, International International Archives of the Photogrammetry,
Conference on High Performance Computing, ISC High Remote Sensing and Spatial Information Sciences,
Performance 2015: High Performance Computing, Volume XLII-5/W3, Capacity Building and
volume 9137, Pages 274-281 Education Outreach in Advance Geospatial
Technologies and Land Management,pp.10–11
December 2019, Dhulikhel, Nepal
[20] Louai Alarabi, Mohamed F. Mokbel, A
Demonstration of STHadoop: A MapReduce
Framework for Big Spatiotemporal Data,
proceedings of the VLDB Endowment, Vol. 10, 8 | No.
Page
www.thesai.org 12, August 2017.
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 13, No. 5, 2022

Timothy Olatayo, A. I. Taiwo, Rainfall Time Series Data”, Global Journal of


Computer Science an

9|Page
www.thesai.org

You might also like