Timeseries forecasting
Timeseries forecasting
net/publication/317886554
CITATION READS
1 2,954
1 author:
Qais Abdulqader
Duhok Polytechnic University
5 PUBLICATIONS 21 CITATIONS
SEE PROFILE
All content following this page was uploaded by Qais Abdulqader on 14 September 2019.
Abstract:
In this paper, the methodology of Box-Jenkins of Autoregressive Integrated Moving Average (ARIMA)
has been used for applying and forecasting the census in Iraq by taking (61) observations of the annually
census from 1950 to 2010. Several adequate models of time series have been built and some of the
performance criteria have been used for the purpose of comparison between models. Results of the
analysis showed that the ARI(2,2) model is adequate to be used to forecast the annually census data of
Iraq. During the period 2011 to 2020, there will be (33.58%) increase in the population, and the
population of Iraq in 2020 would be (41358200) persons.
Keywords: Box-Jenkins, ARIMA Models, Time Series Forecasting, Census.
the electric power. The results of the analysis
T
1. Introduction:
ime series analysis is the process of using showed that The best model was ARIMA(1,0,2)
statistical techniques to model and than ARIMA(1,0,1) model and AR(1) from
explain a time-dependent series of data performance of predict methods.
points. Time series forecasting is the process of Recently, the ARIMA methodology has
using a model to generate predictions (forecasts) been used in population forecasting studies.
for future events based on known past events. (Wan et. al., 2013), have used the ARIMA
Box-Jenkins Forecasting methodology is a methodology for forecasting prison populations
univariate version method and it is a self- using sentencing and arrest data .The results
projecting time series forecasting method. It has from the analysis showed that although
popularized and become widely known by modeling suggests an uptrend in the remand
George E. and Gwilym M. Jenkins in 1970 prisoner population, this should be more than
(George et. al, 2008). offset by a decrease in the sentenced prisoner
Many applications have been done in this population over the next months. (Pang and
area. (Zakria and Mohammad, 2009), have used McElroy, 2014), used ARIMA methodology for
ARIMA models for forecasting the population of forecasting fertility and mortality by
Pakistan. They showed that the estimated model race/ethnicity and gender. Results of the analysis
ARI(1,2) are close to other researcher’s finding are produced using fertility and mortality data
as well as non-government organizations for dating from 1989 to 2009. For total rates, it is
future planning and projects. (Mutar and Ilias, determined that a model without drift produces
2010), have made a comparative forecasting more tenable forecasts in comparison to the
work between ARIMA methodology and neural occasionally implausible results from the model
network method. They showed that ARIMA with drift. (Brajesh and Shekhar, 2015), used
methodology has given more appropriate statistical models to forecast the population of
forecasts than those given by feed forward accidental mortality in India. The results of the
artificial neural network. (Tuama, 2012), used study showed that on validation of models,
ARIMA methodology to forecast numbers of the ARIMA performed better than the damped trend
patients malignant tumors in Anbar province. exponential smoothing (DTES). This will be
The results from the analysis showed that the help for policy maker to control such type of
proper and suitable model is integrated incidence in future.
autoregressive model of order two ARI(2,1). The underlying goal in this paper is to use
(Sarpong, 2013), applied ARIMA models for ARIMA methodology so as find an appropriate
modeling and forecasting maternal mortality. formula when building a model from time series
The results of the study showed that the
data of the population census in Iraq, so that the
ARIMA(1, 0, 2) model is adequate for
forecasting quarterly maternal mortality ratios at residuals are as small as possible and exhibit no
the hospital. (Ghafil, 2013), used the Box- pattern.
Jenkins models for forecasting the production of
258
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
259
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
260
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
squares in the sum, it can be stated that the and to make forecasts and the users of the
Portmanteau test is asymptotically distributed as forecasts will be evaluating the pros and cons of
2
a χ m under the null hypothesis that all m the model as time progresses. A forecasting
autocorrelation coefficients are zero. As for any assignment is not complete when the model has
joint hypothesis test, only one autocorrelation been fitted to the known data. The performance
coefficient needs to be statistically significant of the model can only be properly evaluated
for the test to result in a rejection. after the data for the forecast period have
2.3. Application and Enforment become available (Makridakis et. al., 1998). The
Once a model has been selected carefully and Box - Jenkins methodology can be represented
judiciously and its parameters estimated and summarized through figure1.
appropriately, then it can be used for application
3. The Data
Censuses provide population numbers, household or family size and composition, and information
on sex and age distribution. They often include other demographic, employment, disability, fertility,
migration, education, economic and health-related topics as well. Table2 and also figure2 shows the
variable used in the analysis which is the annually data of census in Iraq (in thousands) and represents
a sample size (61) observations from 1950 to 2010.The first (51) observations were used for
estimation and the last (10) observations were used for forecasting. The source of the data present in
the web page of the United Nations Statistics Division.
261
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
Table (2): The annually data of census in Iraq (in thousands) during the period 1950-2000
Years Population Years Population Years Population
1950 5719 1975 11685 2001 24517
1951 5902 1976 12068 2002 25238
1952 6065 1977 12461 2003 25960
1953 6216 1978 12860 3004 26674
1954 6360 1979 13258 2005 27377
1955 6503 1980 13653 2006 28064
1956 6647 1981 14045 2007 28741
1957 6795 1982 14436 2008 29430
1958 6951 1983 14823 2009 30163
1959 7116 1984 15203 2010 30962
1960 7290 1985 15576
262
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
Figure2 shows upward increasing trend and suggests that the given time series is non-stationary.
Figure3 presents the plots of the autocorrelation function (ACF) and partial autocorrelation function
(PACF) respectively.The values of the ACF are gradually declining from a first - order autocorrelation
coefficient to the end. The computed Portmanteau test of Box-Pierce with seventeen lags takes a value
of 261.468 (p-value = 0.00), which is highly significant, confirming the autocorrelation pattern. The
partial autocorrelation function shows a large peak at lag 1 with a rapid decline thereafter, which is
indicative of a highly persistent autoregressive structure in the series.
Figure (3): ACF and PACF of census of Iraq during the period 1950-2000 from left to right.
Box-Cox transformation gave the values of = 0.0 and its interval was (-0.745, 0.741) which
contains the value zero. This recommended that the log transformation is appropriate choice to make
our series stationary in variance before to take the difference of the series. After applying the log
transformation on the original series and checking again the ACFs and PACFs, we concluded that the
series need to be difference twice so as to be stationary in the mean. Figures4 and 5 represents the
transformed series after 2nd differencing, ACF and PACF after 2nd differencing respectively.
263
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
Figure (5): ACF and PACF of log of census in Iraq after 2nd differencing 1950-2000 from left to right.
After getting stationarity, we proceed to fit an ARMA model to the log of the second difference of
the census of Iraq series. We apply the two measures of accuracy: RMSE and MAE with the two
measures of the goodness of fit of a model AIC and HQC mentioned in theoretical part to select the
appropriate model order. Table2 shows different combinations of ARIMA specifications and the
estimated criteria values.
After estimating the ARI (2, 2) model, we have to check for randomness. Figure6 shows the ACF
and PACF of residuals using ARI (2, 2) on log of census of Iraq.
264
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
Figure (6): ACF and PACF of residuals using ARI(2,2) on log of census of Iraq from left to right.
Through looking at the figure6, we conclude After we checked and selected the best model
that none of the autocorrelations coefficients of which represented an ARI(2,2) of log of the
ACF and PACF are statistically significant, annually census data of Iraq, it’s time to use the
implying that the time series may well be model for application and make forecasts,
completely random (white noise). Also, we did a because the main purpose of modeling a time
test for randomness of residuals using a series is to make forecasts which are then are
Portmanteau test (or Box-Pierce test), which has used directly for making decisions. We validate
been mentioned in the theoretical part using the the forecast by splitting the data in two parts:
equation (6). The value of the test statistics was one part of the data (i;e, the first 51
equal to (8.52774) and the P-value was observations) we used it for modeling and the
(0.860063). Since the P-value for this test is other part of the data (i;e, the last10
greater than or equal to 0.05, we cannot reject observations) is used for forecasting. Table5
the hypothesis that the series is random (white shows the forecasting for the next twenty years
noise) at the 95% or higher confidence level. of the annually census (thousands) of Iraq.
Table (5): Forecast values of the annually census (thousands) of Iraq using ARI(2,2) model
265
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
Looking at the table5 and using the model true values during the period 2001 to 2010 was
obtained ARI (2,2), we forecast the annually (30.08%), and depending on forecasting values
census data of Iraq from 2001 to 2020 and was (31.57%) and the two ratios are close to
compared it to the first ten observed and real each other. Hence, we can say that, ARI(2,2)
values from 2001 to 2010, with the statistical model is adequate to be used to forecast annually
software Statgraphics Centurion XVI. We can census data of Iraq, and during the period 2011
see that in 2001, the predicted value (24513.9) is to 2020, there will be (33.58%) increase in the
very close to the true value (24517) recorded and population, and the population of Iraq in 2020
published by the United Nations Statistics would be (41358200) persons. Figure7 presents
Division. Also, this observed value fall inside the forecasts for the log of the annually census
the confidence interval, and so on for the remain data of Iraq from the period 2001 to 2020 using
values. There is something else to be mentioned, ARI (2,2) model.
the increase of population of Iraq depending on
266
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
Figure (7): Forecasts for the log of the annually census of Iraq from 2001 to 2020 using ARI (2,2)
model
4. Conclusions Brockwell P. J. and R. A. Davis R. A. (2002).
Introduction to time series and forecasting.
In this paper, we have built and used a
(2nd ed.). Springer.
systematic methodology of Box-Jenkins ARIMA
Brooks C. and Tsolacos S. (2010). Real Estate
forecasting for annually census data of Iraq.
Modeling and Forecasting. Cambridge
Indeed, we concluded that
university press.
1- The suitable model for forecasting the census
George E., Jenkins G. M. and Reinsel G. C.
of Iraq is ARI(2,2).
(2008). Time series analysis: Forecasting and
2- The ratio of increase of population of Iraq of
control, (4th ed.). John Wiley & Sons, INC.
true values and forecasted values during the
Ghafil A. A. (2013). Using Box-Jenkins
period 2001 to 2010 was close to each other.
(ARIMA) models for forecasting the
3- During the period 2011 to 2020, there will be
production of electric power. Journal of
(33.58%) increase in the population, and the
Karbala University, 11, 196-207.
population of Iraq in 2020 would be (41358200)
Makridakis S., Wheelwright S. C. and R. J.
persons.
Hyndman R. J. (1998). Forecasting: Methods
5. Recommendations
and applications. (3rd ed.). John Wiley &
1- We may use this model for forecasting the
Sons, INC.
census of Iraq for future.
Mutar D. R. and I. I. Ilias I. I. (2010). Analysis
2- We recommend comparing the results of
and modeling time series of water flow into
ARIMA methodology for forecasting the census
Mosul city: A comparative study. Iraqi
of Iraq with other methods like wavelet
Journal of Statistical Sciences.10, 1-32.
transforms or neural network method to see the
Ngo T. H. D. and Bros W. (2013). The Box-
differences.
Jenkins Methodology for Time Series
References:
Models. Journal of statistics and data
Ayalew S., Babu M. C. and Raw L.K. (2012). analysis, 454, 1-13.
Comparison of New Approach Criteria for Pang O. and McElroy T. (2014). Forecasting
Estimating the Order of Autoregressive Fertility and Mortality by Race/Ethnicity and
Process. Journal of Mathematics, 1, 10-20. Gender. Center for Statistical Research &
Box G.E.P. and Cox D.R. (1982). An analysis of Methodology, 3, 1-55.
transformations, revisited, rebutted. Journal Polhemus N. W. (2011).Time Series Analysis
of American Statistical Association, 77, 209- Using Statgraphics Centurion. StatPoint
210. Technologies, INC.
Brajesh and Shekhar C. (2015). Accidental Sarpong S. A. (2013). Modeling and Forecasting
mortality in India: Statistical models for Maternal Mortality; an Application of
forecasting. International Journal of ARIMA Models. International Journal of
Humanities and Social Science Invention, 4, Applied Science and Technology, 3, 19-28.
35-45.
267
Journal of University of Zakho, Vol. 4(A), No.2, Pp 258-268, 2016 ISSN: 2410-7549
Shumway R. H. and Stoffer D. S. (2011). Time populations using sentencing and arrest data.
series analysis and its applications. (3rd ed.). Crime and Justice Bulletin, 174, 1-12.
Springer. Yaffee R. and McGee M. (1999). Introduction to
Tsay R. S. (2002). Analysis of Financial Time time series analysis and forecasting: With
Series. John Wiley & Sons, INC. applications of SAS and SPSS. Academic
Tuama S. A.(2012). Using analysis of time Press.
series to forecast numbers of the patients Zakria M. and Muhammad F. (2009).
Malignant Tumors in Anbar province. Al- Forecasting the population of Pakistan using
Anbar University Journal of Economics and ARIMA models. Pakistan Journal of
Administration Sciences, 4, 371-393. Agricultural Science, 46, 214-223.
Wan W-Y, Moffatt S., Xie Z., Corben S. and
Weatherburn D. (2013). Forecasting prison
@@@ARIMA@bàbäŠbØ@bäbåï÷ŠbØ@l@ômbØ@æŽî−Œ@bä‹Ùïåïj“Žïq
@@ŽôÔaÈ@æŽîín“ïäa†@a‹ŽïàˆŠó@æŽïîbma†@Šó@ß@ç‹ÙŽïuójŽïu@ßó †
@@ZŽôåïÜíÙŽïÜ@bïmŠíØ
@l@ ómbè@ ARIMA ŠbØìaìóm@ æŽîìó÷@ ÛíÅÝÅÜ@ æŽîŠójÄbä@ ì@ ôØóîŽí‚@ bäìíjÑï“ä@ bî@ @ åÙåïu@ MØíi@ bàbäŠbØ@ a†@ ŽôåïÜíØóÄ@ ŽôÄ@ †@@@
@æŽîín“ïäa†@a‹ŽïàˆŠó@ˆ@bäa†@H61I@båm‹ Šòì@l@õˆ@ìó÷@ŽôÔaÈ@æŽîín“ïäa†@a‹ŽïàˆŠó@l@ç‹Ø@ôåïj“Žïqì@ç‹ÙŽïuójŽïu@bàòŠóà@l@çbåï÷ŠbØ
@Žôäbåï÷@ê¡@æŽîŠóÅïq@Ûò‡åè@ì@ç‹ØbÄb÷@óåmbè@Šíu@ìaŠíu@ì@õb−í @æŽîìó÷@ômbØ@æŽî−Œ@æŽïÝŽî†íà@Ûò‡åè@N@a†@1950-2010@Žõìbà@†@ŽôÔaÈ
@@@@@Na†@þŽî†íà@båîói@Äbä†@Žô䋨†ŠìaŠói@bàòŠóà@l@çbåï÷ŠbØ@l@óåmbè
@æŽïî@óäýb@a‹ŽïàˆŠó@b䋨@ôåïj“Žïqì@Žôäbåï÷ŠbØ@l@íi@ˆ@óîb−í @ì@óîóè@a‹Žïm@bî@ARI(2,2)@þŽî†íà@íØ@‹ØŠbî†@Žô䋨óÄíÝ’@æŽïàb−ó÷@@@
@ŽôÔaÈ@æŽîín“ïäa†@aŠbàˆ@ì@(33.58%)@a‰ŽîŠ@l@a‡äbóØ@aŠbàˆ@†@oi@a‡îóq@çìíiò‡ŽîŒ@Žõ†@a†@2011-2020@Žõìbà@†@NŽôÔaÈ@ôäaín“ïäa†
@@N‘óØ@(41358200)@oi@Žõ†@Žõ@(2020)@ýb@ß
@@
@@
@@
@@@ARIMA@óïvéåà@ãa‡ƒnbi@óïåàÜa@ÞþÜbi@üjånÜa@
@@×a‹ÉÜa@À@ðäbÙÜa@†a‡ÉnÜa@pbäbïi@ôÝÈ@ÖïjnÜa@Êà
@@Z ó–þ©a
@ÖïjnÜa@Þua@æà@ ARIMA@óïÝàbÙnÜa@óØ‹zn¾a@Ãbìÿaì@ðma‰Üa@Ša‡®flÜ@åÙåïu@ M@ Øíi@óïvéåà@ãa‡ƒna@@szjÜa@a‰è@À@@@
@öbåi@@N1950 – 2010@óïåàÜa@òÑÜa@ßþ‚@ñíåÜa@ðäbÙÜa@†a‡ÉnÜa@æà@ò‡èb“à@(61)@‰‚bi@Ú܈ì@×a‹ÉÜa@À@ðäbÙÜa@†a‡ÉnÜbi@üjånÜaì
@@N@xˆbáåÜa@μi@óäŠbÕ¾a@‹ÍÜ@öa†ÿa@ïîbÕà@Éi@ãa‡ƒna@ì@óïåàÜa@ÞþÜa@æà@óÑÝn¬ì@óïÐbØ@xˆb¹
@ßþ‚N×a‹ÉÜa@ À@ ñíåÜa@ ðäbÙÜa@ †a‡ÉnÜbi@ üjånÜa@ À@ êàa‡ƒnfi@ â÷þàì@ ÀbØ@ íè@ ARI(2,2)@ xˆíáåÜa@ çbi@ ÞïÝznÜa@ w÷bnä@ p‹éÄa
@(2020)@ ãbÈ@ À@ ×a‹ÉÜa@ çbÙ@ †‡Èì@ (33.58%)@ b芇Ô@ ójåi@ çbÙÜa@ †‡È@ À@ ò†bîŒ@ Ûbåè@ çíÙï@ 2011-2020@ óïåàÜa@ òÑÜa
@@@N˜ƒ’@(41358200)@|j—ï
268