Module 3.1 Time Series Forecasting ARIMA Model
Module 3.1 Time Series Forecasting ARIMA Model
Supervised regression–based machine learning is a predictive form of modeling in which the goal is to model the
relationship between a target and the predictor variable(s) in order to estimate a continuous set of possible outcomes.
The first part of Module 3 will cover time series models. In its broadest form, time series analysis is about inferring
what has happened to a series of data points in the past and attempting to predict what will happen to it in the future.
There have been a lot of comparisons and debates in academia and the industry regarding the differences between
supervised regression and time series models. Most time series models are parametric (i.e., a known function is
assumed to represent the data), while the majority of supervised regression models are nonparametric.
The biggest difference between statistical modelling (SM) and machine learning (ML) is their purposes. While SM is
used for finding and explaining the relationships between variables, ML models are built for providing accurate
predictions without explicit programming.
Statistical models explicitly specify a probabilistic model or function for the data and identify variables that are usually
interpretable and of special interest, such as effects of predictor variables. In addition to identifying relationships
between variables, statistical models establish both the scale and significance of the relationship. For example, consider
an exponential smoothing model: Y’(t) = αY(t-1) + [(1-α)(Y’(t-1)] + ε(t), with Y(t) the actual value and Y’(t) the predicted
value. We just need to fit the right parameters (in this case α) to that function.
Meanwhile, in ML, we don’t make any assumptions about the shape of the function that represents our data. We rely
on the universal approximation properties of our algorithm to find the best fit for our data.
More interpretable as compared to machine learning Less interpretable and more complex
It is not best suited to a large amount of data. It can range from small to large amounts of data sets
An Autoregressive Integrated Moving Average (ARIMA) model is one of the most popular and widely used statistical
model for time series forecasting. It is a class of statistical algorithms that captures the standard temporal dependencies
that is unique to time series data.
Before we introduce ARIMA models, we must first recall the concept of stationarity and autocorrelation. ARIMA models
are, in theory, the most general class of models for forecasting a time series which can be made to be stationary by
differencing (if necessary). A stationary series has no trend, its variations around its mean have a constant amplitude,
and it wiggles in a consistent fashion (i.e., its short-term random time patterns always look the same in a statistical
sense).
The latter condition means that its autocorrelations (correlations with its own prior deviations from the mean) remain
constant over time, or equivalently, that its power spectrum remains constant over time. A time series of this form
can be viewed as a combination of signal and noise, and the signal (if one is apparent) could be a pattern of fast or
slow mean reversion, or sinusoidal oscillation, or rapid alternation in sign. It could also have a seasonal component.
An ARIMA model can be viewed as a “filter” that tries to separate the signal from the noise, and the signal is then
extrapolated into the future to obtain forecasts.
Before we dive into the parameter tuning of an ARIMA model, let us first discuss what is an Autoregressive (AR) model
and its difference from a Moving Average (MA) model.
𝑦𝑡 = 𝜇 + 𝜑1 𝑦𝑡 −1 + 𝜑2 𝑦𝑡 −2 + ⋯ + 𝜑𝑝 𝑦𝑡 −𝑝 + 𝜀𝑡 ,
where 𝜇 is the intercept or constant, 𝜑𝑝 is the AR coefficient at lag p, and 𝜀𝑡 is white noise. This is like a multiple
regression but with lagged values of 𝑦𝑡 as predictors. We refer to this as an AR(p) model, an autoregressive model at
lag p.
𝑦𝑡 = 𝜇 + 𝜀𝑡 + 𝜃1 𝜀𝑡−1 + 𝜃2 𝜀𝑡 −2 + ⋯ + 𝜃𝑞 𝜀𝑡−𝑞 ,
where 𝜇 is the intercept or constant, 𝜀𝑡 is white noise, 𝜃𝑞 is MA coefficient at lag q, and 𝜀𝑡−𝑞 is the forecast error that
was made at period t - q. We refer to this as an MA(q) model, a moving average model at lag q. A moving average
process states that the current or predicted value is linearly dependent on the current and past error terms. Again, the
error terms are assumed to be mutually independent and normally distributed, just like white noise.
where 𝑦′𝑡 is the differenced series (it may have been differenced more than once or not at all). The “predictors” on
the right-hand side include both lagged values of 𝒚𝒕 and lagged errors. We call this an ARIMA(p,d,q) model where:
▪ If the PACF displays a sharp cutoff while the ACF is exponentially decaying or sinusoidal, we say that the
stationarized series displays an "AR signature," meaning that the autocorrelation pattern can be explained
more easily by adding AR terms than by adding MA terms. The lag at which the PACF cuts off is the indicated
number of AR terms. An AR(1) model has a single spike in the PACF and an ACF with a pattern 𝜌𝑘 = 𝜑1𝑘 .
An AR(2) model has two spikes in the PACF and a sinusoidal ACF that converges to 0.
▪ If the ACF of the differenced series displays a sharp cutoff while the PACF is exponentially decaying or
sinusoidal, the series displays an “MA signature,” meaning that the autocorrelation pattern can be explained
more easily by adding MA terms than by adding AR terms. The lag at which the ACF cuts off is the indicated
number of MA terms. Below is a sample of an MA(1) model.
▪ In most cases, the best model turns out a model that uses either only AR terms or only MA terms, although in
some cases a "mixed" model with both AR and MA terms may provide the best fit to the data.
▪ It is possible for an AR term and an MA term to cancel each other's effects. So, if a mixed ARMA model seems
to fit the data, try a model with one fewer AR term and one fewer MA term – particularly if the parameter
estimates in the original model require more than 10 iterations to converge. BEWARE OF USING MULTIPLE AR
TERMS AND MULTIPLE MA TERMS IN THE SAME MODEL.
▪ ARMA models (including both AR and MA terms) have ACFs and PACFs that both tail off to 0. These are the
trickiest because the order will not be particularly obvious. Basically, you just have to guesstimate that one or
two terms of each type may be needed and then see what happens when you estimate the model. Below is a
sample of an ARIMA (1,1,1) model.
Notes on Seasonal ARIMA Models
Seasonality in a time series is a regular pattern of changes that repeats over S time periods, where S defines the number
of time periods until the pattern repeats again.
For example, there is seasonality in monthly data for which high values tend always to occur in particular months and
low values tend always to occur in other particular months. In this case, S = 12 (months per year) is the span of the
periodic seasonal behavior. For quarterly data, S = 4 time periods per year.
A seasonal ARIMA (SARIMA) model is formed by including additional seasonal terms in the ARIMA models we have
seen so far. It is written as follows:
where m = the seasonal period (e.g., number of observations per year). We use uppercase notation for the seasonal
parts of the model, and lowercase notation for the non-seasonal parts of the model. Here, P = number of seasonal
autoregressive (SAR) terms, D = number of seasonal differences, Q = number of seasonal moving average (SMA) terms.
In a SARIMA model, seasonal AR and MA terms predict 𝑦𝑡 using data values and errors at times with lags that are
multiples of S (the span of the seasonality).
▪ With monthly data (and S = 12), a seasonal first order autoregressive model would use 𝑦𝑡 −12 to predict 𝑦𝑡 . For
instance, if we were selling cooling fans, we might predict this March's sales using last March's sales. (This
relationship of predicting using last year's data would hold for any month of the year.)
▪ A seasonal second order autoregressive model would use 𝑦𝑡−12 and 𝑦𝑡−24 to predict 𝑦𝑡 . Here we would predict
this March’s values from the past two years’ March values.
▪ A seasonal first order MA(1) model (with S = 12) would use 𝜀𝑡−12 as a predictor. A seasonal second order MA(2)
model would use 𝜀𝑡−12 and 𝜀𝑡−24 as predictors.
In identifying a seasonal model, the first step is to determine whether a seasonal difference is needed. If the series has
a strong and consistent seasonal pattern, then consider using an order of seasonal differencing – but never use more
than one order of seasonal differencing or more than 2 orders of total differencing (non -seasonal and seasonal
combined).
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF. For example, an
ARIMA(0,0,0)(0,0,1)[12] model will show:
▪ a spike at lag 12 in the ACF but no other significant spikes;
▪ exponential decay in the seasonal lags of the PACF (i.e., at lags 12, 24, 36, …).
In considering the appropriate seasonal orders for a seasonal ARIMA model, restrict attention to the seasonal lags. If
the autocorrelation at the seasonal period is positive, consider adding an SAR term to the model. If the autocorrelation
at the seasonal period is negative, consider adding an SMA term to the model. Try to avoid mixing SAR and SMA terms
in the same model, and avoid using more than one of either kind.
Usually an SAR(1) or SMA(1) term is sufficient. You will rarely encounter a genuine SAR(2) or SMA(2) process, and even
more rarely have enough data to estimate 2 or more seasonal coefficients without the estimation algorithm getting
into a "feedback loop."
ARIMAX is an extension of the traditional ARIMA model that allows for the inclusion of additional variables, known as
exogenous variables, which may have an effect on the time series being forecasted. These exogenous variables can be
any type of data, but for our purposes, we will only consider time-varying measurements: economic indicators such as
inflation rate or price indices, weather data, inventory turnover, etc.
By incorporating these external factors, ARIMAX models can provide more accurate and comprehensive predictions.
Additionally, ARIMAX models can also be used for causal analysis, where the relationship between the exogenous
variables and the time series data can be examined. Overall, ARIMAX models offer a powerful tool for forecasting and
analyzing time series data in a multivariate context.
We can see how this ARIMAX model compares with the standard ARIMA. For simplicity let’s first consider an
ARIMA(1,1,1):
The new term consists of the ARIMAX coefficient 𝛽 fitted based on the model and data, and the exogenous variable 𝑋.
It is important to remark that this exogenous variable must be available for every time period. Make sure to include
exogenous variables with a strong correlation with the variable of interest/time series data.
Residual Diagnostics
A good forecasting method will yield model residuals with the following properties:
1. The residuals are uncorrelated. If there are correlations between the residuals, then there is information left
in the residuals which should be used in computing forecasts.
2. The residuals have zero mean. If they have a mean other than zero, then the forecasts are biased.
3. The residuals have constant variance. This is known as “homoscedasticity”.
4. The residuals are normally distributed.
The first two properties can be ensured by performing an ADF test for stationarity and Box-Ljung test for serial
autocorrelation on the model residuals. Recall that:
▪ ADF test: p-value < .05 to reject H0 that a series of residuals contains a unit root and is non-stationary
▪ Box-Ljung test: p-value > .05 to fail to reject H0 that residuals are independently distributed
The last two properties make the calculation of prediction intervals easier.
▪ ARCH test: p-value > .05 to fail to reject H0 that a series of residuals exhibits no conditional heteroscedasticity
(ARCH effects)
▪ Jarque-Bera test: p-value > .05 to fail to reject H0 that the residuals follow a normal distribution
The first prediction interval is easy to calculate. If 𝜎̂ is the standard deviation of the residuals, then a 95% prediction
interval is given by 𝑦̂𝑇+1 | 𝑇 ± 1.96𝜎̂. This result is true for all ARIMA models regardless of their parameters and orders.
More general results, and other special cases of multi-step prediction intervals for an ARIMA(p,d,q) model, are given
in more advanced textbooks such Brockwell & Davis (2016)1.
The prediction intervals for ARIMA models are based on assumptions that the residuals are uncorrelated and normally
distributed. If either of these assumptions does not hold, then the prediction intervals may be incorrect. For this
reason, always plot the ACF and histogram of the residuals to check the assumptions before producing prediction
intervals and perform the Jarque-Bera test on the model residuals.
If the residuals are uncorrelated but not normally distributed, then bootstrapped intervals can be obtained instead. In
R, this is easily achieved by simply adding bootstrap=TRUE in the forecast() function.
In general, prediction intervals from ARIMA models increase as the forecast horizon increases. For stationary models
(i.e., with d = 0) they will converge, so that prediction intervals for long horizons are all essentially the same. For d ≥ 1,
the prediction intervals will continue to grow into the future.
We will be using ARIMA models to forecast local inflation. Our data consists of monthly Philippine Consumer Price
Index (CPI) and the Rice Price Index from January 2005 to August 2023. Our target variable is the CPI data.
CODE BLOCK #1: Loading of libraries, data importation, train-test split, data integrity check
#load libraries
library(forecast)
library(FinTS)
library(tseries)
library(urca)
library(tidyverse)
#data importation
data = read.csv(file.choose(), header = T)
head(data)
tail(data)
1
Brockwell, P. J., & Davis, R. A. (2016). Introduction to time series and forecasting (3rd ed). Springer.
#train-test split
train
test
Instead of the standard 70-30 or 80-20 train-test split, I chose to split the data 95-5 so that I will be able to capture
how the time series behaved during the pandemic and “back-to-normal” period during late 2021 to early 2022. This
assumption in the data is important as we are forecasting a macroeconomic variable.
CODE BLOCK #2: Time series visuals
par(mfrow = c(4,1))
plot(as.ts(decompose_train$trend))
plot(as.ts(decompose_train$seasonal))
plot(as.ts(decompose_train$random))
plot(as.ts(decompose_train))
The time series has an observable upward trend, while it’s difficult to detect any sort of seasonal pattern. We’ll have
to check the ACF and PACF plots to inspect it better.
The ACF plot shows significant lags up to lag 3, with a spike in lag 1 and significant drop in lags 2 and 3. The PACF plot
shows a significant spike in lag 1 and then dropping off immediately to lag 2. The PACF plot has a more sinusoidal
pattern.
The time series displays a stronger AR signature but it’s not strong enough to merit a pure AR(p) model. We will have
to work with a mixed ARMA model, but we need to be careful in not overfitting an ARMA model by not using too high
of an order for both AR(p) and MA(q) terms.
mod1 = auto.arima(ts_train)
summary(mod1)
jarque.bera.test(mod1$residuals)
ArchTest(mod1$residuals)
adf.test(mod1$residuals)
Box.test(mod1$residuals, type = "Ljung-Box")
auto.arima() is a is a statistical algorithm used for time series forecasting. It automatically determines the optimal
parameters for an ARIMA model, such as the order of differencing, autoregressive (AR) terms, and moving average
(MA) terms. It searches through different combinations of these parameters to find the best fit for the given time series
data. This automated process saves time and effort, making it easier for users to generate accurate forecasts without
requiring extensive knowledge of time series analysis.
The auto.arima() process fitted an ARIMA(1,1,0)(0,0,1)[12] with drift model. As expected, it fitted a differencing
transformation for the time series with d = 1. The AR(1) model is something we expected from the ACF and PACF plots.
The algorithm also saw an MA signature, but instead included it in the seasonal parameters. The model diagnostic s
also shows the model residuals passed all the regression assumptions.
jarque.bera.test(mod2$residuals)
ArchTest(mod2$residuals)
adf.test(mod2$residuals)
Box.test(mod2$residuals, type = "Ljung-Box")
jarque.bera.test(mod3$residuals)
ArchTest(mod3$residuals)
adf.test(mod3$residuals)
Box.test(mod3$residuals, type = "Ljung-Box")
#model 4 -> SARIMA with exogenous variable
jarque.bera.test(mod4$residuals)
ArchTest(mod4$residuals)
adf.test(mod4$residuals)
Box.test(mod4$residuals, type = "Ljung-Box")
Model #2 shows how to fit an auto.arima() model without seasonality. It fitted a mixed ARMA model (with a d = 1
differencing as expected) with p = 2 AR terms and q = 2 MA terms. A high order of AR and MA at the same time poses
stability and accuracy concerns at our forecasts even if the model diagnostics pass all assumptions. The AR(2) fit is due
to significant lag 2 in our ACF plot. We will have to check our actual forecasts against our test set to see the accuracy
of the mixed ARMA (2,2) fit.
Model #3 shows how we can manually tune the (p,d,q) parameters for the ARIMA model. Again, we have to manually
fit a d = 1 parameter given our knowledge from the earlier ADF test that we need to difference the time series to make
it stationary. I fitted here an AR(1) parameter given the strong AR signature of the time series, especially at lag 1 as
seen in the ACF plot. The MA term was trickier to tune. It was honestly fitted with guesswork here. A (1,1,1) and (1,1,2)
fit both produced residuals that violated more than one assumption. The (1,1,3) fit still violated the autocorrelation
assumption. I stopped here as with any linear regression models, a less parsimonious fit (i.e., AR term or MA term that
is greater than 3) will be penalized by the OLS estimation.
Model #4 shows a SARIMAX fit. The exogenous variable here is the local price index for rice. Rice, being a staple
commodity locally, is one of the most heavily-weighted items in the CPI basket. Like in Model #3, we manually fitted
the parameters by combining the (p,d,q) parameters from Model #3 with the (P,D,Q) fit from Model #1. The model
residuals pass all regression assumptions, but we have to check the forecasts to see its accuracy.
#model 1 forecasts
mod1_fcst = forecast(mod1, h = 12)
mod1_fcst
#model 2 forecasts
mod2_fcst = forecast(mod2, h = 12)
mod2_fcst
#model 3 forecasts
mod3_fcst = forecast(mod3, h = 12)
mod3_fcst
#model 4 forecasts
The h-steps argument means the number of time steps ahead forecast you want to see. Since, our test set is equal to
12 months, we set h = 12.
We did not use the forecasts from Model #3 because the residual diagnostics failed the autocorrelation assumption.
Here, we can see that the best model that we fitted is Model #1, the auto.arima() fit.
It’s not always the case that the auto.arima() fit will produce the best forecast. Here, we hoped that the extraneous
variable will help stabilize the forecast. However, it acted as a dampener to Model #4 forecasts, hence the understated
forecasted values.
You’ll also notice that the ARIMA models we fitted barely reached the levels in the test set. A couple of things
contributed to this.
First, the ARIMA model, like most regression models that are based on a linear function, tend to underestimate
forecasts in the long-term because the stationarity assumption forces the forecast to be mean-reverting. Second, the
ARIMA model has a recency bias (we set AR(p) lag = 1) where the latest data in our train set was the pandemic (where
prices significantly dropped) and the a few months of the economy reopening. Our test set, shows a significant increase
from Q4 2022 that the ARIMA models were not able to chase. You’ll notice in the table below that there was a
significant jump month-on-month in October 2022 that the ARIMA models were not able to consider.
So what are the next steps? We can try to re-fit our ARIMA models (like what was stated in the notes on non-seasonal
ARIMA, a mixed ARMA model parameter tuning takes a lot guesstimate work). You can try to transform the actual data
itself and winsorize the 2020 and 2021 data as if to force the data to follow up the growth rate pre-pandemic. This
means we will model the train set as if the pandemic did not happen.
We can also fit other models fit for time series forecasting. For further reading, you can check out Vector
Autoregression and the ARCH family of models which is better for modelling volatility. The other alternative is to, of
course, use other Machine Learning models that do not depend on Gaussian processes like recurrent neural network
models.
CODE BLOCK #7: Forecasting 12-month ahead using the best ARIMA model
# 12-month ahead forecast on the entire dataset using best ARIMA model
jarque.bera.test(mod1.series$residuals)
ArchTest(mod1.series$residuals)
adf.test(mod1.series$residuals)
Box.test(mod1.series$residuals, type = "Ljung-Box")
#series forecasts
series_fcst = forecast(mod1.series, h = 12)
series_fcst
Assuming that we will use ARIMA for our time series model, we would eventually want to forecast the 12-month ahead
forecasts for inflation using the entire dataset and the parameters of the best ARIMA model.
In our case, we will use Model #1. Recall that Model #1 is the auto.arima() fit and returned a SARIMA(1,1,0)(0,0 ,1)
fit. In the code block above, I did not run an auto.arima() fit but instead fitted the exact parameters using the entire
dataset this time.