Autocorrelation is a fundamental concept in time series analysis. Autocorrelation is a statistical concept that assesses the degree of correlation between the values of variable at different time points. The article aims to discuss the fundamentals and working of Autocorrelation.
What is Autocorrelation?
Autocorrelation measures the degree of similarity between a given time series and the lagged version of that time series over successive time periods. It is similar to calculating the correlation between two different variables except in Autocorrelation we calculate the correlation between two different versions Xt and Xt-k of the same time series.
Calculation of Autocorrelation
Mathematically, autocorrelation coefficient is denoted by the symbol ρ (rho) and is expressed as ρ(k), where 'k' represents the time lag or the number of intervals between the observations. The autocorrelation coefficient is computed using Pearson correlation or covariance.
For a time series dataset, the autocorrelation at lag 'k' (ρ(k)) is determined by comparing the values of the variable at time 't' with the values at time 't-k'.
\rho(k) = \frac{Cov(X_t, X_{t-k})}{σ(X_t) \cdot σ(X_{t-k})}
Here,
- Cov is the covariance
- \sigma
is the standard deviation
- Xt represents the variable at time 't'
Interpretation of Autocorrelation
- A positive autocorrelation (ρ > 0) indicates a tendency for values at one time point to be positively correlated with values at a subsequent time point. A high autocorrelation at a specific lag suggests a strong linear relationship between the variable's current values and its past values at that lag.
- A negative autocorrelation (ρ < 0) suggests an inverse relationship between values at different time intervals. A low or zero autocorrelation indicates a lack of linear dependence between the variable's current and past values at that lag.
Use of Autocorrelation
- Autocorrelation detects repeating patterns and trends in time series data. Positive autocorrelation at specific lags may indicate the presence of seasonality.
- Autocorrelation guides the determination of order of ARIMA and MA models by providing insights into the number of lag terms to include.
- Autocorrelation helps to check whether a time series is stationary or exhibits trends and non-stationary behavior.
- Sudden spikes or drops in autocorrelation at certain lags may indicate the presence of anomalies and outliers.
What is Partial Autocorrelation?
In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It is different from the autocorrelation function, which does not control other lags.
Partial correlation quantifies the relationship between a specific observation and its lagged values. This helps us to examine the direct influence of past time point on the current time point, excluding the indirect influence through the other lagged values. It seeks to determine the unique correlation between a specific time point and another time point, accounting for the influence of the time points in between.
PACF(T_i, k) = \frac{[Cov(T_i|T_{i-1}, T_{i-2}...T_{i-k+1}], [T_{i-k}|T_{i-1}, T_{i-2}...T_{i-k+1}]}{\sigma_{[T_i|T_{i-1}, T_{i-2}...T_{i-k+1}]} \cdot \sigma_{[T_{i-k}|T_{i-k}, T_{i-2}...T_{i-k+1}]}}
Here,
- T_i| T_{i-1}, T_{i-2}...T_{i-k+1}
is the time series of residuals obtained from fitting multivariate linear model to T_{i-1}, T_{i-2}...T_{i-k+1}
for predicting T_i
.
- T_{i-k}|T_{i-1}, T_{i-2}…T_{i-k+1}
is the time series of the residuals obtained from fitting a multivariate linear model to T_{i-1}, T_{i-2}…T_{i-k+1}
for predicting T_{i-k}
.
Testing For Autocorrelation - Durbin-Watson Test
Durbin Watson test is a statistical test use to detect the presence of autocorrelation in the residuals of a regression analysis. The value of DW statistic always ranges between 0 and 4.
In stock market, positive autocorrelation (when DW<2) in stock prices suggests that the price movements have a persistent trend. Positive autocorrelation indicates that the variable increased or decreased on a previous day, there is a there is a tendency for it to follow the same direction on the current day. For example, if the stock fell yesterday, there is a higher likelihood it will fall today. Whereas the negative autocorrelation (when DW>2) indicates that if a variable increased or decreased on a previous day, there is a tendency for it to move in the opposite direction on the current day. For example, if the stock fell yesterday, there is a greater likelihood it will rise today.
Assumptions for the Durbin-Watson Test:
- The errors are normally distributed, and the mean is 0.
- The errors are stationary.
Calculation of DW Statistics
Where et is the residual of error from the Ordinary Least Squares (OLS) method.
The null hypothesis and alternate hypothesis for the Durbin-Watson Test are:
- H0: No first-order autocorrelation in the residuals ( ρ=0)
- HA: Autocorrelation is present.
Formula of DW Statistics
d = \frac{\sum_{t=2}^{T}(e_t - e_{t-1})^2}{\sum_{t=1}^{T}e_{t}^{2}}
Here,
- et is the residual at time t
- T is the number of observations.
Interpretation of DW Statistics
- If the value of DW statistic is 2.0, it suggests that there is no autocorrelation detected in the sample.
- If the value is less than 2, it suggests that there is a positive autocorrelation.
- If the value is between 2 and 4, it suggests that there is a negative autocorrelation.
Decision Rule
- If the Durbin-Watson test statistic is significantly different from 2, it suggests the presence of autocorrelation.
- The decision to reject the null hypothesis depends on the critical values provided in statistical tables for different significance levels.
Need For Autocorrelation in Time Series
Autocorrelation is important in time series as:
- Autocorrelation helps reveal repeating patterns or trends within a time series. By analyzing how a variable correlates with its past values at different lags, analysts can identify the presence of cyclic or seasonal patterns in the data. For example, in economic data, autocorrelation may reveal whether certain economic indicators exhibit regular patterns over specific time intervals, such as monthly or quarterly cycles.
- Financial analysts and traders often use autocorrelation to analyze historical price movements in financial markets. By identifying autocorrelation patterns in past price changes, they may attempt to predict future price movements. For instance, if there is a positive autocorrelation at a specific lag, indicating a trend in price movements, traders might use this information to inform their predictions and trading strategies.
- The Autocorrelation Function (ACF) is a crucial tool for modeling time series data. ACF helps identify which lags have significant correlations with the current observation. In time series modeling, understanding the autocorrelation structure is essential for selecting appropriate models. For instance, if there is a significant autocorrelation at a particular lag, it may suggest the presence of an autoregressive (AR) component in the model, influencing the current value based on past values. The ACF plot allows analysts to observe the decay of autocorrelation over lags, guiding the choice of lag values to include in autoregressive models.
Autocorrelation Vs Correlation
- Autocorrelation refers to the correlation between a variable and its past values at different lags in a time series. It focuses on understanding the temporal patterns within a single variable. Correlation representations the statistical association between two distinct variables. It focuses on accessing the strength and direction of the relationship between separate variables.
- Autocorrelation measures metrics as ACF and PACF, which quantify the correlation between a variable and its lagged values. Correlation measures using coefficients like Pearson correlation coefficient for linear relationships or Spearman rank correlation for non-linear relationships, providing a single value ranging from -1 to 1.
Difference Between Autocorrelation and Multicollinearity
Feature | Autocorrelation | Multicollinearity |
---|
Definition | Correlation between a variable and its lagged values | Correlation between independent variables in a model |
---|
Focus | Relationship within a single variable over time | Relationship among multiple independent variables |
---|
Purpose | Identifying temporal patterns in time series data | Detecting interdependence among predictor variables |
---|
Nature of Relationship | Examines correlation between a variable and its past values | Investigates correlation between independent variables |
---|
Impact on the model | Can lead to biased parameter estimates in time series models | Can lead to inflated standard errors and difficulty in isolating individual variable effects |
---|
Statistical Test | Ljung-Box test, Durbin-Watson statistic | Variance Inflation Factor (VIF), correlation matrix, condition indices |
---|
How to calculate Autocorrelation in Python?
This section demonstrates how to calculate the autocorrelation in python along with the interpretation of the graphs. We will be using google stock price dataset.
Importing Libraries and Dataset
We have used Pandas, NumPy, Matplotlib, statsmodel, linear regression model and tsaplots.
Python
# Importing necessary dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson
from statsmodels.regression.linear_model import OLS
from statsmodels.graphics.tsaplots import plot_acf
goog_stock_Data = pd.read_csv('GOOG.csv', header=0, index_col=0)
goog_stock_Data['Adj Close'].plot()
plt.show()
Output:
.png)
Here, we have plotted the adjusted close price of the Google stock.
Plotting Autocorrelation Function
Python
# Plot the autocorrelation for stock price data with 0.05 significance level
plot_acf(goog_stock_Data['Adj Close'], alpha =0.05)
plt.show()
Output:
.png)
The graph plotted above represent autocorrelation at different lags in the time series. In the ACF plot, the x-axis typically represents the lag or time gap between observations, while the y-axis represents the autocorrelation coefficients. Here, we can see that there is some autocorrelation for significance level 0.05. The peak above the horizontal axis indicates positive autocorrelation, suggesting repeating pattern at the corresponding lag.
The Autocorrelation Function plot represents the autocorrelation coefficients for a time series dataset at different lag values.
Performing Durbin-Watson Test
Python
#Code for Durbin Watson test
df = pd.DataFrame(goog_stock_Data,columns=['Date','Adj Close'])
X =np.arange(len(df[['Adj Close']]))
Y = np.asarray(df[['Adj Close']])
X = sm.add_constant(X)
# Fit the ordinary least square method.
ols_res = OLS(Y,X).fit()
# apply durbin watson statistic on the ols residual
durbin_watson(ols_res.resid)
Output:
0.13568583561262496
The DW statistics value is 0.13 falls in the range close to 0, indicating strong positive autocorrelation.
How to Handle Autocorrelation?
To handle autocorrelation in a model,
- For positive serial correlation
- Include lagged values of the dependent variable or relevant independent variables in the model. This helps capture the autocorrelation patterns in the data.
- For example, if dealing with time series data, consider using lagged values in an autoregressive (AR) model.
- For negative serial correlation
- Ensure that differencing (if applied) is not excessive. Over-differencing can introduce negative autocorrelation.
- If differencing is used to achieve stationarity, consider adjusting the differencing order or exploring alternative methods like seasonal differencing.
Also Check:
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice