How to Check if Time Series Data is Stationary with Python?
Last Updated :
19 Jan, 2022
Time series data are generally characterized by their temporal nature. This temporal nature adds a trend or seasonality to the data that makes it compatible for time series analysis and forecasting. Time-series data is said to be stationary if it doesn’t change with time or if they don’t have a temporal structure. So, it is highly necessary to check if the data is stationary. In time series forecasting, we cannot derive valuable insights from data if it is stationary.
Example plot of stationary data:

Types of stationarity:
When it comes to identifying if the data is stationary, it means identifying the fine-grained notions of stationarity in the data. The types of stationarity observed in time series data include
- Trend Stationary – A time series that does not show a trend.
- Seasonal Stationary – A time series that does not show seasonal changes.
- Strictly Stationary – The joint distribution of observations is invariant to time shift.
Stepwise Implementation
The following steps will let the user easily understand the method to check the given time series data is stationary.
Step 1: Plotting the time series data
Click here to download the practice dataset daily-female-births-IN.csv.
Python3
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv( "daily-total-female-births-IN.csv" ,
header = 0 , index_col = 0 )
plt.plot(data)
|
Output:

Step 2: Evaluating the descriptive statistics
This is usually done by splitting the data into two or more partitions and calculating the mean and variance for each group. If these first-order moments are consistent among these partitions, then we can assume that the data is stationary. Let’s use airlines passenger count data set between 1949 – 1960.
Click here to download the practice dataset AirPassengers.csv.
Python3
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv( "AirPassengers.csv" ,
header = 0 , index_col = 0 )
print (data.head( 10 ))
plt.plot(data)
|
Output:
Now, let’s partition this data into different groups and calculate the mean and variance of different groups and check for consistency.
Python3
import pandas as pd
data = pd.read_csv( "AirPassengers.csv" , header = 0 , index_col = 0 )
values = data.values
parts = int ( len (values) / 3 )
part_1, part_2, part_3 = values[ 0 :parts], values[parts:(
parts * 2 )], values[(parts * 2 ):(parts * 3 )]
mean_1, mean_2, mean_3 = part_1.mean(), part_2.mean(), part_3.mean()
var_1, var_2, var_3 = part_1.var(), part_2.var(), part_3.var()
print ( 'mean1=%f, mean2=%f, mean2=%f' % (mean_1, mean_2, mean_3))
print ( 'variance1=%f, variance2=%f, variance2=%f' % (var_1, var_2, var_3))
|
Output:
The output clearly implies that the mean and variance of the three groups are considerably different from each other describing the data is non-stationary. Say for example if the means where mean_1 = 150, mean_2 = 160, mean_3 = 155 and variance_1 = 33, variance_2 = 35, variance_3 = 37, then we can conclude that the data is stationary. Sometimes this method can fail for some distributions, like log-norm distributions.
Let’s try the same example as above but take the log of the passengers’ count using NumPy’s log() function and check the results.
Python3
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = pd.read_csv( "AirPassengers.csv" , header = 0 , index_col = 0 )
values = log(data.values)
print (values[ 0 : 15 ])
plt.plot(values)
|
Output:
The output signifies there is some trend but not very steep as the previous case, now let’s compute the partition mean and variance.
Python3
parts = int ( len (values) / 3 )
part_1, part_2, part_3 = values[ 0 :parts], values[parts:(parts * 2 )], values[(parts * 2 ):(parts * 3 )]
mean_1, mean_2, mean_3 = part_1.mean(), part_2.mean(), part_3.mean()
var_1, var_2, var_3 = part_1.var(), part_2.var(), part_3.var()
print ( 'mean1=%f, mean2=%f, mean2=%f' % (mean_1, mean_2, mean_3))
print ( 'variance1=%f, variance2=%f, variance2=%f' % (var_1, var_2, var_3))
|
Output:
Ideally, we would have expected the mean and variance to be very different but they are the same, in such cases, this method can terribly fail. In order to avoid this, we have another statistical test which is discussed below.
Step 3: Augmented Dickey-Fuller test
This is a statistical test that is dedicatedly built to test whether univariate time series data is stationary or not. This test is based on a hypothesis and can tell us the degree of probability to which it can be accepted. It is often classified under one of the unit root tests, It determines how strongly, a univariate time series data follows a trend. Let’s define the null and alternate hypotheses,
- Ho (Null Hypothesis): The time series data is non-stationary
- H1 (alternate Hypothesis): The time series data is stationary
Assume alpha = 0.05, meaning (95% confidence). The test results are interpreted with a p-value if p > 0.05 fails to reject the null hypothesis, else if p <= 0.05 reject the null hypothesis. Now, let’s use the same air passengers dataset and test it using adfuller() statistical function provided by the stats model package, to check whether the data is stationary or not.
Python3
import pandas as pd
from statsmodels.tsa.stattools import adfuller
data = pd.read_csv( "AirPassengers.csv" , header = 0 , index_col = 0 )
values = data.values
res = adfuller(values)
print ( 'Augmneted Dickey_fuller Statistic: %f' % res[ 0 ])
print ( 'p-value: %f' % res[ 1 ])
print ( 'critical values at different levels:' )
for k, v in res[ 4 ].items():
print ( '\t%s: %.3f' % (k, v))
|
Output:
As per our hypothesis, the ADF statistic is much greater than the critical values at different levels, and also the p-value is also greater than 0.05 which signifies, we can fail to reject the null hypothesis at 90%, 95%, and 99% confidence, meaning the time series data is strongly non-stationary.
Now, let’s try running the ADF test to the log normed values and cross-check our results.
Python3
import pandas as pd
from statsmodels.tsa.stattools import adfuller
import numpy as np
data = pd.read_csv( "AirPassengers.csv" , header = 0 , index_col = 0 )
values = log(data.values)
res = adfuller(values)
print ( 'Augmneted Dickey_fuller Statistic: %f' % res[ 0 ])
print ( 'p-value: %f' % res[ 1 ])
print ( 'critical values at different levels:' )
for k, v in res[ 4 ].items():
print ( '\t%s: %.3f' % (k, v))
|
Output:
As you can see, the ADF test one more times shows that the ADF statistic is much greater than the critical values at different levels, and also the p-value is much greater than 0.05 which signifies, we can fail to reject the null hypothesis at 90%, 95%, and 99% confidence, meaning the time series data is strongly non-stationary.
Hence, the ADF unit root test stands out to be a robust test to check whether a time series data is stationary or not.
Similar Reads
How to Remove Non-Stationarity in Time Series Forecasting
Removing non-stationarity in time series data is crucial for accurate forecasting because many time series forecasting models assume stationarity, where the statistical properties of the time series do not change over time. Non-stationarity can manifest as trends, seasonality, or other forms of irre
7 min read
Stationarity of Time Series Data using R
In this article, we will discuss about Stationarity of Time Series Data, its characteristics, and types, why stationarity matters, and How to test it using R. Stationarity of Time Series Data Stationarity is an important concept when working with time series data. A stationary time series is one who
7 min read
How to check the execution time of Python script ?
In this article, we will discuss how to check the execution time of a Python script. There are many Python modules like time, timeit and datetime module in Python which can store the time at which a particular section of the program is being executed. By manipulating or getting the difference betwee
6 min read
How to deal with missing values in a Timeseries in Python?
It is common to come across missing values when working with real-world data. Time series data is different from traditional machine learning datasets because it is collected under varying conditions over time. As a result, different mechanisms can be responsible for missing records at different tim
10 min read
How to Make a Time Series Plot with Rolling Average in Python?
Time Series Plot is used to observe various trends in the dataset over a period of time. In such problems, the data is ordered by time and can fluctuate by the unit of time considered in the dataset (day, month, seconds, hours, etc.). When plotting the time series data, these fluctuations may preven
4 min read
Python | Pandas Series.dt.is_year_start
Series.dt can be used to access the values of the series as datetimelike and return several properties. Pandas Series.dt.is_year_start attribute return a boolean value Indicating whether the date is the first day of a year. Syntax: Series.dt.is_year_start Parameter : None Returns : numpy array Examp
2 min read
Python | Pandas Series.to_sparse()
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.to_sparse() function convert
2 min read
Manipulating Time Series Data in Python
A collection of observations (activity) for a single subject (entity) at various time intervals is known as time-series data. In the case of metrics, time series are equally spaced and in the case of events, time series are unequally spaced. We may add the date and time for each record in this Panda
8 min read
Pandas Series dt.is_month_start | Check if Date is First Day of Month
The dt.is_month_start attribute returns a boolean value indicating whether the date is the first day of the month.ExamplePython3 import pandas as pd sr = pd.Series(['2012-1-1', '2019-7-18 12:30', '2008-02-2 10:30', '2010-4-22 09:25', '2019-1-1 00:00']) idx = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day
2 min read
How to Plot a Time Series in Matplotlib?
Time series data is the data marked by some time. Each point on the graph represents a measurement of both time and quantity. Â A time-series chart is also known as a fever chart when the data are connected in chronological order by a straight line that forms a succession of peaks and troughs. x-axis
4 min read