How to Resample Time Series Data in Python?
Last Updated :
19 Dec, 2021
In time series, data consistency is of prime importance, resampling ensures that the data is distributed with a consistent frequency. Resampling can also provide a different perception of looking at the data, in other words, it can add additional insights about the data based on the resampling frequency.
resample() function: It is a primarily used for time series data.
Syntax:
# import the python pandas library
import pandas as pd
# syntax for the resample function.
pd.series.resample(rule, axis=0, closed='left',
convention='start', kind=None, offset=None,
origin='start_day')
Resampling primarily involves changing the time-frequency of the original observations. The two popular methods of resampling in time series are as follows
Upsampling
Upsampling involves increasing the time-frequency of the data, it is a data disaggregation procedure where we break down the time frequency from a higher level to a lower level. For example Breaking down the time-frequency from months to days, or days to hours or hours to seconds. Upsampling usually blows up the size of the data, depending on the sampling frequency. If D is the size of original data and D' is the size of Upsampled data, then D' > D
Now, let's look at an example using Python to perform resampling in time-series data.
Click here to download the practice dataset Detergent sales data.csv used for the implementation.
Example:
Python3
# import the python pandas library
import pandas as pd
# read data using read_csv
data = pd.read_csv("Detergent sales data.csv", header=0,
index_col=0, parse_dates=True, squeeze=True)
Output:
The detergent sales data shows sales value for the first 6 months. Assume the task here is to predict the value of the daily sales. Given monthly data, we are asked to predict the daily sales data, which signifies the use of Upsampling.
Python3
# Use resample function to upsample months
# to days using the mean sales of month
upsampled = data.resample('D').mean()
Output:
The output shows a few samples of the dataset which is upsampled from months to days, based on the mean value of the month. You can also try using sum(), median() that best suits the problem.
The dataset has been upsampled with nan values for the remaining days except for those days which were originally available in our dataset. (total sales data for each month).
Now, we can fill these nan values using a technique called Interpolation. Pandas provide a function called DataFrame.interpolate() for this purpose. Interpolation is a method that involves filling the nan values using one of the techniques like nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial'. We will choose "linear" interpolation. This draws a straight line between available data, in this case on the last of the month, and fills in values at the chosen frequency from this line.
Python3
# use interpolate function with method linear
# to upsample the values of the upsampled days
# linearly
interpolated = upsampled.interpolate(method='linear')
# Printing the linear interpolated values for month 2
print(interpolated['2021-02']) .
Output:

Upsampling with a polynomial interpolation
Another common interpolation method is to use a polynomial or a spline to connect the values. This creates more curves and can look realistic on many datasets. Using a spline interpolation requires you to specify the order (number of terms in the polynomial).
Python3
# use interpolate function with method polynomial
# This upsamples the values of the remaining
# days with a quadratic function of degree 2.
interpolated = upsampled.interpolate(method='polynomial', order=2)
# Printing the polynomial interpolated value
print(interpolated)
Output:
Thus, we can use resample() and interpolate() function to upsample the data. Try this out using different configurations of these functions.
Downsampling:
Downsampling involves decreasing the time-frequency of the data, it is a data aggregation procedure where we aggregate the time frequency from a lower level to a higher level. For example summarizing the time-frequency from days to months, or hours to days or seconds to hours. Downsampling usually shrinks the size of the data, depending on the sampling frequency. If D is the size of original data and D' is the size of Upsampled data, then D' < D.
For example, car sales data shows sales value for the first 6 months daywise. Assume the task here is to predict the value of the quarterly sales. Given daily data, we are asked to predict the quarterly sales data, which signifies the use of downsampling.
Click here to download the practice dataset car-sales.csv used in this implementation.
Example:
Python3
# import the python pandas library
import pandas as pd
# read the data using pandas read_csv() function.
data = pd.read_csv("car-sales.csv", header=0,
index_col=0, parse_dates=True,
squeeze=True)
# printing the first 6 rows of the dataset
print(data.head(6))
Output:
We can use quarterly resampling frequency 'Q' to aggregate the data quarter-wise.
Python3
# Use resample function to downsample days
# to months using the mean sales of month.
downsampled = data.resample('Q').mean()
# printing the downsampled data.
print(downsampled)
Output:
Now, this downsampled data can be used for predicting quarterly sales.
Similar Reads
How to Plot a Time Series in Matplotlib?
Time series data is the data marked by some time. Each point on the graph represents a measurement of both time and quantity. Â A time-series chart is also known as a fever chart when the data are connected in chronological order by a straight line that forms a succession of peaks and troughs. x-axis
4 min read
Manipulating Time Series Data in Python
A collection of observations (activity) for a single subject (entity) at various time intervals is known as time-series data. In the case of metrics, time series are equally spaced and in the case of events, time series are unequally spaced. We may add the date and time for each record in this Panda
8 min read
Python | Pandas Series.sample()
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.sample() function return a ra
3 min read
How to Analyse Irregular Time-Series in R
In the world of data, time-series data refers to information collected over time. When we talk about "irregular time-series data," we mean data collected at inconsistent or random times, rather than at fixed, regular intervals. This article will explain about irregular time-series data , how to hand
4 min read
How to Make a Time Series Plot with Rolling Average in Python?
Time Series Plot is used to observe various trends in the dataset over a period of time. In such problems, the data is ordered by time and can fluctuate by the unit of time considered in the dataset (day, month, seconds, hours, etc.). When plotting the time series data, these fluctuations may preven
4 min read
Python | Pandas Series.at_time()
Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.at_time() function is used to
3 min read
Peak Signal Detection in Real-Time Time-Series Data
Real-time peak detection from within time-series data forms an essential and significant technique or method for a variety of different applications, right from anomaly detection in sensor networks to financial market analytics within the realm of big data analytics. Real-time peak detection is part
7 min read
Graphing Different Time Series Data in Python
Time series data is a sequence of data points recorded at specific time intervals. It is widely used in various fields such as finance, economics, weather forecasting, and many others. Visualizing time series data helps to identify trends, patterns, and anomalies, making it easier to understand and
3 min read
How to utilize time series in Pandas?
The pandas library in python provides a standard set of time series tools and data algorithms. By this, we can efficiently work with very large time series and easily slice and dice, aggregate, and resample irregular and fixed frequency time series. Time series data is an important form of structure
5 min read
How to Check if Time Series Data is Stationary with Python?
Time series data are generally characterized by their temporal nature. This temporal nature adds a trend or seasonality to the data that makes it compatible for time series analysis and forecasting. Time-series data is said to be stationary if it doesn't change with time or if they don't have a temp
8 min read