Manipulating Time Series Data in Python
Last Updated :
18 Mar, 2022
A collection of observations (activity) for a single subject (entity) at various time intervals is known as time-series data. In the case of metrics, time series are equally spaced and in the case of events, time series are unequally spaced. We may add the date and time for each record in this Pandas module, as well as fetch dataframe records and discover data inside a specific date and time range.
Generate a date range:
Pandas package is imported. pd.date_range() method is used to create a date range, the date range has a monthly frequency.
Python3
# importing pandas
import pandas as pd
# creating a date range
Date_range = pd.date_range(start='1/12/2020', end='20/5/2021', freq='M')
print(Date_range)
print(type(Date_range))
print(type(Date_range[0]))
Output:
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
'2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',
'2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31',
'2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30'],
dtype='datetime64[ns]', freq='M')
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Operations on timestamp data:
The date range is converted into a dataframe with the help of pd.DataFrame() method. The column is converted to DateTime using to_datetime() method. info() method gives information about the dataframe if there are any null values and the datatype of the columns.
Python3
# importing pandas
import pandas as pd
# creating a date range
Date_range = pd.date_range(start='1/12/2020', end='20/5/2021', freq='M')
# creating a Dataframe
Data = pd.DataFrame(Date_range, columns=['Date'])
# converting the column to datetime
Data['Date'] = pd.to_datetime(Data['Date'])
print(Data.info())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 16 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 256.0 bytes
Convert data from a string to a timestamp:
if we have a list of string data that resembles DateTime, we can first convert it to a dataframe using pd.DataFrame() method and convert it to DateTime column using pd.to_datetime() method.
Python3
# importing pandas
import pandas as pd
# creating string data
string_data = ['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
'2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',
'2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31',
'2021-01-31', '2021-02-28', '2021-03-31', '2021-04-30']
Data = pd.DataFrame(string_data, columns=['Date'])
Data['Date'] = pd.to_datetime(Data['Date'])
print(Data.info())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 16 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 256.0 bytes
None
According to the format of our string values, we can convert them to DateTime. datetime.strptime() function can be used in this scenario
Python3
# importing pandas
import pandas as pd
from datetime import datetime
# string data
string_data = ['May-20-2021', 'May-21-2021', 'May-22-2021']
timestamp_data = [datetime.strptime(x, '%B-%d-%Y') for x in string_data]
print(timestamp_data)
Data = pd.DataFrame(timestamp_data, columns=['Date'])
print(Data.info())
Output:
[datetime.datetime(2021, 5, 20, 0, 0), datetime.datetime(2021, 5, 21, 0, 0), datetime.datetime(2021, 5, 22, 0, 0)]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 3 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 152.0 bytes
Slicing and indexing time series data:
CSV file is imported in this example and a column with string data is converted into DateTime using pd.to_timestamp() method. That particular column is set as an index which helps us slice and index data accordingly. data. loc['2020-01-22'][:10] indexes data on the day '2020-01-22' and the result is further sliced to return the first 10 observations on that day.
To view and download the CSV file click here.
Python3
# importing pandas
import pandas as pd
# reading csv file
data = pd.read_csv('covid_data.csv')
# converting string data to datetime
data['ObservationDate'] = pd.to_datetime(data['ObservationDate'])
# setting index
data = data.set_index('ObservationDate')
print(data.head())
# indexing and slicing through the dataframe
print(data.loc['2020-01-22'][:10])
Output:
Unnamed: 0 Province/State ... Deaths Recovered
ObservationDate ...
2020-01-22 0 Anhui ... 0.0 0.0
2020-01-22 1 Beijing ... 0.0 0.0
2020-01-22 2 Chongqing ... 0.0 0.0
2020-01-22 3 Fujian ... 0.0 0.0
2020-01-22 4 Gansu ... 0.0 0.0
[5 rows x 7 columns]
Unnamed: 0 Province/State ... Deaths Recovered
ObservationDate ...
2020-01-22 0 Anhui ... 0.0 0.0
2020-01-22 1 Beijing ... 0.0 0.0
2020-01-22 2 Chongqing ... 0.0 0.0
2020-01-22 3 Fujian ... 0.0 0.0
2020-01-22 4 Gansu ... 0.0 0.0
2020-01-22 5 Guangdong ... 0.0 0.0
2020-01-22 6 Guangxi ... 0.0 0.0
2020-01-22 7 Guizhou ... 0.0 0.0
2020-01-22 8 Hainan ... 0.0 0.0
2020-01-22 9 Hebei ... 0.0 0.0
[10 rows x 7 columns]
In this example, we slice data from '2020-01-22' to '2020-02-22'.
Python3
# importing pandas
import pandas as pd
from datetime import datetime
# reading csv file
data = pd.read_csv('covid_data.csv')
# converting string data to datetime
data['ObservationDate'] = pd.to_datetime(data['ObservationDate'])
# setting index
data = data.set_index('ObservationDate')
# indexing and slicing through the dataframe
print(data.loc['2020-01-22':'2020-02-22'])
Output:
Unnamed: 0 Province/State ... Deaths Recovered
ObservationDate ...
2020-01-22 0 Anhui ... 0.0 0.0
2020-01-22 1 Beijing ... 0.0 0.0
2020-01-22 2 Chongqing ... 0.0 0.0
2020-01-22 3 Fujian ... 0.0 0.0
2020-01-22 4 Gansu ... 0.0 0.0
... ... ... ... ... ...
2020-02-22 2169 San Antonio, TX ... 0.0 0.0
2020-02-22 2170 Seattle, WA ... 0.0 1.0
2020-02-22 2171 Tempe, AZ ... 0.0 0.0
2020-02-22 2172 Unknown ... 0.0 0.0
2020-02-22 2173 NaN ... 0.0 0.0
[2174 rows x 7 columns]
Resampling time series data for various aggregates/summary statistics for different time periods:
To resample time-series data, use the pandas resample() function. It is a time series frequency conversion and resampling convenience technique. The caller must give the label of a DateTime-like series/index to the on/level keyword argument if the object has a DateTime-like index.
Python3
# importing pandas
import pandas as pd
from datetime import datetime
# reading csv file
data = pd.read_csv('covid_data.csv')
# converting string data to datetime
data['ObservationDate'] = pd.to_datetime(data['ObservationDate'])
# setting index
data = data.set_index('ObservationDate')
# resampling data according to year
data = data.resample('Y').mean()
print(data)
Output:
Unnamed: 0 Confirmed Deaths Recovered
ObservationDate
2020-12-31 96232.5 39696.116550 1160.959453 24659.893368
2021-12-31 249447.0 163315.277678 3514.893386 93925.632661
Calculate a rolling statistic like a rolling average:
Dataframe created with Pandas. The rolling() method allows you to calculate rolling windows. The idea of calculating a rolling window is most commonly employed in signal processing and time-series data. To put it another way, we take a window of size k at a time and apply some mathematical operation to it. A window of size k signifies that k successive values are displayed at the same time. All of the 'k' values are equally weighted in the simplest instance. In the below example window size is 5.
Python3
# importing pandas
import pandas as pd
from datetime import datetime
# reading csv file
data = pd.read_csv('covid_data.csv')
# converting string data to datetime
data['ObservationDate'] = pd.to_datetime(data['ObservationDate'])
data['Last Update'] = pd.to_datetime(data['Last Update'])
# setting index
data = data.set_index('ObservationDate')
data = data[['Last Update', 'Confirmed']]
data['rolling_sum'] = data.rolling(5).sum()
print(data.head())
Output:
Last Update Confirmed rolling_sum
ObservationDate
2020-01-22 2020-01-22 17:00:00 1.0 NaN
2020-01-22 2020-01-22 17:00:00 14.0 NaN
2020-01-22 2020-01-22 17:00:00 6.0 NaN
2020-01-22 2020-01-22 17:00:00 1.0 NaN
2020-01-22 2020-01-22 17:00:00 0.0 22.0
Dealing with missing data:
In the previous example, the rolling_sum column has Nan values, so we can use that data to demonstrate how to deal with missing data.
Null values appear as NaN in Data Frame when a CSV file contains null values. Fillna() handles and lets the user replace NaN values with their own values, similar to how the pandas dropna() function maintains and removes Null values from a data frame. Filling the missing values in the dataframe in a backward manner is accomplished by passing backfill as the method argument value in fillna(). Fillna() fills the missing values in the dataframe in a forward direction by passing ffill as the method parameter value.
Python3
# importing pandas
import pandas as pd
from datetime import datetime
# reading csv file
data = pd.read_csv('covid_data.csv')
# converting string data to datetime
data['ObservationDate'] = pd.to_datetime(data['ObservationDate'])
data['Last Update'] = pd.to_datetime(data['Last Update'])
# setting index
data = data.set_index('ObservationDate')
data = data[['Last Update', 'Confirmed']]
data['rolling_sum'] = data.rolling(5).sum()
print(data.head())
# dealing with missing data
data['rolling_backfilled'] = data['rolling_sum'].fillna(method='backfill')
print(data.head(5))
Output:
Last Update Confirmed rolling_sum
ObservationDate
2020-01-22 2020-01-22 17:00:00 1.0 NaN
2020-01-22 2020-01-22 17:00:00 14.0 NaN
2020-01-22 2020-01-22 17:00:00 6.0 NaN
2020-01-22 2020-01-22 17:00:00 1.0 NaN
2020-01-22 2020-01-22 17:00:00 0.0 22.0
Last Update Confirmed rolling_sum rolling_backfilled
ObservationDate
2020-01-22 2020-01-22 17:00:00 1.0 NaN 22.0
2020-01-22 2020-01-22 17:00:00 14.0 NaN 22.0
2020-01-22 2020-01-22 17:00:00 6.0 NaN 22.0
2020-01-22 2020-01-22 17:00:00 1.0 NaN 22.0
2020-01-22 2020-01-22 17:00:00 0.0 22.0 22.0
Fundamentals of Unix/epoch time:
One may come across time values in Unix time while working with time-series data. The amount of seconds since 00:00:00 Coordinated Universal Time (UTC), Thursday, January 1, 1970, is known as Unix time, sometimes known as Epoch time. Unix time helps us decipher time stamps so we don't get confused by time zones, daylight savings time, and other factors.
In the below example we convert epoch time to timestamp using pd.to_timestamp() method. If we want time in UTC to a particular time zone, tz_localize() and tz. convert() methods are used. In the below example we convert it to the 'Europe/Berlin' timezone.
Python3
# importing pandas
import pandas as pd
from datetime import datetime
# epoch time
epoch = 1598776989
# converting to timestamp
timestamp = pd.to_datetime(epoch, unit='s')
print(timestamp)
# converting it to a particular time zone
print(timestamp.tz_localize('UTC').tz_convert('Europe/Berlin'))
Output:
2020-08-30 08:43:09
2020-08-30 10:43:09+02:00
Similar Reads
Python Tutorial - Learn Python Programming Language Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly. It'sA high-level language, used in web development, data science, automation, AI and more.Known fo
10 min read
Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth
15+ min read
Non-linear Components In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p
11 min read
Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list
10 min read
Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test
9 min read
Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co
11 min read
Spring Boot Tutorial Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Python Introduction Python was created by Guido van Rossum in 1991 and further developed by the Python Software Foundation. It was designed with focus on code readability and its syntax allows us to express concepts in fewer lines of code.Key Features of PythonPythonâs simple and readable syntax makes it beginner-frien
3 min read
Python Data Types Python Data types are the classification or categorization of data items. It represents the kind of value that tells what operations can be performed on a particular data. Since everything is an object in Python programming, Python data types are classes and variables are instances (objects) of thes
9 min read