
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Group Data by Time Intervals in Python Pandas
Data analysis has increasingly become a crucial aspect of every industry. Numerous organizations depend intensely on information, make strategic decisions, forecast trends, and understand their consumer behaviors. In such a climate, Python's Pandas library has arisen as a powerhouse device, offering a different scope of functionalities to control, break down, and imagine information successfully. One of these powerful capabilities includes grouping data by time intervals.
This article will focus on how to group data by time intervals using Pandas. We will be exploring the syntax, an easy-to-understand algorithm, two distinct approaches, and two fully executable real codes based on these approaches.
Syntax
The method we'll focus on is Pandas' groupby() function, particularly its resampling method. The syntax is as follows:
df.groupby(pd.Grouper(key='date', freq='T')).sum()
In the syntax:
df ? Your DataFrame.
groupby(pd.Grouper()) ? The function to group data.
key ? The column you want to group by. Here, it's the 'date' column.
freq ? The frequency of the time intervals. ('T' for minutes, 'H' for hours, 'D' for days, etc.)
sum() ? The aggregation function.
Algorithm
Here's the step-by-step algorithm for grouping data by time intervals ?
Import the necessary libraries, i.e., Pandas.
Load or create your DataFrame.
Convert the date column to a datetime object, if it isn't already.
Apply the groupby() function with pd.Grouper on the date column with the desired frequency.
Apply the aggregation function like sum(), mean(), etc.
Print or store the result.
Approaches
We'll consider two distinct approaches ?
Approach 1: Grouping by Daily Frequency
In this example, we created a DataFrame with a range of dates and values. We then grouped the data by daily frequency and summed the values for each day.
Example
# Import pandas import pandas as pd # Create a dataframe df = pd.DataFrame({ 'date': pd.date_range(start='1/1/2022', periods=100, freq='H'), 'value': range(100) }) # Convert 'date' to datetime object, if not already df['date'] = pd.to_datetime(df['date']) # Group by daily frequency daily_df = df.groupby(pd.Grouper(key='date', freq='D')).sum() print(daily_df)
Output
value date 2022-01-01 276 2022-01-02 852 2022-01-03 1428 2022-01-04 2004 2022-01-05 390
Explanation
Bringing in the Pandas library, which is an absolute requirement for any data manipulation work, is the principal thing we truly do in this code. Utilizing the pd.DataFrame() strategy is the subsequent stage during the time spent building a DataFrame. The 'date' and 'value' sections make up this DataFrame. The pd.date_range() function is utilized to create a progression of hourly timestamps in the 'date' column, while the 'value' section just incorporates a scope of whole numbers. The 'date' column is the consequence of this interaction.
Notwithstanding the way that our 'date' column as of now addresses a datetime object, we by and by utilize the pd.to_datetime() function to ensure that it gets changed over. This step is critical since the progress of the gathering activity is dependent upon this segment having the information kind of a datetime object.
After that, to group our data by a daily ('D') frequency, we utilize the groupby() function in conjunction with the pd.Grouper() function. Following the application of the grouping, we put in the sum() function, which brings together all of the 'value' elements that belong to the same day into a single total.
At long last, the grouped DataFrame is written out, displaying the totals for each day's values.
Approach 2: Grouping by a custom frequency, such as 15-minute intervals
Example
# Import pandas import pandas as pd # Create a dataframe df = pd.DataFrame({ 'date': pd.date_range(start='1/1/2022', periods=100, freq='T'), 'value': range(100) }) # Convert 'date' to datetime object, if not already df['date'] = pd.to_datetime(df['date']) # Group by 15-minute frequency custom_df = df.groupby(pd.Grouper(key='date', freq='15T')).sum() print(custom_df)
Output
value date 2022-01-01 00:00:00 105 2022-01-01 00:15:00 330 2022-01-01 00:30:00 555 2022-01-01 00:45:00 780 2022-01-01 01:00:00 1005 2022-01-01 01:15:00 1230 2022-01-01 01:30:00 945
Explanation
The subsequent technique starts with a similar import of the Pandas library as the first, followed by the making of a DataFrame. This DataFrame is identical from the one that was utilized in the past model; the main distinction is that the 'date' column presently contains minute-wise timestamps.
The 'date' column should be a datetime object for the gathering activity to work appropriately, and the pd.to_datetime() function ensures that this will occur.
Within this section, we carry out a grouping operation by making use of a specialized frequency of 15 minutes ('15T') using the pd.Grouper() function that is located inside of the groupby() method. To aggregate the 'value' entries for each 15-minute time interval, we use the sum() function, the same method that was used in the first approach.
The code is completed by displaying the newly grouped DataFrame, which displays the total of the 'value' column for each interval of 15 minutes in time.
Conclusion
Pandas' power extends to a variety of data manipulations, one of which is grouping data by time intervals. By using the groupby() function combined with pd.Grouper, we can effectively segment data based on daily frequency or a custom frequency, allowing for efficient, flexible data analysis.
The capability to group data by time intervals enables analysts and businesses to extract meaningful insights from their data. Whether it's calculating the sum of sales every day, obtaining the average temperature every hour, or counting website hits every 15 minutes, grouping data by time intervals allows us to better understand trends, patterns, and outliers in our data over time.
Remember, Python's Pandas library is a powerful tool for data analysis. Learning how to use its functions, like the groupby method, can help you become a more effective and proficient data analyst or data scientist.