Grouping Rows in pandas

Pandas GroupBy

Last Updated : 15 May, 2025

The groupby() function in Pandas is important for data analysis as it allows us to group data by one or more categories and then apply different functions to those groups. This technique is used for handling large datasets efficiently and performing operations like aggregation, transformation and filtration on grouped data.

Concepts of groupby()

The groupby() operation is divided into three main steps:

Step 1 : Splitting Data into Groups

The splitting process refers to dividing the dataset into groups based on a particular condition or key. This can be done using the groupby() function by passing one or more columns as keys.

Methods for splitting data are as follows:

1. Group data by a single key: In order to group data with one key we pass only one key as an argument in groupby function. Here we will group the data by the Name column.

Python

import pandas as pd

data1 = {'Name':['Jai', 'Anuj', 'Jai', 'Princi',
                 'Gaurav', 'Anuj', 'Princi', 'Abhi'],
        'Age':[27, 24, 22, 32,
               33, 36, 27, 32],
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj',
                   'Jaunpur', 'Kanpur', 'Allahabad', 'Aligarh'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd',
                         'B.Tech', 'B.com', 'Msc', 'MA']}
df = pd.DataFrame(data1)
print(df)


df.groupby('Name')
print(df.groupby('Name').groups)

Output:

grpby3 — Custom dataset

Now we print the first entries in all the groups formed.

Python

gk = df.groupby('Name') 
gk.first()

Output:

2. Grouping data with multiple keys : In order to group data with multiple keys, we pass multiple keys in groupby function. Here we group a data of "Name" and "Qualification" together using multiple keys in groupby function.

Python

df.groupby(['Name', 'Qualification'])
print(df.groupby(['Name', 'Qualification']).groups)

Output:

{('Abhi', 'MA'): [7], ('Anuj', 'B.com'): [5], ('Anuj', 'MA'): [1], ('Gaurav', 'B.Tech'): [4], ('Jai', 'MCA'): [2], ('Jai', 'Msc'): [0], ('Princi', 'Msc'): [6], ('Princi', 'Phd'): [3]}

3. Grouping data by sorting keys: Group keys are sorted by default using the groupby operation. Now we apply groupby() using sort.

Python

 df.groupby('Name')['Age'].sum()

Output:

Now we apply groupby() without using sort(we pass sort=False).

Python

df.groupby(['Name'], sort=False)['Age'].sum()

Output:

4. Grouping data with object attributes: Grouping data with object attributes in Pandas allows us to treat the groups attribute as a dictionary where the keys are unique group values and values are the indices (row labels) corresponding to each group.

Python

df.groupby('Name').groups

Output:

{'Abhi': [7], 'Anuj': [1, 5], 'Gaurav': [4], 'Jai': [0, 2], 'Princi': [3, 6]}

5. Iterating through groups: In order to iterate an element of groups, we can iterate through the object.

Python

grp = df.groupby('Name')
for name, group in grp:
    print(name)
    print(group)
    print()

Output:

Now we iterate an element of group containing multiple keys.

Python

grp = df.groupby(['Name', 'Qualification'])
for name, group in grp:
    print(name)
    print(group)
    print()

Output:

6. Selecting a group: In order to select a group, we can select group using GroupBy.get_group(). We can select a group by applying a function GroupBy.get_group this function select a single group.

Python

grp = df.groupby('Name')
grp.get_group('Jai')

Output:

Now we select an object grouped on multiple columns.

Python

grp = df.groupby(['Name', 'Qualification'])
grp.get_group(('Jai', 'Msc'))

Output:

Step 2: Applying Functions to Groups

After splitting a data into a group we apply a function to each group, in order to do that we have to perform some operations which are as follows:

1. Aggregation: It allows us to find a summary statistic for each group (e.g summing or averaging values).

Python

grp1 = df.groupby('Name')
grp1['Age'].aggregate(np.sum)

Output:

Now we perform aggregation on a group containing multiple keys.

Python

grp1 = df.groupby(['Name', 'Qualification'])
grp1['Age'].aggregate(np.sum)

Output:

When we can apply a multiple functions at once by passing a list or dictionary of functions.

Python

grp = df.groupby('Name')
grp['Age'].agg([np.sum, np.mean, np.std])

Output:

2. Transformation: It is a process in which we perform some group-specific computations and return a result with the same index as the original data.

Here we are using some different datasets which is created randomly below.

Python

import pandas as pd
data2 = {'Name':['Jai', 'Anuj', 'Jai', 'Princi', 
                 'Gaurav', 'Anuj', 'Princi', 'Abhi'], 
        'Age':[27, 24, 22, 32, 
               33, 36, 27, 32], 
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj',
                   'Jaunpur', 'Kanpur', 'Allahabad', 'Aligarh'], 
        'Qualification':['Msc', 'MA', 'MCA', 'Phd',
                         'B.Tech', 'B.com', 'Msc', 'MA'],
        'Score':[23, 34, 35, 45, 47, 50, 52, 53]} 
grp2 = df2.groupby('Name')
sc = lambda x: (x - x.mean()) / x.std() 
grp2['Age'].transform(sc)

Output:

grpby2 — Transformation

3. Filtration: It is a process which is used to discard groups based on a condition. For example we can filter out groups where the number of records is less than a certain threshold.

Python

grp2 = df2.groupby('Name')
grp2.filter(lambda x: len(x) >= 2)

Output:

Step 3: Combining Results

After applying the functions, results are combined back into a data structure (like a DataFrame or Series). We can further analyze or visualize the data in its new form.

Python

df.groupby('Name').agg({'Age': 'sum'})

Output:

grpby4 — Combining Results

With the ability to group data by one or more keys and apply various operations, groupby simplifies tasks such as summarizing data, transforming values, or filtering groups, making it essential for efficient and scalable data analysis.

Grouping Rows in pandas

A

ABHISHEK TIWARI 13

Improve

Article Tags :

Similar Reads

Pandas Tutorial

Pandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t