Selecting rows in pandas DataFrame based on conditions

Different ways to iterate over rows in Pandas Dataframe

Last Updated : 27 Nov, 2024

Iterating over rows in a Pandas DataFrame allows to access row-wise data for operations like filtering or transformation. The most common methods include iterrows(), itertuples(), and apply(). However, iteration can be slow for large datasets, so vectorized operations are often preferred.

Let’s understand with a quick Example: Using iterrows() as iterrows() is a great choice when dealing with smaller datasets or when you need to perform row-specific operations that can’t be easily vectorized. However, for larger datasets or performance-critical tasks, you may want to explore more efficient methods like itertuples() or apply().

Method 1: Using `iterrows - For smaller datasets`

For example, you have a DataFrame representing sales data, and you want to calculate the total sales by multiplying the quantity by the price for each raw, you need to iterate over the rows. Here’s a quick look at how it works:

Python

import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)

# Iterating over rows
for index, row in df.iterrows():
    total_sales = row['Quantity'] * row['Price']
    print(f"{row['Item']} Total Sales: ${total_sales}")

Output

Apple Total Sales: $5.0
Banana Total Sales: $6.0
Orange Total Sales: $21.0

In this example, the iterrows() method is used to iterate over each row of the DataFrame, and we calculate the total sales for each item. Now, let’s explore how you can loop through rows, why different methods exist, and when to use each

Method 2: Using `itertuples() - For larger datasets`

itertuples() is another efficient method for iterating over rows. Unlike iterrows(), it returns each row as a named tuple instead of a Series. This can be faster, especially for larger datasets, because named tuples are more lightweight. This method also preserves data types and ideal for scenarios where you need to access row data quickly and in a structured format.

Example:

Python

import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Iterating using itertuples()
for row in df.itertuples():
    total_sales = row.Quantity * row.Price
    print(f"{row.Item} Total Sales: ${total_sales}")

Output

Apple Total Sales: $5.0
Banana Total Sales: $6.0
Orange Total Sales: $21.0

Method 3. Using `apply() - F`or complex row-wise transformations

The apply() method applies a function to each row or column of a DataFrame. It leverages vectorized operations, making it significantly faster than looping methods like iterrows() or itertuples(). apply() is highly versatile and optimized for performance particularly useful for complex row-wise or column-wise transformations without explicitly iterating over rows, thus saving time in large datasets.

Python

import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Defining a function to calculate total sales for each row
def calculate_total_sales(row):
    return f"{row['Item']} Total Sales: ${row['Quantity'] * row['Price']}"

# Applying the function row-wise
result = df.apply(calculate_total_sales, axis=1)
for res in result:
    print(res)

Output

Apple Total Sales: $5.0
Banana Total Sales: $6.0
Orange Total Sales: $21.0

Method 4: Index-based Iteration (`iloc[]` or `loc[]`) – For specific rows

Index-based iteration uses methods like iloc[] (integer-based) or loc[] (label-based) to access specific rows or slices of a DataFrame. This approach is straightforward but less efficient for large datasets. Useful when you need precise control over which rows to access or modify. However, it should be avoided for large-scale iterations due to its slower performance compared to vectorized operations.

Python

import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)

# Iterate over rows using .iloc[] (index-based)
print("Using iloc[] for index-based iteration:")
for i in range(len(df)):
    print(f"Row {i}: {df.iloc[i]}")

# Iterate over rows using .loc[] (label-based)
print("\nUsing loc[] for label-based iteration:")
for i in df.index:  # Here, index defaults to 0, 1, 2
    print(f"Row {i}: {df.loc[i]}")

Output

Using iloc[] for index-based iteration:
Row 0: Item        Apple
Quantity       10
Price         0.5
Name: 0, dtype: object
Row 1: Item        Banana
Quantity        20
Price          0.3
Name: 1, dty...

Index-based iteration is useful when you need precise control over which rows to access or modify. However, it should be avoided for large-scale iterations due to its slower performance compared to vectorized operations.

Which Method : When to Use

For small datasets or debugging, all methods work well, but for larger datasets:

Use itertuples() for better performance when iterating.
Use apply() for efficient and Pythonic transformations without explicit iteration.

Method	Notes
`iterrows()`	Converts rows to Series, which can be inefficient for large datasets.
`itertuples()`	Preserves data types and is more efficient than `iterrows()`.
`apply()`	Optimized for performance; best for transformations or calculations.
Index-based	Useful for precise control but less efficient for large datasets

Selecting rows in pandas DataFrame based on conditions

ankthon

Improve

Article Tags :

Practice Tags :

python

Similar Reads

Pandas Exercises and Programs

Pandas is an open-source Python Library that is made mainly for working with relational or labelled data both easily and intuitively. This Python library is built on top of the NumPy library, providing various operations and data structures for manipulating numerical data and time series. Pandas is

Different ways to create Pandas Dataframe

It is the most commonly used Pandas object. The pd.DataFrame() function is used to create a DataFrame in Pandas. There are several ways to create a Pandas Dataframe in Python. Example: Creating a DataFrame from a Dictionary [GFGTABS] Python import pandas as pd # initialize data of lists. data = {