Open In App

Different ways to iterate over rows in Pandas Dataframe

Last Updated : 27 Nov, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Iterating over rows in a Pandas DataFrame allows to access row-wise data for operations like filtering or transformation. The most common methods include iterrows()itertuples(), and apply(). However, iteration can be slow for large datasets, so vectorized operations are often preferred.

Let’s understand with a quick Example: Using iterrows() as iterrows() is a great choice when dealing with smaller datasets or when you need to perform row-specific operations that can’t be easily vectorized. However, for larger datasets or performance-critical tasks, you may want to explore more efficient methods like itertuples() or apply().

Method 1: Using iterrows - For smaller datasets

For example, you have a DataFrame representing sales data, and you want to calculate the total sales by multiplying the quantity by the price for each raw, you need to iterate over the rows. Here’s a quick look at how it works:

Python
import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)

# Iterating over rows
for index, row in df.iterrows():
    total_sales = row['Quantity'] * row['Price']
    print(f"{row['Item']} Total Sales: ${total_sales}")

Output
Apple Total Sales: $5.0
Banana Total Sales: $6.0
Orange Total Sales: $21.0

In this example, the iterrows() method is used to iterate over each row of the DataFrame, and we calculate the total sales for each item. Now, let’s explore how you can loop through rows, why different methods exist, and when to use each

Method 2: Using itertuples() - For larger datasets

itertuples() is another efficient method for iterating over rows. Unlike iterrows(), it returns each row as a named tuple instead of a Series. This can be faster, especially for larger datasets, because named tuples are more lightweight. This method also preserves data types and ideal for scenarios where you need to access row data quickly and in a structured format.

Example:

Python
import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Iterating using itertuples()
for row in df.itertuples():
    total_sales = row.Quantity * row.Price
    print(f"{row.Item} Total Sales: ${total_sales}")

Output
Apple Total Sales: $5.0
Banana Total Sales: $6.0
Orange Total Sales: $21.0

Method 3. Using apply() - For complex row-wise transformations

The apply() method applies a function to each row or column of a DataFrame. It leverages vectorized operations, making it significantly faster than looping methods like iterrows() or itertuples(). apply() is highly versatile and optimized for performance particularly useful for complex row-wise or column-wise transformations without explicitly iterating over rows, thus saving time in large datasets.

Python
import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)
# Defining a function to calculate total sales for each row
def calculate_total_sales(row):
    return f"{row['Item']} Total Sales: ${row['Quantity'] * row['Price']}"

# Applying the function row-wise
result = df.apply(calculate_total_sales, axis=1)
for res in result:
    print(res)

Output
Apple Total Sales: $5.0
Banana Total Sales: $6.0
Orange Total Sales: $21.0

Method 4: Index-based Iteration (iloc[] or loc[]) – For specific rows

Index-based iteration uses methods like iloc[] (integer-based) or loc[] (label-based) to access specific rows or slices of a DataFrame. This approach is straightforward but less efficient for large datasets. Useful when you need precise control over which rows to access or modify. However, it should be avoided for large-scale iterations due to its slower performance compared to vectorized operations.

Python
import pandas as pd

data = {'Item': ['Apple', 'Banana', 'Orange'],'Quantity': [10, 20, 30],'Price': [0.5, 0.3, 0.7]}
df = pd.DataFrame(data)

# Iterate over rows using .iloc[] (index-based)
print("Using iloc[] for index-based iteration:")
for i in range(len(df)):
    print(f"Row {i}: {df.iloc[i]}")

# Iterate over rows using .loc[] (label-based)
print("\nUsing loc[] for label-based iteration:")
for i in df.index:  # Here, index defaults to 0, 1, 2
    print(f"Row {i}: {df.loc[i]}")

Output
Using iloc[] for index-based iteration:
Row 0: Item        Apple
Quantity       10
Price         0.5
Name: 0, dtype: object
Row 1: Item        Banana
Quantity        20
Price          0.3
Name: 1, dty...

 Index-based iteration is useful when you need precise control over which rows to access or modify. However, it should be avoided for large-scale iterations due to its slower performance compared to vectorized operations.

Which Method : When to Use

For small datasets or debugging, all methods work well, but for larger datasets:

  • Use itertuples() for better performance when iterating.
  • Use apply() for efficient and Pythonic transformations without explicit iteration.
MethodNotes
iterrows()Converts rows to Series, which can be inefficient for large datasets.
itertuples()Preserves data types and is more efficient than iterrows().
apply()Optimized for performance; best for transformations or calculations.
Index-basedUseful for precise control but less efficient for large datasets


Next Article

Similar Reads