Pandas DataFrame itertuples() Method
Last Updated :
18 Dec, 2024
itertuples() is a method that is used to iterate over the rows and return the values along with attributes in tuple format. It returns each row as a lightweight namedtuple, which is faster and more memory-efficient than other row iteration methods like iterrows(). Let us consider one sample example.
Python
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
})
# Iterate using itertuples()
for row in df.itertuples():
print(row)
OutputPandas(Index=0, Name='Alice', Age=25, City='New York')
Pandas(Index=1, Name='Bob', Age=30, City='Los Angeles')
Pandas(Index=2, Name='Charlie', Age=35, City='Chicago')
From the output we can see that namedtuples have been returned for each row.
Pandas DataFrame itertuples() Method
itertuples is a method in Pandas that is used to iterate over the rows of the dataframe and return lightweight namedtuples. By namedtuples we mean that we can access the element of the tuple by its field name. It is an alternative to iterrows() and is much more memory efficient.
Syntax:
DataFrame.itertuples(index=True, name='Pandas')
- DataFrame means name of the dataframe
- index= True will return the index of the row and it will be the first element of the tuple or namedtuple.
- name='Pandas' will return the rows in namedtuple format. If it is set to None, it will return plain tuples with no field names
Here we have a dataframe and we need to iterate over the rows. We will use itertuples and set the index to False.
Python
import pandas as pd
# Sample fruit data
data = {
'name': ['Apple', 'Banana', 'Cherry'],
'color': ['Red', 'Yellow', 'Red'],
'price': [1.2, 0.5, 2.5]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples
for row in df.itertuples(index=False):
print(row)
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that the index has been excluded because we have set index=False. Since the default name is 'Pandas' it returns rows where the field names are basically column names.
How to access a particular field in namedtuple while using itertuples()
We can also access a particular field in namedtuple while using itertuples. This can be done by using row variable followed by the dot operator and then the field name. Let us consider one example. Here we have a dataframe and we need to display the output in a proper format instead of namedtuple.
Python
import pandas as pd
# Sample flower data
data = {
'name': ['Rose', 'Tulip', 'Daisy'],
'color': ['Red', 'Yellow', 'White'],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples
for row in df.itertuples():
print(f"Flower: {row.name}, Color: {row.color}")
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that using the column names we can extract the values from the namedtuple.
Now if we are setting name=None, then we are getting plain tuple as output. For plain tuple we can use indexing to access the values. By default the tuple indexing starts from 0.
Python
import pandas as pd
# Sample flower data
data = {
'color': ['Red', 'Yellow', 'White', 'Yellow'],
'bloom_season': ['Spring', 'Spring', 'Summer', 'Summer']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples (with plain tuple, no namedtuples)
for row in df.itertuples(name=None): # `name=None` ensures plain tuple instead of namedtuple
print(f"Index: {row[0]}, Color:{row[1]}, Bloom Season: {row[2]}")
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that using the indexing, we can access the items of the tuple. But the drawback is in plain tuples we do not know the attribute names.
Some other operations using itertuples()
We can perform some operations using itertuples. Some of them include filtering, calculation, grouping and creating dictionary of rows.
1. Filtering Rows
itertuples means iterating through the rows and generating namedtuples. In namedtuples we can consider any attribute and apply comparison operator and filter those rows or items from the namedtuples. Below is the example that illustrates the same.
Python
import pandas as pd
# Sample student data
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Marks': [85, 42, 78],
'Subject': ['Math', 'Science', 'English']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Filter rows with Marks greater than 75 using itertuples
print("Students scoring more than 75 marks:")
for row in df.itertuples(name='Pandas'):
if row.Marks > 75: # 'Marks' is at index 1 in the tuple
print(f"Student: {row.Student}, Subject: {row[2]}")
Output:
Pandas DataFrame itertuples() MethodHere we have filtered the rows based on marks using the comparison operator.
2. Performing Calculations
We can iterate over the rows and perform aggregate calculations as well. Here we will iterate over the dataframe and perform addition operation for each row.
Python
import pandas as pd
# Sample data with columns A and B
data = {
'A': [10, 20, 30, 40, 50],
'B': [5, 15, 25, 35, 45],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Perform addition using itertuples
print("Sum of A and B for each row:")
for row in df.itertuples(name='Pandas'): # Using plain tuples
sum_ab = row.A + row.B # 'A' is at index 0, 'B' is at index 1
print(f"Row {row.Index}: {sum_ab}")
Output:
Pandas DataFrame itertuples() Method3. Grouping based on specific column
We can also group data based on particular column without using groupby and perform aggregation operations like min, max, count and sum. Let us consider an example.
Python
import pandas as pd
# Sample data
data = {
'Group': ['A', 'B', 'A', 'B', 'A', 'C'],
'Value': [10, 20, 30, 40, 50, 60]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Group data by the 'Group' column and calculate the sum of 'Value'
grouped_data = {}
for row in df.itertuples(name=None): # Use plain tuples
group = row[1] # 'Group' column at index 1
value = row[2] # 'Value' column at index 2
# Aggregate the values by group
if group not in grouped_data:
grouped_data[group] = 0
grouped_data[group] += value
# Print the grouped results
for group, total in grouped_data.items():
print(f"Group: {group}, Total Value: {total}")
Output:
Pandas DataFrame itertuples() MethodIn this we are iterating and for each group we are calculating the sum of the values. If the group name is not present in dictionary, we are creating a key which is basically our group name and default value as 0. Then we are updating the values accordingly.
4. Creating Dictionary of rows
We can also create a dictionary of rows. This technique is useful when we need to store the rows in JSON format.
Python
import pandas as pd
# Sample DataFrame
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Marks': [85, 42, 78],
'Subject': ['Math', 'Science', 'English']
}
df = pd.DataFrame(data)
# Create a dictionary of rows
rows_dict = {}
for row in df.itertuples(): # Use plain tuples
key = row.Index # Use the 'Student' column as the key (index 0)
rows_dict[key] = {
'Marks': row.Marks, # 'Marks' column (index 1)
'Subject': row.Subject, # 'Subject' column (index 2)
}
# Print the resulting dictionary
for k,v in rows_dict.items():
print(k,v)
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that index is basically the key and the values comprises of different columns and its associated values. The structure is similar to the JSON format.