Using dictionary to remap values in Pandas DataFrame columns
While working with data in Pandas, we often need to modify or transform values in specific columns. One common transformation is remapping values using a dictionary. This technique is useful when we need to replace categorical values with labels, abbreviations or numerical representations. In this article, we’ll explore different ways to remap values in a Pandas DataFrame using dictionary mapping.
Creating a sample pandas dataframe
Let’s start by creating a sample DataFrame that contains event details.
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost': [10000, 5000, 15000, 2000]})
print(df)
Output
Date Event Cost 0 10/2/2011 Music 10000 1 11/2/2011 Poetry 5000 2 12/2/2011 Theatre 15000 3 13/2/2011 Comedy 2000
Using replace() function
The replace() function in Pandas allows us to remap values using a dictionary. It works directly on a DataFrame column and modifies the values based on the provided mapping.
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost': [10000, 5000, 15000, 2000]})
# Define a dictionary for remapping values
d = {'Music': 'M', 'Poetry': 'P', 'Theatre': 'T', 'Comedy': 'C'}
# Remap the values using replace()
df['Event'] = df['Event'].replace(d)
print(df)
Output
Date Event Cost 0 10/2/2011 M 10000 1 11/2/2011 P 5000 2 12/2/2011 T 15000 3 13/2/2011 C 2000
Explanation:
- replace() searches for values in the column and replaces them based on the dictionary.
- This method works on DataFrame columns and can handle multiple column replacements.
Using map() function
Another way to remap values in a Pandas column is by using the map() function.
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost': [10000, 5000, 15000, 2000]})
# Define a dictionary for remapping values
d = {'Music': 'M', 'Poetry': 'P', 'Theatre': 'T', 'Comedy': 'C'}
# Remap the values using map()
df['Event'] = df['Event'].map(d)
print(df)
Output
Date Event Cost 0 10/2/2011 M 10000 1 11/2/2011 P 5000 2 12/2/2011 T 15000 3 13/2/2011 C 2000
Explanation:
- map() applies the dictionary mapping to each element in the column.
- Unlike replace(), map() only works on Series (single column).
Differences between replace() and map()
Feature | replace() | map() |
---|---|---|
Works on entire DataFrame? | Yes | No, only on Series (columns) |
Supports multiple column replacements? | Yes | No, works on a single column |
Handles missing keys? | Yes | No, returns NaN for missing keys |
Handling missing values during mapping
If the dictionary used in map() does not contain a key for some values in the column, those values will be replaced with NaN. To handle this, we can use the fillna() function.
import pandas as pd
df = pd.DataFrame({'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event': ['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost': [10000, 5000, 15000, 2000]})
# Define an incomplete dictionary
d = {'Music': 'M', 'Poetry': 'P'}
# Apply map() and handle missing values
df['Event'] = df['Event'].map(d).fillna('Unknown')
print(df)
Output
Date Event Cost 0 10/2/2011 M 10000 1 11/2/2011 P 5000 2 12/2/2011 Unknown 15000 3 13/2/2011 Unknown 2000
Explanation:
- map() function replaces only the values present in the dictionary.
- Missing values (i.e., Theatre and Comedy) become NaN.
- fillna(‘Unknown’) ensures that missing values are replaced with “Unknown” instead of NaN.