String Manipulations in Pandas DataFrame

Last Updated : 10 Dec, 2025

String manipulation refers to cleaning, transforming, and processing text data so it becomes suitable for analysis. Pandas provides a wide collection of .str functions that make it easy to work with string columns inside a DataFrame such as converting cases, trimming spaces, splitting, extracting patterns, replacing values, and more.

In this article, we will perform string manipulation using the dataset shown below:

Python

import pandas as pd
import numpy as np

data = { 'Name': ['Lukas', 'Sofia', 'Hiroshi', 'Marta', 'Yannis', np.nan, 'Elena'],
         'City': ['Berlin', 'Madrid', 'Tokyo', 'Warsaw', 'Athens', 'Oslo', 'Lisbon'] }

df = pd.DataFrame(data)
print(df)

Output

      Name    City
0    Lukas  Berlin
1    Sofia  Madrid
2  Hiroshi   Tokyo
3    Marta  Warsaw
4   Yannis  Athens
5      NaN    Oslo
6    Elena  Lisbon

Column Datatype in Pandas

Sometimes columns that appear like strings may internally be stored as other datatypes. To ensure consistent string operations, it is often useful to convert selected columns to the string dtype.

Below, we convert the entire DataFrame to string type using .astype('string').

Python

print(df.astype('string'))

This ensures every column supports Pandas' string functions without errors.

String Operations in Pandas

Below are the commonly used string manipulation methods in Pandas, explained with short examples.

1. lower(): This method converts every character in the column to lowercase, ensuring consistent text formatting.

Python

print(df['Name'].str.lower())

Output

0 lukas
1 sofia
2 hiroshi
3 marta
4 yannis
5 NaN
6 elena
Name: Name, dtype: object

2. upper(): This method transforms all characters in the column to uppercase for uniform, standardized text.

Python

print(df['Name'].str.upper())

Output

0 LUKAS
1 SOFIA
2 HIROSHI
3 MARTA
4 YANNIS
5 NaN
6 ELENA
Name: Name, dtype: object

3. strip(): This method removes unwanted leading and trailing spaces from each string to clean the data.

Python

print(df['Name'].str.strip())

Output

0 Lukas
1 Sofia
2 Hiroshi
3 Marta
4 Yannis
5 NaN
6 Elena
Name: Name, dtype: object

4. split(): This method splits each string into a list of parts based on a given separator.

Python

print(df['Name'].str.split('a'))

Output

0 [Luk, s]
1 [Sofi, ]
2 [Hiroshi]
3 [M, rt, ]
4 [Y, nnis]
5 NaN
6 [Elen, ]
Name: Name, dtype: object

5. len(): This method calculates and returns the character length of each string in the column.

Python

print(df['Name'].str.len())

Output

0 5.0
1 5.0
2 7.0
3 5.0
4 6.0
5 NaN
6 5.0
Name: Name, dtype: float64

6. cat(): This method concatenates all strings in the column into a single string using a chosen separator.

Python

print(df['Name'].str.cat(sep=', '))

Output

Lukas, Sofia, Hiroshi, Marta, Yannis, Elena

7. get_dummies(): This method converts each unique string into a separate one-hot encoded column for modeling.

Python

print(df['City'].str.get_dummies())

Output

Athens Berlin Lisbon Madrid Oslo Tokyo Warsaw
0 0 1 0 0 0 0 0
1 0 0 0 1 0 0 0
2 0 0 0 0 0 1 0
3 0 0 0 0 0 0 1
4 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0
6 0 0 1 0 0 0 0

8. startswith(): This method checks whether each string begins with the specified prefix.

Python

print(df['Names'].str.startswith('E'))

Output

0 False
1 False
2 False
3 False
4 False
5 NaN
6 True
Name: Name, dtype: object

9. endswith(): This method checks whether each string ends with the specified suffix.

Python

print(df['Names'].str.endswith('a'))

Output

0 False
1 True
2 False
3 True
4 False
5 NaN
6 True
Name: Name, dtype: object

10. replace(): This method replaces occurrences of a specific substring or pattern with a new value.

Python

print(df['Name'].str.replace('Elena', 'Emily'))

Output

0 Lukas
1 Sofia
2 Hiroshi
3 Marta
4 Yannis
5 NaN
6 Emily
Name: Name, dtype: object

11. repeat(): This method duplicates each string a given number of times.

Python

print(df['Name'].str.repeat(2))

Output

0 LukasLukas
1 SofiaSofia
2 HiroshiHiroshi
3 MartaMarta
4 YannisYannis
5 NaN
6 ElenaElena
Name: Name, dtype: object

12. count(): This method counts how many times a specific substring or pattern appears in each string.

Python

print(df['Name'].str.count('a'))

Output

0 1.0
1 1.0
2 0.0
3 2.0
4 1.0
5 NaN
6 1.0
Name: Name, dtype: float64

13. find(): This method returns the index of the first occurrence of a pattern within each string.

Python

print(df['Name'].str.find('a'))

Output

0 3.0
1 4.0
2 -1.0
3 1.0
4 1.0
5 NaN
6 4.0
Name: Name, dtype: float64

14. findall(): This method returns a list of all occurrences of a pattern found in each string.

Python

print(df['Name'].str.findall('a'))

Output

0 [a]
1 [a]
2 []
3 [a, a]
4 [a]
5 NaN
6 [a]
Name: Name, dtype: object

15. islower(): This method checks whether all characters in each string are lowercase.

Python

print(df['Name'].str.islower())

Output

0 False
1 False
2 False
3 False
4 False
5 NaN
6 False
Name: Name, dtype: object

16. isupper(): This method checks whether all characters in each string are uppercase.

Python

print(df['Name'].str.isupper())

Output

0 False
1 False
2 False
3 False
4 False
5 NaN
6 False
Name: Name, dtype: object

17. isnumeric(): This method checks whether each string contains only numeric characters.

Python

print(df['Name'].str.isnumeric())

Output

0 False
1 False
2 False
3 False
4 False
5 NaN
6 False
Name: Name, dtype: object

18. swapcase(): This method swaps uppercase letters to lowercase and lowercase letters to uppercase for each string.

Python

print(df['Name'].str.swapcase())

Output

0 lUKAS
1 sOFIA
2 hIROSHI
3 mARTA
4 yANNIS
5 NaN
6 eLENA
Name: Name, dtype: object

Comment

Article Tags:

Python pandas-dataFrame

Explore

Python Fundamentals

Python Data Structures

Advanced Python

Data Science with Python

Web Development with Python

Python Practice

Python Courses