
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Search and Replace Text in Python
Problem
You want to search for and replace a text pattern in a string.
If we have a very simple literal patterns, using the str.replace() method is an optimal solution.
Example
def sample(): yield 'Is' yield 'USA' yield 'Colder' yield 'Than' yield 'Canada?' text = ' '.join(sample()) print(f"Output \n {text}")
Output
Is USA Colder Than Canada?
Let us first see how to search a text.
# search for exact text print(f"Output \n {text == 'USA'}")
Output
False
We can search for the text using the basic string methods, such as str.find(), str.endswith(), str.startswith().
# text start with print(f"Output \n {text.startswith('Is')}")
Output
True
# text ends with print(f"Output \n {text.startswith('Is')}")
Output
True
# search text with find print(f"Output \n {text.find('USA')}")
Output
3
If the input text to search is more complicated then we can use regular expressions and the re module.
# Let us create a date in string format date1 = '22/10/2020'
# Let us check if the text has more than 1 digit. # \d+ - match one or more digits import re if re.match(r'\d+/\d+/\d+', date1): print('yes') else: print('no') yes
Now, coming back to replacing a text. If the text and the string to replace is simple then use str.replace().
Output
print(f"Output \n {text.replace('USA', 'Australia')}")
Output
Is Australia Colder Than Canada?
If there are complicated patterns to search and replace then we can leverage the sub() methods in re module.
The first argument to sub() is the pattern to match and the second argument is the replacement pattern.
In the below example, we will find the date fields in dd/mm/yyyy and replace them in format - yyyy-dd-mm. Backslashed digits such as \3 refer to capture group numbers in the pattern
import re sentence = 'Date is 22/11/2020. Tommorow is 23/11/2020.' # sentence replaced_text = re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', sentence) print(f"Output \n {replaced_text}")
Output
Date is 2020-22-11. Tommorow is 2020-23-11.
Another way of doing is to compile the expression first to get better performance.
Output
pattern = re.compile(r'(\d+)/(\d+)/(\d+)') replaced_pattern = pattern.sub(r'\3-\1-\2', sentence) print(f"Output \n {replaced_pattern}")
Output
Date is 2020-22-11. Tommorow is 2020-23-11.
re.subn() will give us the number of substitutions made along with replacing the text.
Output
output, count = pattern.subn(r'\3-\1-\2', sentence) print(f"Output \n {output}")
Output
Date is 2020-22-11. Tommorow is 2020-23-11.
Output
print(f"Output \n {count}")
Output
2