
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Match Whitespace but Not Newlines Using Python Regular Expressions
In Python, regular expressions (regex) search and manipulate strings in various ways. If we need to match whitespace characters without including newline characters. This article will explore how to achieve this using Python's re module.
The following are the methods included to match whitespace but not newlines using Python regular expressions -
Using re.sub() Method
The re.sub() method offers an efficient way to replace whitespace characters (excluding newlines) within a string. This method takes the pattern to search for, the replacement string, and the target string as its arguments. The regex pattern [ \t]+ matches one or more occurrences of spaces or tabs.
Example
In the following example, we replaced multiple spaces and tabs with a single space, keeping newlines by using the regex pattern [ \t]+ and re.sub() method.
import re text = "This is a sample text.\nIt contains multiple spaces.\n\tAnd tabs too." normalized_text = re.sub(r'[ \t]+', ' ', text) print(normalized_text)
Following is the output of the above code ?
This is a sample text. It contains multiple spaces. And tabs too.
Using re.findall() Method
The re.findall() method is another useful function for matching patterns in strings. It returns a list of all occurrences of one or more spaces or tabs within the string.
For our purpose, we can use the same pattern [ \t]+ to find all whitespace sequences excluding newlines.
Example
In the following example, we will find all whitespace sequences in a string by using the re.findall() method and with the same pattern, regex [ \t]+ pattern.
import re text = "This is a sample text.\nIt contains multiple spaces." whitespace_matches = re.findall(r'[ \t]+', text) print(whitespace_matches)
Following is the output of the above code ?
[' ', ' ', ' ', ' ']
Using Positive Lookahead
Lookaheads check what's ahead in the text without including it in the match. The positive Lookahead (?=Y) finds X only if it's immediately followed by Y. The "Y" part isn't part of the matched text, it's just a condition. Example: X(?=Y) finds "X" when "XY" is present.
Where, negative Lookahead (?!Y) finds X only if it's NOT immediately followed by Y. Again, "Y" isn't part of the match, it's a condition that prevents a match. Example: X(?!Y) finds "X" when "X" is not followed by "Y".
Example
The following example demonstrates how to find and print spaces followed by any character other than a newline in a given text, by using the re.findall() method along with the pattern [ \t]+(?=\S).
import re text = "This is a test.\nNew line here.\tAnother line." matches = re.findall(r'[ \t]+(?=\S)', text) print(matches)
Following is the output of the above code -
[' ']