
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Strip Spaces, Tabs, and Newlines Using Python Regular Expression
Python regular expressions (regex) provide various ways to handle whitespaces, including spaces, tabs, and newline characters, which can be effectively stripped from strings using regex.
This article will explain how to split a string on newline characters using different regular expressions, following are the various methods to achieve the present task.
- Using re.split(r"[\n]", text)
- Splitting on One or More Newlines Using Quantifier [\n+]
- Splitting on Newlines with Whitespace Using re.split(r"\n\s*\n", text)
Using re.split(r"[\n]", text)
The re.split() function splits a string wherever the specified regular expression pattern matches. The pattern [\n] means stripping the string wherever a single newline character occurs. This is useful when we want to break the string into lines.
Example
Let's assume we have a multiple-line string and want to break it into a list of individual lines. By using the re.split() function along [\n] pattern, we can match every newline character and break the text at those points.
import re s = """I find Tutorialspoint useful""" result = re.split(r"[\n]", s) print(result)
Following is the output of the above code -
['I find', ' Tutorialspoint', ' useful']
Splitting on One or More Newlines Using Quantifier [\n+]
To split multiple newlines in a row, we have to use the pattern [\n+] along re.split() function. Here quantifier ( + ) means one or more occurrences of the preceding character or group in the given string.
Example
The following example demonstrates how to strip multiple new lines into a list of segments using the re.split() function along the [\n+] pattern.
import re s = """First paragraph. Second paragraph. Third paragraph.""" result = re.split(r"\n+", s) print(result)
Following is the output of the above code -
['First paragraph.', 'Second paragraph.', 'Third paragraph.']
Splitting on Newlines with Whitespace Using re.split(r"\n\s*\n", text)
The regular expression r"\n\s*\n" is used to strip a string into parts, specifically at points where there are newlines separating blocks of text that contain whitespaces. Following is the breakdown of these characters.
- \n: This matches a newline character.
- \s: To match a whitespace character (space, tab, newline, etc.). The '*' means "zero or more occurrences" of the preceding character or group.
- *\n: This matches another newline character.
Example
The following example demonstrates how to split on newlines containing whitespaces by using the (r"\n\s*\n") pattern.
import re s = """Line 1 Line 2 with blank space above Line 3""" result = re.split(r"\n\s*\n", s) print(result)
Following is the output of the above program -
['Line 1', ' Line 2 with blank space above', 'Line 3']