Match Pattern Over Multiple Lines in Python



Learning Python's Regular Expressions (Regex) may require you to match text that has multiple lines. This can happen when you want to read information from a file or scrape data from a website.

This chapter will show you how to match patterns across several lines using Python's re module, which allows you to work with regular expressions (regex).

What is a Regular Expression?

A regular expression is a group of characters that allows you to use a search pattern to find a string or a set of strings. RegEx is another name for regular expressions. Here is a simple overview of some common regular expression symbols -

  • .: It is used to match any character except a newline.

  • *: It is used to match zero or more occurrences of the preceding element.

  • +: It is used to match one or more occurrences of the preceding element.

  • ?: It is used to match zero or one occurrence of the preceding element.

  • []: It is used to match any single character from the set.

  • (): It is a group's patterns for applying quantifiers or for capturing.

Handling Multi-line Strings

Generally, in regular expression search '.' special character does not match newline characters. To fix this issue, 're' packages provide a few predefined flags that modify how the special characters behave. So, by using re.DOTALL flag, we can match patterns with multiple lines.

Example

In the following example code, we match the span of a paragraph by using a regular expression. We begin by importing the regular expression module.

Then, we have used search() function, which is imported from the re module. This re.search() function searches the string/paragraph for a match and returns a match object if there is a match. The group() method is used to return the part of the string that is matched.

import re
paragraph = \
'''
<p>
Tutorials point is a website.
It is a platform to enhance your skills.
</p>
'''
match = re.search(r'<p>.*</p>', paragraph, re.DOTALL)
print(match.group(0))

The following output is obtained on executing the above program -

<p>
Tutorials point is a website.
It is a platform to enhance your skills.
</p>

More Complex Patterns

You can create progressively complex patterns as needed. To find any lines that start with "It," are followed by any text, and end with a period, for example, you can use the following -

import re

multi_line_text = """Hello world.
It is a beautiful day.
This is a test.
It will work!"""

pattern = r'(^It.*\.)'

matches = re.findall(pattern, multi_line_text, re.M)
print("Matches found:", matches)

The following output is obtained on executing the above program -

Matches found: ['It is a beautiful day.']  

Using re.MULTILINE

This flag can be useful when you want ^ and $ to match the start and finish of each line, rather than just the full string.

import re

text = """Hello world.
It is a beautiful day.
It will work!"""

matches = re.findall(r"^It is.*", text, re.MULTILINE)
print(matches)

The following output is obtained on executing the above program -

['It is a beautiful day.']

Match a multiline HTML block

You can also use the re.DOTALL flag to match a multiline HTML block. For example, if you want to match a div block that contains multiple paragraphs.

In this example, the re.DOTALL flag is being used to match a div block that has many paragraphs. The pattern <div>.*</div> matches any content, including newlines, between the beginning and ending div tags.

import re

html = """<div>
<p>Hello</p>
<p>Welcome</p>
</div>"""

pattern = r"<div>.*</div>"
match = re.search(pattern, html, re.DOTALL)
print(match.group())

Following is the output of the above program -

<div>
<p>Hello</p>
<p>Welcome</p>
</div>
Updated on: 2025-04-24T13:09:14+05:30

13K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements