Python Regular Expression Syntax Explained Simply

Python Server Side Programming Programming

You will study regular expressions (RegEx) in this blog and interact with RegEx using Python's re-module (with the help of examples).

A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. For example,

^a...s$

A RegEx pattern is defined by the code above. Any five-letter string with an a and a s at the end forms the pattern.

Python has a module named re to work with RegEx. Here's an example ?

import re
pattern = '^a...s$'
test_string = 'abyss'
result = re.match(pattern, test_string)

if result:
   print("Search successful.")
else:
   print("Search unsuccessful.")

Different types of Syntax used for these operations

re.findall()

The re.findall() method returns a list of strings containing all matches.

Example

Program to extract numbers from a string

import re 
string = 'hello 12 hi 89. Howdy 34'
pattern = '\d+'
print("Entered String=",string)
result = re.findall(pattern, string)
print("The numbers in the above string",result)

Output

Entered String= hello 12 hi 89. Howdy 34
The numbers in the above string ['12', '89', '34']

If the pattern is not found, re.findall() returns an empty list.

re.search()

The re.search() method takes two arguments: a pattern and a string. The method looks for the first location where the RegEx pattern produces a match with the string.

If the search is successful, re.search() returns a match object; if not, it returns None.

String

match = re.search(pattern, str)

Example

import re
string = "Python is fun"
#check if 'Python' is at the beginning
match = re.search('\APython', string)
if match:
   print("pattern found inside the string")
else:
   print("pattern not found")

Output

pattern found inside the string

Here, match contains a match object.

re.subn()

The re.subn() is similar to re.sub() except it returns a tuple of 2 items containing the new string and the number of substitutions made.

Example

#Program to remove all whitespaces
import re
#multiline string
string = 'abc 12\
de 23 \n f45 6'
print("Orginal String =",string)
#matches all whitespace characters
pattern = '\s+'
#empty string
replace = ''
new_string = re.subn(pattern, replace, string)
print("New String=",new_string)

Output

Orginal String = abc 12de 23 
 f45 6
New String= ('abc12de23f456', 4)

re.split()

re.split delivers a list of strings where the splits have taken place after splitting the string where a match exists.

Example

import re
string = 'Twelve:12 Eighty nine:89.'
pattern = '\d+'
result = re.split(pattern, string)
print(result)

Output

['Twelve:', ' Eighty nine:', '.']

If the pattern is not found, re.split() returns a list containing the original string.

You can pass maxsplit argument to the re.split() method. It's the maximum number of splits that will occur.

Example

import re

string = 'Twelve:12 Eighty nine:89 Nine:9.'
pattern = '\d+'

//maxsplit = 1
//split only at the first occurrence
result = re.split(pattern, string, 1)
print(result)

Output

['Twelve:', ' Eighty nine:89 Nine:9.']

By the way, the default value of maxsplit is 0; meaning all possible splits.

re.sub()

The syntax of re.sub() is ?

re.sub(pattern, replace, string)

The method returns a string where matched occurrences are replaced with the content of replace variable.

Example

#Program to remove all whitespaces
import re
#multiline string
string = 'abc 12\ de 23 \n f45 6'
#matches all whitespace characters
pattern = '\s+'
#empty string
replace = ''
new_string = re.sub(pattern, replace, string)
print(new_string)

Output

abc12\de23f456

If the pattern is not found, re.sub() returns the original string.

You can pass count as a fourth parameter to the re.sub() method. If omited, it results to 0. This will replace all occurrences.

Example

import re
#multiline string
string = "abc 12\
de 23 \n f45 6"
#matches all whitespace characters
pattern = '\s+'
replace = ''
new_string = re.sub(r'\s+', replace, string, 1)
print(new_string)

Output

abc12de 23 
 f45 6

Match object

You can get methods and attributes of a match object using dir() function.

Some of the commonly used methods and attributes of match objects are ?

match.group()

The group() method returns the part of the string where there is a match.

Example

import re
string = '39801 356, 2102 1111'
#Three digit number followed by space followed by two digit number
pattern = '(\d{3}) (\d{2})'
#match variable contains a Match object.
match = re.search(pattern, string)
if match:
   print(match.group())
else:
   print("pattern not found")

Output

801 35

Here, match variable contains a match object.

Our pattern (\d{3}) (\d{2}) has two subgroups (\d{3}) and (\d{2}). You can get the part of the string of these parenthesized subgroups. Here's how ?

>>> match.group(1)
'801'

>>> match.group(2)
'35'

>>> match.group(1, 2)
('801', '35')

>>> match.groups()
('801', '35')

match.start(), match.end() and match.span()

The start() function returns the index of the start of the matched substring. Similarly, end() returns the end index of the matched substring.

>>> match.start()
2

>>> match.end()
8

The span() function returns a tuple containing start and end index of the matched part.

>>> match.span()
(2, 8)

match.re and match.string

The re attribute of a matched object returns a regular expression object. Similarly, string attribute returns the passed string.

>>> match.re
re.compile('(\d{3}) (\d{2})')

>>> match.string
'39801 356, 2102 1111'

We have covered all commonly used methods defined in the re module. If you want to learn more, visit Python 3 re module.

Using r prefix before RegEx

When r or R prefix is used before a regular expression, it means raw string. For example, '\n' is a new line whereas r'\n' means two characters: a backslash \ followed by n.

Backlash \ is used to escape various characters including all metacharacters. However, using r prefix makes \ treat as a normal character.

Example

import re
string = '\n and \r are escape sequences.'
result = re.findall(r'[\n\r]', string)
print(result)

Output

['\n', '\r']

Conclusion

Therefore, these are the most fundamental and crucial Regular expressions concepts that we have attempted to explain using some engaging examples. Some of them were made up, but most were real problems we encountered while cleaning up our data, so in the future, if you run into a problem, just review the examples again; you may find the solution there.

Md Waqar Tabish

Updated on: 2023-04-04T12:27:42+05:30

73 Views

Kickstart Your Career

Get certified by completing the course

Get Started