Regular Expression 01
Regular Expression 01
Regular
Expression
Introduction
• If you want to search 123 in ‘upes123python’ then how will you do?
3
Introduction
• If you want to search 123 in ‘upes123python’ then how will you do?
• ‘123’ in ‘upes123python’
• Now if you want to find index also
4
Introduction
• If you want to search 123 in ‘upes123python’ then how will you do?
• ‘123’ in ‘upes123python’
• Now if you want to find index also
• ‘upes123python’.find(‘123’)
• ‘upes123python’.index(‘123’)
• In the above examples, matching is done by character by character.
5
Introduction
• But rather than searching for a fixed substring like '123', suppose
you wanted to determine whether a string contains any three
consecutive decimal digit characters, as in the strings
‘upes123python', ' upes456python', ' upes789python234buzz', and '
upes123pythonbuzz678‘ then character by character comparison
will not solve our problem.
• This is where Regular expression is used.
• With regexes in Python, you can identify patterns in a string that
you wouldn’t be able to find with the in operator or with string
methods.
6
Introduction
• Regular expressions is a sequence of characters that forms a search
pattern.
• It can be used to check if a string contains the specified search
pattern or not.
• Python provides a built in module re which can be used to work
with regular expression.
• match=re.method_name(pattern, string)
• If the search is successful, search() returns a match object or None
object otherwise.
7
To implement regular expressions, the Python's re package can
be used. Import the Python's re package with the following
command:
import re
8
Raw strings
9
The difference between a normal string and a raw string is that the normal string in
print() function translates escape characters (such as \n, \t etc.) if any, while those in a
raw string are not. In the following example, \n inside str1 (normal string) has
translated as a newline being printed in the next line. But, it is printed as \n in str2 - a
raw string.
Example: String vs Raw String Copy
str1 = "Hello!\nHow are you?" Output
normal string: Hello!
print("normal string:", str1) How are you?
str2 = r"Hello!\nHow are you?" raw string: Hello!\nHow are you?
print("raw string:",str2)
10
meta characters
.^$*+?[]\|()
11
Pattern Description
[abc] match any of the characters a, b, or c
[a-c] which uses a range to express the same set of characters.
12
'\'is an escaping metacharacter followed by various characters to
signal various special sequences. If you need to match a [ or \,
you can precede them with a backslash to remove their special
meaning: \[ or \\.
13
Metacharacter Description
. (DOT) Matches any character except a newline.
^ (Caret) Matches pattern only at the start of the string.
$ (Dollar) Matches pattern at the end of the string
* (asterisk) Matches 0 or more repetitions of the regex.
+ (Plus) Match 1 or more repetitions of the regex.
? (Question mark) Match 0 or 1 repetition of the regex.
Used to indicate a set of characters. Matches any single character in brackets. For
[] (Square brackets)
example, [abc] will match either a, or, b, or c character
used to specify multiple patterns. For example, P1|P2, where P1 and P2 are two different
| (Pipe)
regexes.
Use to escape special characters or signals a special sequence. For example, If you are
\ (backslash)
searching for one of the special characters you can use a \ to escape them
[^...] Matches any single character not in brackets.
Matches whatever regular expression is inside the parentheses. For example, (abc) will
(...)
match to substring 'abc'
Example
The caret sign (^) serves two purposes. Here, in this figure, it’s checking for the string that doesn’t contain
upper case, lower case, digits, underscore and space in the strings. In short, we can say that it is simply
matching for special characters in the given string. If we use caret outside the square brackets, it will simply
check for the starting of the string.
15
You can also specify a range of characters using - inside square
brackets.
• [a-e] is the same as [abcde].
• [1-4] is the same as [1234].
• [0-9] is the same as [0123---9]
You can complement (invert) the character set by using caret ^
symbol at the start of a square-bracket.
• [^abc] means any character except a or b or c.
• [^0-9] means any non-digit character.
16
Other Special Sequences
There are some of the Special sequences that make commonly used patterns
easier to write. Below is a list of such special sequences:
17
re.match() function
This function in re module tries to find if the specified pattern is
present at the beginning of the given string.
re.match(pattern, string)
This function returns None if no match can be found. If they’re
successful, a match object instance is returned, containing
information about the match: where it starts and ends, the
substring it matched, etc.
18
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.match("Simple", string)
>>> obj
<_sre.SRE_Match object; span=(0, 6), match='Simple'>
>>> obj.start()
0
>>> obj.end()
6
The match object's start() method returns the starting position of pattern in the string,
and end() returns the endpoint.
19
re.search():
This function searches for first occurrence of RE pattern within string
from any position of the string but it only returns the first occurrence
of the search pattern.
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.search("is", string)
>>> obj.start()
7
>>> obj.end()
9
20
re.findall():
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.findall("ple", string)
>>> obj
['ple', 'ple']
21
To obtain list of all alphabetic characters from the string
22
To obtain list of words
23
re.split():
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.split(' ',string)
>>> obj
['Simple', 'is', 'better', 'than', 'complex.']
24
The string is split at each occurrence of a white space ' ' returning
list of slices, each corresponding to a word. Note that output is
similar to split() function of built-in str object.
25
re.sub():
This function returns a string by replacing a certain pattern by its substitute string.
Usage of this function is :
26
Example 1
Write a Python program that matches a string that has
an a followed by zero or more b's.
patterns = 'ab*‘
27
• ? The question mark indicates zero or one occurrences of the
preceding element. For example, colou?r matches both "color"
and "colour".
28
import re
def text_match(text):
patterns = 'ab*'
if re.search(patterns, text):
return 'Found a match!'
else:
return('Not matched!') Output:
Found a match!
print(text_match("ac")) Found a match!
print(text_match("abc")) Found a match!
print(text_match("abbc"))
29
Example 2
Write a Python program that matches a string that has an a
followed by one or more b's.
patterns = 'ab+‘
30
import re
def text_match(text):
patterns = 'ab+'
if re.search(patterns, text):
return 'Found a match!'
else:
return('Not matched!')
Sample Output:
print(text_match("ab"))
Found a match!
print(text_match("abc")) Found a match!
31
Example 3
Write a Python program that matches a string that has an a
followed by zero or one 'b'.
32
import re
def text_match(text):
patterns = 'ab?'
if re.search(patterns, text):
return 'Found a match!'
else:
return('Not matched!') Output:
Found a match!
print(text_match("ab")) Found a match!
print(text_match("abc")) Found a match!
Found a match!
print(text_match("abbc"))
print(text_match("aabbc"))
33
Example 4
Write a Python program that matches a string that has an a
followed by three 'b'.
patterns = 'ab{3}'
34
import re
def text_match(text):
patterns = 'ab{3}'
if re.search(patterns, text):
return 'Found a match!'
else:
return('Not matched!')
Output:
35
Example 5
Write a Python program that matches a string that has
an a followed by two to three 'b'.
36
import re
def text_match(text):
patterns = 'ab{2,3}'
if re.search(patterns, text):
return 'Found a match!'
Output:
else:
return('Not matched!') Not matched!
Found a match!
print(text_match("ab"))
print(text_match("aabbbbbc"))
37
Example 6
Write a Python program to find sequences of lowercase letters
joined with a underscore.
patterns = '^[a-z]+_[a-z]+$'
38
import re
def text_match(text):
patterns = '^[a-z]+_[a-z]+$'
if re.search(patterns, text):
return 'Found a match!'
else:
return('Not matched!') Output:
Found a match!
print(text_match("aab_cbbbc")) Not matched!
print(text_match("aab_Abbbc")) Not matched!
print(text_match("Aaab_abbbc"))
39
Example 7
Write a Python program to find the sequences of one upper case
letter followed by lower case letters.
40
import re
def text_match(text):
patterns = '[A-Z]+[a-z]+$'
if re.search(patterns, text):
return 'Found a match!'
else:
Output:
return('Not matched!')
print(text_match("AaBbGg")) Found a match!
print(text_match("Python")) Found a match!
print(text_match("python")) Not matched!
Not matched!
print(text_match("PYTHON")) Not matched!
print(text_match("aA")) Found a match!
print(text_match("Aa"))
41
Example 8
Write a Python program that matches a string that has an 'a'
followed by anything, ending in 'b'.
patterns = 'a.*b$'
42
import re
def text_match(text):
patterns = 'a.*b$'
if re.search(patterns, text):
return 'Found a match!'
else:
Output:
return('Not matched!')
Not matched!
print(text_match("aabbbbd")) Not matched!
Found a match!
print(text_match("aabAbbbc"))
print(text_match("accddbbjjjb"))
43
Example 9
Write a Python program that matches a string that has an 'a'
followed by anything, ending in digits.
44
import re
def text_match(text):
patterns = 'a.*\d$'
if re.search(patterns, text):
return 'Found a match!'
else:
return('Not matched!') Output:
45
Example 10
Write a Python program that matches a word containing 'z'.
patterns = '\w*z.\w*'
A word character is a character from a-z, A-Z, 0-9, including the _ (underscore) character.
46
import re
def text_match(text):
patterns = '\w*z.\w*'
if re.search(patterns, text):
return 'Found a match!' Output:
else: Found a match!
return('Not matched!') Not matched!
47