PYTHON APPLICATION
PROGRAMMING -18EC646
MODULE-3
REGULAR EXPRESSIONS
PROF. KRISHNANANDA L
DEPARTMEN T OF ECE
GSKSJTI, BENGALURU
WHAT IS MEANT BY
REGULAR EXPRESSION?
We have seen string/file slicing, searching, parsing etc and
built-in methods like split, find etc.
This task of searching and extracting finds applications in
Email classification, Web searching etc.
Python has a very powerful library called regularexpressions
that handles many of these tasks quite elegantly
Regular expressions are like small but powerful programming
language, for matching text patterns and provide a
standardized way of searching, replacing, and parsing text
with complex patterns of characters.
Regular expressions can be defined as the sequence of
characters which are used to search for a pattern in a string.
2
FEATURES OF REGEX
Hundreds of lines of code could be reduced to few lines with regular
expressions
Used to construct compilers, interpreters and text editors
Used to search and match text patterns
The power of the regular expressions comes when we add special
characters to the search string that allow us to do sophisticated
matching and extraction with very little code.
Used to validate text data formats especially input data
ARegular Expression (or Regex) is a pattern (or filter) that describes
a set of strings that matches the pattern. A regex consists of a
sequence of characters, metacharacters (such as . , d , ?, W etc ) and
operators (such as + , * , ? , | , ^ ).
Popular programming languages like Python, Perl, JavaScript, Ruby,
Tcl, C# etc have Regex capabilities 3
GENERAL USES OF REGULAR
EXPRESSIONS
Search a string (search and match)
Replace parts of a string(sub)
Break string into small pieces(split)
Finding a string (findall)
The module re provides the support to use regex in the
python program. The re module throws an exception if there
is some error while using the regular expression.
Before using the regular expressions in program, we have to
import the library using “import re”
4
REGEX FUNCTIONS
The re module offers a set of functions
FUNCTION DESCRIPTION
findall Returns a list containing all matches of a pattern in
the string
search Returns a match Object if there is a match
anywhere in the string
split Returns a list where the string has been split at each
match
sub Replaces one or more matches in a string
(substitute with another string)
match This method matches the regex pattern in the string
with the optional flag. It returns true if a match is
found in the string, otherwise it returns false.
5
EXAMPLE PROGRAM
• We open the file, loop through
each line, and use the regular
expression search() to only print
out lines that contain the string
“hello”. (same can be done using
“line.find()” also)
# Search for lines that contain ‘hello'
import re
fp = open('d:/18ec646/demo1.txt')
for line in fp:
line = line.rstrip()
if re.search('hello', line):
print(line)
Output:
hello and welcome to python class
hello how are you?
# Search for lines that contain ‘hello'
import re
fp = open('d:/18ec646/demo2.txt')
for line in fp:
line = line.rstrip()
if re.search('hello', line):
print(line)
Output:
friends,hello and welcome
hello,goodmorning 6
EXAMPLE PROGRAM
• To get the optimum performance from Regex, we need to use special
characters called ‘metacharacters’
# Search for lines that starts with 'hello'
import re
fp = open('d:/18ec646/demo1.txt')
for line in fp:
line = line.rstrip()
if re.search('^hello', line): ## note 'caret' metacharacter
print(line) ## before hello
Output:
hello and welcome to python class
hello how are you?
# Search for lines that starts with 'hello'
import re
fp = open('d:/18ec646/demo2.txt')
for line in fp:
line = line.rstrip()
if re.search('^hello', line): ## note 'caret' metacharacter
print(line) ## before hello
Output:
hello, goodmorning
7
METACHARACTERS
Metacharacters are characters that are interpreted in a
special way by a RegEx engine.
Metacharacters are very helpful for parsing/extraction
from the given file/string
Metacharacters allow us to build more powerful regular
expressions.
Table-1 provides a summary of metacharacters and their
meaning in RegEx
Here's a list of metacharacters:
[ ] . ^ $ * + ? { } ( )  |
8
Metacharacter Description Example
[ ] It represents the set of characters. "[a-z]"
 It represents the special sequence (can also be
used to escape special characters)
"r"
. It signals that any character is present at some
specific place (except newline character)
"Ja...v."
^ It represents the pattern present at the beginning
of the string (indicates “startswith”)
"^python"
$ It represents the pattern present at the end of the
string. (indicates “endswith”)
"world"
* It represents zero or more occurrences of a
pattern in the string.
"hello*"
+ It represents one or more occurrences of a
pattern in the string.
"hello+"
{} The specified number of occurrences of a pattern
the string.
“hello{2}"
| It represents either this or the other character is
present.
"hello|hi"
() Capture and group
9
[ ] - SQUARE BRACKETS
• Square brackets specifies a set of characters you wish to match.
• A set is a group of characters given inside a pair of square brackets. It represents
the special meaning.
10
[abc] Returns a match if the string contains any of the specified
characters in the set.
[a-n] Returns a match if the string contains any of the characters between a to
n.
[^arn] Returns a match if the string contains the characters except a, r, and n.
[0123] Returns a match if the string contains any of the specified digits.
[0-9] Returns a match if the string contains any digit between 0 and 9.
[0-5][0-9] Returns a match if the string contains any digit between 00 and 59.
[a-zA-Z] Returns a match if the string contains any alphabet (lower-case or upper-
case).
CONTD..
### illustrating square brackets
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("[w]", line):
print(line)
## search all the lines where w is
present and display
Output:
Hello and welcome
@abhishek,how are you
### illustrating square brackets
import re
fh = open('d:/18ec646/demo3.txt')
for line in fh:
line = line.rstrip()
if re.search("[ge]", line):
print(line)
### Search for characters g or e or
both and display
Output:
Hello and welcome
This is Bangalore
11
CONTD…
### illustrating square brackets
import re
fh = open('d:/18ec646/demo3.txt')
for line in fh:
line = line.rstrip()
if re.search("[th]", line):
print(line)
Ouput:
This is Bangalore
This is Paris
This is London
import re
fh = open('d:/18ec646/demo7.txt')
for line in fh:
line = line.rstrip()
if re.search("[y]", line):
print(line) Ouput:
johny johny yes papa
open your mouth
### illustratingsquare brackets
import re
fh =
open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("[x-z]", line):
print(line)
Output:
to:abhishek@yahoo.com
@abhishek,how are you
12
. PERIOD (DOT)
A period matches any single character (except newline 'n‘)
Expression String Matched?
..
(any two
characters)
a No match
ac 1 match
acd 1 match
acde
2 matches
(contains 4
characters)
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("y.", line):
print(line)
Output:
to: abhishek@yahoo.com
@abhishek,how are you
13
CONTD..
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo3.txt')
for line in fh:
line = line.rstrip()
if re.search("P.", line):
print(line)
Output:
This is Paris
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("T..s", line):
print(line)
Output:
This is London
These are beautiful flowers
Thus we see the great London bridge
### illustrating dot metacharacter
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("L..d", line):
print(line)
Output:
This is London
Thus we see the great London bridge
## any two characters betweenT and s
14
^ - CARET
The caret symbol ^ is used to check if a string starts with a certain
character
Expression String Matched?
^a
a 1 match
abc 1 match
bac No match
^ab
abc 1 match
acb No match (starts with a but not followedby b)
### illustrating caret
import re
fh = open('d:/18ec646/demo2.txt')
for line in fh:
line = line.rstrip()
if re.search("^h",line):
print(line) Output:
hello, goodmorning
### illustrating caret
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("^f", line):
print(line)
from:krishna.sksj@gmail.com
15
$ - DOLLAR
The dollar symbol $ is used to check if a string ends with a certain
character.
Expression String Matched?
a$
a 1 match
formula 1 match
cab No match
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("m$", line):
print(line)
Output:
from:krishna.sksj@gmail.com
to: abhishek@yahoo.com
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo7.txt')
for line in fh:
line = line.rstrip()
if re.search("papa$", line):
print(line)
Output:
johny johny yes papa
eating sugar no papa
16
* - STAR
The star symbol * matches zero or more occurrences of the pattern left
to it.
Expression String Matched?
ma*n
mn 1 match
man 1 match
maaan 1 match
main No match (a is not followedby n)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("London*",line):
print(line)
Output:
This is London
Thus we see the great London bridge
17
+ - PLUS
The plus symbol + matchesone or more occurrences of the pattern left
to it.
Expression String Matched?
ma+n
mn No match (no a character)
man 1 match
maaan 1 match
main No match (a is not followedby n)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("see+", line):
print(line)
Output:
Thus we see the great London bridge
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo6.txt')
for line in fh:
line = line.rstrip()
if re.search("ar+", line):
print(line)
Output:
These are beautiful flowers
18
? - QUESTION MARK
The question mark symbol ? matches zero or one occurrence of the pattern left to
it.
Expression String Matched?
ma?n
mn 1 match
man 1 match
maaan No match (more than one a character)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("@gmail?", line):
print(line)
Output:
from:krishna.sksj@gmail.com
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("you?",line):
print(line)
Output:
@abhishek,how are you
19
{} - BRACES
Finds the specified number of occurrences of a pattern. Consider {n, m}. This
means at least n, and at most m repetitions of the pattern left to it.
If a{2} was given, a should be repeated exactly twice
Expression String Matched?
a{2,3}
abc dat No match
abc daat 1 match (at daat)
aabc daaat 2 matches (at aabc and daaat)
aabc daaaat 2 matches (at aabc and daaaat)
20
| - ALTERNATION
Vertical bar | is used for alternation (or operator).
Expression String Matched?
a|b
cde No match
ade 1 match (match at ade)
acdbea 3 matches (at acdbea)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo7.txt')
for line in fh:
line = line.rstrip()
if re.search("yes|no", line):
print(line)
Output:
johny johny yes papa
eating sugar no papa
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo2.txt')
for line in fh:
line = line.rstrip()
if re.search("hello|how", line):
print(line)
Output:
friends,hello and welcome
hello,goodmorning
21
() - GROUP
Parentheses () is used to group sub-patterns.
For ex, (a|b|c)xz match any string that matches
either a or b or c followed by xz
Expression String Matched?
(a|b|c)xz
ab xz No match
abxz 1 match (match at abxz)
axz cabxz 2 matches (at axzbc cabxz)
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo5.txt')
for line in fh:
line = line.rstrip()
if re.search("(hello|how) are", line):
print(line)
Output:@abhishek,how are you
### illustrating metacharacters
import re
fh = open('d:/18ec646/demo2.txt')
for line in fh:
line = line.rstrip()
if re.search("(hello and)", line):
print(line)
Ouptut:
friends,hello and welcome
22
- BACKSLASH
Backlash  is used to escape various characters including all
metacharacters.
For ex, $a match if a string contains $ followed by a.
Here, $ is not interpreted by a RegEx engine in a special way.
If you are unsure if a character has special meaning or not, you
can put  in front of it. This makes sure the character is not treated
in a special way.
NOTE :- Another way of doing it is putting the special
character in the square brackets [ ]
23
SPECIAL SEQUENCES
A special sequence is a  followed by one of the characters
(see Table) and has a special meaning
Special sequences make commonly used patterns easier to
write.
24
SPECIAL SEQUENCES
Character Description Example
A It returns a match if the specified characters are
present at the beginning of the string.
"AThe"
b It returns a match if the specified characters are
present at the beginning or the end of the string.
r"bain"
r"ainb"
B It returns a match if the specified characters are
present at the beginning of the string but not at the
end.
r"Bain"
r"ainB
d It returns a match if the string contains digits [0-9]. "d"
D It returns a match if the string doesn't contain the
digits [0-9].
"D"
s It returns a match if the string contains any white
space character.
"s"
S It returns a match if the string doesn't contain any
white space character.
"S"
w It returns a match if the string contains any word
characters (Ato Z, a to z, 0 to 9 and underscore)
"w"
W It returns a match if the string doesn't contain any
word characters
"W" 25
A - Matches if the specified characters are at the start of a string.
Expression String Matched?
Athe
the sun Match
In the sun No match
26
b - Matches if the specified characters are at the beginning or end of a word
Expression String Matched?
bfoo
football Match
a football Match
afootball No match
foob
football No Match
the afoo test Match
the afootest No match
B - Opposite of b. Matches if the specified characters
are not at the beginning or end of a word.
Expression String Matched?
Bfoo
football No match
a football No match
afootball Match
fooB
the foo No match
the afoo test No match
the afootest Match
27
d - Matches any decimal digit. Equivalent to [0-9]
D - Matches any non-decimal digit. Equivalent to [^0-9]
Expression String Matched?
d
12abc3 3 matches (at 12abc3)
Python No match
Expression String Matched?
D
1ab34"50 3 matches (at 1ab34"50)
1345 No match
28
s - Matches where a string contains any whitespace
character. Equivalent to [ tnrfv].
S - Matches where a string contains any non-whitespace
character. Equivalent to [^ tnrfv].
Expression String Matched?
s
Python RegEx 1 match
PythonRegEx No match
Expression String Matched?
S
a b 2 matches (at a b)
No match
29
w - Matches any alphanumeric character. Equivalent to [a-zA-Z0-
9_]. Underscore is also considered an alphanumeric character
W - Matches any non-alphanumeric character. Equivalent
to [^a-zA-Z0-9_]
Expression String Matched?
w
12&":;c 3 matches (at 12&":;c)
%"> ! No match
Expression String Matched?
W
1a2%c 1 match (at 1a2%c)
Python No match
30
Z - Matches if the specified characters are at the end of a
string.
Expression String Matched?
PythonZ
I like Python 1 match
I like Python
Programming
No match
Python is fun. No match
31
# check whether the specified
#characters are at the end of string
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
if re.findall ("comZ", x):
print(x)
Output:
from:krishna.sksj@gmail.com
to: abhishek@yahoo.com
REGEX FUNCTIONS
The re module offers a set of functions
FUNCTION DESCRIPTION
findall Returns a list containing all matches of a pattern in
the string
search Returns a match Object if there is a match
anywhere in the string
split Returns a list where the string has been split at each
match
sub Replaces one or more matches in a string
(substitute with another string)
match This method matches the regex pattern in the string
with the optional flag. It returns true if a match is
found in the string, otherwise it returns false.
32
THE FINDALL() FUNCTION
The findall() function returns a list containing all matches.
The list contains the matches in the order they are found.
If no matches are found, an empty list is returned
Here is the syntax for this function −
re. findall(pattern, string, flags=0)
33
import re
str ="How are you. How is everything?"
matches= re.findall("How",str)
print(matches)
['How','How']
EXAMPLES Contd..
OUTPUTS:
34
CONTD..
35
#check whether string starts with How
import re
str ="How are you. How is everything?"
x= re.findall("^How",str)
print (str)
print(x)
if x:
print ("string starts with 'How' ")
else:
print ("string does not start with 'How'")
Output:
How are you.How is everything?
['How']
string starts with 'How'
CONTD…
36
# match all lines that starts with 'hello'
import re
fp = open('d:/18ec646/demo1.txt')
for x in fp:
x = x.rstrip()
if re.findall ('^hello',x): ## note 'caret'
print(x)
Output:
hello and welcome to python class
hello how are you?
# match all lines that starts with ‘@'
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
if re.findall ('^@',x): ## note 'caret'
metacharacter
print(x)
Output:
@abhishek,how are you
# check whether the string contains
## non-digit characters
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
if re.findall ("D", x): ## special sequence
print(x)
from:krishna.sksj@gmail.com
to:abhishek@yahoo.com
Hello and welcome
@abhishek,how are you
THE SEARCH() FUNCTION
The search() function searches the string for a match, and
returns a Match object if there is a match.
If there is more than one match, only the first occurrence
of the match will be returned
If no matches are found, the value None is returned
Here is the syntax for this function −
re.search(pattern, string, flags=0)
37
EXAPLES on search() function:-
outputs:
38
THE SPLIT() FUNCTION
The re.split method splits the string where there is a match
and returns a list of strings where the splits have occurred.
You can pass maxsplit argument to the re.split() method. It's
the maximum number of splits that will occur.
If the pattern is not found, re.split() returns a list containing
the original string.
Here is the syntax for this function −
re.split(pattern, string, maxsplit=0, flags=0)
39
EXAPLES on split() function:-
40
# split function
import re
fp = open('d:/18ec646/demo5.txt')
for x in fp:
x = x.rstrip()
x= re.split("@",x)
print(x)
Output:
['from:krishna.sksj','gmail.com']
['to: abhishek','yahoo.com']
['Hello and welcome']
['','abhishek,how are you']
CONTD..
41
# split function
import re
fp =
open('d:/18ec646/demo7.txt')
for x in fp:
x = x.rstrip()
x= re.split("e",x)
print(x)
Output:
['johny johny y','s papa']
['', 'ating sugar no papa']
['t','lling li', 's']
['op','n your mouth']
Output:
['johny johny yes ', '']
['eating sugar no ','']
['telling lies']
['open your mouth']
# split function
import re
fp =
open('d:/18ec646/demo7.txt')
for x in fp:
x = x.rstrip()
x= re.split("papa",x)
print(x)
# split function
import re
fp =
open('d:/18ec646/demo3.txt')
for x in fp:
x = x.rstrip()
x= re.split("is",x)
print(x)
Output:
['Hello and welcome']
['Th',' ',' Bangalore']
['Th',' ',' Par','']
['Th',' ',' London']
THE SUB() FUNCTION
The sub() function replaces the matches with the text of your
choice
You can control the number of replacements by specifying
the count parameter
If the pattern is not found, re.sub() returns the original string
Here is the syntax for this function −
re.sub(pattern, repl, string, count=0, flags=0)
42
EXAPLES on sub() function:-
43
### illustration of substitute (replace)
import re
str ="How are you.How is everything?"
x= re.sub("How","where",str)
print(x)
Output:
where are you.where is everything?
# sub function
import re
fp = open('d:/18ec646/demo3.txt')
for x in fp:
x = x.rstrip()
x= re.sub("This","Where",x)
print(x)
Output:
Hello and welcome
Where is Bangalore
Where is Paris
Where is London
THE MATCH() FUNCTION
If zero or more characters at the beginning of string match
this regular expression, return a corresponding match object.
Return None if the string does not match the pattern.
Here is the syntax for this function −
Pattern.match(string[, pos[, endpos]])
The optional pos and endpos parameters have the same
meaning as for the search() method.
44
search() Vs match()
Python offers two different primitive operations based on
regular expressions:
 re.match() checksfor a match only at the beginning of the string,
while re.search() checks for a match anywhere in the string
Eg:-
45
# match function
import re
fp = open('d:/18ec646/demo3.txt')
for x in fp:
x = x.rstrip()
if re.match("This",x):
print(x)
Outptut:
This is Bangalore
This is Paris
This is London
MATCH OBJECT
A Match Object is an object containing information about the
search and the result
If there is no match, the value None will be returned, instead
of the Match Object
Some of the commonly used methods and attributes of match
objects are:
match.group(), match.start(), match.end(), match.span(),
match.string
46
match.group()
The group() method returns the part of the string where
there is a match
match.start(), match.end()
The start() function returns the index of the start of the
matched substring.
 Similarly, end() returns the end index of the matched
substring.
match.string
string attribute returns the passed string.
47
match.span()
The span() function returns a tuple containing start
and end index of the matched part.
Eg:-
OUTPUT:
(12,17)
48

Python regular expressions

  • 1.
    PYTHON APPLICATION PROGRAMMING -18EC646 MODULE-3 REGULAREXPRESSIONS PROF. KRISHNANANDA L DEPARTMEN T OF ECE GSKSJTI, BENGALURU
  • 2.
    WHAT IS MEANTBY REGULAR EXPRESSION? We have seen string/file slicing, searching, parsing etc and built-in methods like split, find etc. This task of searching and extracting finds applications in Email classification, Web searching etc. Python has a very powerful library called regularexpressions that handles many of these tasks quite elegantly Regular expressions are like small but powerful programming language, for matching text patterns and provide a standardized way of searching, replacing, and parsing text with complex patterns of characters. Regular expressions can be defined as the sequence of characters which are used to search for a pattern in a string. 2
  • 3.
    FEATURES OF REGEX Hundredsof lines of code could be reduced to few lines with regular expressions Used to construct compilers, interpreters and text editors Used to search and match text patterns The power of the regular expressions comes when we add special characters to the search string that allow us to do sophisticated matching and extraction with very little code. Used to validate text data formats especially input data ARegular Expression (or Regex) is a pattern (or filter) that describes a set of strings that matches the pattern. A regex consists of a sequence of characters, metacharacters (such as . , d , ?, W etc ) and operators (such as + , * , ? , | , ^ ). Popular programming languages like Python, Perl, JavaScript, Ruby, Tcl, C# etc have Regex capabilities 3
  • 4.
    GENERAL USES OFREGULAR EXPRESSIONS Search a string (search and match) Replace parts of a string(sub) Break string into small pieces(split) Finding a string (findall) The module re provides the support to use regex in the python program. The re module throws an exception if there is some error while using the regular expression. Before using the regular expressions in program, we have to import the library using “import re” 4
  • 5.
    REGEX FUNCTIONS The remodule offers a set of functions FUNCTION DESCRIPTION findall Returns a list containing all matches of a pattern in the string search Returns a match Object if there is a match anywhere in the string split Returns a list where the string has been split at each match sub Replaces one or more matches in a string (substitute with another string) match This method matches the regex pattern in the string with the optional flag. It returns true if a match is found in the string, otherwise it returns false. 5
  • 6.
    EXAMPLE PROGRAM • Weopen the file, loop through each line, and use the regular expression search() to only print out lines that contain the string “hello”. (same can be done using “line.find()” also) # Search for lines that contain ‘hello' import re fp = open('d:/18ec646/demo1.txt') for line in fp: line = line.rstrip() if re.search('hello', line): print(line) Output: hello and welcome to python class hello how are you? # Search for lines that contain ‘hello' import re fp = open('d:/18ec646/demo2.txt') for line in fp: line = line.rstrip() if re.search('hello', line): print(line) Output: friends,hello and welcome hello,goodmorning 6
  • 7.
    EXAMPLE PROGRAM • Toget the optimum performance from Regex, we need to use special characters called ‘metacharacters’ # Search for lines that starts with 'hello' import re fp = open('d:/18ec646/demo1.txt') for line in fp: line = line.rstrip() if re.search('^hello', line): ## note 'caret' metacharacter print(line) ## before hello Output: hello and welcome to python class hello how are you? # Search for lines that starts with 'hello' import re fp = open('d:/18ec646/demo2.txt') for line in fp: line = line.rstrip() if re.search('^hello', line): ## note 'caret' metacharacter print(line) ## before hello Output: hello, goodmorning 7
  • 8.
    METACHARACTERS Metacharacters are charactersthat are interpreted in a special way by a RegEx engine. Metacharacters are very helpful for parsing/extraction from the given file/string Metacharacters allow us to build more powerful regular expressions. Table-1 provides a summary of metacharacters and their meaning in RegEx Here's a list of metacharacters: [ ] . ^ $ * + ? { } ( ) | 8
  • 9.
    Metacharacter Description Example [] It represents the set of characters. "[a-z]" It represents the special sequence (can also be used to escape special characters) "r" . It signals that any character is present at some specific place (except newline character) "Ja...v." ^ It represents the pattern present at the beginning of the string (indicates “startswith”) "^python" $ It represents the pattern present at the end of the string. (indicates “endswith”) "world" * It represents zero or more occurrences of a pattern in the string. "hello*" + It represents one or more occurrences of a pattern in the string. "hello+" {} The specified number of occurrences of a pattern the string. “hello{2}" | It represents either this or the other character is present. "hello|hi" () Capture and group 9
  • 10.
    [ ] -SQUARE BRACKETS • Square brackets specifies a set of characters you wish to match. • A set is a group of characters given inside a pair of square brackets. It represents the special meaning. 10 [abc] Returns a match if the string contains any of the specified characters in the set. [a-n] Returns a match if the string contains any of the characters between a to n. [^arn] Returns a match if the string contains the characters except a, r, and n. [0123] Returns a match if the string contains any of the specified digits. [0-9] Returns a match if the string contains any digit between 0 and 9. [0-5][0-9] Returns a match if the string contains any digit between 00 and 59. [a-zA-Z] Returns a match if the string contains any alphabet (lower-case or upper- case).
  • 11.
    CONTD.. ### illustrating squarebrackets import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("[w]", line): print(line) ## search all the lines where w is present and display Output: Hello and welcome @abhishek,how are you ### illustrating square brackets import re fh = open('d:/18ec646/demo3.txt') for line in fh: line = line.rstrip() if re.search("[ge]", line): print(line) ### Search for characters g or e or both and display Output: Hello and welcome This is Bangalore 11
  • 12.
    CONTD… ### illustrating squarebrackets import re fh = open('d:/18ec646/demo3.txt') for line in fh: line = line.rstrip() if re.search("[th]", line): print(line) Ouput: This is Bangalore This is Paris This is London import re fh = open('d:/18ec646/demo7.txt') for line in fh: line = line.rstrip() if re.search("[y]", line): print(line) Ouput: johny johny yes papa open your mouth ### illustratingsquare brackets import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("[x-z]", line): print(line) Output: to:[email protected] @abhishek,how are you 12
  • 13.
    . PERIOD (DOT) Aperiod matches any single character (except newline 'n‘) Expression String Matched? .. (any two characters) a No match ac 1 match acd 1 match acde 2 matches (contains 4 characters) ### illustrating dot metacharacter import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("y.", line): print(line) Output: to: [email protected] @abhishek,how are you 13
  • 14.
    CONTD.. ### illustrating dotmetacharacter import re fh = open('d:/18ec646/demo3.txt') for line in fh: line = line.rstrip() if re.search("P.", line): print(line) Output: This is Paris ### illustrating dot metacharacter import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("T..s", line): print(line) Output: This is London These are beautiful flowers Thus we see the great London bridge ### illustrating dot metacharacter import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("L..d", line): print(line) Output: This is London Thus we see the great London bridge ## any two characters betweenT and s 14
  • 15.
    ^ - CARET Thecaret symbol ^ is used to check if a string starts with a certain character Expression String Matched? ^a a 1 match abc 1 match bac No match ^ab abc 1 match acb No match (starts with a but not followedby b) ### illustrating caret import re fh = open('d:/18ec646/demo2.txt') for line in fh: line = line.rstrip() if re.search("^h",line): print(line) Output: hello, goodmorning ### illustrating caret import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("^f", line): print(line) from:[email protected] 15
  • 16.
    $ - DOLLAR Thedollar symbol $ is used to check if a string ends with a certain character. Expression String Matched? a$ a 1 match formula 1 match cab No match ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("m$", line): print(line) Output: from:[email protected] to: [email protected] ### illustrating metacharacters import re fh = open('d:/18ec646/demo7.txt') for line in fh: line = line.rstrip() if re.search("papa$", line): print(line) Output: johny johny yes papa eating sugar no papa 16
  • 17.
    * - STAR Thestar symbol * matches zero or more occurrences of the pattern left to it. Expression String Matched? ma*n mn 1 match man 1 match maaan 1 match main No match (a is not followedby n) ### illustrating metacharacters import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("London*",line): print(line) Output: This is London Thus we see the great London bridge 17
  • 18.
    + - PLUS Theplus symbol + matchesone or more occurrences of the pattern left to it. Expression String Matched? ma+n mn No match (no a character) man 1 match maaan 1 match main No match (a is not followedby n) ### illustrating metacharacters import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("see+", line): print(line) Output: Thus we see the great London bridge ### illustrating metacharacters import re fh = open('d:/18ec646/demo6.txt') for line in fh: line = line.rstrip() if re.search("ar+", line): print(line) Output: These are beautiful flowers 18
  • 19.
    ? - QUESTIONMARK The question mark symbol ? matches zero or one occurrence of the pattern left to it. Expression String Matched? ma?n mn 1 match man 1 match maaan No match (more than one a character) ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("@gmail?", line): print(line) Output: from:[email protected] ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("you?",line): print(line) Output: @abhishek,how are you 19
  • 20.
    {} - BRACES Findsthe specified number of occurrences of a pattern. Consider {n, m}. This means at least n, and at most m repetitions of the pattern left to it. If a{2} was given, a should be repeated exactly twice Expression String Matched? a{2,3} abc dat No match abc daat 1 match (at daat) aabc daaat 2 matches (at aabc and daaat) aabc daaaat 2 matches (at aabc and daaaat) 20
  • 21.
    | - ALTERNATION Verticalbar | is used for alternation (or operator). Expression String Matched? a|b cde No match ade 1 match (match at ade) acdbea 3 matches (at acdbea) ### illustrating metacharacters import re fh = open('d:/18ec646/demo7.txt') for line in fh: line = line.rstrip() if re.search("yes|no", line): print(line) Output: johny johny yes papa eating sugar no papa ### illustrating metacharacters import re fh = open('d:/18ec646/demo2.txt') for line in fh: line = line.rstrip() if re.search("hello|how", line): print(line) Output: friends,hello and welcome hello,goodmorning 21
  • 22.
    () - GROUP Parentheses() is used to group sub-patterns. For ex, (a|b|c)xz match any string that matches either a or b or c followed by xz Expression String Matched? (a|b|c)xz ab xz No match abxz 1 match (match at abxz) axz cabxz 2 matches (at axzbc cabxz) ### illustrating metacharacters import re fh = open('d:/18ec646/demo5.txt') for line in fh: line = line.rstrip() if re.search("(hello|how) are", line): print(line) Output:@abhishek,how are you ### illustrating metacharacters import re fh = open('d:/18ec646/demo2.txt') for line in fh: line = line.rstrip() if re.search("(hello and)", line): print(line) Ouptut: friends,hello and welcome 22
  • 23.
    - BACKSLASH Backlash is used to escape various characters including all metacharacters. For ex, $a match if a string contains $ followed by a. Here, $ is not interpreted by a RegEx engine in a special way. If you are unsure if a character has special meaning or not, you can put in front of it. This makes sure the character is not treated in a special way. NOTE :- Another way of doing it is putting the special character in the square brackets [ ] 23
  • 24.
    SPECIAL SEQUENCES A specialsequence is a followed by one of the characters (see Table) and has a special meaning Special sequences make commonly used patterns easier to write. 24
  • 25.
    SPECIAL SEQUENCES Character DescriptionExample A It returns a match if the specified characters are present at the beginning of the string. "AThe" b It returns a match if the specified characters are present at the beginning or the end of the string. r"bain" r"ainb" B It returns a match if the specified characters are present at the beginning of the string but not at the end. r"Bain" r"ainB d It returns a match if the string contains digits [0-9]. "d" D It returns a match if the string doesn't contain the digits [0-9]. "D" s It returns a match if the string contains any white space character. "s" S It returns a match if the string doesn't contain any white space character. "S" w It returns a match if the string contains any word characters (Ato Z, a to z, 0 to 9 and underscore) "w" W It returns a match if the string doesn't contain any word characters "W" 25
  • 26.
    A - Matchesif the specified characters are at the start of a string. Expression String Matched? Athe the sun Match In the sun No match 26 b - Matches if the specified characters are at the beginning or end of a word Expression String Matched? bfoo football Match a football Match afootball No match foob football No Match the afoo test Match the afootest No match
  • 27.
    B - Oppositeof b. Matches if the specified characters are not at the beginning or end of a word. Expression String Matched? Bfoo football No match a football No match afootball Match fooB the foo No match the afoo test No match the afootest Match 27
  • 28.
    d - Matchesany decimal digit. Equivalent to [0-9] D - Matches any non-decimal digit. Equivalent to [^0-9] Expression String Matched? d 12abc3 3 matches (at 12abc3) Python No match Expression String Matched? D 1ab34"50 3 matches (at 1ab34"50) 1345 No match 28
  • 29.
    s - Matcheswhere a string contains any whitespace character. Equivalent to [ tnrfv]. S - Matches where a string contains any non-whitespace character. Equivalent to [^ tnrfv]. Expression String Matched? s Python RegEx 1 match PythonRegEx No match Expression String Matched? S a b 2 matches (at a b) No match 29
  • 30.
    w - Matchesany alphanumeric character. Equivalent to [a-zA-Z0- 9_]. Underscore is also considered an alphanumeric character W - Matches any non-alphanumeric character. Equivalent to [^a-zA-Z0-9_] Expression String Matched? w 12&":;c 3 matches (at 12&":;c) %"> ! No match Expression String Matched? W 1a2%c 1 match (at 1a2%c) Python No match 30
  • 31.
    Z - Matchesif the specified characters are at the end of a string. Expression String Matched? PythonZ I like Python 1 match I like Python Programming No match Python is fun. No match 31 # check whether the specified #characters are at the end of string import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() if re.findall ("comZ", x): print(x) Output: from:[email protected] to: [email protected]
  • 32.
    REGEX FUNCTIONS The remodule offers a set of functions FUNCTION DESCRIPTION findall Returns a list containing all matches of a pattern in the string search Returns a match Object if there is a match anywhere in the string split Returns a list where the string has been split at each match sub Replaces one or more matches in a string (substitute with another string) match This method matches the regex pattern in the string with the optional flag. It returns true if a match is found in the string, otherwise it returns false. 32
  • 33.
    THE FINDALL() FUNCTION Thefindall() function returns a list containing all matches. The list contains the matches in the order they are found. If no matches are found, an empty list is returned Here is the syntax for this function − re. findall(pattern, string, flags=0) 33 import re str ="How are you. How is everything?" matches= re.findall("How",str) print(matches) ['How','How']
  • 34.
  • 35.
    CONTD.. 35 #check whether stringstarts with How import re str ="How are you. How is everything?" x= re.findall("^How",str) print (str) print(x) if x: print ("string starts with 'How' ") else: print ("string does not start with 'How'") Output: How are you.How is everything? ['How'] string starts with 'How'
  • 36.
    CONTD… 36 # match alllines that starts with 'hello' import re fp = open('d:/18ec646/demo1.txt') for x in fp: x = x.rstrip() if re.findall ('^hello',x): ## note 'caret' print(x) Output: hello and welcome to python class hello how are you? # match all lines that starts with ‘@' import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() if re.findall ('^@',x): ## note 'caret' metacharacter print(x) Output: @abhishek,how are you # check whether the string contains ## non-digit characters import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() if re.findall ("D", x): ## special sequence print(x) from:[email protected] to:[email protected] Hello and welcome @abhishek,how are you
  • 37.
    THE SEARCH() FUNCTION Thesearch() function searches the string for a match, and returns a Match object if there is a match. If there is more than one match, only the first occurrence of the match will be returned If no matches are found, the value None is returned Here is the syntax for this function − re.search(pattern, string, flags=0) 37
  • 38.
    EXAPLES on search()function:- outputs: 38
  • 39.
    THE SPLIT() FUNCTION There.split method splits the string where there is a match and returns a list of strings where the splits have occurred. You can pass maxsplit argument to the re.split() method. It's the maximum number of splits that will occur. If the pattern is not found, re.split() returns a list containing the original string. Here is the syntax for this function − re.split(pattern, string, maxsplit=0, flags=0) 39
  • 40.
    EXAPLES on split()function:- 40 # split function import re fp = open('d:/18ec646/demo5.txt') for x in fp: x = x.rstrip() x= re.split("@",x) print(x) Output: ['from:krishna.sksj','gmail.com'] ['to: abhishek','yahoo.com'] ['Hello and welcome'] ['','abhishek,how are you']
  • 41.
    CONTD.. 41 # split function importre fp = open('d:/18ec646/demo7.txt') for x in fp: x = x.rstrip() x= re.split("e",x) print(x) Output: ['johny johny y','s papa'] ['', 'ating sugar no papa'] ['t','lling li', 's'] ['op','n your mouth'] Output: ['johny johny yes ', ''] ['eating sugar no ',''] ['telling lies'] ['open your mouth'] # split function import re fp = open('d:/18ec646/demo7.txt') for x in fp: x = x.rstrip() x= re.split("papa",x) print(x) # split function import re fp = open('d:/18ec646/demo3.txt') for x in fp: x = x.rstrip() x= re.split("is",x) print(x) Output: ['Hello and welcome'] ['Th',' ',' Bangalore'] ['Th',' ',' Par',''] ['Th',' ',' London']
  • 42.
    THE SUB() FUNCTION Thesub() function replaces the matches with the text of your choice You can control the number of replacements by specifying the count parameter If the pattern is not found, re.sub() returns the original string Here is the syntax for this function − re.sub(pattern, repl, string, count=0, flags=0) 42
  • 43.
    EXAPLES on sub()function:- 43 ### illustration of substitute (replace) import re str ="How are you.How is everything?" x= re.sub("How","where",str) print(x) Output: where are you.where is everything? # sub function import re fp = open('d:/18ec646/demo3.txt') for x in fp: x = x.rstrip() x= re.sub("This","Where",x) print(x) Output: Hello and welcome Where is Bangalore Where is Paris Where is London
  • 44.
    THE MATCH() FUNCTION Ifzero or more characters at the beginning of string match this regular expression, return a corresponding match object. Return None if the string does not match the pattern. Here is the syntax for this function − Pattern.match(string[, pos[, endpos]]) The optional pos and endpos parameters have the same meaning as for the search() method. 44
  • 45.
    search() Vs match() Pythonoffers two different primitive operations based on regular expressions:  re.match() checksfor a match only at the beginning of the string, while re.search() checks for a match anywhere in the string Eg:- 45 # match function import re fp = open('d:/18ec646/demo3.txt') for x in fp: x = x.rstrip() if re.match("This",x): print(x) Outptut: This is Bangalore This is Paris This is London
  • 46.
    MATCH OBJECT A MatchObject is an object containing information about the search and the result If there is no match, the value None will be returned, instead of the Match Object Some of the commonly used methods and attributes of match objects are: match.group(), match.start(), match.end(), match.span(), match.string 46
  • 47.
    match.group() The group() methodreturns the part of the string where there is a match match.start(), match.end() The start() function returns the index of the start of the matched substring.  Similarly, end() returns the end index of the matched substring. match.string string attribute returns the passed string. 47
  • 48.
    match.span() The span() functionreturns a tuple containing start and end index of the matched part. Eg:- OUTPUT: (12,17) 48