Lecture 9 Python
Lecture 9 Python
While using the regular expression the first thing is to recognize is that everything is
essentially a character, and we are writing patterns to match a specific sequence of
characters also referred as string. Ascii or latin letters are those that are on your
keyboards and Unicode is used to match the foreign text. It includes digits and
punctuation and all special characters like $#@!%, etc.
For instance, a regular expression could tell a program to search for specific text from
the string and then to print out the result accordingly. Expression can include
Text matching
Repetition
Branching
Pattern-composition etc.
RE
import re
"re" module included with Python primarily used for string searching and
manipulation
Also used frequently for web page "Scraping" (extract large amount of data from
websites)
We will begin the expression tutorial with this simple exercise by using the expressions
(w+) and (^).
For example, for our string "guru99, education is fun" if we execute the code with w+
and^, it will give the output "guru99".
import re
xx = "we believe,education is fun"
r1 = re.findall(r"^\w+",xx)
print(r1)
Remember, if you remove +sign from the w+, the output will change, and it will only give
the first character of the first letter, i.e., [g]
To understand how this regular expression works in Python, we begin with a simple
example of a split function. In the example, we have split each word using the "re.split"
function and at the same time we have used expression \s that allows to parse each
word in the string separately.
When you execute this code it will give you the output ['we', 'are', 'splitting', 'the',
'words'].
Now, let see what happens if you remove "\" from s. There is no 's' alphabet in the
output, this is because we have removed '\' from the string, and it evaluates "s" as a
regular character and thus split the words wherever it finds "s" in the string.
Similarly, there are series of other regular expressions in Python that you can use in
various ways in Python like \d,\D,$,\.,\b, etc.
import re
xx = "we believe,education is fun"
r1 = re.findall(r"^\w+", xx)
print((re.split(r'\s','we are splitting the words')))
print((re.split(r's','split the words')))
Next, we will going to see the types of methods that are used with regular expressions.
The "re" package provides several methods to actually perform queries on an input
string. The method we going to see are
re.match()
re.search()
re.findall()
Note: Based on the regular expressions, Python offers two different primitive operations.
The match method checks for a match only at the beginning of the string while search
checks for a match anywhere in the string.
Using re.match()
The match function is used to match the RE pattern to string with optional flags. In this
method, the expression "w+" and "\W" will match the words starting with letter 'g' and
thereafter, anything which is not started with 'g' is not identified. To check match for
each element in the list or string, we run the forloop.
A regular expression is commonly used to search for a pattern in a text. This method
takes a regular expression pattern and a string and searches for that pattern with the
string.
In order to use search() function, you need to import re first and then execute the code.
The search() function takes the "pattern" and "text" to scan from our main string and
returns a match object when the pattern is found or else not match if the pattern is not
found.
For example here we look for two literal strings "Software testing" "guru99", in a text
string "Software Testing is fun". For "software testing" we found the match hence it
returns the output as "found a match", while for word "guru99" we could not found in
string hence it returns the output as "No match".
Re.findall() module is used when you want to iterate over the lines of the file, it will
return a list of all the matches in a single step. For example, here we have a list of e-
mail addresses, and we want all the e-mail addresses to be fetched out from the list, we
use the re.findall method. It will find all the e-mail addresses from the list.
import re
Many Python Regex Methods and Regex functions take an optional argument called
Flags. This flags can modify the meaning of the given Regex pattern. To understand
these we will see one or two example of these Flags.
[re.S] Make [ . ]
In multiline the pattern character [^] match the first character of the string and the
beginning of each line (following immediately after the each newline). While expression
small "w" is used to mark the space with characters. When you run the code the first
variable "k1" only prints out the character 'g' for word guru99, while when you add
multiline flag, it fetches out first characters of all the elements in the string.
import re
xx = """guru99
careerguru99
selenium"""
k1 = re.findall(r"^\w", xx)
k2 = re.findall(r"^\w", xx, re.MULTILINE)
print(k1)
print(k2)
Likewise, you can also use other Python flags like re.U (Unicode), re.L (Follow locale),
re.X (Allow Comment), etc.
Python 2 Example
Above codes are Python 3 examples, If you want to run in Python 2 please consider
following code.
Summary
Text matching
Repetition
Branching
Pattern-composition etc.
"re" module included with Python primarily used for string searching and
manipulation
Also used frequently for webpage "Scraping" (extract large amount of data from
websites)
Regular Expression Methods include re.match(),re.search()& re.findall()
Python Flags Many Python Regex Methods and Regex functions take an optional
argument called Flags
This flags can modify the meaning of the given Regex pattern
Various Python flags used in Regex Methods are re.M, re.I, re.S, etc.