Regular Expression Matching
Last Updated :
21 Sep, 2024
Given a text t and a pattern p where t consists of only lowercase English alphabets while p consists of lowercase English alphabets as well as special characters ‘.’ and ‘*’, the task is to implement a function to test regular expression such that:
'.'
Matches any single character.'*'
Matches zero or more of the preceding character.
Note: For each appearance of the character ‘*'
, there will be a previous valid character to match.
Examples:
Input:
t = "aaa", p = "a"
Output:
false
Explanation:
"a" does not match the entire string "aaa".
Input:
t = "abb", p = "a.*"
Output:
true
Explanation:
replace
.
with b then p becomes ab* now replace * with one preceeding character hence p becomes abb.
Input:
t = "", p = "a*b*"
Output:
true
Explanation:
Note that * can match 0 occurrences also.
Naive Recursive Solution:
We can begin matching both pattern from one side. We begin from the right side. Following cases arise, when we match character by character.
Case 1 (If Last Characters are Same) : We move to the next character in both text t and pattern p
Case 2 (If Last Character of Patter is ‘.’) : We move to the next character in both text t and pattern p
Case 2 (If Last Character of Patter is ‘*’) : There must be at-least two characters in the pattern. If not, we return false. If yes, the following two cases arise.
a) The ‘*’ and character preceding it match with 0 characters of the text. We move two characters in pattern and do not move in text.
b) The ‘*’ and character preceding it match with 1 or more characters of the text. We match the preceding character of pattern with the current character of text. If match, then we move one character ahead in text and do not move in pattern.
We return false if both a) and b) are not true.
Below is implementation of the idea.
C++
#include <iostream>
using namespace std;
bool isMatchRec(string t, string p, int n, int m) {
// If pattern is empty, then text must also be
// empty
if (m == 0) {
return n == 0;
}
// If text is empty, then pattern can have characters
// followed by *s
if (n == 0) {
return (m >= 2 && p[m - 1] == '*') &&
isMatchRec(t, p, n, m - 2);
}
// If last characters of both string and pattern
// match, or pattern has '.'
if (t[n - 1] == p[m - 1] || p[m - 1] == '.') {
return isMatchRec(t, p, n - 1, m - 1);
}
// Handle '*' in the pattern
if (p[m - 1] == '*' && m > 1) {
// Check if '*' can represent zero occurrences
// of the preceding character
bool zero = isMatchRec(t, p, n, m - 2);
// Check if '*' can represent one or more occurrences
// of the preceding character
bool oneOrMore = (p[m - 2] == t[n - 1] || p[m - 2] == '.') &&
isMatchRec(t, p, n - 1, m);
return zero || oneOrMore;
}
// If no match
return false;
}
// Wrapper function
bool isMatch(string t, string p) {
return isMatchRec(t, p, t.size(), p.size());
}
int main() {
cout << boolalpha << isMatch("aab", "a.*") << endl;
cout << boolalpha << isMatch("aa", "a") << endl;
cout << boolalpha << isMatch("aa", "a*") << endl;
cout << boolalpha << isMatch("ab", ".*") << endl;
cout << boolalpha << isMatch("mississippi", "mis*is*p*.") << endl;
return 0;
}
Java
public class GfG {
public static boolean isMatchRec(String t, String p, int n, int m) {
// If pattern is empty, then text must also be
// empty
if (m == 0) {
return n == 0;
}
// If text is empty, then pattern can have characters
// followed by *s
if (n == 0) {
return (m >= 2 && p.charAt(m - 1) == '*') &&
isMatchRec(t, p, n, m - 2);
}
// If last characters of both string and pattern
// match, or pattern has '.'
if (t.charAt(n - 1) == p.charAt(m - 1) || p.charAt(m - 1) == '.') {
return isMatchRec(t, p, n - 1, m - 1);
}
// Handle '*' in the pattern
if (p.charAt(m - 1) == '*' && m > 1) {
// Check if '*' can represent zero occurrences
// of the preceding character
boolean zero = isMatchRec(t, p, n, m - 2);
// Check if '*' can represent one or more occurrences
// of the preceding character
boolean oneOrMore = (p.charAt(m - 2) == t.charAt(n - 1) || p.charAt(m - 2) == '.') &&
isMatchRec(t, p, n - 1, m);
return zero || oneOrMore;
}
// If no match
return false;
}
// Wrapper function
public static boolean isMatch(String t, String p) {
return isMatchRec(t, p, t.length(), p.length());
}
public static void main(String[] args) {
System.out.println(isMatch("aab", "a.*"));
System.out.println(isMatch("aa", "a"));
System.out.println(isMatch("aa", "a*"));
System.out.println(isMatch("ab", ".*");
System.out.println(isMatch("mississippi", "mis*is*p*."));
}
}
Python
def is_match_rec(t, p, n, m):
# If pattern is empty, then text must also be empty
if m == 0:
return n == 0
# If text is empty, then pattern can have characters followed by *s
if n == 0:
return (m >= 2 and p[m - 1] == '*') and is_match_rec(t, p, n, m - 2)
# If last characters of both string and pattern match, or pattern has '.'
if t[n - 1] == p[m - 1] or p[m - 1] == '.':
return is_match_rec(t, p, n - 1, m - 1)
# Handle '*' in the pattern
if p[m - 1] == '*' and m > 1:
# Check if '*' can represent zero occurrences of the preceding character
zero = is_match_rec(t, p, n, m - 2)
# Check if '*' can represent one or more occurrences of the preceding character
one_or_more = (p[m - 2] == t[n - 1] or p[m - 2] == '.') and is_match_rec(t, p, n - 1, m)
return zero or one_or_more
# If no match
return False
# Wrapper function
def is_match(t, p):
return is_match_rec(t, p, len(t), len(p))
# Example usage
print(is_match('aab', 'a.*'))
print(is_match('aa', 'a'))
print(is_match('aa', 'a*'))
print(is_match('ab', '.*'))
print(is_match('mississippi', 'mis*is*p*.'))
JavaScript
function is_match_rec(t, p, n, m) {
// If pattern is empty, then text must also be empty
if (m === 0) {
return n === 0;
}
// If text is empty, then pattern can have characters followed by *s
if (n === 0) {
return (m >= 2 && p[m - 1] === '*') && is_match_rec(t, p, n, m - 2);
}
// If last characters of both string and pattern match, or pattern has '.'
if (t[n - 1] === p[m - 1] || p[m - 1] === '.') {
return is_match_rec(t, p, n - 1, m - 1);
}
// Handle '*' in the pattern
if (p[m - 1] === '*' && m > 1) {
// Check if '*' can represent zero occurrences of the preceding character
const zero = is_match_rec(t, p, n, m - 2);
// Check if '*' can represent one or more occurrences of the preceding character
const one_or_more = (p[m - 2] === t[n - 1] || p[m - 2] === '.') && is_match_rec(t, p, n - 1, m);
return zero || one_or_more;
}
// If no match
return false;
}
// Wrapper function
function is_match(t, p) {
return is_match_rec(t, p, t.length, p.length);
}
// Example usage
console.log(is_match('aab', 'a.*'));
console.log(is_match('aa', 'a'));
console.log(is_match('aa', 'a*'));
console.log(is_match('ab', '.*'));
console.log(is_match('mississippi', 'mis*is*p*.'));
Outputtrue
false
true
true
false
Dynamic Programming Solution
The above recursive solution has exponential time complexity in the worst case. Please note that we make two recursive calls in the last if condition. We can clearly notice overlapping subproblems here as we make calls for (n-1, m-1), (n, m-2) and/or (n-1, m). So we can use Dynamic Programming to solve this problem.
- Create a boolean 2D
dp
array of size (n + 1) * (m + 1). Please note that the range of values in the recursion goes from 0 to text length (or n) and 0 to pattern length (or m) - dp[i][j] is going to be true if first i characters of text match with first j characters of pattern.
- If both strings are empty, then it’s a match, thus,
dp[0][0] = true
. - For other cases, we simply follow the above recursive solution.
C++
#include <iostream>
#include <vector>
using namespace std;
bool isMatch(string t, string p) {
int n = t.size();
int m = p.size();
// DP table where dp[i][j] means whether first i characters in t
// match the first j characters in p
vector<vector<bool>> dp(n + 1, vector<bool>(m + 1, false));
// Empty pattern matches empty text
dp[0][0] = true;
// Deals with patterns like a*, a*b*, a*b*c* etc, where '*'
// can eliminate preceding character
for (int j = 1; j <= m; ++j) {
if (p[j - 1] == '*' && j > 1) {
dp[0][j] = dp[0][j - 2];
}
}
// Fill the table
for (int i = 1; i <= n; ++i) {
for (int j = 1; j <= m; ++j) {
// Characters match
if (p[j - 1] == '.' || t[i - 1] == p[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
}
else if (p[j - 1] == '*' && j > 1) {
// Two cases:
// 1. '*' represents zero occurrence of the preceding character
// 2. '*' represents one or more occurrence of the preceding character
dp[i][j] = dp[i][j - 2] ||
(dp[i - 1][j] && (p[j - 2] == t[i - 1] || p[j - 2] == '.'));
}
}
}
return dp[n][m];
}
int main() {
cout << boolalpha << isMatch("aab", "a.*") << endl;
cout << boolalpha << isMatch("aa", "a") << endl;
cout << boolalpha << isMatch("aa", "a*") << endl;
cout << boolalpha << isMatch("ab", ".*") << endl;
cout << boolalpha << isMatch("mississippi", "mis*is*p*.") << endl;
return 0;
}
Java
import java.util.*;
public class GfG {
public static boolean isMatch(String t, String p) {
int n = t.length();
int m = p.length();
// DP table where dp[i][j] means whether first i characters in t
// match the first j characters in p
boolean[][] dp = new boolean[n + 1][m + 1];
// Empty pattern matches empty text
dp[0][0] = true;
// Deals with patterns like a*, a*b*, a*b*c* etc, where '*'
// can eliminate the preceding character
for (int j = 1; j <= m; ++j) {
if (p.charAt(j - 1) == '*' && j > 1) {
dp[0][j] = dp[0][j - 2];
}
}
// Fill the table
for (int i = 1; i <= n; ++i) {
for (int j = 1; j <= m; ++j) {
// Characters match
if (p.charAt(j - 1) == '.' || t.charAt(i - 1) == p.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1];
}
// Handle '*' in the pattern
else if (p.charAt(j - 1) == '*' && j > 1) {
// Two cases:
// 1. '*' represents zero occurrence of the preceding character
// 2. '*' represents one or more occurrence of the preceding character
dp[i][j] = dp[i][j - 2] ||
(dp[i - 1][j] && (p.charAt(j - 2) == t.charAt(i - 1) ||
p.charAt(j - 2) == '.'));
}
}
}
return dp[n][m];
}
public static void main(String[] args) {
System.out.println(isMatch("aab", "a.*"));
System.out.println(isMatch("aa", "a"));
System.out.println(isMatch("aa", "a*"));
System.out.println(isMatch("ab", ".*"));
System.out.println(isMatch("mississippi", "mis*is*p*."));
}
}
Python
def isMatch(t: str, p: str) -> bool:
n = len(t)
m = len(p)
# DP table where dp[i][j] means whether first i characters in t
# match the first j characters in p
dp = [[False] * (m + 1) for _ in range(n + 1)]
# Empty pattern matches empty text
dp[0][0] = True
# Deals with patterns like a*, a*b*, a*b*c* etc, where '*'
# can eliminate the preceding character
for j in range(1, m + 1):
if p[j - 1] == '*' and j > 1:
dp[0][j] = dp[0][j - 2]
# Fill the table
for i in range(1, n + 1):
for j in range(1, m + 1):
# Characters match
if p[j - 1] == '.' or t[i - 1] == p[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
# Handle '*' in the pattern
elif p[j - 1] == '*' and j > 1:
# Two cases:
# 1. '*' represents zero occurrence of the preceding character
# 2. '*' represents one or more occurrence of the preceding character
dp[i][j] = dp[i][j - 2] or (dp[i - 1][j] and (p[j - 2] == t[i - 1] or p[j - 2] == '.'))
return dp[n][m]
if __name__ == "__main__":
print(isMatch("aab", "a.*"))
print(isMatch("aa", "a"))
print(isMatch("aa", "a*"))
print(isMatch("ab", ".*"))
print(isMatch("mississippi", "mis*is*p*."))
JavaScript
function isMatch(t, p) {
const n = t.length;
const m = p.length;
// DP table where dp[i][j] means whether first i characters in t
// match the first j characters in p
const dp = Array.from({ length: n + 1 }, () => Array(m + 1).fill(false));
// Empty pattern matches empty text
dp[0][0] = true;
// Deals with patterns like a*, a*b*, a*b*c* etc, where '*'
// can eliminate the preceding character
for (let j = 1; j <= m; ++j) {
if (p[j - 1] === '*' && j > 1) {
dp[0][j] = dp[0][j - 2];
}
}
// Fill the table
for (let i = 1; i <= n; ++i) {
for (let j = 1; j <= m; ++j) {
// Characters match
if (p[j - 1] === '.' || t[i - 1] === p[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
}
// Handle '*' in the pattern
else if (p[j - 1] === '*' && j > 1) {
// Two cases:
// 1. '*' represents zero occurrence of the preceding character
// 2. '*' represents one or more occurrence of the preceding character
dp[i][j] = dp[i][j - 2] || (dp[i - 1][j] && (p[j - 2] === t[i - 1] || p[j - 2] === '.'));
}
}
}
return dp[n][m];
}
// Testing the function
console.log(isMatch("aab", "a.*")); // true
console.log(isMatch("aa", "a")); // false
console.log(isMatch("aa", "a*")); // true
console.log(isMatch("ab", ".*")); // true
console.log(isMatch("mississippi", "mis*is*p*.")); // false
Outputtrue
false
true
true
false
Illustration
Let’s take an example t = "aab"
and p = "c*a*b"
and create a DP table.
| | | c
|
*
| a
|
*
| b
|
---|
| |
0
|
1
|
2
|
3
|
4
|
5
|
|
0
| TRUE
| FALSE
| TRUE
| FALSE
| TRUE
| FALSE
|
a
|
1
| FALSE
| FALSE
| FALSE
| TRUE
| TRUE
| FALSE
|
a
|
2
| FALSE
| FALSE
| FALSE
| FALSE
| TRUE
| FALSE
|
b
|
3
| FALSE
| FALSE
| FALSE
| FALSE
| FALSE
| TRUE
|
- First column — it means
p
is empty and it will match to s
only if s
is also empty which we have stored in dp[0][0]
. Thus, remaining values of dp[0][i] will be false
. - First row — this is not so easy. It means which
p
matches empty t
. The answer is either an empty pattern or a pattern that represents an empty string such as "a*"
, "x*y*"
, "l*m*n*"
and so on. In the above example, if t = ""
and p = "c*"
, then due to *
, c
can be replaced by 0 c
s which gives us an empty string. Hence, dp[0][2] = true
. - For non-empty strings, let’s say that t
[i - 1] == p[j - 1]
this means the (i – 1)th and (j – 1)th characters are same. This means, we have to check if the remaining strings are a match or not. If they are a match, then the current substrings will be a match, otherwise they won’t be a match i.e., dp[i][j] = dp[i - 1][j - 1]
. We’re taking (i – 1)th and (j – 1)th characters to offset empty strings as we’re assuming our strings start from index 1. - If
p[j - 1] == "."
, then it means any single character can be matched. Therefore, here also, we will have to check if the remaining string is a match or not. Thus, dp[i][j] = dp[i - 1][j - 1]
. - If
p[j - 1] == "*"
, then it means either it’s represents an empty string (0 characters), thus dp[i][j] = dp[i][j - 2]
or t[i - 1] == p[j - 2] || p[j - 2] == "."
, then current character of string equals the char preceding ‘*'
in pattern so the result is dp[i-1][j]
.
Time Complexity: O(m×n)
Auxiliary Space: O(m×n)
Similar Reads
Kotlin Regular Expression
Regular Expressions are a fundamental part of almost every programming language and Kotlin is no exception to it. In Kotlin, the support for regular expression is provided through Regex class. An object of this class represents a regular expression, that can be used for string matching purposes. cla
4 min read
MariaDB - Regular Expression
MariaDB is also a relational database language that is similar to SQL. However, the introduction of MariaDB took place as it is an extension to SQL and contains some more advanced operators rather than SQL. MariaDB contains operators similar to SQL like CRUD operations and between operators and othe
8 min read
Regular Expressions In R
Regular expressions (regex) are powerful tools used in programming languages like R for pattern matching within text data. They enable us to search for specific patterns, extract information, and manipulate strings efficiently. Here, we'll explore the fundamentals of regular expressions in R Program
5 min read
Regular Expressions in Scala
Regular Expressions explain a common pattern utilized to match a series of input data so, it is helpful in Pattern Matching in numerous programming languages. In Scala Regular Expressions are generally termed as Scala Regex. Regex is a class which is imported from the package scala.util.matching.Reg
5 min read
Properties of Regular Expressions
Regular expressions, often called regex or regexp, are a powerful tool used to search, match, and manipulate text. They are essentially patterns made up of characters and symbols that allow you to define a search pattern for text. In this article, we will see the basic properties of regular expressi
7 min read
Regular Expression to DFA
The main function of regular expressions is to define patterns for matching strings; automata theory provides a structured pattern recognition of these patterns through Finite Automata. A very common method to construct a Deterministic Finite Automaton (DFA) based on any given regular expression is
6 min read
Perl | Quantifiers in Regular Expression
Perl provides several numbers of regular expression quantifiers which are used to specify how many times a given character can be repeated before matching is done. This is mainly used when the number of characters going to be matched is unknown. There are six types of Perl quantifiers which are give
4 min read
Perl | Operators in Regular Expression
Prerequisite: Perl | Regular Expressions The Regular Expression is a string which is the combination of different characters that provides matching of the text strings. A regular expression can also be referred to as regex or regexp. The basic method for applying a regular expression is to use of bi
4 min read
How to Use Regular Expressions (RegEx) on Linux
Regexps are acronyms for regular expressions. Regular expressions are special characters or sets of characters that help us to search for data and match the complex pattern. Regexps are most commonly used with the Linux commands:- grep, sed, tr, vi. The following are some basic regular expressions:
5 min read
Working with Regular Expressions in PostgreSQL
Regular expressions, often referred to as "regex," are patterns used to match strings. In PostgreSQL, regular expressions allow us to search, validate, and manipulate text data in a powerful way. Regular expressions are helpful whether we need to find patterns in a string, replace parts of a text, o
5 min read