Token-based matching
Some NLU tasks can be solved without the help of any statistical model. One of those ways is a regex, which we use to match a predefined set of patterns to our text.
A regex is a sequence of characters that specifies a search pattern. A regex describes a set of strings that follows the specified pattern. These patterns can include letters, digits, and characters with special meanings, such as ?, ., and *. Python’s built-in library, re, provides great support to define and match regexes.
What does a regex look like, then? The following regex matches the following strings:
"Barack Obama" "Barack Obama" "Barack Hussein Obama" reg = r"Barack\s(Hussein\s)?Obama"
This pattern can be read as follows: the string Barack can be followed optionally by the string Hussein (the ? character in a regex means optional; that is, 0 or 1 occurrence) and should be followed by the string Obama. The inter-word spaces can be a...