0% found this document useful (0 votes)

37 views14 pages

Regular Expressions

Regular expressions (regex) are patterns used to match strings in programming languages, enabling tasks like validation, extraction, and substitution of text. The document covers definitions, basic examples, notation, and components of regex, along with performance tips and language-specific notes. It provides practical examples in various programming languages to illustrate how regex can be implemented effectively.

Uploaded by

gomugomunokingkong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views14 pages

Regular Expressions

Uploaded by

gomugomunokingkong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

6/12/24, 1:26 PM Regular Expressions

Regular
Expressions
Regular Expressions are so cool. Knowledge of regexes will allow you to save the day.

CONTENTS
Definitions • Basic Examples • Notation • Using Regular Expressions • Components of Regexes • Performance
Pitfalls • Performance Tips • Miscellaneous Language-Specific Notes • Study and Practice

Definitions
In formal language theory, a regular expression (a.k.a. regex, regexp, or r.e.), is a string
that represents a regular (type-3) language.

Huh??

Okay, in many programming languages, a regular expression is a pattern that matches

strings or pieces of strings. The set of strings they are capable of matching goes way
beyond what regular expressions from language theory can describe.

Basic Examples
Rather than start with technical details, we’ll start with a bunch of examples.
https://cs.lmu.edu/~ray/notes/regex/ 1/14
6/12/24, 1:26 PM Regular Expressions

Regex Matches any string that

hello contains {hello}

gray|grey contains {gray, grey}
gr(a|e)y contains {gray, grey}
gr[ae]y contains {gray, grey}

b[aeiou]bble contains {babble, bebble, bibble, bobble, bubble}

[b-chm-pP]at|ot contains {bat, cat, hat, mat, nat, oat, pat, Pat, ot}
colou?r contains {color, colour}
rege(x(es)?|xps?) contains {regex, regexes, regexp, regexps}
go*gle contains {ggle, gogle, google, gooogle, goooogle, ...}

go+gle contains {gogle, google, gooogle, goooogle, ...}

g(oog)+le contains {google, googoogle, googoogoogle,
googoogoogoogle, ...}
z{3} contains {zzz}
z{3,6} contains {zzz, zzzz, zzzzz, zzzzzz}
z{3,} contains {zzz, zzzz, zzzzz, ...}

[Bb]rainf\\k contains {Brainfk, brainfk}

\d contains {0,1,2,3,4,5,6,7,8,9}
\d{5}(-\d{4})? contains a United States zip code

1\d{10} contains an 11-digit string starting with a 1

[2-9]|[12]\d|3[0-6] contains an integer in the range 2..36 inclusive

Hello\nworld contains Hello followed by a newline followed by world

mi.....ft contains a nine-character (sub)string beginning with mi and

ending with ft (Note: depending on context, the dot stands
either for “any character at all” or “any character except a
newline”.) Each dot is allowed to match a different character,
so both microsoft and minecraft will match.
\d+(\.\d\d)? contains a positive integer or a floating point number with
exactly two characters after the decimal point.

https://cs.lmu.edu/~ray/notes/regex/ 2/14
6/12/24, 1:26 PM Regular Expressions

[^i*&2@] contains any character other than an i, asterisk, ampersand,

2, or at-sign.
//[^\r\n]*[\r\n] contains a Java or C# slash-slash comment
^dog begins with "dog"

dog$ ends with "dog"

^dog$ is exactly "dog"

Notation
There are many different syntaxes for regular expressions, but in general you will see that:

Most characters stand for themselves

Certain characters, called metacharacters, have special meaning and must be

escaped (usually with \ ) if you want to use them as characters. In most syntaxes the
metacharacters are:

( ) [ ] { } ^ $ . \ ? * + |

Within square brackets, you only have to escape (1) an initial ^ , (2) a non-initial or
non-final - , (3) a non-initial ] , and (4) a \ .

Using Regular Expressions

Many languages allow programmers to define regexes and then use them to:

Validate that a piece of text (or a portion of that text) matches some pattern
Find fragments of some text that match some pattern
Extract fragments of some text
Replace fragments of text with other text

https://cs.lmu.edu/~ray/notes/regex/ 3/14
6/12/24, 1:26 PM Regular Expressions

Generally a regex is first compiled into some internal form that can be used for super fast
validation, extraction, and replacing. Sometimes there is an explicit compile function or
method, and sometimes special syntax is used to compile, such as the very common form
/.../ .

Validation
Example: find "color" or "colour" in a given string.

// Java
Pattern p = Pattern.compile("colou?r");
Matcher m = p.matcher("The color green");
m.find(); // returns true
m.start(); // returns 4
m.end(); // returns 9
m = p.matcher("abc");
m.find(); // returns false

# Perl
$p = /colou?r/;
"The color green" =~ $p; # returns 1 (cuz no Perl true)
"abc" =~ $p; # returns 0 (cuz no Perl false)

# Ruby
p = /colou?r/
"The color green" =~ p # returns 4
"abc" =~ p # returns nil

# Python
p = re.compile("colou?r")
m = p.search("The color green")
m.start() # returns 4
m = p.search("abc") # returns None

// JavaScript
const p = /colou?r/;

https://cs.lmu.edu/~ray/notes/regex/ 4/14
6/12/24, 1:26 PM Regular Expressions

"The color green".search(p); // returns 4

"abc".search(p); // returns -1

If you want to know if an entire string matches a pattern, define the pattern with ^ and $ , or
with \A and \Z . In Java, you can call matches() instead of find() .

Extraction
After doing a match against a pattern, most regex engines will return you a bundle of
information, including such things as:

the part of the text that matched the pattern

the index within the string where the match begins
each part of the text matching the parenthesized portions within the pattern
(sometimes) the text before the matched text
(sometimes) the text after the matched text

Example in Ruby:

>> pattern = /weighs (\d+(\.\d+)?) (\w+)/

=> /weighs (\d+(\.\d+)?) (\w+)/
>> pattern =~ 'The thing weighs 2.5 kilograms or so.'
=> 10
>> $&
=> "weighs 2.5 kilograms"
>> $1
=> "2.5"
>> $2
=> ".5"
>> $3
=> "kilograms"
>> $`
=> "The thing "
>> $'
=> " or so."

The same thing in JavaScript:

> const pattern = /weighs (\d+(\.\d+)?) (\w+)/

> pattern.exec('The thing weighs 2.5 kilograms or so.')
[ 'weighs 2.5 kilograms',
'2.5',
'.5',
'kilograms',

https://cs.lmu.edu/~ray/notes/regex/ 5/14
6/12/24, 1:26 PM Regular Expressions
index: 10,
input: 'The thing weighs 2.5 kilograms or so.' ]

Note how in JavaScript, the match result object looks like an array and an object.

The so-called group numbers are found by counting the left-parentheses in the pattern:

TODO PICTURE GOES HERE

Sometimes you need parentheses only for precedence purposes and you don’t want to
incur the cost of extracting a group. We have non-capturing groups for this purpose.

Ruby:

>> phone = /((\d{3})(?:\.|-))?(\d{3})(?:\.|-)(\d{4})/

=> /((\d{3})(?:\.|-))?(\d{3})(?:\.|-)(\d{4})/
>> phone =~ 'Call 555-1212 for info'
=> 5
>> [$`, $&, $', $1, $2, $3, $4, $5]
=> ["Call ", "555-1212", " for info", nil, nil, "555", "1212", nil]
>> phone =~ '800.221.9989'
=> 0
>> [$`, $&, $', $1, $2, $3, $4, $5]
=> ["", "800.221.9989", "", "800.", "800", "221", "9989", nil]
>> phone =~ '1800.221.9989'
=> 1
>> [$`, $&, $', $1, $2, $3, $4, $5]
=> ["1", "800.221.9989", "", "800.", "800", "221", "9989", nil]

JavaScript:

> const r = /((\d{3})(?:\.|-))?(\d{3})(?:\.|-)(\d{4})/g;

> const m = r.exec("Call 1.800.555-1212 for info");
> m.index
7
> JSON.stringify(m);
["800.555-1212","800.","800","555","1212"]

Java

Pattern phone = Pattern.compile("((\\d{3})(?:\\.|-))?(\\d{3})(?:\\.|-)(\\d{4})");

String[] tests = {"Call 555-1212 for info", "800.221.9989", "1800.221.9989"};
for (String s : tests) {
Matcher m = phone.matcher(s);
m.find();
System.out.println("groupCount = " + m.groupCount());

https://cs.lmu.edu/~ray/notes/regex/ 6/14
6/12/24, 1:26 PM Regular Expressions

System.out.println("group(0) = " + m.group(0));

System.out.println("group(1) = " + m.group(1));
System.out.println("group(2) = " + m.group(2));
System.out.println("group(3) = " + m.group(3));
System.out.println("group(4) = " + m.group(4));
}

groupCount = 4
group(0) = 555-1212
group(1) = null
group(2) = null
group(3) = 555
group(4) = 1212
groupCount = 4
group(0) = 800.221.9989
group(1) = 800.
group(2) = 800
group(3) = 221
group(4) = 9989
groupCount = 4
group(0) = 800.221.9989
group(1) = 800.
group(2) = 800
group(3) = 221
group(4) = 9989

Substitution
Many languages have replace or replaceAll methods that replace the parts of a string
that match a regex. Sometimes you will see a g flag on a regex instead of a replaceAll
function.

alert("Rascally Rabbit".replace(/[RrLl]/g, "w"));

Components of Regexes
Character Classes

https://cs.lmu.edu/~ray/notes/regex/ 7/14
6/12/24, 1:26 PM Regular Expressions

Square brackets [ ] — means exactly one character

A leading ^ negates, a non-leading, non-terminal - defines a range:

[abc] a or b or c
[^abc] any character _except_ a, b, or c (negation)
[a-zA-Z] a through z or A through Z, inclusive (range)

If you have a ] in your set, put it first. Use \ to escape.

Java allows crazy extensions:

[a-d[m-p]] [a-dm-p] (union, Java only, I think)

[a-e&&[def]] [de] (intersection, Java only, I think)
[a-r&&[^bq-z]] [ac-p] (subtraction, Java only, I think)

Other ways to say exactly one character from a set are:

\d [0-9]
\D [^\d]
\s [ \t\n\x0B\f\r]
\S [^\s]
\w [a-zA-Z0-9_]
\W [^\w]
. any character at all, except maybe not a line termina

Groups
Defined above, in the section on extraction.

Quantifiers
Generally, 18 types:

https://cs.lmu.edu/~ray/notes/regex/ 8/14
6/12/24, 1:26 PM Regular Expressions

Eager Reluctant Possessive

Zero or one ? ?? ?+

Zero or more * *? *+

One or more + +? ++

m times {m} {m}? {m}+

At least m times {m,} {m,}? {m,}+

At least m, at most n times {m,n} {m,n}? {m,n}+

Eager (Greedy and Generous) — match as much as possible, but give back

\w+\d\d\w+ // matches abcdef42skjhfskjfhsjdfs

// but inefficiently

Possessive — match as much as possible, but do NOT give back

\w++\d\d\w+ // does not match abcdef42skjhfskjfhsjdfs

// but is efficient

Reluctant — match as little as possible

\w+?\d\d\w+ // matches abcdef42skjhfskjfhsjdfs

// efficiently, yay!

Backreferences
Things captured can be used later:

<(\w+)>[^<]*</\1>

https://cs.lmu.edu/~ray/notes/regex/ 9/14
6/12/24, 1:26 PM Regular Expressions

Anchors, Boundaries, Delimiters

Some regex tokens do not consume characters! They just assert the matching engine is at
a particular place, so to speak.

^ : Beginning of string (or line, depending on the mode)

$ : End of string (or line, depending on the mode)
\A : Beginning of string
\z : End of string
\Z : Varies a lot depending on the engine, so be careful with it
\b : Word boundary
\B : Not a word boundary

Negative Lookahead: Matches only if not followed by something

q(?!u)
\b(?:[a-eg-z]|f(?!oo))\w*\b // Word not starting with foo
\b(?:[a-eg-z]|f(?!oo\b))\w*\b // Any word except foo
((?!foo).*)

Positive Lookbehind: Matches only if preceded by something

https://cs.lmu.edu/~ray/notes/regex/ 10/14
6/12/24, 1:26 PM Regular Expressions

(?<=-)\p{L}+ // a word following a hyphen

(?<=http://)\S+ // URL not including the http:// part

Negative Lookbehind: Matches only if not preceded by something

(?<![-+\d])(\d+) // Digits not preceded by a digit, +, or -

Lookarounds show up in search and replace applications

Pattern p = Pattern.compile("Hillary(?=\\s+Clinton)");
String text = "Once Hillary Clinton was talking about Sir\n" +
"Edmund Hillary to Hillary Makasa and then Hillary\n" +
"Clinton had to run off on important business.";
Matcher m = p.matcher(text);
System.out.println(m.replaceAll("Secretary"));

Note: Read this awesome article on lookarounds.

Modifiers
A modifier affects the way the rest of the regex is interpreted. Not every language supports
all of the modifiers below. For example, JavaScript (officially) supports only i, g, and m.

Modifier Meaning

g global

i ignore case

m multiple line

s single line (DOTALL): Means that the dot matches any character at all. Without
this modifier, the dot matches any character except a newline.
x ignore whitespace in the pattern

d Unix line mode: Considers only U+000A as a line separator, rather than
U+000D or the U+000D/U+000A combo or even U+2028.
u Unicode case: in this mode the case-insensitive modifier respects Unicode
cases; outside of this mode that modifier only consolidates cases of US-ASCII
https://cs.lmu.edu/~ray/notes/regex/ 11/14
6/12/24, 1:26 PM Regular Expressions

characters.

Performance Pitfalls
You should know some things about how your regex engine works since two "equivalent"
regexes can have drastic differences in processing speed.

It is possible to write regexes that take exponential time to match, but you pretty much
have to TRY to make one (they’re pathological)
It is more common to accidentally create regexes that run in quadratic time
Main types of problems
Recompilation (from forgetting to compile regexes used multiple times)

// Java shortcut, should not be used in most cirumstances

s.matches("colou?r");

Dot-star in the Middle (which causes backtracking)

Solution 1: Use negated character class
Solution 2: Use reluctant quantifiers
Nested Repetition

Performance Tips
Always do the following:

Use non-capturing groups when you need parentheses but not capture.
If the regex is very complex, do a quick spot-check before attempting a match, e.g.
Does an email address contain ' @ '?
Present the most likely alternative(s) first, e.g.
black|white|blue|red|green|metallic seaweed
Reduce the amount of looping the engine has to do
\d\d\d\d\d is faster than \d{5}

https://cs.lmu.edu/~ray/notes/regex/ 12/14
6/12/24, 1:26 PM Regular Expressions

aaaa+ is faster than a{4,}

Miscellaneous Language-
Specific Notes
A few things that are good to know:

Java’s built-in support for regexes exceeds that of many languages

Especially good for Unicode and for character classes (has more than Perl)
Syntax is more cumbersome (string literal support weak, no operator for matching...)
— live with it!
Perl has nice regex, too, even allows you can even embed code inside them.
Great Perl documentation at the perldoc pages perlrequick, perlretut, perlre. and
perlreref.
JavaScript seems to less extensive support than other languages, but I think this is
changing.
Python puts regex functions in a module.
Python docs are here.

Study and Practice

Here are some good sources:

The Premier Site for Regexes

Maybe this is a better premier site
Regex 101—Develop regexes online
RegExr (Awesome online tool for Java regexes)
Rubular (Ruby Online Regex Tester)
Perl Regular Expressions Tutorial

https://cs.lmu.edu/~ray/notes/regex/ 13/14
6/12/24, 1:26 PM Regular Expressions

Perl Regular Expressions (Manual)

Perl Regular Expressions Quick Reference
Ruby Regexp class documentation
Java Regex Tutorial
Java Regex Optimization Article
Java Pattern class API docs
JavaScript Regular Expressions (at Mozilla Developer Center)
Python Regular Expressions

https://cs.lmu.edu/~ray/notes/regex/ 14/14

Regex Cheat Sheet
No ratings yet
Regex Cheat Sheet
10 pages
06 - JavaScript Miscellaneous Concepts
No ratings yet
06 - JavaScript Miscellaneous Concepts
9 pages
Css Micro
No ratings yet
Css Micro
14 pages
Java Regex Tutorial: Lars Vogel
No ratings yet
Java Regex Tutorial: Lars Vogel
20 pages
Understanding Regular Expressions in JS
No ratings yet
Understanding Regular Expressions in JS
11 pages
Regex Special Characters and Classes
No ratings yet
Regex Special Characters and Classes
12 pages
Css Unit 5 Dev Notes
No ratings yet
Css Unit 5 Dev Notes
13 pages
Regular Expressions
No ratings yet
Regular Expressions
35 pages
Java and Regular Expressions
No ratings yet
Java and Regular Expressions
18 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
Java Regular Expressions Guide
100% (8)
Java Regular Expressions Guide
26 pages
CSS Unit 5
No ratings yet
CSS Unit 5
18 pages
Chapter 10
No ratings yet
Chapter 10
28 pages
Regular Expressions Basics
No ratings yet
Regular Expressions Basics
11 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Regex for Mobile Forensic Searches
No ratings yet
Regex for Mobile Forensic Searches
4 pages
Regular Expressions
100% (5)
Regular Expressions
94 pages
Unit 2 Regular Expression
No ratings yet
Unit 2 Regular Expression
3 pages
Lecture 5
No ratings yet
Lecture 5
24 pages
Python How To Regex
No ratings yet
Python How To Regex
19 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
18 pages
Java Regex Basics and Applications
No ratings yet
Java Regex Basics and Applications
21 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
No ratings yet
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regex Patterns and Java Implementation
No ratings yet
Regex Patterns and Java Implementation
24 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
Python Regex Guide
No ratings yet
Python Regex Guide
20 pages
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
100% (1)
Regular Expression HOWTO: Guido Van Rossum Fred L. Drake, JR., Editor
18 pages
Regular Expression in Javascript Regular Expression
No ratings yet
Regular Expression in Javascript Regular Expression
5 pages
Network Security - 4.2 Reg Ex Primer
No ratings yet
Network Security - 4.2 Reg Ex Primer
3 pages
Regex
No ratings yet
Regex
24 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Regex Guide for Developers
No ratings yet
Regex Guide for Developers
7 pages
Chapter 5 Regular Expression, Rollover and Frames
No ratings yet
Chapter 5 Regular Expression, Rollover and Frames
56 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
Regular Expressions Guide - Mozilla Developer Center
No ratings yet
Regular Expressions Guide - Mozilla Developer Center
12 pages
Python Regular Expressions Cheat Sheet PDF
No ratings yet
Python Regular Expressions Cheat Sheet PDF
1 page
Regex & Parsing for Developers
No ratings yet
Regex & Parsing for Developers
32 pages
Regular Expressions in Java
No ratings yet
Regular Expressions in Java
30 pages
Regular Expression in Javascript
No ratings yet
Regular Expression in Javascript
19 pages
Python Regular Expressions Guide
No ratings yet
Python Regular Expressions Guide
18 pages
Regular Expressions
No ratings yet
Regular Expressions
20 pages
45 The Matching Characters
No ratings yet
45 The Matching Characters
3 pages
Regular Expressions in Java
No ratings yet
Regular Expressions in Java
30 pages
Regex Cheat Sheet for Developers
No ratings yet
Regex Cheat Sheet for Developers
2 pages
Comprehensive Regular Expressions Guide
No ratings yet
Comprehensive Regular Expressions Guide
94 pages
Quick Regex Cheatsheet Guide
No ratings yet
Quick Regex Cheatsheet Guide
7 pages
David Wang Computing Science and Information Technology: Info 1211 - Operating System'S Principles and Applications
No ratings yet
David Wang Computing Science and Information Technology: Info 1211 - Operating System'S Principles and Applications
73 pages
Python Regex Basics and Usage
No ratings yet
Python Regex Basics and Usage
12 pages
Chapter 5 Css
No ratings yet
Chapter 5 Css
52 pages
06 Regularexpression
No ratings yet
06 Regularexpression
35 pages
Syllabus For Basic German (B.Tech) (University Core) : Listening: Students Will Be Able To Understand Routine Questions
No ratings yet
Syllabus For Basic German (B.Tech) (University Core) : Listening: Students Will Be Able To Understand Routine Questions
1 page
Tex's Passé Composé With Avoir
No ratings yet
Tex's Passé Composé With Avoir
4 pages
Hyphen Adjectives
No ratings yet
Hyphen Adjectives
16 pages
Countability & Quantifiers: Grammar - Business Approach
No ratings yet
Countability & Quantifiers: Grammar - Business Approach
9 pages
Translation Equivalence Theory
No ratings yet
Translation Equivalence Theory
21 pages
Week 8 Planner
No ratings yet
Week 8 Planner
2 pages
I04 - Unit 12 Our World - Lesson A - For Students
No ratings yet
I04 - Unit 12 Our World - Lesson A - For Students
69 pages
Campus Journalism Report 1 085957
No ratings yet
Campus Journalism Report 1 085957
70 pages
Pronouns: He / She / It I We / You / They
100% (1)
Pronouns: He / She / It I We / You / They
2 pages
Campus Journalism Midterm Review Guide
No ratings yet
Campus Journalism Midterm Review Guide
14 pages
Q1 WK 1 English
No ratings yet
Q1 WK 1 English
61 pages
Unit 4 - Verb Form
No ratings yet
Unit 4 - Verb Form
9 pages
SITUATIONAL CONTEXTS (Gayeta, Arielle)
No ratings yet
SITUATIONAL CONTEXTS (Gayeta, Arielle)
20 pages
Suffixation Is The Formation of Words With The Help of Suffixes
No ratings yet
Suffixation Is The Formation of Words With The Help of Suffixes
2 pages
Moonlight On Manila Bay
33% (3)
Moonlight On Manila Bay
3 pages
Final Common English Syllabus III To X (2025-26)
No ratings yet
Final Common English Syllabus III To X (2025-26)
24 pages
Infinitive Past Forma Base
No ratings yet
Infinitive Past Forma Base
5 pages
Comparative Adjectives Worksheets For Grade 3 K5 Learning
No ratings yet
Comparative Adjectives Worksheets For Grade 3 K5 Learning
1 page
List of Irregular Verbs
No ratings yet
List of Irregular Verbs
1 page
Year 7-Study Guide& Revision Pack-QUIZ 3
No ratings yet
Year 7-Study Guide& Revision Pack-QUIZ 3
32 pages
English Tense Practice Test
No ratings yet
English Tense Practice Test
9 pages
Grammar Book
No ratings yet
Grammar Book
21 pages
Wassce English Language 2018
No ratings yet
Wassce English Language 2018
16 pages
Persuasive Text
No ratings yet
Persuasive Text
1 page
NDA, CDS, AFCAT - English Practice Set - 1
No ratings yet
NDA, CDS, AFCAT - English Practice Set - 1
56 pages
Student Performance Report Summary
No ratings yet
Student Performance Report Summary
448 pages
Basic English Grammar 12 Tenses
100% (3)
Basic English Grammar 12 Tenses
16 pages
Subject-Verb Agreement Rules
No ratings yet
Subject-Verb Agreement Rules
12 pages
Post Assessment For English 9
No ratings yet
Post Assessment For English 9
4 pages

Regular Expressions

Uploaded by

Regular Expressions

Uploaded by

6/12/24, 1:26 PM Regular Expressions

Okay, in many programming languages, a regular expression is a pattern that matches

Regex Matches any string that

hello contains {hello}

b[aeiou]bble contains {babble, bebble, bibble, bobble, bubble}

go+gle contains {gogle, google, gooogle, goooogle, ...}

[Bb]rainf\*\*k contains {Brainf**k, brainf**k}

1\d{10} contains an 11-digit string starting with a 1

Hello\nworld contains Hello followed by a newline followed by world

mi.....ft contains a nine-character (sub)string beginning with mi and

[^i*&2@] contains any character other than an i, asterisk, ampersand,

dog$ ends with "dog"

Most characters stand for themselves

Certain characters, called metacharacters, have special meaning and must be

Using Regular Expressions

"The color green".search(p); // returns 4

the part of the text that matched the pattern

>> pattern = /weighs (\d+(\.\d+)?) (\w+)/

The same thing in JavaScript:

> const pattern = /weighs (\d+(\.\d+)?) (\w+)/

TODO PICTURE GOES HERE

>> phone = /((\d{3})(?:\.|-))?(\d{3})(?:\.|-)(\d{4})/

> const r = /((\d{3})(?:\.|-))?(\d{3})(?:\.|-)(\d{4})/g;

Pattern phone = Pattern.compile("((\\d{3})(?:\\.|-))?(\\d{3})(?:\\.|-)(\\d{4})");

System.out.println("group(0) = " + m.group(0));

alert("Rascally Rabbit".replace(/[RrLl]/g, "w"));

Square brackets [ ] — means exactly one character

A leading ^ negates, a non-leading, non-terminal - defines a range:

If you have a ] in your set, put it first. Use \ to escape.

Java allows crazy extensions:

[a-d[m-p]] [a-dm-p] (union, Java only, I think)

Other ways to say exactly one character from a set are:

Eager Reluctant Possessive

m times {m} {m}? {m}+

At least m times {m,} {m,}? {m,}+

At least m, at most n times {m,n} {m,n}? {m,n}+

\w+\d\d\w+ // matches abcdef42skjhfskjfhsjdfs

Possessive — match as much as possible, but do NOT give back

\w++\d\d\w+ // does not match abcdef42skjhfskjfhsjdfs

Reluctant — match as little as possible

\w+?\d\d\w+ // matches abcdef42skjhfskjfhsjdfs

Anchors, Boundaries, Delimiters

^ : Beginning of string (or line, depending on the mode)

Read more about these at Rexegg.

Negative Lookahead: Matches only if not followed by something

Positive Lookbehind: Matches only if preceded by something

(?<=-)\p{L}+ // a word following a hyphen

Negative Lookbehind: Matches only if not preceded by something

(?<![-+\d])(\d+) // Digits not preceded by a digit, +, or -

Lookarounds show up in search and replace applications

Note: Read this awesome article on lookarounds.

// Java shortcut, should not be used in most cirumstances

Dot-star in the Middle (which causes backtracking)

aaaa+ is faster than a{4,}

Java’s built-in support for regexes exceeds that of many languages

Study and Practice

The Premier Site for Regexes

Perl Regular Expressions (Manual)

You might also like

[Bb]rainf\\k contains {Brainfk, brainfk}