java_Java_Regexe_htm
java_Java_Regexe_htm
| HOME
Introduction
Regular Expression (regexe) is extremely useful in programming, especially in processing text files.
The Sun's online Java Tutorial trail on "Regular Expressions" is excellently written. Please read if you are new to regexe.
Regexe by Examples
open in browser PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Example 1: Find Text
For example, given the input "This is an apple. These are 33 (thirty-three) apples", you wish to find all occurrences of pattern "Th"
(case-sensitive or case insensitive).
import java.util.regex.Pattern;
import java.util.regex.Matcher;
find() found the pattern "Th" starting at index 0 and ending at index 2
find() found the pattern "Th" starting at index 18 and ending at index 20
matches() found nothing
lookingAt() found the pattern "Th" starting at index 0 and ending at index 2
Explanation:
Java's regexe classes are kept in package java.util.regex.Pattern. There are only two classes in this package: Pattern
and Matcher. You should browse the Javadoc for Pattern class, followed by Matcher class.
Three steps are required to perform regexe matching:
Allocate a Pattern object. There is no constructor for the Pattern class. Instead, you invoke the static method
Pattern.compile(regexeString) to compile the regexeString, which returns a Pattern instance.
Allocate a Matcher object. Again, there is no constructor for the Matcher class. Instead, you invoke the
matcher(inputString) method from the Pattern instance (created in Step 1). You also bind the input sequence to
this Matcher.
Use the Matcher instance (created in Step 2) to perform the matching and process the matching result. The Matcher
class provides a few boolean methods for performing the matches:
boolean find(): scans the input sequence to look for the next subsequence that matches the pattern. If match is
found, you can use the group(), start() and end() to retrieve the matched subsequence and its starting and
ending indices, as shown in the above example.
boolean matches(): try to match the entire input sequence against the regexe pattern. It returns true if the entire
open in browser PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
boolean matches(): try to match the entire input sequence against the regexe pattern. It returns true if the entire
input sequence matches the pattern.
boolean lookingAt(): try to match the input sequence, starting from the beginning, against the regexe pattern. It
returns true if a prefix of the input sequence matches the pattern.
To perform case-insensitive matching, use Pattern.compile(regexeString, Pattern.CASE_INSENSITIVE) to create the
Pattern instance (as commented out in the above example).
Try changing the regexe pattern of the above example to the followings and observe the outputs. Take not that you need to use a
escape sequence '\' for special characters such as '\' inside a Java's string.
String regexe = "\\w+"; // Escape needed for \
String regexe = "\\b[1-9][0-9]+\\b";
Read Javadoc for the class java.util.regex.Pattern for the list of regular expression constructs supported by Java.
Read Sun's online Java Tutorial trail on "Regular Expressions" on how to use regular expression.
// Step 2: Allocate a Matcher object from the pattern, and provide the input
Matcher matcher = pattern.matcher(input);
Explanation:
First, create a Pattern object to compile a regexe pattern. Next, create a Matcher object from the Pattern and specify the
input.
The Matcher class provides a replaceAll(replacement) to replace all the matched subsequence with the replacement;
or replaceFirst(replacement) to replace the first match only.
open in browser PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
// Step 1: Allocate a Pattern object to compile a regexe
Pattern pattern = Pattern.compile(regexe, Pattern.CASE_INSENSITIVE);
// Step 2: Allocate a Matcher object from the Pattern, and provide the input
Matcher matcher = pattern.matcher(input);
You can use regexe to specify the pattern, and back references in the replacement, as in the previous example.
For example,
open in browser PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
public class StringSplitTest {
public static void main(String[] args) {
String source = "There are thirty-three big-apple";
String[] tokens = source.split("\\s+|-"); // whitespace(s) or -
for (String token : tokens) {
System.out.println(token);
}
}
}
There
are
thirty
three
big
apple
For example,
import java.util.Scanner;
public class ScannerUseDelimiterTest {
public static void main(String[] args) {
String source = "There are thirty-three big-apple";
Scanner in = new Scanner(source);
in.useDelimiter("\\s+|-"); // whitespace(s) or -
while (in.hasNext()) {
System.out.println(in.next());
}
}
open in browser PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
}
Feedback, comments, corrections, and errata can be sent to Chua Hock-Chuan ([email protected]) | HOME
open in browser PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com