Regex Character Cases in Java
Last Updated :
11 Mar, 2022
Regular expressions (also called “Regex”) are special types of pattern-matching strings that describe a pattern in a text. A regex can be matched against another string to see whether the string fits the pattern. In general, regex consists of normal characters, character classes, wildcard characters, and quantifiers. We will talk specifically about character classes here. At times there’s a need to match any sequence that contains one or more characters, in any order, that is part of a set of characters. For example, to match whole words, we want to match any sequence of the letters of alphabets. Character classes come in handy for such use-cases. A character class is a set of characters such that characters are put between square brackets ‘[‘ and ‘]’. For example, class [abc] matches a, b, c characters. A range of characters can also be specified using hyphens. For example, to match whole words of lowercase letters, the [a-z] class can be used.
Note that a character class has no relation with a class construct or class files in Java. Also, the word “match” means a pattern exists in a string, it does not mean the whole string matches the pattern. A regex pattern lets us use two types of character classes:
- Predefined character classes and
- Custom character classes
Predefined Character Classes
Some frequently used character classes are predefined in Java, these are listed below. Characters in these classes are usually proceeded with a backslash “\” and need not reside in brackets “[” and “]”.
Predefined character classes
|
Meaning of predefined character classes
|
. (dot) |
This special character dot (.) matches any character. One dot matches one (any) character, two dots match two characters and so on. Dot characters may or may not match line terminators. |
\d |
This matches any digit character. This works the same as the character class [0-9]. |
\D |
This matches any character except for digits. This works the same as the character class [^0-9]. |
\s |
This matches any whitespace character. This includes a space ‘ ‘, a tab ‘\t’, a new line ‘\n’, a vertical tab ‘\x0B’, a form feed ‘\f’, a carriage return ‘\r’ and backspace character ‘\b’. |
\S |
This matches any character except for the whitespace characters listed above. |
\w |
This matches any word character, including both uppercase and lowercase, also including digit characters and the underscore character ‘_’. This works the same as the class [a-zA-z_0-9]. |
\W |
This matches any character except for word characters. This works the same as the class [^a-zA-z_0-9]. |
A few example regex patterns using predefined character classes:
Regex pattern using predefined character classes
|
Input String – Result
|
Input String – Result
|
Input String – Result
|
b.r |
bar – Match |
ab1r – Match |
ba1r – Does not match |
“b.r” regex means there can be any 1 character between “b” and “r”, the pattern is found in “bar” and “ab1r”, but is not found in “ba1r” as one dot matches only one character, but here there are more than one characters between “b” and “r”. |
|
\d\d-\d\d-\d\d\d\d |
01-01-2022 – Match |
12-31-2050 – Match |
2022-02-02 – Does not match |
“\d\d-\d\d-\d\d\d\d” regex is a naive regex for date in format “DD-MM-YYYY” all characters are digits. The regex is “naive” because it matches dates of the format “MM-DD-YYYY” too and dates > 31 or months > 12 are not taken care of either. |
|
\d\d-\D\D\D-\d\d\d\d |
01-JAN-2022 – Match |
31-12-2050 – Does not match |
22-a1B-1234 – Does not match |
“\d\d-\d\d-\d\d\d\d” regex is another naive regex for the date in format “DD-MMM-YYYY” where date and year characters are digits and month characters are anything other than digits. |
|
…\s… |
abc xyz – Match |
abc_xyz – Does not match |
abc <tab_space> xyz – Match |
“…\s…” regex means two groups of any 3 characters separated by any whitespace character. As “_” is not a whitespace character, “abc_xyz” does not match. |
|
…\S… |
123 456 – Does not match |
123+456 – Match |
abc_xyz – Match |
“…\S…” regex means two groups of any 3 characters separated by any character other than a whitespace character. As ” ” (space) is a whitespace character, “123 456” does not match. |
|
\w\w\w\W\w\w\w |
abc xyz – Match |
LMN_opq – Does not match |
123+456 – Match |
“\w\w\w\W\w\w\w” regex means two groups of 3 word characters separated by any non-word character. As “_” is a word character, “LMN_opq” does not match. |
Custom Character Classes
Java allows us to define character classes of our own using […]. A few examples of custom character classes are as follows:
Example of custom character class
|
Meaning of the custom character class
|
b[aeiou]t |
This regex means pattern must start with “b” followed by any of the vowels “a”, “e”, “i”, “o”, “u” followed by “t”. Strings “bat”, “bet”, “bit”, “bot”, “but” would match this regex, but “bct”, “bkt”, etc. would not match. |
[bB][aAeEiIoOuU][tT] |
Such a regex can be used to allow uppercase letters too in the previous regex. So the strings “bAT”, “BAT”, etc. would match the pattern. |
b[^aeiou]t |
“^” at the beginning of character class works as negation or complement, such that this regex means any character other than vowels is allowed between “b” and “t”. Strings “bct”, “bkt”, “b+t”, etc. would match the pattern. Using a ‘^’ at the beginning of character class has a special meaning, but using ‘^’ anywhere in the class apart from at the beginning, acts like any other normal character. |
[a-z][0-3] |
Range of letters and digits can be specified in character classes using the hyphen “-“. Strings “a1”, “z3”, etc. match the pattern. Strings “k7”, “n9”, etc. does not match. |
[a-zA-Z][0-9] |
More than one range can be used in a class. Strings “A1”, “b2”, etc. match the pattern. |
[A-F[G-Z]] |
Nesting character classes simply add them, so this class is the same as [A-Z] class. |
[a-p&&[l-z]] |
Intersection of ranges also works in character classes. This regex means characters “l”, “m”, “n”, “o”, “p” would match in a string. |
[a-z&&[^aeiou]] |
Subtraction of ranges also works in character classes. This regex means vowels are subtracted from the range “a-z”. |
Regex patterns discussed so far require that each position in the input string match a specific character class. For example, the “[a-z]\s\d” pattern requires a letter at the first position, a whitespace character at the second position, and a digit at the third position. These patterns are inflexible, restrictive, and require more maintenance efforts. To solve this issue quantifiers can be used in character classes. Using quantifiers we can specify the number of times a character in a regex may match the sequence of characters.
Quantifiers
|
Meaning of the quantifier
|
* |
Zero or more times |
Placing an asterisk “*” after a character class means “allow any number of occurrences of that character class”. For example, “0*\d” regex matches any number of leading zeroes followed by a digit. |
|
|
+ |
One or more times |
“+” plus sign has the same effect as XX*, meaning a pattern followed by pattern asterisk. For example, “0+\d” regex matches at least one leading zeroes followed by a digit. |
|
|
? |
zero or one time |
“?” question mark sign allows either zero or one occurrence. For example, “\w\w-?\d\d” regex matches 2-word characters followed by an optional hyphen and then followed by 2 digit characters. |
|
|
{m} |
Exactly “m” times |
|
|
{m, } |
At least “m” times |
|
|
{m, n} |
At least “m” times and at most “n” times |
Similar Reads
Character Class in Java
Java provides a wrapper class Character in java.lang package. An object of type Character contains a single field, whose type is char. The Character class offers a number of useful class (i.e., static) methods for manipulating characters. You can create a Character object with the Character construc
11 min read
Convert Character Array to String in Java
Strings are defined as an array of characters. The difference between a character array and a string is the string is terminated with a special character "\0". A character array can be converted to a string and vice versa. In this article, we will discuss how to convert a character array to a string
4 min read
Find Word Character in a String with JavaScript RegExp
Here are the different ways to find word characters in a string with JavaScript RegExp 1. Using \w to Find Word CharactersThe \w metacharacter in regular expressions matches any word character, which includes: A to Z (uppercase letters)a to z (lowercase letters)0 to 9 (digits)Underscore (_)You can u
3 min read
Metacharacters in Java Regex
Regex stands for Regular Expression, which is used to define a pattern for a string. It is used to find the text or to edit the text. Java Regex classes are present in java.util.regex package, which needs to be imported before using any of the methods of regex classes. java.util.regex package consis
5 min read
Java.lang.Character.Subset Class in Java
Character.Subset Class represents particular subsets of the Unicode(standards using hexadecimal values to express characters - 16bit) character set. The subset, it defines in Character set is UnicodeBlock. Declaration : public static class Character.Subset extends Object Constructors : protected Cha
2 min read
java.lang.Character Class Methods | Set 1
java.lang.Character Class wraps the value of a primitive data type char to an object of datatype Character. This object contains a single field having the data type char. This class provides several methods regarding character manipulations like converting them from lowercase to uppercase. Character
6 min read
Pattern Class in Java
The Pattern class in Java is used for defining regular expressions (regex) to perform pattern matching on strings. It is part of the java.util.regex package and it plays a key role in searching, replacing, and manipulating strings based on patterns. The Matcher class works together with Pattern to p
3 min read
java.lang.Character class - methods | Set 2
java.lang.Character class - methods | Set 1 Following methods of Character class are discussed here : toUpperCase() : java.lang.toUpperCase(char arg) method converts given character in argument to its Upper case based on Unicode values. Syntax : public static char toUpperCase(char arg) Parameters :
4 min read
Matcher Class in Java
In Java, Matcher is a class that is implemented by the MatchResult interface, that performs match operations on a character sequence by interpreting a Pattern. Below, we can see the declaration of java.util.regex.Matcher in java.lang.Object Class: public final class Matcher extends Object implements
4 min read
Convert Characters of a String to Opposite Case in JavaScript
Here are the various approaches to convert each character of a string to its opposite case (uppercase to lowercase and vice versa). Using for Loop and if-else Condition - Most CommonIn this approach, we iterate through each character of the string. Using an if-else statement, it checks if each chara
3 min read