Open In App

Working with Regular Expressions in PostgreSQL

Last Updated : 10 Oct, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Regular expressions, often referred to as "regex," are patterns used to match strings. In PostgreSQL, regular expressions allow us to search, validate, and manipulate text data in a powerful way. Regular expressions are helpful whether we need to find patterns in a string, replace parts of a text, or validate the format of an input.

In this article, we will explain PostgreSQL regular expressions, their syntax, and practical examples, making it an essential tool for any database professional looking to optimize text-related queries.

What are Regular Expressions?

Regular expressions are sequences of characters that form a search pattern. we can use these patterns to match specific sequences of characters in a string. PostgreSQL provides various operators and functions that help in pattern matching using regular expressions, making it a flexible tool for text processing.

Syntax:

SELECT column_name
FROM table_name
WHERE column_name ~ 'pattern';

key terms:

  • ~: This operator is used to match the text in the column column_name against the regex pattern provided in the string 'pattern'.

Key Operators and Functions for Regular Expressions in PostgreSQL

The main operators and functions for working with regular expressions in PostgreSQL include:

  • ~ (Matches regular expression): Checks if a string matches a regular expression.
  • ~* (Case-insensitive match): Same as ~, but ignores case.
  • !~ (Does not match regular expression): Checks if a string does not match a regular expression.
  • !~* (Case-insensitive non-match): Same as !~, but ignores case.
  • regexp_replace(): Replaces substring(s) matching a regular expression.
  • regexp_matches(): Returns all matches of a pattern in a string.

Example of Regular Expressions in PostgreSQL

Let’s create a simple table called users, which will store user data like usernames, emails, and phone numbers. This table can now be used for various queries and operations involving regular expressions to match patterns or manipulate the data.

Query:

CREATE TABLE users (
id SERIAL PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100),
phone VARCHAR(15)
);

INSERT INTO users (username, email, phone) VALUES
('john_doe', '[email protected]', '555-1234'),
('janeDoe22', '[email protected]', '555-4321'),
('BobSmith', '[email protected]', '123-4567');

Output:

idusernameemailphone
1john_doe[email protected]555-1234
2janeDoe22[email protected]555-4321
3BobSmith[email protected]123-4567

Example 1: Matching a Pattern Using Regular Expressions

This example demonstrates how to use regular expressions to search for specific patterns within a column in PostgreSQL. This allows for precise searching and filtering based on text patterns within the data. If we want to find all users whose email addresses contain "example," use the following query

Query:

SELECT username, email
FROM users
WHERE email ~ 'example';

Output:

Example1
Output

Explanation:

In this query, the ~ operator is used to find rows where the email column contains the word "example." The regular expression pattern 'example' is used to match this text in the email addresses.

Example 2: Case-Insensitive Pattern Matching

In many cases, we need to perform searches that ignore case sensitivity, especially when dealing with unstructured data. PostgreSQL provides the ~* operator for this purpose, allowing us to match patterns in a case-insensitive manner. To find all users whose usernames contain "doe", we can use the case-insensitive operator ~*

Query:

SELECT username
FROM users
WHERE username ~* 'doe';

Output:

Example2
Output

Explanation:

The ~* operator performs a case-insensitive match, so it finds both "john_doe" and "janeDoe22," regardless of case. This feature helps in performing flexible searches where we don't want the results to be restricted by exact case matches, making it a powerful tool for improving user experience and data processing.

Example 3: Using regexp_replace() to Clean Data

In PostgreSQL, the regexp_replace() function allows us to efficiently remove or replace unwanted characters using regular expressions. This function is particularly useful for standardizing data formats or cleaning up values before further processing. To clean up phone numbers by removing dashes, use the regexp_replace() function

Query:

SELECT username, regexp_replace(phone, '-', '', 'g') AS cleaned_phone
FROM users;

Output:

Example3
Output

Explanation:

In this query, the regexp_replace() function finds all instances of - in the phone column and replaces them with an empty string '', effectively removing the dashes. The result is a column named cleaned_phone, where the phone numbers have been stripped of dashes.

Example 4: Finding Users with Valid Emails

Regular expressions in PostgreSQL can help validate email formats by checking for specific patterns, such as the presence of an "@" symbol, a domain name, and a valid top-level domain (TLD).

This validation can prevent common errors like missing domains or incorrect email structures. we can use a regular expression to find users with valid email addresses (e.g., those containing an "@" symbol and a domain):

Query:

SELECT username, email
FROM users
WHERE email ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$';

Output:

Example4
Output

Explanation:

This regular expression checks for valid email addresses by ensuring they have an "@" symbol and a valid domain name structure. All the users in the output have valid email addresses that meet these criteria, making them suitable for further processing, such as sending confirmation emails or verifying user identities.

Conclusion

PostgreSQL's support for regular expressions provides powerful tools for text matching, validation, and transformation. With operators like ~ and functions like regexp_replace(), we can manipulate and filter string data in highly flexible ways. This makes regex important for tasks like data validation, cleaning etc.


Next Article

Similar Reads