Perl | Searching in a File using regex
Last Updated :
07 Jun, 2019
Prerequisite: Perl | Regular Expressions
Regular Expression (Regex or Regexp or RE) in Perl is a special text string for describing a search pattern within a given text. Regex in Perl is linked to host language and are not the same as in PHP, Python, etc. Sometimes these are termed as “Perl 5 Compatible Regular Expressions”. To use the Regex, Binding operators like =~
(Regex Operator) and !~
(Negated Regex Operator) are used.
These Binding regex operators are used to match a string from a regular expression. The left-hand side of the statement will contain a string which will be matched with the right-hand side which will contain the specified pattern. Negated regex operator checks if the string is not equal to the regular expression specified on the right-hand side.
Regex operators help in searching for a specific word or a group of words in a file. This can be done in multiple ways as per the user’s requirement. Searching in Perl follows the standard format of first opening the file in the read mode and further reading the file line by line and then look for the required string or group of strings in each line. When the required match is found, then the statement following the search expression will determine what is the next step to do with the matched string, it can be either added to any other file specified by the user or simply printed on the console.
Within the regular expression created to match the required string with the file, there can be multiple ways to search for the required string:
Regular Search:
This is the basic pattern of writing a regular expression which looks for the required string within the specified file. Following is the syntax of such a Regular Expression:
$String =~ /the/
This expression will search for the lines in the file which contain a word with letters ‘the‘ in it and store that word in the variable $String
. Further, this variable’s value can be copied to a file or simply printed on the console.
Example:
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt' ;
open (FH, $file ) or die ( "File $file not found" );
while ( my $String = <FH>)
{
if ( $String =~ /the/)
{
print "$String \n" ;
}
}
close (FH);
}
main();
|
Output:

As it can be seen that the above search also results in the selection of words which have ‘the’ as a part of it. To avoid such words the regular expression can be changed in the following manner:
$String =~ / the /
By providing spaces before and after the required word to be searched, the searched word is isolated from both the ends and no such word that contains it as a part of it is returned in the searching process. This will solve the problem of searching extra words which are not required. But, this will result in excluding the words that contain comma or full stop immediately after the requested search word.
To avoid such situation, there are other ways as well which help in limiting the search to a specific word, one of such ways is using the word boundary.
Using Word Boundary in Regex Search:
As seen in the above Example, regular search results in returning either the extra words which contain the searched word as a part of it or excluding some of the words if searched with spaces before and after the required word. To avoid such a situation, word boundary is used which is denoted by ‘\b
‘.
$String =~ /\bthe\b/;
This will limit the words which contain the requested word to be searched as a part of it and will not exclude the words that end with a comma or full stop.
Example:
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt' ;
open (FH, $file ) or die ( "File $file not found" );
while ( my $String = <FH>)
{
if ( $String =~ /\bthe\b/)
{
print "$String \n" ;
}
}
close (FH);
}
main();
|
Output:

As it can be seen in the above given example, the word which is ending with full stop is included in the search but the words which contain the searched words as a part are excluded. Hence, word boundary can help overcome the problem created in the Regular Search method.
What if there is a case in which there is a need to find words that either start or end or both with specific characters? Then that can’t be done with the use of Regular Search or the word boundary. For cases like these, Perl allows the use of WildCards in the Regular Expression.
Use of Wild Cards in Regular Expression:
Perl allows to search for a specific set of words or the words that follow a specific pattern in the given file with the use of Wild cards in Regular Expression. Wild cards are ‘dots’ placed within the regex along with the required word to be searched. These wildcards allow the regex to search for all the related words that follow the given pattern and will display the same. Wild cards help in reducing the number of iterations involved in searching for various different words which have a pattern of letters in common.
$String =~ /t..s/;
Above pattern will search for all the words which start with t, end with s, and have two letters/characters between them.
Example:
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt' ;
open (FH, $file ) or die ( "File $file not found" );
while ( my $String = <FH>)
{
if ( $String =~ /t..s/)
{
print "$String \n" ;
}
}
close (FH);
}
main();
|
Output:

Above code contains all the words as specified in the given pattern.
In this method of printing the searched words, the whole line that contains that word gets printed which makes it difficult to find out exactly what word is searched by the user. To avoid this confusion, we can only print the searched words and not the whole sentence. This is done by grouping the searched pattern with the use of parentheses. To print this grouping of words, $number
variables are used.
$number variables
are the matches from the last successful match of the capture groups that are formed in the regular expression. e.g. if there are multiple groupings in the regular expression then $1
will print the words that match the first grouping, similarly, $2
will match the second grouping and so on.
Given below is the above program transformed using the $number variables to show only the searched words and not the whole sentence:
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt' ;
open (FH, $file ) or die ( "File $file not found" );
while ( my $String = <FH>)
{
if ( $String =~ /(t..s)/)
{
print "$1 \n" ;
}
}
close (FH);
}
main();
|
Output:

Similar Reads
Perl - Extracting Date from a String using Regex
In Perl generally, we have to read CSV (Comma Separated Values) files to extract the required data. Sometimes there are dates in the file name like sample 2014-02-12T11:10:10.csv or there could be a column in a file that has a date in it. These dates can be of any pattern like YYYY-MM-DDThh:mm:ss or
5 min read
Perl | Extract IP Address from a String using Regex
Perl stands for Practical Extraction and Reporting Language and this not authorized acronym. One of the most powerful features of the Perl programming language is Regular Expression and in this article, you will learn how to extract an IP address from a string. A regular expression can be either sim
4 min read
Perl | Reading a CSV File
Perl was originally developed for the text processing like extracting the required information from a specified text file and for converting the text file into a different form. Reading a text file is a very common task in Perl. For example, you often come across reading CSV(Comma-Separated Value) f
7 min read
Perl | Anchors in Regex
Anchors in Perl Regex do not match any character at all. Instead, they match a particular position as before, after, or between the characters. These are used to check not the string but its positional boundaries. Following are the respective anchors in Perl Regex: '^' '$', '\b', '\A', '\Z', '\z', '
5 min read
Perl | Assertions in Regex
Regular Expression (Regex or RE) in Perl is when a special string describing a sequence or the search pattern in the given string. An Assertion in Regular Expression is when a match is possible in some way. The Perl's regex engine evaluates the given string from left to right, searching for the matc
3 min read
Perl | String functions (length, lc, uc, index, rindex)
String in Perl is a sequence of character enclosed within some kinds of quotation marks. Perl string can contain UNICODE, ASCII and escape sequence characters. Perl provides the various function to manipulate the string like any other programming language. Some string functions of Perl are as follow
4 min read
Perl | Useful String functions
A string in Perl is a scalar variable and start with a ($) sign and it can contain alphabets, numbers, special characters. The string can consist of a single word, a group of words or a multi-line paragraph. The String is defined by the user within a single quote (â) or double quote (â). Perl provid
3 min read
Perl | Regex Cheat Sheet
Regex or Regular Expressions are an important part of Perl Programming. It is used for searching the specified text pattern. In this, set of characters together form the search pattern. It is also known as regexp. When user learns regular expression then there might be a need for quick look of those
6 min read
Perl | STDIN in Scalar and List Context
STDIN in Perl is used to take input from the keyboard unless its work has been redefined by the user. Syntax: <STDIN> or <> STDIN in Scalar Context In order to take input from the keyboard or operator is used in Perl. This operator reads a line entered through the keyboard along with the
3 min read
Perl | Use of STDIN for Input
Perl allows the programmer to accept input from the user to perform operations on. This makes it easier for the user to give input of its own and not only the one provided as Hardcoded input by the programmer. This Input can then be processed and printed with the use of print() function. Input to a
2 min read