
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Extract Substring Using Linux Bash
Overview
Extracting a substring from a string is a basic and common operation of text processing in Linux.
We're looking at different ways to extract substrings from strings using the Linux command line here.
Extracting an Index-Based Substring
Let's first take a quick glance at how to extract index-based substring using four different methods.
Using the cut command
Using the awk command
Using Bash's substring expansion
Using the expr command
Next, we'll see them in action.
Using The cut Command
We can extract characters starting at position N through position M from the input string using "cut" command.
To use the cut command to fix our issue, we must add 1 to the starting index and subtract 1 from the ending index. Therefore, the new intervals will be 4-8 and 9-13 respectively.
Now, we'll see if the cut command solves the problem.
$ cut -c 5-9 <<< '0123Linux9' Linux
We've found the expected substring, "Linux" ? no longer an issue.
We passed the input string to our cut function via a here-string, then echoed out the result.
Using The awk Command
If we want to solve some text processing problems in Linux, we don't need to remember any specific tools. We just need to use awk.
The substr() functions takes three arguments. Let's examine each one of them in detail.
s ? The input string
i ? The start index of the substring (awk uses the 1-based index system)
n ? The length of the substring. If it's omitted, awk will return from index i until the last character in the input string as the substring
Let's now see whether awk?s substring() function can provide us with the desired output.
$ awk '{print substr($0, 5, 5)}' <<< '0123Linux9' Linux
We start at position 0 (the first character) and count up to position 4 (the last character). Then we add one to account for the fact that we're counting from 1 instead of 0.
Using Bash's Substring Expansion
We've seen how cut and awks can easily extract substring-like strings.
Instead of using sed, which doesn't support substring expansion, use bash, which does.
Today, bash is the default command line interpreter for most modern Linux distributions. In other words, if we want to use the command line, we don't need to install anything else.
$ STR="0123Linux9" $ echo ${STR:4:5} Linux
Using The expr Command
The expr (expression) is a core utility in the GNU Core Utilities package. It means that it's available for all Linux systems.
Further, expr has a subcommand called substr which allows us to extract substring from an expression.
expr substr <input_string> <start_index> <length>
You may want to mention that the expr function works using the 1-based indexing system.
Let's say we want to extract the first two words from each line of text. We could use the substring function with
$ expr substr "0123Linux9"5 5 Linux
The output above indicates that the expr solution worked.
Extracting a Pattern-Based Substring
Now we're going to explore patterns-substrings, in addition to the indexed substrings that we've already learned.
We'll discuss two ways to solve our problem: one approach, which we'll
Using the cut command
Using the awk command
We'll take another approach to solving this problem by looking at a different type of string matching problem.
Using The cut Command
The "field" commands are useful tools for working with field-related data.
Let's take a quick look at our problem. We have an input value which is separated by commas. And we want to get the third item from that list.
We can use awk to split the line into fields using commas (,-) as separators, and then print out the third field (-f3).
$ cut -d , -f 3 <<< "Eric,Male,28,USA" 28
We achieved our desired results and fixed the issue.
Using the awk Command
Awks are also good at handling field-based input. A compact awkish one-liner can solve this problem.
$ awk -F',' '{print $3}' <<< "Eric,Male,28,USA" 28
Furthermore, since awk's field separator (FS), which allows for regular expressions, we can build more generic solutions using awk.
For this reason, the "C" option isn't a good choice for solving this problem. It would only support one character as the field delimiters.
It's still easy to use awk.
$ awk -F', ' '{print $3}' <<< "Eric, Male, 28, USA" 28
You can use an awk command to work in both situations. This could be a handy trick in the real word.
$ awk -F', ?' '{print $3}' <<< "Eric, Male, 28, USA" 28 $ awk -F', ?' '{print $3}' <<< "Eric,Male,28,USA" 28
A Different Pattern-Based Substring Case
We've already dealt with the "Eric's birthday" issue. Now let's look at another one.
Although in theory, the pattern-matching substring should be present in a CSV file, this may not always be the case. As a demonstration, let's look at an example.
Awk is an excellent tool for solving this kind of challenge. However, it doesn't always use the cut command.
Let's now look at how we solve this problem using awk. We store the input string into a variable called $STR so that our commands become easier to read.
$ STR="whatever dataBEGIN:Interesting dataEND:something else" $ awk -F'BEGIN:|END:' '{print $2}' <<< "$STR" Interesting data
$ awk '{ sub(/.*BEGIN:/, ""); sub(/END:.*/, ""); print }' <<< "$STR" Interesting data
The first awk statement sets the beginning (or end) of each line as the delimiter, and then takes the second column.
After executing these two substitutions, our final output will be the desired one. We just need to display it.
Conclusion
Text processing is a key component of Linux. Depending on the needs, specific substrings can be determined through pattern- or index-related parameters.
Through examples, we have looked at how to extract substrings from both types of strings.