0% found this document useful (0 votes)
173 views

30 Examples For Awk Command in Text Processing

The document discusses the awk command, which provides a programming language for text processing and stream editing. Some key points: - Awk allows defining variables, using arithmetic/string operators on data, and adding logic with control flow like if/then statements and loops. - It can generate formatted reports from log files containing large numbers of lines. - The awk command reads a program script from the command line or a file to manipulate and display data in a text stream. - It automatically assigns variables like $1 and $2 to each data field in a line separated by a field separator character like a space.

Uploaded by

Gopal Patel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
173 views

30 Examples For Awk Command in Text Processing

The document discusses the awk command, which provides a programming language for text processing and stream editing. Some key points: - Awk allows defining variables, using arithmetic/string operators on data, and adding logic with control flow like if/then statements and loops. - It can generate formatted reports from log files containing large numbers of lines. - The awk command reads a program script from the command line or a file to manipulate and display data in a text stream. - It automatically assigns variables like $1 and $2 to each data field in a line separated by a field separator character like a space.

Uploaded by

Gopal Patel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

 


 


 


 

30 Examples For Awk Command


In Text Processing
Learn how to take your stream editing to the next level, using a
programming language designed specifically for that purpose.

 by 
seco max
 ·
 Feb. 23, 17 · Web Dev Zone
Like (11)
 
 Comment (0)
 
Save
 
 Tweet
 3,836 Views
Make the transition to Node.js if you are a Java, PHP, Rails or .NET
developer with  these resources to help jumpstart your Node.js knowledge plus pick
up some development tips.  Brought to you in partnership with  IBM.
In the previous post, we talked about sed Linux command and looked at some
examples of how to use it in, and why it works well with, text processing.
Nobody can deny that sed is a very handy tool, but it has some limitations.
Sometimes you need a more advanced tool for manipulating data, one that
provides a more programming-like environment which gives you more control
over modifying data in a file that is more robust. This is where awk
command comes in.

The awk command takes stream editing one step further than the sed editor by
providing a programming language instead of just editor commands. Within
the awk programming language, you can do the following:

 Define variables to store data.


 Use arithmetic and string operators to operate on data.
 Use structured programming concepts and control flow, such as if-then
statements and loops, to add logic to your text processing.
 Generate formatted reports.

Actually generating formatted reports comes in handy when working with log
files that contain hundreds or maybe millions of lines, and output a readable
report from which you can benefit.

Our Main Points Are: 


Command options.

Reading the program script from the command line.

Using data field variables.

Using multiple commands.

Reading the program from a file.

Running scripts before processing data.

Running scripts after processing data.

Built-in variables.

Data variables.

User defined variables.


Structured Commands.

Formatted Printing.

Built-In Functions.

User Defined Functions.

awk Command Options


The awk command has a basic format as follows:

$ awk options program file
And these are some of the options for awk command that you will use often:

 -F fs - Specifies a file separator for the fields in a line.

 -f file - Specifies a filename from which to read the program.

 -v var=value - Defines a variable and default value used in the awk


command.

 –mf N - Specifies the maximum number of fields to process that are


contained in the data file.

 –mr N - Specifies the maximum record size in the data file.

 -W - Keyword Specifies the compatibility mode or warning level for


awk.

The real power of awk is in the program script. You can write scripts to read
the data within a text line and then manipulate and display the data to create
any type of output report.

Reading the Program Script From the


Command Line
awk program script is defined by opening and closing braces. You must place
script commands between the two braces and because the awk command line
assumes that the script is a single text string, you must enclose your script in
single quotation marks like this:

$ awk '{print "Welcome to awk command tutorial"}'


If you run this command nothing will happen because no filename was
defined in the command line.

The awk command retrieves data from STDIN. When you run the program, it
just waits for text to come in via STDIN.

If you type a line of text and press Enter, the awk command runs the text
through the program script. Just like the sed editor, the awk command
executes the program script on each line of text available in the data stream.
As the program script is set to display a fixed text string, you get the same text
output.

$ awk '{print "Welcome to awk command tutorial "}'

Any strings we typed return the same welcome string we provide.

To terminate the program we have to send End-of-File (EOF) character. The


Ctrl+D key combination generates an EOF character in bash. Maybe you're
disappointed by this example, but wait for the awesomeness!

Using Data Field Variables


One of the primary features of awk is its ability to manipulate data in the text
file. It does this by automatically assigning a variable to each data element in a
line. By default, awk assigns the following variables to each data field it detects
in the line of text.

 $0 represents the entire line of text.


 $1 represents the first data field in the line of text.
 $2 represents the second data field in the line of text.
 $n represents the nth data field in the line of text.

Each data field is determined in a text line by a field separation character. The


default field separation character in awk is any whitespace character like tab
or space.

Look at the following file and see how awk deals with it.

$ awk '{print $1}' myfile

This command uses the $1 field variable to display only the first data
field for each line of text.
Sometimes the separator in some files is not space or a tab but something else.
You can specify it using the –F option.

$ awk -F: '{print $1}' /etc/passwd
This command displays the first data field in the passwd file because the
/etc/passwd file uses a colon to separate the data fields.

Using Multiple Commands


Any programming language wouldn’t be very useful if you could only execute
one command. The awk programming language allows you to combine
commands into a normal program. To use multiple commands on the
command line, just place a semicolon between each command.

$ echo "My name is Tom" | awk '{$4="Adam"; print $0}'

The first command assigns a value to the $4 field variable. The second
command then prints the entire line.

Reading the Program From a File


As with sed command, the awk command allows you to store your scripts in a
file and refer to them in the command line with the –f option. Our file
contains the following script:

{print $1 " has a  home directory at " $6}


$ awk -F: -f testfile /etc/passwd
Here we print the username which is the first field ($1) and the home path
which is the sixth field ($6) from /etc/passwd and we specify the file that
contains that script which is called myscipt with -f option and make sure the
separator is specified with capital -F.

You can specify multiple commands in the script file. Just place each
command on a separate line. You don’t need to use semicolons

This is our file:

1
2
3 { text = " has a home directory at " print $1 text $6 }
4
5
6
7

$ awk -F: -f testfile /etc/passwd
Here we define a variable that holds a text string used in the print command.

Running Scripts Before Processing Data


Sometimes, you may need to run a script before processing data, in order to
create a header section for a report or something similar.

The BEGIN keyword is used to accomplish this. It forces awk to execute the
script specified after the BEGIN keyword and before awk reads the data.

$ awk 'BEGIN {print "Hello World!"}'


Let’s apply it to something we can see the result
Now after awk command executes the BEGIN script, it uses the second script
to process any file data. Be careful when doing this; both of the scripts are still
considered one text string on the awk command line. You need to place your
single quotation marks accordingly.

Running Scripts After Processing Data


The END keyword allows you to specify a program script that awk command
executes after reading the data.
$ awk 'BEGIN {print "The File Contents:"} {print $0} END {print "End of
File"}' myfile

When the awk command is finished printing the file contents, it executes the
commands in the END script. This is useful to use to add the footer as
an example.

We can put all these elements together into a nice little script file.
BEGIN { print "The latest list of users and shells"
print " UserName \t HomePath"
print "-------- \t -------" FS=":" }
{ print $1 " \t " $6 }
END { print "The end" }

This script uses the BEGIN script to create a header section for the report. It
also defines the file separator FS and prints the footer at the end. Then we use
the following file:
$ awk -f myscript  /etc/passwd

This gives you a small taste of the power available when you use simple awk
scripts.

Built-In Variables
The awk command uses built-in variables to reference specific features within
the program data. We’ve seen the data field variables $1, $2, and so on to
extract data fields. We also deal with the field separator FS which, by default,
is a whitespace character, such as space or a tab. But those are not the only
variables, there are more built-in variables

The following list is some of the built-in variables that awk command use:

 FIELDWIDTHS - A space-separated list of numbers defining the exact


width (in spaces) of each data field.
 FS - Input field separator character.

 RS - Input record separator character.

 OFS - Output field separator character.

 ORS - Output record separator character.

By default, awk sets the OFS variable as a space. By setting the OFS variable,
you can use any string to separate data fields in the output.

$ awk 'BEGIN{FS=":"; OFS="-"} {print $1,$6,$7}' /etc/passwd

The FIELDWIDTHS variable allows you to read records without using a field
separator character. In some situations, instead of using a field separator, data
is placed in specific columns within the record. In these instances, you must
set the FIELDWIDTHS variable to match the layout of the data in the records.
After you set the FIELDWIDTHS variable, awk ignores the FS and calculates
data fields based on the provided field width sizes.

Suppose we have this content:

1
2 1235.9652147.91 
3   927-8.365217.27 
4   36257.8157492.5
5

$ awk 'BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}'


Look at the output. The FIELDWIDTHS variable defines four data fields, and
awk command parses the data record accordingly. The string of numbers in
each record is split based on the defined field width values.

The RS and ORS variables define how your awk command handles records in
the data. By default, awk sets the RS and ORS variables to the newline
character which means that each new line of text in the input data stream is a
new record. Sometimes, you run into situations where data fields are spread
across multiple lines in the data stream, like an address and phone number,
each on a separate line.
Person Name
123 High Street
 
(222) 466-1234
 
 
 
Another person
 
487 High Street
 
(523) 643-8754
$ awk 'BEGIN{FS="\n"; RS=""} {print $1,$3}' addresses
If you try to read this data using the default FS and RS variable values,
awk reads each line as a separate record and interprets each space in the
record as a field separator. This is not what you want. To solve this
problem, you need to set the FS variable to the newline character. This
indicates that each line in the data is a separate field and all the data on a
line belongs to the data field and set the RS variable to an empty string.
The awk command interprets each blank line as a record separator.
Awesome! The awk command interpreted each line in the file as a data field
and the blank lines as record separators.

Data Variables
Besides the built-in variables we’ve seen, awk provides some other built-in
variables to help you know what’s going on with your data and extract
information from the shell environment.

 ARGC - The number of command line parameters present.

 ARGIND - The index in ARGV of the current file being processed.

 ARGV - An array of command line parameters.

 ENVIRON - An associative array of the current shell environment


variables and their values.

 ERRNO - The system error if an error occurs when reading or closing


input files.

 FILENAME - The filename of the data file used for input to the awk
command.

 FNR - The current record number in the data file.

 IGNORECASE - If set to a non-zero value ignores the case of


characters in strings used in the awk command.

 NF - The total number of data fields in the data file.


 NR - The number of input records processed.

You should recognize a few of these variables from previous posts about shell
scripting series. The ARGC and ARGV variables allow you to retrieve the
number of command line parameters. This can be a little tricky because awk
command doesn’t count the script as part of the command line parameters.

$ awk 'BEGIN{print ARGC,ARGV[1]}' myfile

The ENVIRON variable uses an associative array to retrieve shell environment


variables like this.
$ awk ' BEGIN{ print ENVIRON["HOME"] print ENVIRON["PATH"] }'

The NF variable allows you to specify the last data field in the record
without having to know its position.
$ awk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd
The NF variable contains the numerical value of the last data field in the data
file. You can then use it as a data field variable by placing a dollar sign in front
of it.

The FNR and NR variables are similar to each other, but with slight
differences. The FNR variable contains the number of records processed in the
current data file. The NR variable contains the total number of records
processed. Let’s take a look at those two examples to illustrate the difference:

$ awk 'BEGIN{FS=","}{print $1,"FNR="FNR}' myfile myfile

In this example, the awk command defines two input files (it specifies the
same input file twice.) The script prints the first data field value and the
current value of the FNR variable. Notice that the FNR value was reset to 1
when the awk command processed the second data file. Now, let’s add the NR
variable and see the difference:
$ awk ' BEGIN {FS=","}
{print $1,"FNR="FNR,"NR="NR}
END{print "There were",NR,"records processed"}' myfile myfile

The FNR variable value was reset when awk processed the second data file,
but the NR variable maintained its count into the second data file.

User Defined Variables


Like any other programming language, awk allows you to define your own
variables for use within the script. The awk user-defined variable name can be
any number of letters, digits, and underscores, but it can’t begin with a
digit.Like in shell scripting, you can assign a variable like this:
$ awk ' BEGIN{ test="This is a test" print test }'
Structured Commands
The awk programming language supports the standard if-then-else format of
the if statement. You must define a condition for the if statement to evaluate,
enclosed in parentheses.

Here is an example:

Our testfile contains the following

10

15

33

45

$ awk '{if ($1 > 20) print $1}' testfile

It's just that simple.

You can execute multiple statements in the if statement, but you must enclose
them with braces.
$ awk '{ if ($1 > 20) { x = $1 * 2 print x } }' testfile

The awk if statement also supports the else clause like this 
awk '{   if ($1 > 20)   {   x = $1 * 2   print x   } else   {   x = $1 / 2  
print x   }}' testfile

You can use the else clause on a single line, but you must use a semicolon after
the if statement
$ awk '{if ($1 > 20) print $1 * 2; else print $1 / 2}' testfile

While Loop
The while loop allows you to iterate over a set of data, checking a condition
that stops the iteration

cat myfile
124 127 130

112 142 135

175 158 245

1 2 $ awk '{  


total = 0   i = 1  
while (i < 4)   {  
total += $i   i++  
}   avg = total / 3
print
"Average:",avg   }'
testfile  
The while statement iterates through the data fields in the record, adding each
value to the total variable and incrementing the counter variable i.

When the counter value is equal to 4, the while condition becomes FALSE, and
the loop terminates, dropping through to the next statement in the script. That
statement calculates the average and prints the average.

The awk programming language supports using the break and continue
statements in while loops, allowing you to jump out of the middle of the loop.

12
34
56
78
9
10
11
12
13
14  $  awk '{   total = 0   i = 1   while (i < 4)   {   total += $i   if (i == 2)  
15 break   i++   }   avg = total / 2   print "The average of the first two
16 elements is:",avg   }' testfile 
17
18
19
20
21
22
23
24
25
The For Loop
The For loop is a common method used in many programming languages for
looping. The awk programming language supports for loops.

1 2 $ awk '{   total


= 0   for (i = 1; i
< 4; i++)   {  
total += $i   }  
avg = total / 3  
print
"Average:",avg   }'
testfile  
By defining the iteration counter in the for loop, you don’t have to worry about
incrementing it yourself as you did when using the while statement.

Formatted Printing
The printf command in awk allows you to specify detailed instructions on how
to display data. It specifies exactly how the formatted output should appear,
using both text elements and format specifiers. A format specifier is a special
code that indicates what type of variable is displayed and how to display it.
The awk command uses each format specifier as a placeholder for each
variable listed in the command. The first format specifier matches the first
variable listed, the second matches the second variable, and so on.

The format specifiers use the following format:

%[modifier]control-letter

This list is the format specifiers you can use with printf:

c             Displays a number as an ASCII character.

d             Displays an integer value.

i              Displays an integer value (same as d).

e             Displays a number in scientific notation.


f              Displays a floating-point value.

g             Displays either scientific notation or floating point, whichever is


shorter.

o             Displays an octal value.

s             Displays a text string.

Here we use printf to format our output.


1
2
3
4 $ awk 'BEGIN{   x = 100 * 100   printf "The result is: %e\n", x   }'
5
6
7

Here as an example, we display a large value using scientific notation %e. We


are not going to try every format specifier. You get the concept.

Built-In Functions
The awk programming language provides quite a few built-in functions that
perform mathematical, string, and time functions. You can utilize these
functions in your awk scripts.

Mathematical Functions
If you love math, these are some of the mathematical functions you can use
with awk:
 cos(x) - The cosine of x, with x specified in radians.

 exp(x) - The exponential of x.

 int(x) - The integer part of x, truncated toward 0.

 log(x) - The natural logarithm of x.

 rand() - A random floating point value larger than 0 and less than 1.

 sin(x) - The sine of x, with x specified in radians.

 sqrt(x) - The square root of x.

These mathematical funcations can be used normally.

$ awk 'BEGIN{x=exp(5); print x}'

String Functions
There are many string functions you can use, but we will examine one of them
as an example since they are all guided by the same principles. 

$ awk 'BEGIN{x = "likegeeks"; print toupper(x)}'

The function 'toupper' converts the case to upper case for the string passed.

User Defined Functions


You can create your own functions for use in awk scripts, just define them and
use them.
$ awk ' function myprint()
{ printf "The user %s has home path at %s\n", $1,$6 }
BEGIN{FS=":"} { myprint() }' /etc/passwd

  

Here we define a function called 'myprint' then we use it in out script to


print output using printf built-in function.
And I'll wrap it up with that last example. 

Thank you!

You might also like