awk, sed, tr, cut
Objectives
After studying this lesson, you should
be able to:
– awk: a pattern scanning and processing
language
– sed: stream editor
– tr: translate characters
– cut: cut specific columns vertically
awk
• awk, a pattern scanning and processing
language, helps to produce reports that
look professional.
• Named after its developers Aho,
Weinberger, and Kernighan.
• Search files to see if they contain lines
that match specified patterns and then
perform associated actions.
awk
awk [–Fsep] ‘pattern{action}’ filenames
• awk checks to see if the input records in the
specified files satisfy the pattern.
• If they do, awk executes the action associated with
it.
• If no pattern is specified, the action affects every
input record.
awk
awk [–Fsep] ‘pattern{action}’ filenames
• -Fsep options allows you to specify the field
separator. By default this is set to whitespace
(SPACE and TAB). –F: means the field separator
is a colon.
• A common use of awk is to process input files by
formatting them, and then output the results in the
chosen form.
Different Way to Run awk
• awk –f awkFile inputFile
– Since awk itself can be a complex
language, you can store all the
commands in a file and run it with the –f
flag.
– We will not cover it in this lecture.
Important awk Concepts
• Record
– Every line of an input file is a record.
• The current record can be referenced with $0.
• awk operates on one record at a time.
• Field
– A record consists of fields, which by default are
separated by any number of SPACES or TABS.
– Each field is numbered and can be referred to
• $1 is the first field, $2 is the second, etc.
awk Example
• A sample data file named countries.
Can[Link]North America
US[Link]North America
Brazil:3286:134:South America
Englan[Link]urope
Fran[Link]urope
Japan:144:120:Asia
Mexico:762:78:North America
Chin[Link]sia
Indi[Link]sia
• country name, area (thousands of km^2),
population density (millions), continent
awk Example
We could use awk to format it:
awk -F: '{ printf "%-10s\t%d\t%d\t%15s\n",$1,$2,$3,$4 }'
countries
Output:
Some build-in Variables
• NF - Number of fields in current record
• $NF - Last field of current record
• NR - Number of records processed so far
• FILENAME - Name of current input file
• FS - Field separator (default: SPACE or TAB)
• $0 - Entire line
• $1, $2, …, $n - Field 1, 2, …, n
PRINTF - formatting output
• The awk version of printf is similar to that of the
C language.
printf "control-string“, arg1, arg2, ... , argn
• The control-string determines how printf will
format arg1 - argn. Within the control-string,
you can use “\n" to indicate a NEWLINE and “\t"
to indicate a TAB.
• The control-string contains conversion
specifications, one for each argument.
PRINTF - formatting output
A conversion specification has the following format:
%[-][x[.y]]conv
- causes printf to left justify the argument.
x is the minimum field width.
.y is the number of places to the right of a decimal
point in a number.
conv is a letter from the following list:
d decimal o unsigned octal
e exponential notation s string of characters
f floating point number x unsigned hexadecimal
g use f or e, whichever is shorter
awk Example Revisited
awk -F: '{ printf "%-10s\t%d\t%d\t%15s\n",$1,$2,$3,$4 }'
countries
• -F: option instructed awk to separate the input data
into fields delimited by colons.
• No particular pattern was specified so awk performed
the print action for every line of the file.
• $1,$2,$3,$4 means printing four fields. Each field has
a conversion specification. e.g.,
– %-10s\t indicates that field $1 is to appear as a string. –
specifies the string is to be left-justified. The minimum field
width is 10. \t indicates a TAB.
Selecting Records
• awk opens a file and reads it serially, one line at a
time.
• By specifying a pattern, we can select only those
lines that contain a certain string of characters.
• A string of characters placed between forward
slashes (//) is called a regular expression. Any
occurrence of that pattern within a line will cause it to
be selected.
• awk '/Europe/' countries
– display all countries which are situated within
Europe.
Selecting Records (Contd)
• If you want to select records on the basis of
data in a particular field, you can use a
matching operator such as the equal signs.
• awk -F: '$3 == 55' countries
– The third field (which tell us each countries
population) is tested against the value 55, and one
record is selected.
• Matching operators are :
== equal to != not equal to
> greater than < less than
>= greater than or equal to <= less than or
equal to
Using Logic Operators
• We can use logical operators to
combine several conditions.
– To select a record from the results file that
satisfies more than one condition awk uses
the symbols && as the and operator.
– || indicates the or operator.
Questions
Sample file named cars:
• Q1: How to select all the cars which were made after
or during 1991 (column 3) and cost less than $6,250
(column 4)?
• Q2: How to select cars made either by ford, or buick?
Data processing & Arithmetic
Sample file named wages:
The three field titles are:
Employee, rates of pay per hour, weekly hours
Data processing & Arithmetic
(Contd)
• If the tax is 25%, we can calculate and
display each employee’s GROSS pay and
TAX like this:
awk '{ printf "%-10s\t%.2f \t%d\t%.2f \t%.2f\n",
$1,$2,$3,$2*$3,$2*$3*0.25 }' wages
Questions
What is the output of the following
commands?
• Q3: awk -F: '{ print $1 }' /etc/passwd |
sort
• Q4: awk -F: '{ print "username: " $1 "\t\
tuid:" $3 }' /etc/passwd
sed
• sed stands for stream editor, works as a filter
processing input line by line.
• sed is a non-interactive editor used to make
global changes to entire files at once.
• An interactive editor like vi would be too
cumbersome to replace large amounts of
information at once.
• sed command is primarily used to substitute
one pattern for another.
sed
• Syntax:
sed ‘command’ file(s)
sed –e ‘command’ –e ‘command’ … file(s)
sed –f scriptfile file(s)
Useful sed Commands
Commands Example Explanation
d 4,8d Delete the 4th through 8th
lines
s s/old/new/ Replace old with new
Patterns Revisited
^ beginning of the line
$ end of the line
. any single character
(character)* arbitrarily many occurrences of
(character)
(character)? 0 or 1 instance of (character)
[abcdef] Match any character enclosed in [ ]
(in this instance, a b c d e or f)
[^abcdef] Match any character NOT enclosed
in [ ] (in this instance, any
character other than a b c d e or f)
sed Substitute
• SUBSTITUTE(s)
[address1[ , address2]]s/pattern/replacement/[flags]
Flags:
n replace nth instance of pattern with replacement
g replace all instances of pattern with replacement
p write to STDOUT if a successful substitution takes
place
w file write to file if a successful substitution takes place
sed Substitute (Contd)
[address1[ , address2]]s/pattern/replacement/[flags]
• An address can be
– a regular expression enclosed by forward slashes
/regex/ , or
– a line number .
• The $ symbol can be used to denote the last line.
• If one address is given, then the substitution is
applied to lines containing that address.
• If two addresses are given separated by a
comma, then the substitution is applied to all
lines between the two lines that match the
pattern.
Questions
• Q5: What does the following command do?
sed 's/Tx/Texas/' foo
• Q6: What is the output of following
command?
cat animal
I have three dogs and two cats.
sed -e 's/dog/cat/g' -e 's/cat/elephant/g' animal
Questions
• Q7: What is the output of following
command?
cat animal1
The black cat was chased by the brown dog.
The black cat was not chased by the brown dog.
sed -e '/not/s/black/white/g' animal1
sed Delete
• DELETE(d)
[address1[, address2] ]d
• sed 6d foo
– deletes line 6
• Q8: How to delete lines 1-10 from the file
foo
• Q9: How to delete lines 11 through the end
of the file foo
Questions
• Q10: What does the following command do?
sed ‘/^Co*t/,/[0-9]$/d’ foo
• Q11: What is the output of the following
command?
cat linefile
line 1 (one)
line 2 (two)
line 3 (three)
sed -e '/^line.*one/s/line/LINE/' -e '/line/d' linefile
Questions
• Q12: How to deletes every line in the
file log that contains the string warning?
• Sed can delete a string, not the entire
line, substitute text with nothing.
• Q13: How to removes the string draft
everywhere it occurs in the file foo?
tr
• translates characters from stdin to stdout.
• tr [options] string1 [string2]
Options:
-c complement the set of characters
specified by string1. The complement is the
set of all characters not in string1
-d delete all occurrences of input
characters specified by string1
-s replace instances of repeated
characters with a single character
tr Examples
• tr '[a-z]' '[A-Z]' < trfile
– replaces all lower case characters with
upper case in file trfile
• tr ' ' '\012' < trfile
– turn spaces into newlines (ASCII code 012)
• Q14: How to translates only lower case a
through m to upper case A though M in a
file?
Questions
• tr -d string1 lets you delete any
character matched in string1.
• tr -d '[a-z]'
– deletes all lower case characters
• Q15: How to delete all vowels?
• Q16: How to delete all characters
except vowels?
cut
• cut - cut out selected fields of each line
of a file
• The cut command has a very narrow
set of capabilities, but when you’re
extracting specific columns of
information, it’s a winner.
cut Examples
• cut -d: -f1 /etc/passwd
– Extract usernames from /etc/passwd
– -d option is used to specify : as the field separator,
default is a TAB
– -f option specifies the first field
– -d may only be used with -f
• Q17: What is the output of the following
command?
who am i | cut -f1 -d' '
Questions
• cat cutfile
Line number 1
Line number 2
Line number 3
Line number 4
• Q18: How can you get the following output
using cut?
1
2
3
4
Lecture Summery
• awk: a pattern scanning and processing
language
• sed: stream editor
• tr: translate one character to another
• cut: cut specific columns vertically
Next Lecture
• Shell programming (Chap 17)
• Quiz #2