
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Add a Column of Numbers in Bash
Overview
This article examines how to total up numeric columns of data in a bash shell, looking at the bash tools available for the task and comparing their speed.
Using The awk Tool
We'll start by calculating the sum of the values in a particular column using the awk (awk) program.
$ awk '{Total=Total+$1} END{print "Total is: " Total}' numbers.csv Total is: 49471228
Let's now take a look at the timing using the "time" command ?
$ time awk '{Total=Total+$1} END{print "Total is: " Total}' numbers.csv Total is: 49471228 real 0m0.228s user 0m0.141s sys 0m0.047s
When The File Contains Multiple Columns
We've seen how to calculate sums across one row of a file using awk. Now let's see how to calculate sums across multiple rows of a file.
$ cat prices.csv Books,40 Bag,70 Dress,80 Box,10
Here, the file price.csv has two columns. We now want to calculate the sum of the values in the second column.
$ awk -F "," '{Total=Total+$2} END{print "Total is: " Total}' prices.csv Total is: 200
When The File Contains a Header Line
Sometimes, text or CSV file headers also include a row containing column names. We'll use these column names to help us understand the contents of the file. Let's edit our prices.csv file and add a new row at the top ?
$ cat prices.csv Item,Value Books,40 Bag,70 Dress,80 Box,10
When the file includes a title, we'll want to remove the title before the text processing occurs. First, let's add an additional field to our input data set called "title". Then, we'll write a script that reads through each record in the file and adds the value of the new field to the total. Finally, we'll print out the results.
$ awk -F "," 'NR!=1{Total=Total+$2} END{print "Total is: " Total}' prices.csv Total is: 200
We'll then look at some other methods for adding up numbers in a column, and see how the awk approach compares to them.
Iterating With The Bash Loops
Awk is a powerful tool, but we could also use a for loop to iterate over each value in our columns.
Using the expr Command
We're going to run an experiment and see if we can use the expr function to calculate the total of the numbers inside the for loop.
$ time (sum=0;for number in `cat numbers.csv`; do sum=`expr $sum + $number`; done; echo "Total is: $sum") Total is: 49471228 real 212m48.418s user 7m19.375s sys 145m48.203s
Using Arithmetic Expansion
Since the use of "expand" did not help much, we'll try another approach using arithmetic expression ?
$ time (sum=0;for number in `cat numbers.csv`; do sum=$((sum+number)); done; echo "Total is: $sum") Total is: 49471228 real 0m1.961s user 0m1.813s sys 0m0.125s
In order to calculate the total, we'll utilize the arithmetic operator. Unlike the expr operator, the $(...) operator can handle both integers and floats.
Adding Values With the bc Command
The bc command performs calculations on expressions consisting of multiple lines. Therefore, we'll have to concatenate the numbers into one line, separating them with an addition operator. Then we'll pass the resulting string to bc to perform the calculation. Here are a few ways to do so.
Using The Paste Command
To start off, we'll take a look at the "PASET" command to arrange the first ten numbers of our data set on one line.
$ cat numbers.csv| head -10 | paste -sd+ - 2+44+6+15+23+0+15+88+82+1
The option -s makes sure that paste prints out each entry on its own line. We also used the +d option to add the "+" symbol as a separator between each entry.
Now that we've covered the basics, let's move on to providing our input to the bc command.
$ time echo "Total is: $(cat numbers.csv | paste -sd+ - | bc)" Total is: 49471228 real 0m0.244s user 0m0.203s sys 0m0.063s
Using The tr Command
Let's use the tr (transliterate) function again to create a new string from an existing one.
$ cat numbers.csv | head -10 |tr "
" "+" 2+44+6+15+23+0+15+88+82+1+
We added an extra zero at the beginning of the string so that we could use the bc command to perform addition. However, notice the extra '+' at the end of the string. To fix this, we can simply append another zero at the end of the line.
$ cat numbers.csv | head -10 |tr "
" "+" ; echo "0" 2+44+6+15+23+0+15+88+82+1+0
Let's redirect the result from the previous command to the bc command.
$ time ((cat numbers.csv | tr "
" "+" ; echo "0") | bc) 49471228 real 0m0.217s user 0m0.203s sys 0m0.031s
Combining tr and bc commands runs faster than using awk.
Using The sed Command
We'll finally be using the sed command to generate our sequence.
$ cat numbers.csv | head -10 | sed -z 's#
#+#g' 2+44+6+15+23+0+15+88+82+1+
We've replaced new lines ("
") with plus ("+") operators using the search and replacement option of the sed command, and printed zeros at the end to deal wth the extra plus operators.
$ time ((cat numbers.csv | sed -z 's#
#+#g' ; echo "0") | bc) 49471228 real 0m0.343s user 0m0.281s sys 0m0.109s
To change the meaning of the newlines, use the "-z" option. Instead of interpreting the newlines as the start of each input, they will now be treated as the null characters. So, if we replaced the newlines with the plus sign (+), then we would get the following output ?
You may want to use sed instead of tr if you're working with large amounts of text.
Conclusion
We looked at different methods for adding up columns of numbers in a bash shell.
We began by discussing the use of the awk command. We then discussed ways to handle files that contain multiple columns or a single column with a heading.