Awk Find And Replace Fields Values

by on November 26, 2012 · 7 comments· LAST UPDATED November 26, 2012

in , , ,

I have a data as follows :

foo bar 12,300.50
foo bar 2,300.50
abc xyz 1,22,300.50

How do I replace all , from 3rd field using awk and pass output to bc -l in the following format to get sum of all numbers:
12300.50+2300.50+1,22,300.50

Tutorial details
DifficultyEasy (rss)
Root privilegesNo
RequirementsGNU/awk
Estimated completion timeN/A

You can use gsub() funcion as follows. The syntax is:

gsub("find", "replace")
gsub("find-regex", "replace")
gsub("find-regex", "replace", t)
gsub(r, s [, t])

From the awk man page:

For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0. An & in the replacement text is replaced with the text that was actually matched. Use \& to get a literal &.

You can also use the following syntax:

gensub(r, s, h [, t])

From the awk man page:

Search the target string t for matches of the regular expression r. If h is a string beginning with g or G, then replace all matches of r with s. Otherwise, h is a number indicating which match of r to replace. If t is not supplied, $0 is used instead. Within the replacement text s, the sequence \n, where n is a digit from 1 to 9, may be used to indicate just the text that matched the n'th parenthesized subexpression. The sequence \0 represents the entire matched text, as does the character &. Unlike sub() and gsub(), the modified string is returned as the result of the function, and the original target string is not changed.

Example

Create a data file cat /tmp/data.txt

foo 	bar	12,300.50
foo	bar	2,300.50
abc	xyz	1,22,300.50

Type the following awk command:

 
awk '{ gsub(",","",$3); print $3 }' /tmp/data.txt
 

Sample outputs:

12300.50
2300.50
122300.50

You can pass the output to any command or calculate sum of the fields:

 
awk 'BEGIN{ sum=0} { gsub(",","",$3); sum += $3 } END{ printf "%.2f\n", sum}' /tmp/data.txt
 

OR build the list and pass to the bc -l:

awk '{ x=gensub(",","","G",$3); printf x "+" } END{ print "0" }' /tmp/data.txt   | bc -l
TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!

{ 7 comments… read them below or add one }

1 Andres November 27, 2012 at 9:55 pm

Nice tutorial :-)

Reply

2 Iona July 9, 2013 at 6:57 am

gsub doesn’t work for me…

data.txt
user1 123,433,345.55
user3 12.50

awk ‘{gsub(“.”,”,”,$2); print $2}’ ./data.txt

result:

,,,,,,,,,,,,,,
,,,,,

Reply

3 tytt July 24, 2013 at 12:25 pm

Too much commands.
Awk can substitute and delete fields in a string with a //// syntax.

Reply

4 priyabrata May 22, 2014 at 2:10 am

Can you please explain what does it mean by //// syntax

Reply

5 liquiddb May 31, 2014 at 3:06 pm

i am trying to open a virus data file and replace dna strings with number patterns generated from formula in hopes to innoculate the h.i.v. virus.

awk > virus-data ‘{ gsub(“a”, “84″, $1); gsub(“b”, “177″, $2); gsub(“c”, “16984″, $3)}’ >> pattern

liquiddb.net

Reply

6 liquiddb May 31, 2014 at 11:02 pm

Code from the H.I.V. virus can be computed with a formula simplifying strains into segments that would fit into immunity code.

1 2 3 4 5 6 7 8(n)
-4 -4 -4 -4 -4 -4 -4 -4
+1 +1 +1 +1 +1 +1 +1 +1
==========================
-3 3 -1 5 1 7 3 n
2 -2 4 0 6 2 8 4
| | | | | | | |
-1__1__3__5__7__n__n+3_n+4
| | | |
0 + n + 7n + n^2+7

— -1 –
| |- 0
2-| – 1 –
| |
–| 3 —
n-| |- n
–| 5 –
| |
| – 7 –
n^2-| |- 7n
| –n –
| |
–| 3+n–
n^2+4-| |- n^2 +7
–n+4–

0 + n + 7n +n^2 + 7 (virus sequence:01)

01
716
1121691
161627169116016
161627169116016
055545653805615
050011112585154
055010001333441
050511001200103
055540101120113
050014111012102
055013300111112
050512030100001—+
055541233110001—|—+
050013110201001—|—|–+
055012201221101—|—|–|–+
050511021101011—|—|–|–|–+
055540121011110 | | | | |
050014111110001 | | | | |
055013300001001 | | | | |
050512030001101 | | | | |
055541233001011 | | | | |
050013110301110 | | | | |
055012201331001 | | | | |
050511021202101 | | | | |
055540121122111 | | | | |
050014111010100 | | | | |
055013300111110 | | | | |
050512030100001—+ | | | |
055541233110001——-+ | | |
050013110201001———-+ | |
055012201221101————-+ |
050511021101011—————-+

2 + n + n^2 + n^2 +4 (virus sequence:01)

01
69
84177
142429111111
032227800000
031005180000
032105478000
031115131800
032004422780
031204020518
032124422547
031112020313
032001222322
031201100110
032101010101
031111111111
032000000000
031200000000
032120000000
031112000000
031112000000
032001200000
031201120000
032121012000
031111111200
032000000120
031200000112+
032120000101
031112000111
032001200100
031201120110
032121012101

Reply

7 john July 27, 2014 at 10:19 pm

With Python, C and Bash under my belt an extra language made no sense so I’m now using piep a Python pipeline processor for this sort of thing. With piep you do it like this (p is lines, pp is stream):

cat data.txt | piep ‘p.split()[2]‘

Pipe the output of cat to piep, split each line into lists then take the third item (Python lists count from 0). So far it seems a good replacement for awk and sed although I can’t seem to give up on grep and pgrep.

Reply

Leave a Comment

Tagged as: , , ,

Previous Faq:

Next Faq: