≡ Menu

Awk Find And Replace Fields Values

I have a data as follows :

foo bar 12,300.50
foo bar 2,300.50
abc xyz 1,22,300.50

How do I replace all , from 3rd field using awk and pass output to bc -l in the following format to get sum of all numbers:
12300.50+2300.50+1,22,300.50

Tutorial details
DifficultyEasy (rss)
Root privilegesNo
RequirementsGNU/awk
Estimated completion timeN/A

You can use gsub() funcion as follows. The syntax is:

gsub("find", "replace")
gsub("find-regex", "replace")
gsub("find-regex", "replace", t)
gsub(r, s [, t])

From the awk man page:

For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0. An & in the replacement text is replaced with the text that was actually matched. Use \& to get a literal &.

You can also use the following syntax:

gensub(r, s, h [, t])

From the awk man page:

Search the target string t for matches of the regular expression r. If h is a string beginning with g or G, then replace all matches of r with s. Otherwise, h is a number indicating which match of r to replace. If t is not supplied, $0 is used instead. Within the replacement text s, the sequence \n, where n is a digit from 1 to 9, may be used to indicate just the text that matched the n'th parenthesized subexpression. The sequence \0 represents the entire matched text, as does the character &. Unlike sub() and gsub(), the modified string is returned as the result of the function, and the original target string is not changed.

Example

Create a data file cat /tmp/data.txt

foo 	bar	12,300.50
foo	bar	2,300.50
abc	xyz	1,22,300.50

Type the following awk command:

 
awk '{ gsub(",","",$3); print $3 }' /tmp/data.txt
 

Sample outputs:

12300.50
2300.50
122300.50

You can pass the output to any command or calculate sum of the fields:

 
awk 'BEGIN{ sum=0} { gsub(",","",$3); sum += $3 } END{ printf "%.2f\n", sum}' /tmp/data.txt
 

OR build the list and pass to the bc -l:

awk '{ x=gensub(",","","G",$3); printf x "+" } END{ print "0" }' /tmp/data.txt   | bc -l
Tweet itFacebook itGoogle+ itPDF itFound an error/typo on this page?

{ 9 comments… add one }

  • Andres November 27, 2012, 9:55 pm

    Nice tutorial :-)

  • Iona July 9, 2013, 6:57 am

    gsub doesn’t work for me…

    data.txt
    user1 123,433,345.55
    user3 12.50

    awk ‘{gsub(“.”,”,”,$2); print $2}’ ./data.txt

    result:

    ,,,,,,,,,,,,,,
    ,,,,,

  • tytt July 24, 2013, 12:25 pm

    Too much commands.
    Awk can substitute and delete fields in a string with a //// syntax.

    • priyabrata May 22, 2014, 2:10 am

      Can you please explain what does it mean by //// syntax

  • liquiddb May 31, 2014, 3:06 pm

    i am trying to open a virus data file and replace dna strings with number patterns generated from formula in hopes to innoculate the h.i.v. virus.

    awk > virus-data ‘{ gsub(“a”, “84”, $1); gsub(“b”, “177”, $2); gsub(“c”, “16984”, $3)}’ >> pattern

    liquiddb.net

  • liquiddb May 31, 2014, 11:02 pm

    Code from the H.I.V. virus can be computed with a formula simplifying strains into segments that would fit into immunity code.

    1 2 3 4 5 6 7 8(n)
    -4 -4 -4 -4 -4 -4 -4 -4
    +1 +1 +1 +1 +1 +1 +1 +1
    ==========================
    -3 3 -1 5 1 7 3 n
    2 -2 4 0 6 2 8 4
    | | | | | | | |
    -1__1__3__5__7__n__n+3_n+4
    | | | |
    0 + n + 7n + n^2+7

    — -1 —
    | |- 0
    2-| – 1 —
    | |
    –| 3 —
    n-| |- n
    –| 5 —
    | |
    | – 7 —
    n^2-| |- 7n
    | –n —
    | |
    –| 3+n–
    n^2+4-| |- n^2 +7
    –n+4–

    0 + n + 7n +n^2 + 7 (virus sequence:01)

    01
    716
    1121691
    161627169116016
    161627169116016
    055545653805615
    050011112585154
    055010001333441
    050511001200103
    055540101120113
    050014111012102
    055013300111112
    050512030100001—+
    055541233110001—|—+
    050013110201001—|—|–+
    055012201221101—|—|–|–+
    050511021101011—|—|–|–|–+
    055540121011110 | | | | |
    050014111110001 | | | | |
    055013300001001 | | | | |
    050512030001101 | | | | |
    055541233001011 | | | | |
    050013110301110 | | | | |
    055012201331001 | | | | |
    050511021202101 | | | | |
    055540121122111 | | | | |
    050014111010100 | | | | |
    055013300111110 | | | | |
    050512030100001—+ | | | |
    055541233110001——-+ | | |
    050013110201001———-+ | |
    055012201221101————-+ |
    050511021101011—————-+

    2 + n + n^2 + n^2 +4 (virus sequence:01)

    01
    69
    84177
    142429111111
    032227800000
    031005180000
    032105478000
    031115131800
    032004422780
    031204020518
    032124422547
    031112020313
    032001222322
    031201100110
    032101010101
    031111111111
    032000000000
    031200000000
    032120000000
    031112000000
    031112000000
    032001200000
    031201120000
    032121012000
    031111111200
    032000000120
    031200000112+
    032120000101
    031112000111
    032001200100
    031201120110
    032121012101

  • john July 27, 2014, 10:19 pm

    With Python, C and Bash under my belt an extra language made no sense so I’m now using piep a Python pipeline processor for this sort of thing. With piep you do it like this (p is lines, pp is stream):

    cat data.txt | piep ‘p.split()[2]’

    Pipe the output of cat to piep, split each line into lists then take the third item (Python lists count from 0). So far it seems a good replacement for awk and sed although I can’t seem to give up on grep and pgrep.

  • anil September 17, 2014, 2:35 pm

    any one can solve this problem,
    i have one folder, in that many file names are there. i need to print the path of the files by using awk script.

    please tell me this is possible or not.

    • Amani Musomba May 5, 2015, 7:36 am

      Hi Anil,
      it is possible..
      You need to use readlink -f

      You can use this one line script to get what you want, i hope this one helps
      for line in `ls -ltrh|awk -F” ” ‘{print $9}’`; do readlink -f $line; done

Leave a Comment