Awk Find And Replace Fields Values

Posted on in Categories , , , last updated November 26, 2012

I have a data as follows :

foo bar 12,300.50
foo bar 2,300.50
abc xyz 1,22,300.50

How do I replace all , from 3rd field using awk and pass output to bc -l in the following format to get sum of all numbers:
12300.50+2300.50+1,22,300.50


You can use gsub() funcion as follows. The syntax is:

gsub("find", "replace")
gsub("find-regex", "replace")
gsub("find-regex", "replace", t)
gsub(r, s [, t])

From the awk man page:

For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0. An & in the replacement text is replaced with the text that was actually matched. Use \& to get a literal &.

You can also use the following syntax:

gensub(r, s, h [, t])

From the awk man page:

Search the target string t for matches of the regular expression r. If h is a string beginning with g or G, then replace all matches of r with s. Otherwise, h is a number indicating which match of r to replace. If t is not supplied, $0 is used instead. Within the replacement text s, the sequence \n, where n is a digit from 1 to 9, may be used to indicate just the text that matched the n’th parenthesized subexpression. The sequence \0 represents the entire matched text, as does the character &. Unlike sub() and gsub(), the modified string is returned as the result of the function, and the original target string is not changed.

Example

Create a data file cat /tmp/data.txt

foo 	bar	12,300.50
foo	bar	2,300.50
abc	xyz	1,22,300.50

Type the following awk command:

awk '{ gsub(",","",$3); print $3 }' /tmp/data.txt

Sample outputs:

12300.50
2300.50
122300.50

You can pass the output to any command or calculate sum of the fields:

awk 'BEGIN{ sum=0} { gsub(",","",$3); sum += $3 } END{ printf "%.2f\n", sum}' /tmp/data.txt

OR build the list and pass to the bc -l:

awk '{ x=gensub(",","","G",$3); printf x "+" } END{ print "0" }' /tmp/data.txt   | bc -l

12 comment

  1. gsub doesn’t work for me…

    data.txt
    user1 123,433,345.55
    user3 12.50

    awk ‘{gsub(“.”,”,”,$2); print $2}’ ./data.txt

    result:

    ,,,,,,,,,,,,,,
    ,,,,,

    1. the “.” is being evaluated as a wildcard:

      In regular expressions, the period (“.”, also called “dot”) is the wildcard pattern character that matches a single character. Combined with the asterisk operator (.*) it will match any number of characters.

      Add the “\” escape character in front of it so that it will be interpreted literally.

  2. i am trying to open a virus data file and replace dna strings with number patterns generated from formula in hopes to innoculate the h.i.v. virus.

    awk > virus-data ‘{ gsub(“a”, “84”, $1); gsub(“b”, “177”, $2); gsub(“c”, “16984”, $3)}’ >> pattern

    liquiddb.net

  3. Code from the H.I.V. virus can be computed with a formula simplifying strains into segments that would fit into immunity code.

    1 2 3 4 5 6 7 8(n)
    -4 -4 -4 -4 -4 -4 -4 -4
    +1 +1 +1 +1 +1 +1 +1 +1
    ==========================
    -3 3 -1 5 1 7 3 n
    2 -2 4 0 6 2 8 4
    | | | | | | | |
    -1__1__3__5__7__n__n+3_n+4
    | | | |
    0 + n + 7n + n^2+7

    — -1 —
    | |- 0
    2-| – 1 —
    | |
    –| 3 —
    n-| |- n
    –| 5 —
    | |
    | – 7 —
    n^2-| |- 7n
    | –n —
    | |
    –| 3+n–
    n^2+4-| |- n^2 +7
    –n+4–

    0 + n + 7n +n^2 + 7 (virus sequence:01)

    01
    716
    1121691
    161627169116016
    161627169116016
    055545653805615
    050011112585154
    055010001333441
    050511001200103
    055540101120113
    050014111012102
    055013300111112
    050512030100001—+
    055541233110001—|—+
    050013110201001—|—|–+
    055012201221101—|—|–|–+
    050511021101011—|—|–|–|–+
    055540121011110 | | | | |
    050014111110001 | | | | |
    055013300001001 | | | | |
    050512030001101 | | | | |
    055541233001011 | | | | |
    050013110301110 | | | | |
    055012201331001 | | | | |
    050511021202101 | | | | |
    055540121122111 | | | | |
    050014111010100 | | | | |
    055013300111110 | | | | |
    050512030100001—+ | | | |
    055541233110001——-+ | | |
    050013110201001———-+ | |
    055012201221101————-+ |
    050511021101011—————-+

    2 + n + n^2 + n^2 +4 (virus sequence:01)

    01
    69
    84177
    142429111111
    032227800000
    031005180000
    032105478000
    031115131800
    032004422780
    031204020518
    032124422547
    031112020313
    032001222322
    031201100110
    032101010101
    031111111111
    032000000000
    031200000000
    032120000000
    031112000000
    031112000000
    032001200000
    031201120000
    032121012000
    031111111200
    032000000120
    031200000112+
    032120000101
    031112000111
    032001200100
    031201120110
    032121012101

  4. With Python, C and Bash under my belt an extra language made no sense so I’m now using piep a Python pipeline processor for this sort of thing. With piep you do it like this (p is lines, pp is stream):

    cat data.txt | piep ‘p.split()[2]’

    Pipe the output of cat to piep, split each line into lists then take the third item (Python lists count from 0). So far it seems a good replacement for awk and sed although I can’t seem to give up on grep and pgrep.

  5. any one can solve this problem,
    i have one folder, in that many file names are there. i need to print the path of the files by using awk script.

    please tell me this is possible or not.

    1. Hi Anil,
      it is possible..
      You need to use readlink -f

      You can use this one line script to get what you want, i hope this one helps
      for line in `ls -ltrh|awk -F” ” ‘{print $9}’`; do readlink -f $line; done

  6. Didn’t work for me on Mac OS X. I had to use…

    awk '{ gsub("find","replace"); print }' ./file.txt

    instead of the suggested…

    awk '{ gsub("find","replace",$3); print $3 }' ./file.txt

  7. Hi,
    I have a file with a keyword ‘FIX(number number) ‘ in it. This keyword occurs around 1000+ times.
    I want to carry out a multiplication operation on the numbers inside ‘FIX’.
    I could get all the keywords and could carry out the operation using awk, but I need to replace the new result in my original file.
    Is there a way I can awk, process the data and replace it into an original file ?

Leave a Comment