≡ Menu

Processing the delimited files using cut and awk

From wikipedia, "Delimited data uses specific characters (delimiters) to separate its values. Most database and spreadsheet programs are able to read or save data in a delimited format.".

So how do you process delimited files under Linux shell prompt?

Processing the delimited files using cut

cut command print selected parts of lines from each FILE (or variable) i.e. it remove sections from each line of files:

For example /etc/passwd file is separated using character : delimiters.

To print list of all users, type the following command at shell prompt:

$ cut -d: -f1 /etc/passwd

Output:

root
you
me
vivek
httpd

Where,

  • -d : Specifies to use character : as delimiter
  • -f1 : Print first field, if you want print second field use -f2 and so on...

Now consider variable service. Let us print out mail word using cut command:
$ service="http mail ssh"
$ echo $service | cut -d' ' -f2

mail

Note that a blank space is used as delimiter.

Processing the delimited files using awk

You can also use awk command for same purpose:
$ awk -F':' '{ print $1 }' /etc/passwd

Output:

root
you
me
vivek
httpd

Where,

  • -F: - Use : as fs (delimiter) for the input field separator
  • print $1 - Print first field, if you want print second field use $2 and so on

Processing the delimited files using cut or awk is an essential skill for sys admin. Let us say you want to find out if particular service is active or not using shell script (download link):

#!/bin/bash
ports="22 80 25"
service="SSH WEB MAIL"
c=1
echo "Running services status:"
/bin/netstat -tulpn | grep -vE '^Active|Proto' | while read LINE
do
 # get active port name and use : as delimiter
 t=$(echo $LINE | awk '{ print $4}' | cut -d: -f2)
 [ "$t" == "" ] && t=-1 || :
 # get service name from $services and : as delimiter
 sname=$(echo $service | cut -d' ' -f$c)
 sstatus="$sname: No"
 # now compare port
 for i in $ports
 do
  if [ $i -eq $t ]; then
   sstatus="$sname: Ok"
  fi
 done
 # display service status as OK or NO
 echo "$sstatus"
 #next service please
 c=$( expr $c + 1 )
 # break afer 3 services
 [ $c -ge 4 ] && break || :
done

Execute a script:
$ chmod +x script.sh
$ ./script.sh

Output:

SSH: Yes
WEB: No
MAIL: Yes

As you see above script use cut and awk to extract one or many contiguous or non-contiguous fields from shell variables and command output. Download complete working script that sends an email alert to admin user.

Tweet itFacebook itGoogle+ itPDF itFound an error/typo on this page?

Comments on this entry are closed.

  • James D. Keeline August 9, 2006, 10:07 pm

    This article has some bad formatting in the source code. The single (‘) and double quotes (“) are presented as if they were smart quotes. However, if one tries to type in the back-tick (`) when the single quote (‘) was intended, Bash will reject this. The display should use a standard monospaced font with no smart quotes.

  • nixCraft August 9, 2006, 11:12 pm

    James,

    Thanks for suggestion :) I will update CSS code, which is responsible for displaying source code using code class.

    Appreciate your post.

  • steve_w April 4, 2009, 8:52 am

    Nixcraft …I think James meant to say very good post with very clear examples…you’ve always gotta be careful of quotes after they’ve been copied and pasted around a windows environment or in word docs etc anyhow…

  • Ahmet Alp April 23, 2009, 6:40 pm

    Thank you some much.

    Super script.

  • ravinder May 17, 2011, 3:33 pm

    Good job. Very neat

  • Eduardo February 28, 2012, 9:41 pm

    Great examples, helped me a lot here. Fast to solve a simple problem here. Thanks!

  • Kiran March 10, 2012, 12:22 pm

    Hi !!

    This may not be the right place to ask but i madly need the solution for this.
    I need a script which should generate report for Http request and response,it should contain
    1. Date
    2. Time
    3. Number of Packets
    4. Size of Packets
    5. Source IP
    6. Destination IP

    It should capture the http request on given url. Thanks in advance :-) please send me mail !!!!!

    • Simon March 20, 2012, 12:38 pm

      Do your homework Kiran.

      • Kiran March 21, 2012, 6:13 am

        Hay Simon, I tried many stuff didn’t work out any help plzzzz

  • Simon March 26, 2012, 3:55 am

    Hello Kiran,

    Have you tried using Wireshark (formerly Ethereal)?
    Try running it by calling it in a script and parse it’s output any way you’d like it.
    You’ll be pleasantly surprised at what you can do with it ;-)

    Give it a whirl and let me know what comes of it.

  • vishal September 21, 2012, 3:30 am

    Hi Guys,

    I have a file which has header and footers and data is over multiple iteration *with headers and footers), I want to delimit the data and dont want headers footers. I will be sincerely thankful if someone can help me on this. My objective is to data crunching in order to do performance analysis, its just that parsing peace is not going fine.

    data in file:

    09:31:56 12/13/11 r/w I/O per second KBytes per sec Svt ms IOSz KB
    VVname Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen
    admin r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    admin w 17 17 17 68 68 68 0.2 0.2 4.1 4.1 –
    admin t 17 17 17 68 68 68 0.2 0.2 4.1 4.1 0
    z0001_app0136_s.254 r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    z0001_app0136_s.254 w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    z0001_app0136_s.254 t 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0
    ………………..
    ………………..
    ssm9903_ASRU.126 t 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0
    ———————————————————————————–
    394 r 87 87 4354 4354 2.7 2.7 49.9 49.9 –
    394 w 126 126 906 906 0.1 0.1 7.2 7.2 –
    394 t 213 213 5260 5260 1.2 1.2 24.7 24.7 0

    09:32:57 12/13/11 r/w I/O per second KBytes per sec Svt ms IOSz KB
    VVname Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen
    admin r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    admin w 32 24 32 130 99 130 0.2 0.2 4.1 4.1 –
    admin t 32 24 32 130 99 130 0.2 0.2 4.1 4.1 0
    z0001_app0136_s.254 r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    z0001_app0136_s.254 w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –

    and data over in file goes over again and again.

  • Mohammed September 25, 2012, 2:02 pm

    Hi & Peace for everyone,

    If you want to select all data whith-out first Ligne and the last Ligne (head and tail ).

    the script:

    #!bin/bash
    ########################################
    n=`wc -l < ur_file.txt` ##number of lignes in your file
    let n1="$n"-1 ##to show all lignes except the last ligne
    let n2="$n"-2 ##from the selection n1, we show all except the first ligne
    head -"$n1" ur_file.txt|tail -"$n2"
    

    That script select all data except the head and tail lignes.

    If i miss understund your problem just tel me.

  • Pratham April 19, 2013, 7:51 am

    I am giving the parameter to my script which are comma separated and I want to get the count fot the comma separated items, I am using below command but not successful.

    script name – abc.sh
    script content

    #!\usr\bin\bash
    FILES=$1
    COUNT=$FILES | awk -F, ‘{ print N }’
    :wq

    command line – ./abc.sh a,b,c,d

    expected output = 3

  • AliDsten November 3, 2014, 12:33 pm

    HI

    I have file name A contains ab|cd|ef|fgh|ijk

    so how i can i get the output like this
    this is input1 ab
    this is input2 cd
    this is input3 ef and so on.

    thanks