Processing the delimited files using cut and awk

Posted on in Categories Shell scripting last updated July 17, 2006

From wikipedia, “Delimited data uses specific characters (delimiters) to separate its values. Most database and spreadsheet programs are able to read or save data in a delimited format.“.

So how do you process delimited files under Linux shell prompt?

Processing the delimited files using cut

cut command print selected parts of lines from each FILE (or variable) i.e. it remove sections from each line of files:

For example /etc/passwd file is separated using character : delimiters.

To print list of all users, type the following command at shell prompt:

$ cut -d: -f1 /etc/passwd

Output:

root
you
me
vivek
httpd

Where,

  • -d : Specifies to use character : as delimiter
  • -f1 : Print first field, if you want print second field use -f2 and so on…

Now consider variable service. Let us print out mail word using cut command:
$ service="http mail ssh"
$ echo $service | cut -d' ' -f2

mail

Note that a blank space is used as delimiter.

Processing the delimited files using awk

You can also use awk command for same purpose:
$ awk -F':' '{ print $1 }' /etc/passwd

Output:

root
you
me
vivek
httpd

Where,

  • -F: – Use : as fs (delimiter) for the input field separator
  • print $1 – Print first field, if you want print second field use $2 and so on

Processing the delimited files using cut or awk is an essential skill for sys admin. Let us say you want to find out if particular service is active or not using shell script (download link):

#!/bin/bash

ports="22 80 25"
service="SSH WEB MAIL"
c=1

echo "Running services status:"

/bin/netstat -tulpn | grep -vE '^Active|Proto' | while read LINE
do
 # get active port name and use : as delimiter
 t=$(echo $LINE | awk '{ print $4}' | cut -d: -f2)
 [ "$t" == "" ] && t=-1 || :
 # get service name from $services and : as delimiter
 sname=$(echo $service | cut -d' ' -f$c)
 sstatus="$sname: No"
 # now compare port
 for i in $ports
 do
  if [ $i -eq $t ]; then
   sstatus="$sname: Ok"
  fi
 done
 # display service status as OK or NO
 echo "$sstatus"
 #next service please
 c=$( expr $c + 1 )
 # break afer 3 services
 [ $c -ge 4 ] && break || :
done

Execute a script:
$ chmod +x script.sh
$ ./script.sh

Output:

SSH: Yes
WEB: No
MAIL: Yes

As you see above script use cut and awk to extract one or many contiguous or non-contiguous fields from shell variables and command output. Download complete working script that sends an email alert to admin user.

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter, Facebook, Google+.

14 comment

  1. This article has some bad formatting in the source code. The single (‘) and double quotes (“) are presented as if they were smart quotes. However, if one tries to type in the back-tick (`) when the single quote (‘) was intended, Bash will reject this. The display should use a standard monospaced font with no smart quotes.

  2. Nixcraft …I think James meant to say very good post with very clear examples…you’ve always gotta be careful of quotes after they’ve been copied and pasted around a windows environment or in word docs etc anyhow…

  3. Hi !!

    This may not be the right place to ask but i madly need the solution for this.
    I need a script which should generate report for Http request and response,it should contain
    1. Date
    2. Time
    3. Number of Packets
    4. Size of Packets
    5. Source IP
    6. Destination IP

    It should capture the http request on given url. Thanks in advance 🙂 please send me mail !!!!!

  4. Hello Kiran,

    Have you tried using Wireshark (formerly Ethereal)?
    Try running it by calling it in a script and parse it’s output any way you’d like it.
    You’ll be pleasantly surprised at what you can do with it 😉

    Give it a whirl and let me know what comes of it.

  5. Hi Guys,

    I have a file which has header and footers and data is over multiple iteration *with headers and footers), I want to delimit the data and dont want headers footers. I will be sincerely thankful if someone can help me on this. My objective is to data crunching in order to do performance analysis, its just that parsing peace is not going fine.

    data in file:

    09:31:56 12/13/11 r/w I/O per second KBytes per sec Svt ms IOSz KB
    VVname Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen
    admin r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    admin w 17 17 17 68 68 68 0.2 0.2 4.1 4.1 –
    admin t 17 17 17 68 68 68 0.2 0.2 4.1 4.1 0
    z0001_app0136_s.254 r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    z0001_app0136_s.254 w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    z0001_app0136_s.254 t 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0
    ………………..
    ………………..
    ssm9903_ASRU.126 t 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0
    ———————————————————————————–
    394 r 87 87 4354 4354 2.7 2.7 49.9 49.9 –
    394 w 126 126 906 906 0.1 0.1 7.2 7.2 –
    394 t 213 213 5260 5260 1.2 1.2 24.7 24.7 0

    09:32:57 12/13/11 r/w I/O per second KBytes per sec Svt ms IOSz KB
    VVname Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen
    admin r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    admin w 32 24 32 130 99 130 0.2 0.2 4.1 4.1 –
    admin t 32 24 32 130 99 130 0.2 0.2 4.1 4.1 0
    z0001_app0136_s.254 r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
    z0001_app0136_s.254 w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –

    and data over in file goes over again and again.

  6. Hi & Peace for everyone,

    If you want to select all data whith-out first Ligne and the last Ligne (head and tail ).

    the script:

    #!bin/bash
    ########################################
    n=`wc -l < ur_file.txt` ##number of lignes in your file
    let n1="$n"-1 ##to show all lignes except the last ligne
    let n2="$n"-2 ##from the selection n1, we show all except the first ligne
    head -"$n1" ur_file.txt|tail -"$n2"
    

    That script select all data except the head and tail lignes.

    If i miss understund your problem just tel me.

  7. I am giving the parameter to my script which are comma separated and I want to get the count fot the comma separated items, I am using below command but not successful.

    script name – abc.sh
    script content

    #!usrbinbash
    FILES=$1
    COUNT=$FILES | awk -F, ‘{ print N }’
    :wq

    command line – ./abc.sh a,b,c,d

    expected output = 3

Leave a Comment