Processing the delimited files using cut and awk

by on July 17, 2006 · 13 comments· LAST UPDATED December 22, 2006

in

From wikipedia, "Delimited data uses specific characters (delimiters) to separate its values. Most database and spreadsheet programs are able to read or save data in a delimited format.".

So how do you process delimited files under Linux shell prompt?

Processing the delimited files using cut

cut command print selected parts of lines from each FILE (or variable) i.e. it remove sections from each line of files:

For example /etc/passwd file is separated using character : delimiters.

To print list of all users, type the following command at shell prompt:

$ cut -d: -f1 /etc/passwd

Output:

root
you
me
vivek
httpd

Where,

  • -d : Specifies to use character : as delimiter
  • -f1 : Print first field, if you want print second field use -f2 and so on...

Now consider variable service. Let us print out mail word using cut command:
$ service="http mail ssh"
$ echo $service | cut -d' ' -f2

mail

Note that a blank space is used as delimiter.

Processing the delimited files using awk

You can also use awk command for same purpose:
$ awk -F':' '{ print $1 }' /etc/passwd

Output:

root
you
me
vivek
httpd

Where,

  • -F: - Use : as fs (delimiter) for the input field separator
  • print $1 - Print first field, if you want print second field use $2 and so on

Processing the delimited files using cut or awk is an essential skill for sys admin. Let us say you want to find out if particular service is active or not using shell script (download link):

#!/bin/bash
ports="22 80 25"
service="SSH WEB MAIL"
c=1
echo "Running services status:"
/bin/netstat -tulpn | grep -vE '^Active|Proto' | while read LINE
do
 # get active port name and use : as delimiter
 t=$(echo $LINE | awk '{ print $4}' | cut -d: -f2)
 [ "$t" == "" ] && t=-1 || :
 # get service name from $services and : as delimiter
 sname=$(echo $service | cut -d' ' -f$c)
 sstatus="$sname: No"
 # now compare port
 for i in $ports
 do
  if [ $i -eq $t ]; then
   sstatus="$sname: Ok"
  fi
 done
 # display service status as OK or NO
 echo "$sstatus"
 #next service please
 c=$( expr $c + 1 )
 # break afer 3 services
 [ $c -ge 4 ] && break || :
done

Execute a script:
$ chmod +x script.sh
$ ./script.sh

Output:

SSH: Yes
WEB: No
MAIL: Yes

As you see above script use cut and awk to extract one or many contiguous or non-contiguous fields from shell variables and command output. Download complete working script that sends an email alert to admin user.

TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!

{ 13 comments… read them below or add one }

1 James D. Keeline August 9, 2006 at 10:07 pm

This article has some bad formatting in the source code. The single (‘) and double quotes (“) are presented as if they were smart quotes. However, if one tries to type in the back-tick (`) when the single quote (‘) was intended, Bash will reject this. The display should use a standard monospaced font with no smart quotes.

Reply

2 nixCraft August 9, 2006 at 11:12 pm

James,

Thanks for suggestion :) I will update CSS code, which is responsible for displaying source code using code class.

Appreciate your post.

Reply

3 steve_w April 4, 2009 at 8:52 am

Nixcraft …I think James meant to say very good post with very clear examples…you’ve always gotta be careful of quotes after they’ve been copied and pasted around a windows environment or in word docs etc anyhow…

Reply

4 Ahmet Alp April 23, 2009 at 6:40 pm

Thank you some much.

Super script.

Reply

5 ravinder May 17, 2011 at 3:33 pm

Good job. Very neat

Reply

6 Eduardo February 28, 2012 at 9:41 pm

Great examples, helped me a lot here. Fast to solve a simple problem here. Thanks!

Reply

7 Kiran March 10, 2012 at 12:22 pm

Hi !!

This may not be the right place to ask but i madly need the solution for this.
I need a script which should generate report for Http request and response,it should contain
1. Date
2. Time
3. Number of Packets
4. Size of Packets
5. Source IP
6. Destination IP

It should capture the http request on given url. Thanks in advance :-) please send me mail !!!!!

Reply

8 Simon March 20, 2012 at 12:38 pm

Do your homework Kiran.

Reply

9 Kiran March 21, 2012 at 6:13 am

Hay Simon, I tried many stuff didn’t work out any help plzzzz

Reply

10 Simon March 26, 2012 at 3:55 am

Hello Kiran,

Have you tried using Wireshark (formerly Ethereal)?
Try running it by calling it in a script and parse it’s output any way you’d like it.
You’ll be pleasantly surprised at what you can do with it ;-)

Give it a whirl and let me know what comes of it.

Reply

11 vishal September 21, 2012 at 3:30 am

Hi Guys,

I have a file which has header and footers and data is over multiple iteration *with headers and footers), I want to delimit the data and dont want headers footers. I will be sincerely thankful if someone can help me on this. My objective is to data crunching in order to do performance analysis, its just that parsing peace is not going fine.

data in file:

09:31:56 12/13/11 r/w I/O per second KBytes per sec Svt ms IOSz KB
VVname Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen
admin r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
admin w 17 17 17 68 68 68 0.2 0.2 4.1 4.1 –
admin t 17 17 17 68 68 68 0.2 0.2 4.1 4.1 0
z0001_app0136_s.254 r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
z0001_app0136_s.254 w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
z0001_app0136_s.254 t 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0
………………..
………………..
ssm9903_ASRU.126 t 0 0 0 0 0 0 0.0 0.0 0.0 0.0 0
———————————————————————————–
394 r 87 87 4354 4354 2.7 2.7 49.9 49.9 –
394 w 126 126 906 906 0.1 0.1 7.2 7.2 –
394 t 213 213 5260 5260 1.2 1.2 24.7 24.7 0

09:32:57 12/13/11 r/w I/O per second KBytes per sec Svt ms IOSz KB
VVname Cur Avg Max Cur Avg Max Cur Avg Cur Avg Qlen
admin r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
admin w 32 24 32 130 99 130 0.2 0.2 4.1 4.1 –
admin t 32 24 32 130 99 130 0.2 0.2 4.1 4.1 0
z0001_app0136_s.254 r 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –
z0001_app0136_s.254 w 0 0 0 0 0 0 0.0 0.0 0.0 0.0 –

and data over in file goes over again and again.

Reply

12 Mohammed September 25, 2012 at 2:02 pm

Hi & Peace for everyone,

If you want to select all data whith-out first Ligne and the last Ligne (head and tail ).

the script:

#!bin/bash
########################################
n=`wc -l < ur_file.txt` ##number of lignes in your file
let n1="$n"-1 ##to show all lignes except the last ligne
let n2="$n"-2 ##from the selection n1, we show all except the first ligne
head -"$n1" ur_file.txt|tail -"$n2"

That script select all data except the head and tail lignes.

If i miss understund your problem just tel me.

Reply

13 Pratham April 19, 2013 at 7:51 am

I am giving the parameter to my script which are comma separated and I want to get the count fot the comma separated items, I am using below command but not successful.

script name – abc.sh
script content

#!\usr\bin\bash
FILES=$1
COUNT=$FILES | awk -F, ‘{ print N }’
:wq

command line – ./abc.sh a,b,c,d

expected output = 3

Reply

Leave a Comment

Previous post:

Next post: