HowTo: grep Text Between Two Words in Unix / Linux

by on August 12, 2012 · 3 comments· last updated at August 12, 2012

I got over 100s of HTML files in the following format:

 
<HTML>
<HEAD>
 <TITLE>Statistics for ABC LTD - January 2007 - Rang IDXYZZAZZZZ</TITLE>
</HEAD>
 
<BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">
<H2>Statistics for ABC LTF</H2>
<SMALL><STRONG>
Summary Period: January 2007<BR>
Generated 01-Feb-2007 06:40 CET<BR>
</STRONG></SMALL>
<CENTER>
<HR>
<P>
<FONT SIZE="-1"></CENTER><PRE>
 
my data 1
my data 2
my data 3
my data 10000
my data N times
</PRE></FONT>
</CENTER>
<P>
<HR>
<TABLE WIDTH="100%" CELLPADDING=0 CELLSPACING=0 BORDER=0>
<TR>
<TD ALIGN=left VALIGN=top>
<SMALL>Generated by MyAppDbStatsWriter (UNIX) version 1.9b2</A>
</SMALL>
</TD>
</TR>
</TABLE>
</BODY>
</HTML>
 

How do I extract text between two words (<PRE> and </PRE>) in unix or linux using grep command?

The grep command is not suitable for this kind of work. I suggest that you use sed command. The syntax is:

 
sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input
sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input > output
 

In this example, extract text between two <PRE> and </PRE> using sed commmand:

 
sed -n "/<PRE>/,/<\/PRE>/p" input.html
sed -n "/<PRE>/,/<\/PRE>/p" input.html > output.html
 


You should follow me on twitter here or grab rss feed to keep track of new changes.

Featured Articles:

{ 3 comments… read them below or add one }

1 Arash August 12, 2012 at 4:27 pm

Easy and practical, thanks.
I just wanted to know is it possible to use `awk` instead of `sed` ?

Reply

2 Vivek Gite August 12, 2012 at 9:41 pm
awk '/WORD1/,/WORD2/' /path/to/file
awk '/<PRE>/,/<\/PRE>/' /path/to/file
awk '/<PRE>/,/<\/PRE>/' /path/to/file > output.file

Reply

3 Surya December 2, 2012 at 8:34 am

Thanks for a simple & elegant solution. But it prints START-WORD & END-WORD as well. Is there a simple way to exclude these, by including something simple like \zs or \ze (of ViM) in the START-WORD & END-WORD?

Reply

Leave a Comment

You can use these HTML tags and attributes for your code and commands: <strong> <em> <ol> <li> <u> <ul> <kbd> <blockquote> <pre> <a href="" title="">

Tagged as: , , , , , , , , , ,

Previous Faq:

Next Faq: