HowTo: grep Text Between Two Words in Unix / Linux

I got over 100s of HTML files in the following format:

<HTML>
<HEAD>
 <TITLE>Statistics for ABC LTD - January 2007 - Rang IDXYZZAZZZZ</TITLE>
</HEAD>
 
<BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">
<H2>Statistics for ABC LTF</H2>
<SMALL><STRONG>
Summary Period: January 2007<BR>
Generated 01-Feb-2007 06:40 CET<BR>
</STRONG></SMALL>
<CENTER>
<HR>
<P>
<FONT SIZE="-1"></CENTER><PRE>
 
my data 1
my data 2
my data 3
my data 10000
my data N times











Generated by MyAppDbStatsWriter (UNIX) version 1.9b2



How do I extract text between two words (<PRE> and </PRE>) in unix or linux using grep command?

The grep command is not suitable for this kind of work. I suggest that you use sed command. The syntax is:

sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input
sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input > output

In this example, extract text between two <PRE> and </PRE> using sed commmand:

sed -n "/<PRE>/,/<\/PRE>/p" input.html
sed -n "/<PRE>/,/<\/PRE>/p" input.html > output.html

🐧 Get the latest tutorials on Linux, Open Source & DevOps via RSS feed or Weekly email newsletter.

🐧 12 comments so far... add one


CategoryList of Unix and Linux commands
Disk space analyzersdf duf ncdu pydf
File Managementcat cp mkdir tree
FirewallAlpine Awall CentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Modern utilitiesbat exa
Network UtilitiesNetHogs dig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot cron disown fg glances gtop jobs killall kill pidof pstree pwdx time vtop
Searchingag grep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNAlpine CentOS 8 Debian 10 Firewall Ubuntu 20.04
12 comments… add one

Leave a Reply

Your email address will not be published.

Use HTML <pre>...</pre> for code samples. Still have questions? Post it on our forum

Next FAQ:

Previous FAQ: