HowTo: grep Text Between Two Words in Unix / Linux

I got over 100s of HTML files in the following format:

<HTML>
<HEAD>
 <TITLE>Statistics for ABC LTD - January 2007 - Rang IDXYZZAZZZZ</TITLE>
</HEAD>
 
<BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">
<H2>Statistics for ABC LTF</H2>
<SMALL><STRONG>
Summary Period: January 2007<BR>
Generated 01-Feb-2007 06:40 CET<BR>
</STRONG></SMALL>
<CENTER>
<HR>
<P>
<FONT SIZE="-1"></CENTER><PRE>
 
my data 1
my data 2
my data 3
my data 10000
my data N times











Generated by MyAppDbStatsWriter (UNIX) version 1.9b2



How do I extract text between two words (<PRE> and </PRE>) in unix or linux using grep command?

The grep command is not suitable for this kind of work. I suggest that you use sed command. The syntax is:

sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input
sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input > output

In this example, extract text between two <PRE> and </PRE> using sed commmand:

sed -n "/<PRE>/,/<\/PRE>/p" input.html
sed -n "/<PRE>/,/<\/PRE>/p" input.html > output.html

🐧 Get the latest tutorials on Linux, Open Source & DevOps via RSS feed or Weekly email newsletter.

🐧 12 comments so far... add one

CategoryList of Unix and Linux commands
File Managementcat
FirewallAlpine Awall CentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Network Utilitiesdig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot cron disown fg jobs killall kill pidof pstree pwdx time
Searchinggrep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNAlpine CentOS 8 Debian 10 Firewall Ubuntu 20.04
12 comments… add one

Leave a Reply

Your email address will not be published.

Use HTML <pre>...</pre> for code samples. Problem posting comment? Email me @ webmaster@cyberciti.biz

Next FAQ:

Previous FAQ: