I got over 100s of HTML files in the following format:
<HTML> <HEAD> <TITLE>Statistics for ABC LTD - January 2007 - Rang IDXYZZAZZZZ</TITLE> </HEAD> <BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000"> <H2>Statistics for ABC LTF</H2> <SMALL><STRONG> Summary Period: January 2007<BR> Generated 01-Feb-2007 06:40 CET<BR> </STRONG></SMALL> <CENTER> <HR> <P> <FONT SIZE="-1"></CENTER><PRE> my data 1 my data 2 my data 3 my data 10000 my data N times </PRE></FONT> </CENTER> <P> <HR> <TABLE WIDTH="100%" CELLPADDING=0 CELLSPACING=0 BORDER=0> <TR> <TD ALIGN=left VALIGN=top> <SMALL>Generated by MyAppDbStatsWriter (UNIX) version 1.9b2</A> </SMALL> </TD> </TR> </TABLE> </BODY> </HTML>
How do I extract text between two words (<PRE> and </PRE>) in unix or linux using grep command?
The grep command is not suitable for this kind of work. I suggest that you use sed command. The syntax is:
sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input sed -n "/START-WORD-HERE/,/END-WORD-HERE/p" input > output
In this example, extract text between two <PRE> and </PRE> using sed commmand:
sed -n "/<PRE>/,/<\/PRE>/p" input.html sed -n "/<PRE>/,/<\/PRE>/p" input.html > output.html
You should follow me on twitter here or grab rss feed to keep track of new changes.
Featured Articles:
- 30 Handy Bash Shell Aliases For Linux / Unix / Mac OS X
- Top 30 Nmap Command Examples For Sys/Network Admins
- 25 PHP Security Best Practices For Sys Admins
- 20 Linux System Monitoring Tools Every SysAdmin Should Know
- 20 Linux Server Hardening Security Tips
- Linux: 20 Iptables Examples For New SysAdmins
- Top 20 OpenSSH Server Best Security Practices
- Top 20 Nginx WebServer Best Security Practices
- 20 Examples: Make Sure Unix / Linux Configuration Files Are Free From Syntax Errors
- 15 Greatest Open Source Terminal Applications Of 2012

- My 10 UNIX Command Line Mistakes
- Top 10 Open Source Web-Based Project Management Software
- Top 5 Email Client For Linux, Mac OS X, and Windows Users
- The Novice Guide To Buying A Linux Laptop












{ 3 comments… read them below or add one }
Easy and practical, thanks.
I just wanted to know is it possible to use `awk` instead of `sed` ?
Thanks for a simple & elegant solution. But it prints START-WORD & END-WORD as well. Is there a simple way to exclude these, by including something simple like \zs or \ze (of ViM) in the START-WORD & END-WORD?