≡ Menu

Delete text or paragraph between two sections using sed

Q. How do I use sed for selective deletion of certain lines? I have text as follows in file:
Line 1
Line 2
Line 4

I would like to delete all lines between WORD1 and WORD2 to produce final output:
Line 1
Line 2

A. For selective deletion of certain lines sed is the best tool. To print all of file EXCEPT section between WORD1 and WORD2 (2 regular expressions), use
$ sed '/WORD1/,/WORD2/d' input.txt > output.txt

Shell script to remove Javascript code

Here is my small script that reads all *.html files and removes javascript (script download link).

# for loop read each file
for f in $FILES
# replace javascript
sed '/<script type="text\/javascript"/,/<\/script>/d' $INF > $OUTF
/bin/cp $OUTF $INF
/bin/rm -f $OUTF

Above shell script removes all occurrence of javascript.

Share this tutorial on:
{ 12 comments… add one }
  • Sam November 27, 2007, 9:44 am

    but the command

    sed -e ‘/word1/,/word2/d’ deletes the whole line in case the given text is like this :

    unix word1 java word2

    if the text between word1 and word2 only is to be deleted then this is not the solution. Can I get some help on this aspect…. thnks in advance

  • Johan January 8, 2008, 10:22 am

    How can i make the script to look in subfolders?

  • tannu September 6, 2008, 1:06 am

    How to delete following line from a xml file.
    #vi package.xml
    I tried sed -e ‘/drivers-3.6.1-${release}.${arch}.rpm /d’ /root/package.xml > /root/output.txt

  • Jasan April 29, 2009, 2:45 pm

    tannu: you can also use “grep -v”

  • Gnaural August 16, 2009, 1:53 am

    As for people asking Sam’s question (11.27.07), the problem is two fold. One, using sed’s pattern matching with the “d” (delete) clobbers whole lines, as you noted. The solution would seem to to use the “s” (substitute) form instead:
    sed 's/WORD1.*WORD2//g' input.txt
    BUT then you discover the larger problem: sed by default does all it’s processing one line at a time (e.g., it fills it’s processing buffer up to each ‘\n’ it encounters).
    The formal approaches include using sed’s “hold space” and “pattern space” commands, N, H/h,G/g,x as hinted here:
    I used that route in an email formatting script, but these days if i can get away with quick’n’dirty, i try to just strip all occurrences of ‘\n’ from the stream you give sed like this:
    cat input.txt | tr -d '\n' | sed 's/WORD1.*WORD2//g'
    No files i’ve worked on are big enough to blow sed’s buffer so far.

    • Lea January 14, 2013, 12:53 pm

      What if there are several occurrences of WORD2, but you only want to delete what is before WORD1 and the first WORD2? It seems that it chooses the last occurrence of WORD2 instead, how do I change that?

    • anil October 26, 2015, 8:39 pm

      I’m sorry for asking this here, felt you could help me.

      How do I delete everything between (……), but leaving behind [..], in the below line:

      Expected output: 123[678]0.

      Thanks in advance

  • jaganath February 23, 2011, 6:46 am

    how can i a do electrisity bill in mysql

  • ahmed July 6, 2011, 12:27 pm

    Great , just what i wanted!!!

  • Anderson Venturini November 11, 2011, 11:48 pm

    Thanks! Very useful! Worked perfectly! Congrats!

  • Saeed Neamati April 15, 2012, 7:15 am

    How to find and remove all the JavaScript code in a given HTML text? I mean, consider that in this comment, I’ve written something like an alert, how do you find it, and remove it?

  • kk May 12, 2016, 5:22 am

    For the above code we can make it more simple by placing “-i” option in sed command,by this sed will remove the lines in the file *.html itself and..no need of output file(>$OUTF).

    # for loop read each file
    for f in $FILES
    # replace javascript
    sed -i '/<script type="text\/javascript"/,//d' $INF

Leave a Comment

   Tagged with: , , , , ,