≡ Menu

Unix / Linux: Remove duplicate lines from a text file using awk or perl

I have a text file with exact duplicates of lines. I need to remove all those duplicates lines and preserves the order too on Linux or Unix-like system. How do I delete duplicate lines from a text file?

You can use Perl or awk or Python to delete all duplicate lines from a text file on Linux, OS X, and Unix-like system.

Sample data file

$ cat data.txt
this is a test
Hi, User!
this is a test
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that
Hi, User!
this vs that
call 911

How to remove duplicate lines inside a text file using awk

The syntax is as follows to preserves the order of the text file:
awk '!seen[$0]++' input > output
awk '!seen[$0]++' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that

How to remove duplicates line from multiple text file in Perl?

The syntax is:

perl -lne '$seen{$_}++ and next or print;' input > output
perl -lne '$seen{$_}++ and next or print;' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that
Share this tutorial on:
{ 8 comments… add one }
  • Yoander March 21, 2016, 5:40 pm

    Another solution:
    $ sort data.txt|uniq > uniq-data.txt

    • Tom March 21, 2016, 6:10 pm

      It will not preserves the order. Am I right?

      • Mark Dhas April 4, 2016, 1:33 pm

        You’re right the order is lost. Also it’s unnecessary to pipe the sort command through uniq(ue).

        $ sort -u data.txt > uniq-data.txt

    • Henrik Iivonen March 21, 2016, 6:13 pm

      Yoander: wouldn’t that scramble the line order?

  • Ron Barak March 22, 2016, 11:26 am

    Following uses only nl, sort and cut:

    $ cat data.txt | nl | sort -k2 -u | sort -n | cut –complement -f1
    this is a test
    Hi, User!
    this is a line
    this is another line
    call 911
    this vs that
    that vs this
    How to Call 911
    that and that
    • Alex March 24, 2016, 10:32 am

      Using ‘sort’ – as it shown above – is much easier.

    • Mark Dhas April 4, 2016, 1:42 pm

      I really like this solution but $ cut -complement -f1 does not always work so (depends on your flavour of *nix), therefore another solution would be to use

      $ cat data.txt | nl | sort -k2 -u | sort -n | awk '{ $1=""; print }'

  • Lars March 31, 2016, 10:26 am

    Essentially each line is thrown into an array and checked for duplicates. Did you test it with large files? I really wonder who this would work with millions of lines to check.

Leave a Comment


   Tagged with: , ,