Unix / Linux: Remove duplicate lines from a text file using awk or perl

last updated in Categories , , ,

I have a text file with exact duplicates of lines. I need to remove all those duplicates lines and preserves the order too on Linux or Unix-like system. How do I delete duplicate lines from a text file?

You can use Perl or awk or Python to delete all duplicate lines from a text file on Linux, OS X, and Unix-like system.

Sample data file

$ cat data.txt
this is a test
Hi, User!
this is a test
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that
Hi, User!
this vs that
call 911

How to remove duplicate lines inside a text file using awk

The syntax is as follows to preserves the order of the text file:
awk '!seen[$0]++' input > output
awk '!seen[$0]++' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that

How to remove duplicates line from multiple text file in Perl?

The syntax is:

perl -lne '$seen{$_}++ and next or print;' input > output
perl -lne '$seen{$_}++ and next or print;' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter.

Share this on (or read 8 comments/add one below):

8 comment

      1. You’re right the order is lost. Also it’s unnecessary to pipe the sort command through uniq(ue).

        $ sort -u data.txt > uniq-data.txt

  1. Following uses only nl, sort and cut:

    $ cat data.txt | nl | sort -k2 -u | sort -n | cut –complement -f1
    this is a test
    Hi, User!
    this is a line
    this is another line
    call 911
    this vs that
    that vs this
    How to Call 911
    that and that
    1. I really like this solution but $ cut -complement -f1 does not always work so (depends on your flavour of *nix), therefore another solution would be to use

      $ cat data.txt | nl | sort -k2 -u | sort -n | awk '{ $1=""; print }'

  2. Essentially each line is thrown into an array and checked for duplicates. Did you test it with large files? I really wonder who this would work with millions of lines to check.

    Have a question? Post it on our forum!