Unix / Linux: Remove duplicate lines from a text file using awk or perl

Posted on in Categories , , , last updated March 22, 2016

I have a text file with exact duplicates of lines. I need to remove all those duplicates lines and preserves the order too on Linux or Unix-like system. How do I delete duplicate lines from a text file?

You can use Perl or awk or Python to delete all duplicate lines from a text file on Linux, OS X, and Unix-like system.

Sample data file

$ cat data.txt
this is a test
Hi, User!
this is a test
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that
Hi, User!
this vs that
call 911

How to remove duplicate lines inside a text file using awk

The syntax is as follows to preserves the order of the text file:
awk '!seen[$0]++' input > output
awk '!seen[$0]++' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that

How to remove duplicates line from multiple text file in Perl?

The syntax is:

perl -lne '$seen{$_}++ and next or print;' input > output
perl -lne '$seen{$_}++ and next or print;' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter, Facebook, Google+.

8 comment

  1. Following uses only nl, sort and cut:

    $ cat data.txt | nl | sort -k2 -u | sort -n | cut –complement -f1
    this is a test
    Hi, User!
    this is a line
    this is another line
    call 911
    this vs that
    that vs this
    How to Call 911
    that and that
    1. I really like this solution but $ cut -complement -f1 does not always work so (depends on your flavour of *nix), therefore another solution would be to use

      $ cat data.txt | nl | sort -k2 -u | sort -n | awk '{ $1=""; print }'

  2. Essentially each line is thrown into an array and checked for duplicates. Did you test it with large files? I really wonder who this would work with millions of lines to check.

Leave a Comment