Unix / Linux: Remove duplicate lines from a text file using awk or perl

I have a text file with exact duplicates of lines. I need to remove all those duplicates lines and preserves the order too on Linux or Unix-like system. How do I delete duplicate lines from a text file?

You can use Perl or awk or Python to delete all duplicate lines from a text file on Linux, OS X, and Unix-like system.

ADVERTISEMENTS

Sample data file

$ cat data.txt
this is a test
Hi, User!
this is a test
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that
Hi, User!
this vs that
call 911

How to remove duplicate lines inside a text file using awk

The syntax is as follows to preserves the order of the text file:
awk '!seen[$0]++' input > output
awk '!seen[$0]++' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that

How to remove duplicates line from multiple text file in Perl?

The syntax is:

perl -lne '$seen{$_}++ and next or print;' input > output
perl -lne '$seen{$_}++ and next or print;' data.txt > output.txt
more output.txt

Sample outputs:

this is a test
Hi, User!
this is a line
this is another line
call 911
this vs that
that vs this
How to Call 911
that and that
🐧 Get the latest tutorials on SysAdmin, Linux/Unix, Open Source/DevOps topics:
CategoryList of Unix and Linux commands
File Managementcat
FirewallCentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Network Utilitiesdig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot disown fg jobs killall kill pidof pstree pwdx time
Searchinggrep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNCentOS 8 Debian 10 Firewall Ubuntu 20.04

ADVERTISEMENTS
8 comments… add one
  • Yoander Mar 21, 2016 @ 17:40

    Another solution:
    $ sort data.txt|uniq > uniq-data.txt

    • Tom Mar 21, 2016 @ 18:10

      It will not preserves the order. Am I right?

      • Mark Dhas Apr 4, 2016 @ 13:33

        You’re right the order is lost. Also it’s unnecessary to pipe the sort command through uniq(ue).

        $ sort -u data.txt > uniq-data.txt

    • Henrik Iivonen Mar 21, 2016 @ 18:13

      Yoander: wouldn’t that scramble the line order?

  • Ron Barak Mar 22, 2016 @ 11:26

    Following uses only nl, sort and cut:

    $ cat data.txt | nl | sort -k2 -u | sort -n | cut –complement -f1
    this is a test
    Hi, User!
    this is a line
    this is another line
    call 911
    this vs that
    that vs this
    How to Call 911
    that and that
    • Alex Mar 24, 2016 @ 10:32

      Using ‘sort’ – as it shown above – is much easier.

    • Mark Dhas Apr 4, 2016 @ 13:42

      I really like this solution but $ cut -complement -f1 does not always work so (depends on your flavour of *nix), therefore another solution would be to use

      $ cat data.txt | nl | sort -k2 -u | sort -n | awk '{ $1=""; print }'

  • Lars Mar 31, 2016 @ 10:26

    Essentially each line is thrown into an array and checked for duplicates. Did you test it with large files? I really wonder who this would work with millions of lines to check.

Leave a Reply

Your email address will not be published.

Use HTML <pre>...</pre>, <code>...</code> and <kbd>...</kbd> for code samples.