Linux Shell – How To Remove Duplicate Text Lines

I need to sort data from a log file, but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?

You need to use shell pipes along with the following two Linux command line utilities to sort and remove duplicate text lines:

ADVERTISEMENTS

  1. sort command– Sort lines of text files in Linux and Unix-like systems.
  2. uniq command– Rport or omit repeated lines on Linux or Unix

Removing Duplicate Lines With Sort, Uniq and Shell Pipes

Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u

Remove duplicate lines with uniq

Here is a sample test file called garbage.txt displayed using the cat command:
cat garbage.txt
Sample outputs:

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
this is a test
unix ips as well as enjoy our blog

Removing duplicate lines from a text file on Linux

Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:

food that are killing you
unix ips as well as enjoy our blog
we hope that the labor spent in creating this software
wings of fire

Where,

  • -u : check for strict ordering, remove all duplicate lines.

Sort file contents on Linux

Let us say you have a file named users.txt:
cat users.txt
Sample outputs:

Vivek Gite 24/10/72
Martin Lee 12/11/68
Sai Kumar  31/12/84
Marlena Summer 13/05/76
Wendy Lee  04/05/77
Sayali Gite 13/02/76
Vivek Gite 24/10/72

Let us sort, run:
sort users.txt
Next sort by last name, run:
sort +2 users.txt
Want to sort in reverse order? Try:
sort -r users.txt
You can eliminate any duplicate entries in a file while ordering the file, run:
sort +2 -u users.txt
sort -u users.txt

Remove duplicate lines with uniq and sort commands

Without any options, the sort compares entire lines in the file and outputs them in ASCII order. You can control output with options.

How to remove duplicate lines on Linux with uniq command

Consider the following file:
cat -n telphone.txt
Sample outputs:

     1	99884123
     2	97993431
     3	81234000
     4	02041467
     5	77985508
     6	97993431
     7	77985509
     8	77985509

The uniq command removes the 8th line from file and places the result in a file called output.txt:
uniq telphone.txt output.txt
Verify it:
cat -n output.txt
How to use uniq command to remove duplicate lines

How to remove duplicate lines in a .txt file and save result to the new file

Try any one of the following syntax:
sort input_file | uniq > output_file
sort input_file | uniq -u | tee output_file

Conclusion

The sort command is used to order the lines of a text file and uniq filters duplicate adjacent lines from a text file. These commands have many more useful options. I suggest you read the man pages by typing the following man command:
man sort
man uniq

🐧 Get the latest tutorials on SysAdmin, Linux/Unix, Open Source/DevOps topics:
CategoryList of Unix and Linux commands
File Managementcat
FirewallCentOS 8 OpenSUSE RHEL 8 Ubuntu 16.04 Ubuntu 18.04 Ubuntu 20.04
Network Utilitiesdig host ip nmap
OpenVPNCentOS 7 CentOS 8 Debian 10 Debian 8/9 Ubuntu 18.04 Ubuntu 20.04
Package Managerapk apt
Processes Managementbg chroot cron disown fg jobs killall kill pidof pstree pwdx time
Searchinggrep whereis which
User Informationgroups id lastcomm last lid/libuser-lid logname members users whoami who w
WireGuard VPNCentOS 8 Debian 10 Firewall Ubuntu 20.04

ADVERTISEMENTS
37 comments… add one
  • Bahaa Soliman Sep 20, 2008 @ 7:43

    you can use
    command: sort -u filename
    it gives you the same result

    • Josh Jul 22, 2015 @ 19:36

      This actually leaves one of the duplicates. The command above removes the original *and* the duplicate. Notice in the output “this is a test” doesn’t show up. With ‘sort -u filename’ it will.

      • Rodney Fisk Sep 8, 2015 @ 0:08

        That is the answer I was looking for. Thank you! I need only to remove ONE of the duplicate lines.

  • Martin Sep 20, 2008 @ 8:41

    How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?


    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    Thank you.

  • Deb Sep 20, 2008 @ 10:12

    uniq -c will do it Martin.

  • Jadu Saikia Sep 21, 2008 @ 3:34

    One more approach keeping the order of lines same as input. The good thing about this is that it can be applied if we need to remove duplicate based on a field or fields.

    $ awk ‘!x[$0]++’ garbage.txt

    Output:
    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

  • LAMP docs Dec 3, 2008 @ 5:51

    What will be the command to save the output of this command to the same file? Or maybe a script?

  • dude Apr 5, 2009 @ 20:46

    @lamp:
    sort -u -o new_file old_file
    e.g. sort -u -o moregarbage.txt garbage.txt

    • ramesh Oct 31, 2012 @ 11:56

      hi ,
      Just try this….:#
      sort old_fil | new_file -c

  • dude Apr 5, 2009 @ 20:48

    sorry… was going a bit too fast…
    sort -u -o garbage.txt garbage.txt
    Obviously =)

  • Mikey Jul 10, 2009 @ 0:33

    Read the RUTE. Root users Tutorial and Exposition. This teaches you many of the basics of Linux system administration.

  • Amber Aug 10, 2009 @ 6:32

    Martin 09.20.08 at 8:41 am

    How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?

    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    Thank you.
    3 Deb 09.20.08 at 10:12 am

    uniq -c will do it Martin.

    uniq -c removes duplicates and leaves one of the lines that have been duplicated. But it also prefixes each line with the number of duplicates that have been removed.
    You must use
    uniq
    without any of the options and your job is done.

  • Amber Aug 10, 2009 @ 6:33

    that would be uniq followed by filename.

  • rh Aug 31, 2009 @ 10:50

    Does anyone know how should i remove both duplicated lines and put list of all duplicates
    into text file
    as there will be output

    garbage.txt:
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    garbage.duplicates.txt:
    this is a test
    ?

  • rh Aug 31, 2009 @ 14:10

    GOT IT:

    # write dup files into text file
    sort file1.txt file2.txt | uniq -d > duplicity.txt
    # remove duplicated records from both files
    for domain in `cat duplicity.txt`; do sed -e “/${domain}/d” -i file1.txt; sed -e “/${domain}/d” -i file2.txt; done

  • Nathan Jun 8, 2010 @ 16:53

    Thank you so much, this has helped me big time!

  • Vivek Jun 27, 2010 @ 6:36

    How to find the duplicates alone & print in a text file? example i i have in abc.txt

    abc dr545.xml
    dsf fg456.xml
    abc sfg34.xml

    I need a text file with it’s output

    abc dr545.xml
    abc sfg34.xml

    • stacy Oct 28, 2010 @ 1:27

      just pipe the output to a new filename:

      sort abc.txt | uniq > abc_1.txt

      • Pranab Rana Jun 6, 2011 @ 12:20

        Thanks for this post.

  • Joseph Jul 30, 2010 @ 20:59

    Thanks for this. Helped me a great deal. For some reason it did leave duplicates but was still hugely helpful and removed a few duplicates manually is much better then making a 1483 line file into a 379 line file like it was supposed to be without duplicates.

  • stacy Oct 28, 2010 @ 1:22

    Thanks for the help!! the uniq command took a 1372 line file down to a 32 line file, which was much less daunting!

  • stacy Oct 28, 2010 @ 1:24

    Also i needed both files so my command was more like this:

    sort garbage.file | uniq > garbage_no_duplicates.file

  • web designing chennai Aug 11, 2011 @ 4:58

    Thanks and it helped me a lot.

  • Rakesh Aug 17, 2011 @ 16:56

    I want to redirect only duplicate rows into another file, anyone help me pls?

  • Rakesh Aug 17, 2011 @ 17:01

    Rows means dupliacte text data is comma seperated
    example:
    this,1,country,567,1,1,1,1
    that,2,country,678,2,2,2,2
    this,1,country,567,3,3,3,3

    from the above data, it shoulb be check for duplicate data upto 4 values(this,1,country,567) and also redirect the particular duplicate lines into some another fil, pls help me………………

  • naph Mar 21, 2012 @ 8:13

    I have many files with duplicates lines, but instead of removing the duplictaes I would like to append numbers to make the entries unique… how might I do this? could I pipe the uniq output into a sed command perhaps?

    cheers,
    naph

  • John McLean Jun 3, 2012 @ 12:48

    This command would have been more logical if it had an ‘-a’ option to print all the lines in a file with the duplicates ‘squashed’ into a single line. Sorry Richard.

    The below “only print unique lines”, will omitt lines that have duplicates.
    uniq -u

    The below “only print duplicate lines”, will omitt lines that are unique
    uniq -d

    I reverted to the below commands to get my ‘-a’ option:

    cat messy_file.txt | uniq -u > out.txt
    cat messy_file.txt | uniq -d >> out.txt

    The number of lines in out.txt should match the below command:

    cat messy_file.txt | uniq -c

  • miezu Jul 11, 2012 @ 10:58

    cat cucu.txt | sort | uniq > cucu2.txt

  • bryan Feb 8, 2013 @ 8:11

    how do you remove duplicate lines without sorting?
    example
    hi
    hello
    hey
    hello
    hay
    hiy

    And i want my output to remain the same format and only deleting the duplicate lines

    “Preferred Output
    hi
    hello
    hey
    hay
    hiy

    • dan Sep 26, 2013 @ 10:44

      Hi Bryan,

      As far as I understood, you need something like:

      [root@test tmp]# cat test.lst
      hi
      hello
      hey
      hello
      hay
      hiy

      [root@test tmp]# cat test.lst | awk ‘{ if (!seen[$0]++) print}’
      hi
      hello
      hey
      hay
      hiy
      [root@test tmp]#

      Cheers,
      Dan

  • vijayant Jun 26, 2013 @ 7:46

    I want to keep duplicate row in the file and add a new column in file with Md5 value corresponding to each duplicate record.

  • sakthi Jul 4, 2013 @ 13:20

    please see the bellow use as per your requirement
    [:ROOT:/home/spachaiy]cat selective.log.Sat -i |grep failed |awk ‘{print $3 }’ | uniq
    dm01/mail/ssamuel.nsf
    dm01/mail/ssoude.nsf
    dm01/mail/stripath1.nsf
    [:ROOT:/home/spachaiy]cat sel.log -i |awk ‘{print $3 }’ | uniq
    dm01/mail/promero.nsf
    dm01/mail/pscnafaxpjsc.nsf
    dm02/mail/pedca/yesalinas.nsf
    [:ROOT:/home/spachaiy]awk ‘!x[$0]++’ sel.log >se.log
    [:ROOT:/home/spachaiy]cat se.log
    Backup of dm01/mail/promero.nsf failed.
    Backup of dm01/mail/pscnafaxpjsc.nsf failed.
    Backup of dm02/mail/pedca/yesalinas.nsf failed.
    [:ROOT:/home/spachaiy]

  • rohit Nov 15, 2013 @ 2:49

    how do i Write a command to duplicate each line in a file?

  • Si Shangase Nov 18, 2013 @ 11:29

    Piping the output to a newly created file is usually easiest.

    sort firstFile.txt | uniq -u > secondFile.txt

    or you can rewrite the current file:

    sort firstFile.txt | uniq -u >> firstFile.txt

  • leonffs May 6, 2016 @ 20:15

    you can do this with one line in awk

    awk ‘!seen[$0]++’ input.txt > deduped.txt

  • Katrin Jan 27, 2017 @ 20:35

    How can I delete duplicate lines on Excel.Thanks

  • sartyaki Feb 18, 2017 @ 15:38

    how can i remove duplicate lines permarently from my unix file because using uniq command it is deleted temporarily.

Leave a Reply

Your email address will not be published.

Use HTML <pre>...</pre>, <code>...</code> and <kbd>...</kbd> for code samples.