≡ Menu

Shell: How To Remove Duplicate Text Lines

Q. I need to sort data from a log file but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?

A.. You need to use shell pipes along with following two utilities:

a] sort command - sort lines of text files

b] uniq command - report or omit repeated lines

Removing Duplicate Lines With Sort, Uniq and Shell Pipes

Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u

Here is a sample test file called garbage.txt:

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
this is a test
unix ips as well as enjoy our blog

Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:

food that are killing you
unix ips as well as enjoy our blog
we hope that the labor spent in creating this software
wings of fire

Where,

  • -u : check for strict ordering, remove all duplicate lines.
Tweet itFacebook itGoogle+ itPDF itFound an error/typo on this page?

{ 33 comments… add one }

  • Bahaa Soliman September 20, 2008, 7:43 am

    you can use
    command: sort -u filename
    it gives you the same result

    • Josh July 22, 2015, 7:36 pm

      This actually leaves one of the duplicates. The command above removes the original *and* the duplicate. Notice in the output “this is a test” doesn’t show up. With ‘sort -u filename’ it will.

  • Martin September 20, 2008, 8:41 am

    How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?


    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    Thank you.

  • Deb September 20, 2008, 10:12 am

    uniq -c will do it Martin.

  • Jadu Saikia September 21, 2008, 3:34 am

    One more approach keeping the order of lines same as input. The good thing about this is that it can be applied if we need to remove duplicate based on a field or fields.

    $ awk ‘!x[$0]++’ garbage.txt

    Output:
    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

  • LAMP docs December 3, 2008, 5:51 am

    What will be the command to save the output of this command to the same file? Or maybe a script?

  • dude April 5, 2009, 8:46 pm

    @lamp:
    sort -u -o new_file old_file
    e.g. sort -u -o moregarbage.txt garbage.txt

    • ramesh October 31, 2012, 11:56 am

      hi ,
      Just try this….:#
      sort old_fil | new_file -c

  • dude April 5, 2009, 8:48 pm

    sorry… was going a bit too fast…
    sort -u -o garbage.txt garbage.txt
    Obviously =)

  • Mikey July 10, 2009, 12:33 am

    Read the RUTE. Root users Tutorial and Exposition. This teaches you many of the basics of Linux system administration.

  • Amber August 10, 2009, 6:32 am

    Martin 09.20.08 at 8:41 am

    How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?

    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    Thank you.
    3 Deb 09.20.08 at 10:12 am

    uniq -c will do it Martin.

    uniq -c removes duplicates and leaves one of the lines that have been duplicated. But it also prefixes each line with the number of duplicates that have been removed.
    You must use
    uniq
    without any of the options and your job is done.

  • Amber August 10, 2009, 6:33 am

    that would be uniq followed by filename.

  • rh August 31, 2009, 10:50 am

    Does anyone know how should i remove both duplicated lines and put list of all duplicates
    into text file
    as there will be output

    garbage.txt:
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    garbage.duplicates.txt:
    this is a test
    ?

  • rh August 31, 2009, 2:10 pm

    GOT IT:

    # write dup files into text file
    sort file1.txt file2.txt | uniq -d > duplicity.txt
    # remove duplicated records from both files
    for domain in `cat duplicity.txt`; do sed -e “/${domain}/d” -i file1.txt; sed -e “/${domain}/d” -i file2.txt; done

  • Nathan June 8, 2010, 4:53 pm

    Thank you so much, this has helped me big time!

  • Vivek June 27, 2010, 6:36 am

    How to find the duplicates alone & print in a text file? example i i have in abc.txt

    abc dr545.xml
    dsf fg456.xml
    abc sfg34.xml

    I need a text file with it’s output

    abc dr545.xml
    abc sfg34.xml

    • stacy October 28, 2010, 1:27 am

      just pipe the output to a new filename:

      sort abc.txt | uniq > abc_1.txt

      • Pranab Rana June 6, 2011, 12:20 pm

        Thanks for this post.

  • Joseph July 30, 2010, 8:59 pm

    Thanks for this. Helped me a great deal. For some reason it did leave duplicates but was still hugely helpful and removed a few duplicates manually is much better then making a 1483 line file into a 379 line file like it was supposed to be without duplicates.

  • stacy October 28, 2010, 1:22 am

    Thanks for the help!! the uniq command took a 1372 line file down to a 32 line file, which was much less daunting!

  • stacy October 28, 2010, 1:24 am

    Also i needed both files so my command was more like this:

    sort garbage.file | uniq > garbage_no_duplicates.file

  • web designing chennai August 11, 2011, 4:58 am

    Thanks and it helped me a lot.

  • Rakesh August 17, 2011, 4:56 pm

    I want to redirect only duplicate rows into another file, anyone help me pls?

  • Rakesh August 17, 2011, 5:01 pm

    Rows means dupliacte text data is comma seperated
    example:
    this,1,country,567,1,1,1,1
    that,2,country,678,2,2,2,2
    this,1,country,567,3,3,3,3

    from the above data, it shoulb be check for duplicate data upto 4 values(this,1,country,567) and also redirect the particular duplicate lines into some another fil, pls help me………………

  • naph March 21, 2012, 8:13 am

    I have many files with duplicates lines, but instead of removing the duplictaes I would like to append numbers to make the entries unique… how might I do this? could I pipe the uniq output into a sed command perhaps?

    cheers,
    naph

  • John McLean June 3, 2012, 12:48 pm

    This command would have been more logical if it had an ‘-a’ option to print all the lines in a file with the duplicates ‘squashed’ into a single line. Sorry Richard.

    The below “only print unique lines”, will omitt lines that have duplicates.
    uniq -u

    The below “only print duplicate lines”, will omitt lines that are unique
    uniq -d

    I reverted to the below commands to get my ‘-a’ option:

    cat messy_file.txt | uniq -u > out.txt
    cat messy_file.txt | uniq -d >> out.txt

    The number of lines in out.txt should match the below command:

    cat messy_file.txt | uniq -c

  • miezu July 11, 2012, 10:58 am

    cat cucu.txt | sort | uniq > cucu2.txt

  • bryan February 8, 2013, 8:11 am

    how do you remove duplicate lines without sorting?
    example
    hi
    hello
    hey
    hello
    hay
    hiy

    And i want my output to remain the same format and only deleting the duplicate lines

    “Preferred Output
    hi
    hello
    hey
    hay
    hiy

    • dan September 26, 2013, 10:44 am

      Hi Bryan,

      As far as I understood, you need something like:

      [root@test tmp]# cat test.lst
      hi
      hello
      hey
      hello
      hay
      hiy

      [root@test tmp]# cat test.lst | awk ‘{ if (!seen[$0]++) print}’
      hi
      hello
      hey
      hay
      hiy
      [root@test tmp]#

      Cheers,
      Dan

  • vijayant June 26, 2013, 7:46 am

    I want to keep duplicate row in the file and add a new column in file with Md5 value corresponding to each duplicate record.

  • sakthi July 4, 2013, 1:20 pm

    please see the bellow use as per your requirement
    [:ROOT:/home/spachaiy]cat selective.log.Sat -i |grep failed |awk ‘{print $3 }’ | uniq
    dm01/mail/ssamuel.nsf
    dm01/mail/ssoude.nsf
    dm01/mail/stripath1.nsf
    [:ROOT:/home/spachaiy]cat sel.log -i |awk ‘{print $3 }’ | uniq
    dm01/mail/promero.nsf
    dm01/mail/pscnafaxpjsc.nsf
    dm02/mail/pedca/yesalinas.nsf
    [:ROOT:/home/spachaiy]awk ‘!x[$0]++’ sel.log >se.log
    [:ROOT:/home/spachaiy]cat se.log
    Backup of dm01/mail/promero.nsf failed.
    Backup of dm01/mail/pscnafaxpjsc.nsf failed.
    Backup of dm02/mail/pedca/yesalinas.nsf failed.
    [:ROOT:/home/spachaiy]

  • rohit November 15, 2013, 2:49 am

    how do i Write a command to duplicate each line in a file?

  • Si Shangase November 18, 2013, 11:29 am

    Piping the output to a newly created file is usually easiest.

    sort firstFile.txt | uniq -u > secondFile.txt

    or you can rewrite the current file:

    sort firstFile.txt | uniq -u >> firstFile.txt

Leave a Comment