Shell: How To Remove Duplicate Text Lines

Posted on in Categories , , , , , , , , , , , , , , last updated September 19, 2008

Q. I need to sort data from a log file but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?

A.. You need to use shell pipes along with following two utilities:

a] sort command – sort lines of text files

b] uniq command – report or omit repeated lines

Removing Duplicate Lines With Sort, Uniq and Shell Pipes

Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u

Here is a sample test file called garbage.txt:

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
this is a test
unix ips as well as enjoy our blog

Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:

food that are killing you
unix ips as well as enjoy our blog
we hope that the labor spent in creating this software
wings of fire

Where,

  • -u : check for strict ordering, remove all duplicate lines.

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter, Facebook, Google+.

36 comment

    1. This actually leaves one of the duplicates. The command above removes the original *and* the duplicate. Notice in the output “this is a test” doesn’t show up. With ‘sort -u filename’ it will.

  1. How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?


    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    Thank you.

  2. One more approach keeping the order of lines same as input. The good thing about this is that it can be applied if we need to remove duplicate based on a field or fields.

    $ awk ‘!x[$0]++’ garbage.txt

    Output:
    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

  3. Martin 09.20.08 at 8:41 am

    How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?

    this is a test
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    Thank you.
    3 Deb 09.20.08 at 10:12 am

    uniq -c will do it Martin.

    uniq -c removes duplicates and leaves one of the lines that have been duplicated. But it also prefixes each line with the number of duplicates that have been removed.
    You must use
    uniq
    without any of the options and your job is done.

  4. Does anyone know how should i remove both duplicated lines and put list of all duplicates
    into text file
    as there will be output

    garbage.txt:
    food that are killing you
    wings of fire
    we hope that the labor spent in creating this software
    unix ips as well as enjoy our blog

    garbage.duplicates.txt:
    this is a test
    ?

  5. GOT IT:

    # write dup files into text file
    sort file1.txt file2.txt | uniq -d > duplicity.txt
    # remove duplicated records from both files
    for domain in `cat duplicity.txt`; do sed -e “/${domain}/d” -i file1.txt; sed -e “/${domain}/d” -i file2.txt; done

  6. How to find the duplicates alone & print in a text file? example i i have in abc.txt

    abc dr545.xml
    dsf fg456.xml
    abc sfg34.xml

    I need a text file with it’s output

    abc dr545.xml
    abc sfg34.xml

  7. Thanks for this. Helped me a great deal. For some reason it did leave duplicates but was still hugely helpful and removed a few duplicates manually is much better then making a 1483 line file into a 379 line file like it was supposed to be without duplicates.

  8. Rows means dupliacte text data is comma seperated
    example:
    this,1,country,567,1,1,1,1
    that,2,country,678,2,2,2,2
    this,1,country,567,3,3,3,3

    from the above data, it shoulb be check for duplicate data upto 4 values(this,1,country,567) and also redirect the particular duplicate lines into some another fil, pls help me………………

  9. I have many files with duplicates lines, but instead of removing the duplictaes I would like to append numbers to make the entries unique… how might I do this? could I pipe the uniq output into a sed command perhaps?

    cheers,
    naph

  10. This command would have been more logical if it had an ‘-a’ option to print all the lines in a file with the duplicates ‘squashed’ into a single line. Sorry Richard.

    The below “only print unique lines”, will omitt lines that have duplicates.
    uniq -u

    The below “only print duplicate lines”, will omitt lines that are unique
    uniq -d

    I reverted to the below commands to get my ‘-a’ option:

    cat messy_file.txt | uniq -u > out.txt
    cat messy_file.txt | uniq -d >> out.txt

    The number of lines in out.txt should match the below command:

    cat messy_file.txt | uniq -c

  11. how do you remove duplicate lines without sorting?
    example
    hi
    hello
    hey
    hello
    hay
    hiy

    And i want my output to remain the same format and only deleting the duplicate lines

    “Preferred Output
    hi
    hello
    hey
    hay
    hiy

    1. Hi Bryan,

      As far as I understood, you need something like:

      [[email protected] tmp]# cat test.lst
      hi
      hello
      hey
      hello
      hay
      hiy

      [[email protected] tmp]# cat test.lst | awk ‘{ if (!seen[$0]++) print}’
      hi
      hello
      hey
      hay
      hiy
      [[email protected] tmp]#

      Cheers,
      Dan

  12. please see the bellow use as per your requirement
    [:ROOT:/home/spachaiy]cat selective.log.Sat -i |grep failed |awk ‘{print $3 }’ | uniq
    dm01/mail/ssamuel.nsf
    dm01/mail/ssoude.nsf
    dm01/mail/stripath1.nsf
    [:ROOT:/home/spachaiy]cat sel.log -i |awk ‘{print $3 }’ | uniq
    dm01/mail/promero.nsf
    dm01/mail/pscnafaxpjsc.nsf
    dm02/mail/pedca/yesalinas.nsf
    [:ROOT:/home/spachaiy]awk ‘!x[$0]++’ sel.log >se.log
    [:ROOT:/home/spachaiy]cat se.log
    Backup of dm01/mail/promero.nsf failed.
    Backup of dm01/mail/pscnafaxpjsc.nsf failed.
    Backup of dm02/mail/pedca/yesalinas.nsf failed.
    [:ROOT:/home/spachaiy]

  13. Piping the output to a newly created file is usually easiest.

    sort firstFile.txt | uniq -u > secondFile.txt

    or you can rewrite the current file:

    sort firstFile.txt | uniq -u >> firstFile.txt

Leave a Comment