Shell: How To Remove Duplicate Text Lines

by on September 19, 2008 · 32 comments· LAST UPDATED September 19, 2008

in , ,

Q. I need to sort data from a log file but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?

A.. You need to use shell pipes along with following two utilities:

a] sort command - sort lines of text files

b] uniq command - report or omit repeated lines

Removing Duplicate Lines With Sort, Uniq and Shell Pipes

Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u

Here is a sample test file called garbage.txt:

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
this is a test
unix ips as well as enjoy our blog

Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:

food that are killing you
unix ips as well as enjoy our blog
we hope that the labor spent in creating this software
wings of fire

Where,

  • -u : check for strict ordering, remove all duplicate lines.
TwitterFacebookGoogle+PDF versionFound an error/typo on this page? Help us!

{ 32 comments… read them below or add one }

1 Bahaa Soliman September 20, 2008 at 7:43 am

you can use
command: sort -u filename
it gives you the same result

Reply

2 Martin September 20, 2008 at 8:41 am

How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?


this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

Thank you.

Reply

3 Deb September 20, 2008 at 10:12 am

uniq -c will do it Martin.

Reply

4 Jadu Saikia September 21, 2008 at 3:34 am

One more approach keeping the order of lines same as input. The good thing about this is that it can be applied if we need to remove duplicate based on a field or fields.

$ awk ‘!x[$0]++’ garbage.txt

Output:
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

Reply

5 LAMP docs December 3, 2008 at 5:51 am

What will be the command to save the output of this command to the same file? Or maybe a script?

Reply

6 dude April 5, 2009 at 8:46 pm

@lamp:
sort -u -o new_file old_file
e.g. sort -u -o moregarbage.txt garbage.txt

Reply

7 ramesh October 31, 2012 at 11:56 am

hi ,
Just try this….:#
sort old_fil | new_file -c

Reply

8 dude April 5, 2009 at 8:48 pm

sorry… was going a bit too fast…
sort -u -o garbage.txt garbage.txt
Obviously =)

Reply

9 Mikey July 10, 2009 at 12:33 am

Read the RUTE. Root users Tutorial and Exposition. This teaches you many of the basics of Linux system administration.

Reply

10 Amber August 10, 2009 at 6:32 am

Martin 09.20.08 at 8:41 am

How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

Thank you.
3 Deb 09.20.08 at 10:12 am

uniq -c will do it Martin.

uniq -c removes duplicates and leaves one of the lines that have been duplicated. But it also prefixes each line with the number of duplicates that have been removed.
You must use
uniq
without any of the options and your job is done.

Reply

11 Amber August 10, 2009 at 6:33 am

that would be uniq followed by filename.

Reply

12 rh August 31, 2009 at 10:50 am

Does anyone know how should i remove both duplicated lines and put list of all duplicates
into text file
as there will be output

garbage.txt:
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

garbage.duplicates.txt:
this is a test
?

Reply

13 rh August 31, 2009 at 2:10 pm

GOT IT:

# write dup files into text file
sort file1.txt file2.txt | uniq -d > duplicity.txt
# remove duplicated records from both files
for domain in `cat duplicity.txt`; do sed -e “/${domain}/d” -i file1.txt; sed -e “/${domain}/d” -i file2.txt; done

Reply

14 Nathan June 8, 2010 at 4:53 pm

Thank you so much, this has helped me big time!

Reply

15 Vivek June 27, 2010 at 6:36 am

How to find the duplicates alone & print in a text file? example i i have in abc.txt

abc dr545.xml
dsf fg456.xml
abc sfg34.xml

I need a text file with it’s output

abc dr545.xml
abc sfg34.xml

Reply

16 stacy October 28, 2010 at 1:27 am

just pipe the output to a new filename:

sort abc.txt | uniq > abc_1.txt

Reply

17 Pranab Rana June 6, 2011 at 12:20 pm

Thanks for this post.

Reply

18 Joseph July 30, 2010 at 8:59 pm

Thanks for this. Helped me a great deal. For some reason it did leave duplicates but was still hugely helpful and removed a few duplicates manually is much better then making a 1483 line file into a 379 line file like it was supposed to be without duplicates.

Reply

19 stacy October 28, 2010 at 1:22 am

Thanks for the help!! the uniq command took a 1372 line file down to a 32 line file, which was much less daunting!

Reply

20 stacy October 28, 2010 at 1:24 am

Also i needed both files so my command was more like this:

sort garbage.file | uniq > garbage_no_duplicates.file

Reply

21 web designing chennai August 11, 2011 at 4:58 am

Thanks and it helped me a lot.

Reply

22 Rakesh August 17, 2011 at 4:56 pm

I want to redirect only duplicate rows into another file, anyone help me pls?

Reply

23 Rakesh August 17, 2011 at 5:01 pm

Rows means dupliacte text data is comma seperated
example:
this,1,country,567,1,1,1,1
that,2,country,678,2,2,2,2
this,1,country,567,3,3,3,3

from the above data, it shoulb be check for duplicate data upto 4 values(this,1,country,567) and also redirect the particular duplicate lines into some another fil, pls help me………………

Reply

24 naph March 21, 2012 at 8:13 am

I have many files with duplicates lines, but instead of removing the duplictaes I would like to append numbers to make the entries unique… how might I do this? could I pipe the uniq output into a sed command perhaps?

cheers,
naph

Reply

25 John McLean June 3, 2012 at 12:48 pm

This command would have been more logical if it had an ‘-a’ option to print all the lines in a file with the duplicates ‘squashed’ into a single line. Sorry Richard.

The below “only print unique lines”, will omitt lines that have duplicates.
uniq -u

The below “only print duplicate lines”, will omitt lines that are unique
uniq -d

I reverted to the below commands to get my ‘-a’ option:

cat messy_file.txt | uniq -u > out.txt
cat messy_file.txt | uniq -d >> out.txt

The number of lines in out.txt should match the below command:

cat messy_file.txt | uniq -c

Reply

26 miezu July 11, 2012 at 10:58 am

cat cucu.txt | sort | uniq > cucu2.txt

Reply

27 bryan February 8, 2013 at 8:11 am

how do you remove duplicate lines without sorting?
example
hi
hello
hey
hello
hay
hiy

And i want my output to remain the same format and only deleting the duplicate lines

“Preferred Output
hi
hello
hey
hay
hiy

Reply

28 dan September 26, 2013 at 10:44 am

Hi Bryan,

As far as I understood, you need something like:

[root@test tmp]# cat test.lst
hi
hello
hey
hello
hay
hiy

[root@test tmp]# cat test.lst | awk ‘{ if (!seen[$0]++) print}’
hi
hello
hey
hay
hiy
[root@test tmp]#

Cheers,
Dan

Reply

29 vijayant June 26, 2013 at 7:46 am

I want to keep duplicate row in the file and add a new column in file with Md5 value corresponding to each duplicate record.

Reply

30 sakthi July 4, 2013 at 1:20 pm

please see the bellow use as per your requirement
[:ROOT:/home/spachaiy]cat selective.log.Sat -i |grep failed |awk ‘{print $3 }’ | uniq
dm01/mail/ssamuel.nsf
dm01/mail/ssoude.nsf
dm01/mail/stripath1.nsf
[:ROOT:/home/spachaiy]cat sel.log -i |awk ‘{print $3 }’ | uniq
dm01/mail/promero.nsf
dm01/mail/pscnafaxpjsc.nsf
dm02/mail/pedca/yesalinas.nsf
[:ROOT:/home/spachaiy]awk ‘!x[$0]++’ sel.log >se.log
[:ROOT:/home/spachaiy]cat se.log
Backup of dm01/mail/promero.nsf failed.
Backup of dm01/mail/pscnafaxpjsc.nsf failed.
Backup of dm02/mail/pedca/yesalinas.nsf failed.
[:ROOT:/home/spachaiy]

Reply

31 rohit November 15, 2013 at 2:49 am

how do i Write a command to duplicate each line in a file?

Reply

32 Si Shangase November 18, 2013 at 11:29 am

Piping the output to a newly created file is usually easiest.

sort firstFile.txt | uniq -u > secondFile.txt

or you can rewrite the current file:

sort firstFile.txt | uniq -u >> firstFile.txt

Reply

Leave a Comment

Tagged as: , , , , , , ,

Previous Faq:

Next Faq: