Shell: How To Remove Duplicate Text Lines

by Vivek Gite · 12 comments

Q. I need to sort data from a log file but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?

A.. You need to use shell pipes along with following two utilities:

a] sort command - sort lines of text files

b] uniq command - report or omit repeated lines

Removing Duplicate Lines With Sort, Uniq and Shell Pipes

Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u

Here is a sample test file called garbage.txt:

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
this is a test
unix ips as well as enjoy our blog

Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:

food that are killing you
unix ips as well as enjoy our blog
we hope that the labor spent in creating this software
wings of fire

Where,

  • -u : check for strict ordering, remove all duplicate lines.

Featured Articles:

Want to read Linux tips and tricks, but don't have time to check our blog everyday? Subscribe to our daily email newsletter to make sure you don't miss a single tip/tricks. Subscribe to our weekly newsletter here!

{ 12 comments… read them below or add one }

1 Bahaa Soliman 09.20.08 at 7:43 am

you can use
command: sort -u filename
it gives you the same result

2 Martin 09.20.08 at 8:41 am

How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?


this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

Thank you.

3 Deb 09.20.08 at 10:12 am

uniq -c will do it Martin.

4 Jadu Saikia 09.21.08 at 3:34 am

One more approach keeping the order of lines same as input. The good thing about this is that it can be applied if we need to remove duplicate based on a field or fields.

$ awk ‘!x[$0]++’ garbage.txt

Output:
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

5 LAMP docs 12.03.08 at 5:51 am

What will be the command to save the output of this command to the same file? Or maybe a script?

6 dude 04.05.09 at 8:46 pm

@lamp:
sort -u -o new_file old_file
e.g. sort -u -o moregarbage.txt garbage.txt

7 dude 04.05.09 at 8:48 pm

sorry… was going a bit too fast…
sort -u -o garbage.txt garbage.txt
Obviously =)

8 Mikey 07.10.09 at 12:33 am

Read the RUTE. Root users Tutorial and Exposition. This teaches you many of the basics of Linux system administration.

9 Amber 08.10.09 at 6:32 am

Martin 09.20.08 at 8:41 am

How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?

this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

Thank you.
3 Deb 09.20.08 at 10:12 am

uniq -c will do it Martin.

uniq -c removes duplicates and leaves one of the lines that have been duplicated. But it also prefixes each line with the number of duplicates that have been removed.
You must use
uniq
without any of the options and your job is done.

10 Amber 08.10.09 at 6:33 am

that would be uniq followed by filename.

11 rh 08.31.09 at 10:50 am

Does anyone know how should i remove both duplicated lines and put list of all duplicates
into text file
as there will be output

garbage.txt:
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog

garbage.duplicates.txt:
this is a test
?

12 rh 08.31.09 at 2:10 pm

GOT IT:

# write dup files into text file
sort file1.txt file2.txt | uniq -d > duplicity.txt
# remove duplicated records from both files
for domain in `cat duplicity.txt`; do sed -e “/${domain}/d” -i file1.txt; sed -e “/${domain}/d” -i file2.txt; done

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous FAQ:

Next FAQ:

nixCraft FAQ PDF Collection Now Available To All