Q. I need to sort data from a log file but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?
A.. You need to use shell pipes along with following two utilities:
a] sort command - sort lines of text files
b] uniq command - report or omit repeated lines
Removing Duplicate Lines With Sort, Uniq and Shell Pipes
Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u
Here is a sample test file called garbage.txt:
this is a test food that are killing you wings of fire we hope that the labor spent in creating this software this is a test unix ips as well as enjoy our blog
Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:
food that are killing you unix ips as well as enjoy our blog we hope that the labor spent in creating this software wings of fire
Where,
- -u : check for strict ordering, remove all duplicate lines.
Featured Articles:
- 20 Linux System Monitoring Tools Every SysAdmin Should Know
- My 10 UNIX Command Line Mistakes
- 10 Greatest Open Source Software Of 2009
- Top 5 Email Client For Linux, Mac OS X, and Windows Users
- Top 20 OpenSSH Server Best Security Practices
- Top 10 Open Source Web-Based Project Management Software
- Top 5 Linux Video Editor Software
Want to read Linux tips and tricks, but don't have time to check our blog everyday? Subscribe to our daily email newsletter to make sure you don't miss a single tip/tricks. Subscribe to our weekly newsletter here!
- Email FAQ to a friend
- Download PDF version
- Printable version
- Comment RSS feed
- Last Updated: 09/19/08



{ 12 comments… read them below or add one }
you can use
command: sort -u filename
it gives you the same result
How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
Thank you.
uniq -c will do it Martin.
One more approach keeping the order of lines same as input. The good thing about this is that it can be applied if we need to remove duplicate based on a field or fields.
$ awk ‘!x[$0]++’ garbage.txt
Output:
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
What will be the command to save the output of this command to the same file? Or maybe a script?
@lamp:
sort -u -o new_file old_file
e.g. sort -u -o moregarbage.txt garbage.txt
sorry… was going a bit too fast…
sort -u -o garbage.txt garbage.txt
Obviously =)
Read the RUTE. Root users Tutorial and Exposition. This teaches you many of the basics of Linux system administration.
Martin 09.20.08 at 8:41 am
How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
Thank you.
3 Deb 09.20.08 at 10:12 am
uniq -c will do it Martin.
uniq -c removes duplicates and leaves one of the lines that have been duplicated. But it also prefixes each line with the number of duplicates that have been removed.
You must use
uniq
without any of the options and your job is done.
that would be uniq followed by filename.
Does anyone know how should i remove both duplicated lines and put list of all duplicates
into text file
as there will be output
garbage.txt:
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
garbage.duplicates.txt:
this is a test
?
GOT IT:
# write dup files into text file
sort file1.txt file2.txt | uniq -d > duplicity.txt
# remove duplicated records from both files
for domain in `cat duplicity.txt`; do sed -e “/${domain}/d” -i file1.txt; sed -e “/${domain}/d” -i file2.txt; done