Q. I need to sort data from a log file but there are too many duplicate lines. How do I remove all duplicate lines from a text file under GNU/Linux?
A.. You need to use shell pipes along with following two utilities:
a] sort command - sort lines of text files
b] uniq command - report or omit repeated lines
Removing Duplicate Lines With Sort, Uniq and Shell Pipes
Use the following syntax:
sort {file-name} | uniq -u
sort file.log | uniq -u
Here is a sample test file called garbage.txt:
this is a test food that are killing you wings of fire we hope that the labor spent in creating this software this is a test unix ips as well as enjoy our blog
Type the following command to get rid of all duplicate lines:
$ sort garbage.txt | uniq -u
Sample output:
food that are killing you unix ips as well as enjoy our blog we hope that the labor spent in creating this software wings of fire
Where,
- -u : check for strict ordering, remove all duplicate lines.
You should follow me on twitter here or grab rss feed to keep track of new changes.
Featured Articles:
- 30 Handy Bash Shell Aliases For Linux / Unix / Mac OS X
- Top 30 Nmap Command Examples For Sys/Network Admins
- 25 PHP Security Best Practices For Sys Admins
- 20 Linux System Monitoring Tools Every SysAdmin Should Know
- 20 Linux Server Hardening Security Tips
- Linux: 20 Iptables Examples For New SysAdmins
- Top 20 OpenSSH Server Best Security Practices
- Top 20 Nginx WebServer Best Security Practices
- 20 Examples: Make Sure Unix / Linux Configuration Files Are Free From Syntax Errors
- 15 Greatest Open Source Terminal Applications Of 2012

- My 10 UNIX Command Line Mistakes
- Top 10 Open Source Web-Based Project Management Software
- Top 5 Email Client For Linux, Mac OS X, and Windows Users
- The Novice Guide To Buying A Linux Laptop













{ 27 comments… read them below or add one }
you can use
command: sort -u filename
it gives you the same result
How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
Thank you.
uniq -c will do it Martin.
One more approach keeping the order of lines same as input. The good thing about this is that it can be applied if we need to remove duplicate based on a field or fields.
$ awk ‘!x[$0]++’ garbage.txt
Output:
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
What will be the command to save the output of this command to the same file? Or maybe a script?
@lamp:
sort -u -o new_file old_file
e.g. sort -u -o moregarbage.txt garbage.txt
hi ,
Just try this….:#
sort old_fil | new_file -c
sorry… was going a bit too fast…
sort -u -o garbage.txt garbage.txt
Obviously =)
Read the RUTE. Root users Tutorial and Exposition. This teaches you many of the basics of Linux system administration.
Martin 09.20.08 at 8:41 am
How can change your example so the output would be (without duplicate lines, but one of duplicates is still there)?
this is a test
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
Thank you.
3 Deb 09.20.08 at 10:12 am
uniq -c will do it Martin.
uniq -c removes duplicates and leaves one of the lines that have been duplicated. But it also prefixes each line with the number of duplicates that have been removed.
You must use
uniq
without any of the options and your job is done.
that would be uniq followed by filename.
Does anyone know how should i remove both duplicated lines and put list of all duplicates
into text file
as there will be output
garbage.txt:
food that are killing you
wings of fire
we hope that the labor spent in creating this software
unix ips as well as enjoy our blog
garbage.duplicates.txt:
this is a test
?
GOT IT:
# write dup files into text file
sort file1.txt file2.txt | uniq -d > duplicity.txt
# remove duplicated records from both files
for domain in `cat duplicity.txt`; do sed -e “/${domain}/d” -i file1.txt; sed -e “/${domain}/d” -i file2.txt; done
Thank you so much, this has helped me big time!
How to find the duplicates alone & print in a text file? example i i have in abc.txt
abc dr545.xml
dsf fg456.xml
abc sfg34.xml
I need a text file with it’s output
abc dr545.xml
abc sfg34.xml
just pipe the output to a new filename:
sort abc.txt | uniq > abc_1.txt
Thanks for this post.
Thanks for this. Helped me a great deal. For some reason it did leave duplicates but was still hugely helpful and removed a few duplicates manually is much better then making a 1483 line file into a 379 line file like it was supposed to be without duplicates.
Thanks for the help!! the uniq command took a 1372 line file down to a 32 line file, which was much less daunting!
Also i needed both files so my command was more like this:
sort garbage.file | uniq > garbage_no_duplicates.file
Thanks and it helped me a lot.
I want to redirect only duplicate rows into another file, anyone help me pls?
Rows means dupliacte text data is comma seperated
example:
this,1,country,567,1,1,1,1
that,2,country,678,2,2,2,2
this,1,country,567,3,3,3,3
from the above data, it shoulb be check for duplicate data upto 4 values(this,1,country,567) and also redirect the particular duplicate lines into some another fil, pls help me………………
I have many files with duplicates lines, but instead of removing the duplictaes I would like to append numbers to make the entries unique… how might I do this? could I pipe the uniq output into a sed command perhaps?
cheers,
naph
This command would have been more logical if it had an ‘-a’ option to print all the lines in a file with the duplicates ‘squashed’ into a single line. Sorry Richard.
The below “only print unique lines”, will omitt lines that have duplicates.
uniq -u
The below “only print duplicate lines”, will omitt lines that are unique
uniq -d
I reverted to the below commands to get my ‘-a’ option:
cat messy_file.txt | uniq -u > out.txt
cat messy_file.txt | uniq -d >> out.txt
The number of lines in out.txt should match the below command:
cat messy_file.txt | uniq -c
cat cucu.txt | sort | uniq > cucu2.txt
how do you remove duplicate lines without sorting?
example
hi
hello
hey
hello
hay
hiy
And i want my output to remain the same format and only deleting the duplicate lines
“Preferred Output
hi
hello
hey
hay
hiy