Mysql remove duplicate data or rows with DISTINCT

by Vivek Gite · 22 comments

Q. How do I remove duplicate data or rows from a MySQL query?

A. If output from a MySQL query contains duplicate data or row and if you would like to remove the same use DISTINCT.

DISTINCT combined with ORDER BY needs a temporary table in many cases.
Consider following query:
mysql> SELECT firstname FROM address;

If you need unique firstname i.e remove duplicate contains type the command:
mysql> SELECT DISTINCT firstname FROM address;

You can use DISTINCT with multiple column as follows:
mysql> SELECT DISTINCT firstname, city FROM address;

For DISTINCT optimization please see this help page.

Featured Articles:

Want to read Linux tips and tricks, but don't have time to check our blog everyday? Subscribe to our daily email newsletter to make sure you don't miss a single tip/tricks. Subscribe to our weekly newsletter here!

{ 22 comments… read them below or add one }

1 jafar 01.21.08 at 6:03 pm


delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.field_name=vtable.field_name)

1. Here you tell mysql that there is a table1.
2. Then you tell it that you will use table1 and a virtual table with the values of table1.
3. This will let mysql not compare a record with itself!
4. Here you tell it that there shouldn’t be records with the same field_name.

2 jafar 01.21.08 at 6:22 pm

The DISTINCT command is not working somehow!
Also the delete rule what I send early is not good enough:
It deletes all values which are duplicate,
instead of leaving one behind!

3 Glen Rubin 02.24.08 at 12:29 am

I have fixed this problem and have a tutorial on my website: http://www.weareallhumansnow.com

4 Amit Vyas 03.03.08 at 10:24 am

select count(*) as num, name,id from table group by name having count(*)>1;

5 mikemon 03.25.08 at 6:05 pm

The first answer by jafar is close, you just need some condition that is true for all except one of the duplicates.
I ended up using:

delete from table1
USING table1, table1 as vtable
WHERE (table1.ID > vtable.ID)
AND (table1.field_name=vtable.field_name)

Used “greater than” instead of “not equal”. This handles both the conditions that the records not be the same and that only one record will return false.
Since we use greater than (or less than), only the first (or last) duplicate returns false, so it is not selected for deletion.

6 SpyGM 08.25.08 at 2:36 pm

here the code bro..

tmp = table name;
a = cols 1
b = cols 2

ALTER IGNORE TABLE tmp ADD UNIQUE INDEX(a,b);

before :

id | a | b
1 | 1 | 2
2 | 1 | 2
3 | 2 | 3
5 | 3 | 4

after:

id | a | b
2 | 1 | 2
3 | 2 | 3
5 | 3 | 4

tested… :D
this not my idea..
but i read it over net..
and i`m used it..

7 bardh7 09.16.08 at 6:18 pm

this could also help

before :

id | name
————-
1 | test |
2 | test |
3 | php |
4 | echo |

after:

id | name|
————
2 | test |
3 | php |
4 | echo |

<?php
$cn= mysql_connect("localhost", "root", "") or die('Could not connect');
$db= mysql_select_db(test, $cn) or die('Could not select database');

    $query="SELECT * FROM name"; 

    $result=mysql_query($query);
    echo mysql_error();

        while($row = mysql_fetch_array($result, MYSQL_BOTH))

        {
                $query1="SELECT * FROM tablename where name = '".$row[1]."'"; 

                $result1=mysql_query($query1);
                $count = mysql_num_rows($result1) - 1;

                mysql_query("DELETE FROM tablename WHERE name='".$row[1]."' LIMIT $count",$link);
                echo "deleted $row[1] ";

}
        echo mysql_error();
    ?>
8 Julia Valencia 10.07.08 at 11:06 am

it is difficult to delete duplicates.

9 Caleb Gray 12.31.08 at 2:04 am

This is the proper format, allowing for easy debugging of errors…
DELETE
FROM table1
USING table1, table1 AS vtable
WHERE vtable.id > table1.id
AND table1.field_name = vtable.field_name

If you’re nervous about deleteing everything, you can make sure it’s working first:
SELECT *
FROM table1, table1 AS vtable
WHERE vtable.id > table1.id
AND vtable.field_name = table1.field_name

I just did this and it worked like a charm on hundreds of thousands of database entries.

- Caleb

10 alex 01.13.09 at 1:56 pm

Probably a stupid question, but what does the ‘id’ in table1.id and vtable.id stand for and does it have to be replaced individually?

11 blingin' 01.15.09 at 6:25 pm

The “id” is the primary key column.

12 Ricardo Furtado 01.20.09 at 7:31 pm

The Caleb Gray solution worked fine !

Nice code ! Thank you for helping !!

13 James 01.28.09 at 1:38 pm

SpyGM is spot on – and saved me a massive amount of time. His solution allows you to delete duplicates based on a combination of columns, rather than just a simple 1 to 1 comparison.

The code I used for my table as follows (based on SpyGM’s comment):

ALTER IGNORE TABLE `referrer` ADD UNIQUE INDEX(user_id,referrer_id);

But as I understand it you could have more than 2 columns in here, or just the 1 if that’s all you need.

Cheers

14 John 02.10.09 at 5:56 am

The following fails if one of the unique index columns is more than 1000 characters long.

ALTER IGNORE TABLE `referrer` ADD UNIQUE INDEX(user_id,referrer_id);

Still useful…

15 Me 04.12.09 at 12:08 am

SpyGM’s solution works for me.

I’d guess it was Mike Chirico’s original he found: http://free.netartmedia.net/Databases/Databases8.html ;o)

16 Ruggie 06.18.09 at 9:10 pm

Be careful using SpyGM’s suggestion, creating unique indexes on multiple columns may get rid of duplicates but may also have undesired results.

For example, imagine a table like this:
ID|Email|AcctID|DateAdded
1|a@a.com|12|1/1/2009
2|a@a.com|13|1/3/2009
3|b@b.com|12|2/2/2009
4|a@a.com|12|3/3/2009

If this table is used to associate an email address to an account (that is, multiple email addresses associated to multiple accounts), you would want duplicates available in the email column and acctid column, but not duplicates where both cells are the same. (Record 4 would be the undesired record in this case)
Forcing a unique index on the email and acctid column would wipe out row 2 and 4, when we only want to get rid of row 4.

In situations like this, its a good idea to write your code to check for an existing record already containing those two pieces of information before you insert a new record.
But if you already have a database full of mixed duplicates like this, then the best bet would be to compare every row like in the php example bardh7 listed above.

Also, Indexes get updated every time a database gets updated, and they take additional space on top of your database. Used correctly, they can speed up searches on a database, used incorrectly they can have a negative impact on server performance.

17 Bruno Correia 08.08.09 at 10:05 pm

This works:

– remove duplicated entries
– —————————————————————————————
– step 1
CREATE TABLE temp_table AS
SELECT * FROM foo WHERE 1 GROUP BY bar;
– step 2
DROP TABLE foo;
– step 3
RENAME TABLE temp_table TO foo;
– —————————————————————————————

18 Kyle 08.20.09 at 6:10 pm

Caleb Gray’s solution also worked fine for me. The only problem I had was that because of the nature of the query a LIMIT clause could be used. I got around this by using a DELETE QUICK and following up after with an OPTIMIZE TABLE table1.

19 Staffo 12.17.09 at 2:43 pm

Ruggie is wrong. I tested the ALTER IGNORE command, and it will not remove row with id 3 in his example – just id=4 will be deleted.

20 Ruggie 12.19.09 at 11:27 pm

Turns out I did make a mistake.
In the past when I create a unique index I’ve always done it per column. Which can cause undesired effects sometimes.
(Example: ALTER IGNORE TABLE `test` ADD UNIQUE INDEX(user_id); )
I didn’t know that by passing two column identifiers it would create a single index containing data from both columns. I assumed that it would create two indexes.

This feature isn’t documented on the mysql ALTER syntax page but I did find it on the CREATE INDEX syntax page. Here is a quote: “A column list of the form (col1,col2,…) creates a multiple-column index. Index values are formed by concatenating the values of the given columns. ”

Thanks Staffo, your response encouraged me take a second look!

21 Disappointed 02.11.10 at 8:51 am

I could not even read the article because a banner ad was in the way that I could not close. This is a lame web site.

22 Ashish 02.12.10 at 11:34 am

Thanks, jafar and mikemon for the right query.

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Previous FAQ:

Next FAQ:

nixCraft FAQ PDF Collection Now Available To All