Mysql Remove Duplicate Data or Rows With DISTINCT

by Vivek Gite on December 27, 2006 · 60 comments

How do I remove duplicate data or rows from a MySQL query?

If output from a MySQL query contains duplicate data or row and if you would like to remove the same use DISTINCT.

DISTINCT combined with ORDER BY needs a temporary table in many cases. Consider following query:
mysql> SELECT firstname FROM address;

If you need unique firstname i.e remove duplicate contains type the command:
mysql> SELECT DISTINCT firstname FROM address;

You can use DISTINCT with multiple column as follows:
mysql> SELECT DISTINCT firstname, city FROM address;

For DISTINCT optimization please see this help page.

Featured Articles:

Share this with other sys admins!
Facebook it - Tweet it - Print it -

{ 60 comments… read them below or add one }

1 jafar January 21, 2008


delete from table1
USING table1, table1 as vtable
WHERE (NOT table1.ID=vtable.ID)
AND (table1.field_name=vtable.field_name)

1. Here you tell mysql that there is a table1.
2. Then you tell it that you will use table1 and a virtual table with the values of table1.
3. This will let mysql not compare a record with itself!
4. Here you tell it that there shouldn’t be records with the same field_name.

Reply

2 Susanta Sri October 26, 2010

You saved my day Man!!!

Reply

3 Vikram December 18, 2010

Nice 1 bro…!!

Reply

4 pablo sanchez February 8, 2011

Thank you!

Reply

5 overseer May 16, 2011

Totally awesome.

Reply

6 kalaivani July 7, 2011

Hi frnd,
your query is gud, but i have one doubt while i’m deleting…….
Suppose if i have two or more table and if that table contain more than 80000 records, and i find some duplicate records in my table & try to delete them means the other table will be suffer or not? because i kept all id’s as primary key & set to other table into foreign key, while i’m deleting a primary key, Will foreign key show error? or will other table be suffered?

Reply

7 manish September 4, 2011

thank you jafar bhayi .

Reply

8 kaushik December 1, 2011

thanx jafar..that is gr8 man..:)

Reply

9 jafar January 21, 2008

The DISTINCT command is not working somehow!
Also the delete rule what I send early is not good enough:
It deletes all values which are duplicate,
instead of leaving one behind!

Reply

10 Glen Rubin February 24, 2008

I have fixed this problem and have a tutorial on my website: http://www.weareallhumansnow.com

Reply

11 Amit Vyas March 3, 2008

select count(*) as num, name,id from table group by name having count(*)>1;

Reply

12 mikemon March 25, 2008

The first answer by jafar is close, you just need some condition that is true for all except one of the duplicates.
I ended up using:

delete from table1
USING table1, table1 as vtable
WHERE (table1.ID > vtable.ID)
AND (table1.field_name=vtable.field_name)

Used “greater than” instead of “not equal”. This handles both the conditions that the records not be the same and that only one record will return false.
Since we use greater than (or less than), only the first (or last) duplicate returns false, so it is not selected for deletion.

Reply

13 srsumon May 17, 2010

thx. mikemon. It really helps

Reply

14 Simon November 23, 2010

Thanks Mikemon and Jafar,

It worked for me !

/Simon

Reply

15 dave February 18, 2011

Thanks mikemon and jafar!

Reply

16 SpyGM August 25, 2008

here the code bro..

tmp = table name;
a = cols 1
b = cols 2

ALTER IGNORE TABLE tmp ADD UNIQUE INDEX(a,b);

before :

id | a | b
1 | 1 | 2
2 | 1 | 2
3 | 2 | 3
5 | 3 | 4

after:

id | a | b
2 | 1 | 2
3 | 2 | 3
5 | 3 | 4

tested… :D
this not my idea..
but i read it over net..
and i`m used it..

Reply

17 Anonymous May 17, 2010

Adopted this solution, work like a charm! Thanks a lot

Reply

18 aeon October 31, 2010

You’re a lifesaver man!

Reply

19 bardh7 September 16, 2008

this could also help

before :

id | name
————-
1 | test |
2 | test |
3 | php |
4 | echo |

after:

id | name|
————
2 | test |
3 | php |
4 | echo |

<?php
$cn= mysql_connect("localhost", "root", "") or die('Could not connect');
$db= mysql_select_db(test, $cn) or die('Could not select database');
    $query="SELECT * FROM name";
    $result=mysql_query($query);
    echo mysql_error();
        while($row = mysql_fetch_array($result, MYSQL_BOTH))
        {
                $query1="SELECT * FROM tablename where name = '".$row[1]."'";
                $result1=mysql_query($query1);
                $count = mysql_num_rows($result1) - 1;
                mysql_query("DELETE FROM tablename WHERE name='".$row[1]."' LIMIT $count",$link);
                echo "deleted $row[1] ";
}
        echo mysql_error();
    ?>

Reply

20 MANNULAL August 26, 2011

IT IS THE WAY OF HELPING EACH OTHER

Reply

21 Julia Valencia October 7, 2008

it is difficult to delete duplicates.

Reply

22 Caleb Gray December 31, 2008

This is the proper format, allowing for easy debugging of errors…
DELETE
FROM table1
USING table1, table1 AS vtable
WHERE vtable.id > table1.id
AND table1.field_name = vtable.field_name

If you’re nervous about deleteing everything, you can make sure it’s working first:
SELECT *
FROM table1, table1 AS vtable
WHERE vtable.id > table1.id
AND vtable.field_name = table1.field_name

I just did this and it worked like a charm on hundreds of thousands of database entries.

- Caleb

Reply

23 avo July 12, 2010

it’s been 10 minutes, still “loading” or “processing”… is this normal?

Reply

24 mkquake May 2, 2011

Sure, if you are working with 80286 !

Reply

25 Tunc November 10, 2010

Wow! Saved my life, thanks…

Reply

26 Ricky July 7, 2011

WOW your the MAN!!

I used exactly what you said and then i checked with

SELECT field_name,
COUNT(field_name) AS NumOccurrences
FROM cron_jobs
GROUP BY field_name
HAVING ( COUNT(field_name) > 1 )

Worked like a charm!!

Reply

27 alex January 13, 2009

Probably a stupid question, but what does the ‘id’ in table1.id and vtable.id stand for and does it have to be replaced individually?

Reply

28 blingin' January 15, 2009

The “id” is the primary key column.

Reply

29 Ricardo Furtado January 20, 2009

The Caleb Gray solution worked fine !

Nice code ! Thank you for helping !!

Reply

30 James January 28, 2009

SpyGM is spot on – and saved me a massive amount of time. His solution allows you to delete duplicates based on a combination of columns, rather than just a simple 1 to 1 comparison.

The code I used for my table as follows (based on SpyGM’s comment):

ALTER IGNORE TABLE `referrer` ADD UNIQUE INDEX(user_id,referrer_id);

But as I understand it you could have more than 2 columns in here, or just the 1 if that’s all you need.

Cheers

Reply

31 ganesh December 2, 2011

hey guys this work thanks………..

Reply

32 John February 10, 2009

The following fails if one of the unique index columns is more than 1000 characters long.

ALTER IGNORE TABLE `referrer` ADD UNIQUE INDEX(user_id,referrer_id);

Still useful…

Reply

33 Me April 12, 2009

SpyGM’s solution works for me.

I’d guess it was Mike Chirico’s original he found: http://free.netartmedia.net/Databases/Databases8.html ;o)

Reply

34 Ruggie June 18, 2009

Be careful using SpyGM’s suggestion, creating unique indexes on multiple columns may get rid of duplicates but may also have undesired results.

For example, imagine a table like this:
ID|Email|AcctID|DateAdded
1|a@a.com|12|1/1/2009
2|a@a.com|13|1/3/2009
3|b@b.com|12|2/2/2009
4|a@a.com|12|3/3/2009

If this table is used to associate an email address to an account (that is, multiple email addresses associated to multiple accounts), you would want duplicates available in the email column and acctid column, but not duplicates where both cells are the same. (Record 4 would be the undesired record in this case)
Forcing a unique index on the email and acctid column would wipe out row 2 and 4, when we only want to get rid of row 4.

In situations like this, its a good idea to write your code to check for an existing record already containing those two pieces of information before you insert a new record.
But if you already have a database full of mixed duplicates like this, then the best bet would be to compare every row like in the php example bardh7 listed above.

Also, Indexes get updated every time a database gets updated, and they take additional space on top of your database. Used correctly, they can speed up searches on a database, used incorrectly they can have a negative impact on server performance.

Reply

35 Bruno Correia August 8, 2009

This works:

– remove duplicated entries
– —————————————————————————————
– step 1
CREATE TABLE temp_table AS
SELECT * FROM foo WHERE 1 GROUP BY bar;
– step 2
DROP TABLE foo;
– step 3
RENAME TABLE temp_table TO foo;
– —————————————————————————————

Reply

36 Kyle August 20, 2009

Caleb Gray’s solution also worked fine for me. The only problem I had was that because of the nature of the query a LIMIT clause could be used. I got around this by using a DELETE QUICK and following up after with an OPTIMIZE TABLE table1.

Reply

37 Staffo December 17, 2009

Ruggie is wrong. I tested the ALTER IGNORE command, and it will not remove row with id 3 in his example – just id=4 will be deleted.

Reply

38 Ruggie December 19, 2009

Turns out I did make a mistake.
In the past when I create a unique index I’ve always done it per column. Which can cause undesired effects sometimes.
(Example: ALTER IGNORE TABLE `test` ADD UNIQUE INDEX(user_id); )
I didn’t know that by passing two column identifiers it would create a single index containing data from both columns. I assumed that it would create two indexes.

This feature isn’t documented on the mysql ALTER syntax page but I did find it on the CREATE INDEX syntax page. Here is a quote: “A column list of the form (col1,col2,…) creates a multiple-column index. Index values are formed by concatenating the values of the given columns. ”

Thanks Staffo, your response encouraged me take a second look!

Reply

39 Disappointed February 11, 2010

I could not even read the article because a banner ad was in the way that I could not close. This is a lame web site.

Reply

40 Ashish February 12, 2010

Thanks, jafar and mikemon for the right query.

Reply

41 mohsen March 20, 2010

thank u so mush, this code is a life saver, and u too thanks

Reply

42 dana May 12, 2010

Spygm’s worked great! Thanks,

SpyGM:
ALTER IGNORE TABLE tmp ADD UNIQUE INDEX(a,b);

Reply

43 sa June 5, 2010

qive me mysql query which i delete onlu one duplicate row field thanks

Reply

44 Jim May 6, 2011

TRUNCATE TABLE bar WHERE duplicate IS true;

Reply

45 avo July 12, 2010

how long does it usually take for the query to finish its execution? :S

Reply

46 Gerbald September 12, 2010

This worked for me (no guarantees):

DELETE FROM `MyTable`
WHERE `DuplCol` IN (
SELECT `DuplCol`
FROM (

SELECT `DuplCol` , count( * ) AS c
FROM `MyTable`
GROUP BY `DuplCol`
HAVING c >1
) AS dctbl
)
AND `id` NOT IN (
SELECT `mi`
FROM (
SELECT `DuplCol` , min( `id` ) AS mi
FROM `MyTable`
GROUP BY `DuplCol`
) AS idtbl
);

before:

id | DuplCol
———–
1 | aaa
2 | bbb
3 | bbb
4 | aaa
5 | ccc

after:

id | DuplCol
———–
1 | aaa
2 | bbb
5 | ccc

Reply

47 John Komla December 8, 2011

This saved my day. I even customized this to fit my project, as different users insert into that same table, only records attached to a particular logging session are removed from the table.

This works great !

Thank you again !

Reply

48 Dj November 18, 2010

thanx bro

Reply

49 INDRAJITH December 20, 2010

thanz bro………………………………………………….u saved me

Reply

50 Rangga December 24, 2010

Wow it’s really work . .

Reply

51 brij March 16, 2011

DELETE FROM vehicle_makes USING vehicle_makes,
vehicle_makes AS vtable WHERE (
NOT vehicle_makes.id = vtable.id
) AND (
vehicle_makes.name = vtable.name
)

=====================
id name
=====================
2 Ottawa
3 AM General/Hummer
5 John Deere
4 WRV 6 WRV
17 Blue Bird
20 Chance
21 Crane Carrier
22 Crane Carrier
======================

Reply

52 choudhury March 18, 2011

wow! This really worked for me…Thanks a lot Gerald…

DELETE FROM `MyTable`
WHERE `DuplCol` IN (
SELECT `DuplCol`
FROM (

SELECT `DuplCol` , count( * ) AS c
FROM `MyTable`
GROUP BY `DuplCol`
HAVING c >1
) AS dctbl
)
AND `id` NOT IN (
SELECT `mi`
FROM (
SELECT `DuplCol` , min( `id` ) AS mi
FROM `MyTable`
GROUP BY `DuplCol`
) AS idtbl
);

Reply

53 vikingu October 13, 2011

i’ll test at first, but looks like should workout!!

Reply

54 sane November 10, 2011

!!!!! I need synatax in mysql !!!!!!

i create a table called studentaccount
if i create a row at the end one column called “status” should hav value boolean value as true

if i delete a row the status should be false

what is the syntax plzzzz help me

Reply

55 Afshin November 17, 2011

All of you assume that there is a unique ID or we can create a new table!
Let’s say there is no ID and there is just one or two columns, and we don’t want to create a new table!

any idea?

Reply

56 bob December 13, 2011
DELETE FROM table
 WHERE ID NOT IN (SELECT *
                    FROM (SELECT MIN(n.ID)
                            FROM table n
                        GROUP BY n.FIELDNAME) x)

Reply

57 John Ortiz December 22, 2011

Thanks a lot for this mini-guide. It was useful for me.

Reply

58 Shairyar December 29, 2011

great thanks, exactly what i was looking for.

Reply

59 neeraj January 16, 2012

I like to below query if there is need to remove duplicate rows of table that does not contain primary key:

DELETE FROM items WHERE GREATEST(0,@num := IF(NAME = @NAME, @num + 1, 0),LEAST(0, LENGTH(@NAME := NAME)))>0

In my example items table just has one column “name” and above query removes the duplicate rows

Reply

60 Chicago Web January 17, 2012

Here is a much faster solution for large tables…. The other methods will take hours on a very large table.

First we make a new table with distinct values for the field we are trying to remove duplicates for.

CREATE TABLE new_table as
SELECT * FROM old_table WHERE 1 GROUP BY column_to_remove_duplicates;

Step 2: remove old table

DROP TABLE old_table;

Step 3: mv the new table back to the old one

RENAME TABLE new_table TO old_table;

Done!

Reply

Leave a Comment

You can use these HTML tags and attributes for your code and commands: <strong> <em> <ol> <li> <u> <ul> <blockquote> <pre> <a href="" title="">
What is 10 + 3 ?
Please leave these two fields as-is:
Solve the simple math so we know that you are a human and not a bot.




Previous post:

Next post: