SQL script delete duplicate rows in the table

In the data world, duplicated data is a very common issue. It may duplicated for few fields or even worse you have more then two identical rows in the table and there is no way to distinguish those rows. So how to clean those duplicated data?

DELETE Duplicate Records Using TOP

Since SQL Server 2005 we can use the TOP command when we issue the delete, such as the following.

CREATE TABLE dbo.duplicateTest (
	[id] [INT] NULL,
	[name] [VARCHAR](100) 
)

INSERT INTO dbo.duplicateTest VALUES (1, 'a')
INSERT INTO dbo.duplicateTest VALUES (1, 'a')
INSERT INTO dbo.duplicateTest VALUES (1, 'a')
INSERT INTO dbo.duplicateTest VALUES (2, 'b')
INSERT INTO dbo.duplicateTest VALUES (3, 'c')

DECLARE @id int
WHILE EXISTS (SELECT COUNT(id) FROM dbo.duplicateTest  GROUP BY id HAVING COUNT(id) > 1 ) 
BEGIN
	SELECT TOP 1  @id = id
	FROM dbo.duplicateTest
	GROUP BY id 
	HAVING COUNT(id) > 1 

	DELETE TOP(1) FROM dbo.duplicateTest WHERE id [email protected] 
END

So as you can see using the top in while loop we can remove multiple duplicated records even you are not sure which id is duplicated.

DELETE Duplicate Records Using CTE

another way we can use to remove duplicated records is by using CTE.

TRUNCATE TABLE dbo.duplicateTest
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob') 
INSERT INTO dbo.duplicateTest VALUES(2, 'Dave') 
INSERT INTO dbo.duplicateTest VALUES(3, 'Karen') 
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob')
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob')	


WITH cte1
AS (
   SELECT ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY ID) rank1,*
   FROM dbo.duplicateTest 
)
DELETE FROM cte1 WHERE rank1 > 1