在SQL Server中查找重复的行

Finding duplicate rows in SQL Server

我有一个组织的SQL Server数据库,并且有许多重复的行。 我想运行一个select语句来获取所有这些和dupes的数量,但也返回与每个组织关联的id。

声明如下:

1
2
3
4
SELECT     orgName, COUNT(*) AS dupes  
FROM         organizations  
GROUP BY orgName  
HAVING      (COUNT(*) > 1)

将返回类似的东西

1
2
3
4
orgName        | dupes  
ABC Corp       | 7  
Foo Federation | 5  
Widget Company | 2

但我也想抓住他们的身份证。 有没有办法做到这一点? 也许就像一个

1
2
3
4
5
6
orgName        | dupeCount | id  
ABC Corp       | 1         | 34  
ABC Corp       | 2         | 5  
...  
Widget Company | 1         | 10  
Widget Company | 2         | 2

原因是还有一个单独的用户表链接到这些组织,我想统一它们(因此删除欺骗,以便用户链接到同一组织而不是欺骗组织)。 但我想手动分配,所以我不会搞砸任何东西,但我仍然需要一个声明返回所有欺骗组织的ID,以便我可以浏览用户列表。


1
2
3
4
5
6
7
8
SELECT o.orgName, oc.dupeCount, o.id
FROM organizations o
INNER JOIN (
    SELECT orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc ON o.orgName = oc.orgName


您可以运行以下查询并使用max(id)查找重复项并删除这些行。

1
2
3
4
SELECT orgName, COUNT(*), MAX(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)

但是你必须运行几次这个查询。


你可以这样做:

1
2
3
4
5
6
7
8
9
SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) AS intCount
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName

如果您只想返回可以删除的记录(只留下其中一个),您可以使用:

1
2
3
4
5
6
7
8
9
SELECT
    id, orgName
FROM (
     SELECT
         orgName, id,
         ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
     FROM organizations
) AS d
WHERE intRow != 1

编辑:SQL Server 2000没有ROW_NUMBER()函数。相反,你可以使用:

1
2
3
4
5
6
7
8
9
10
SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) AS intCount, MIN(id) AS minId
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id


标记为正确的解决方案对我不起作用,但我发现这个答案非常有用:获取MySql中重复行的列表

1
2
3
4
5
SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id


你可以尝试这个,它最适合你

1
2
3
4
5
6
 WITH CTE AS
    (
    SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
    )
    SELECT * FROM CTE WHERE RN>1
    GO


1
SELECT * FROM [Employees]

>
</p>
<p>
用于查找重复记录<br />
1)使用CTE
</p>
<div class=

1
2
3
4
5
6
WITH mycte
AS
(
SELECT Name,EmailId,ROW_NUMBER() OVER(partition BY Name,EmailId ORDER BY id) AS Duplicate FROM [Employees]
)
SELECT * FROM mycte

>
</p>
<p>
2)使用GroupBy
</p>
<div class=

1
SELECT Name,EmailId,COUNT(name) AS Duplicate FROM  [Employees] GROUP BY Name,EmailId


如果要删除重复项:

1
2
3
4
5
6
WITH CTE AS(
   SELECT orgName,id,
       RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
   FROM organizations
)
DELETE FROM CTE WHERE RN > 1

1
2
3
SELECT * FROM (SELECT orgName,id,
ROW_NUMBER() OVER(Partition BY OrgName ORDER BY id DESC) Rownum
FROM organizations )tbl WHERE Rownum>1

因此,rowum> 1的记录将是表中的重复记录。 '由第一组按记录分区,然后通过给它们序列号序列化它们。
所以rownum> 1将是可以删除的重复记录。


1
2
3
4
5
6
7
8
9
SELECT a.orgName,b.duplicate, a.id
FROM organizations a
INNER JOIN (
    SELECT orgName, COUNT(*) AS duplicate
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) b ON o.orgName = oc.orgName
GROUP BY a.orgName,a.id

1
2
3
4
SELECT column_name, COUNT(column_name)
FROM TABLE_NAME
GROUP BY column_name
HAVING COUNT (column_name) > 1;

Src:https://stackoverflow.com/a/59242/1465252


1
2
3
4
5
6
7
8
9
SELECT orgname, COUNT(*) AS dupes, id
FROM organizations
WHERE orgname IN (
    SELECT orgname
    FROM organizations
    GROUP BY orgname
    HAVING (COUNT(*) > 1)
)
GROUP BY orgname, id


您可以通过多种方式选择duplicate rows

对于我的解决方案,首先考虑这个表格

1
2
3
4
5
6
7
8
9
10
11
12
13
CREATE TABLE #Employee
(
ID          INT,
FIRST_NAME  NVARCHAR(100),
LAST_NAME   NVARCHAR(300)
)

INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );

第一解决方案

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
SELECT DISTINCT *
FROM   #Employee;

WITH #DeleteEmployee AS (
                     SELECT ROW_NUMBER()
                            OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
                            RNUM
                     FROM   #Employee
                 )

SELECT *
FROM   #DeleteEmployee
WHERE  RNUM > 1

SELECT DISTINCT *
FROM   #Employee

Secound解决方案:使用identity字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT DISTINCT *
FROM   #Employee;

ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)

SELECT *
FROM   #Employee
WHERE  UNIQ_ID < (
    SELECT MAX(UNIQ_ID)
    FROM   #Employee a2
    WHERE  #Employee.ID = a2.ID
           AND #Employee.FIRST_NAME = a2.FIRST_NAME
           AND #Employee.LAST_NAME = a2.LAST_NAME
)

ALTER TABLE #Employee DROP COLUMN UNIQ_ID

SELECT DISTINCT *
FROM   #Employee

并且所有解决方案的结尾都使用此命令

1
DROP TABLE #Employee

我想我知道你需要什么
我需要在答案之间混合,我想我得到了他想要的解决方案:

1
2
3
4
5
6
7
8
SELECT o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
FROM organizations o
INNER JOIN (
    SELECT MAX(id) AS id, orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc ON o.orgName = oc.orgName

拥有最大ID会给你一个dublicate的id和原始的id,这是他要求的:

1
2
id org name , dublicate COUNT (missing OUT IN this CASE)
id doublicate org name , doub COUNT (missing OUT again because does NOT help IN this CASE)

只有悲伤的事情,你把它以这种形式推出

1
id , name , dubid , name

希望它仍然有帮助


1
2
3
4
 /*To get duplicate data in table */

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE STATUS=1
  GROUP BY EmpCode HAVING COUNT(EmpCode) > 1

假设我们有表格'Student'表有2列:

  • student_id int
  • student_name varchar

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    Records:
    +------------+---------------------+
    | student_id | student_name        |
    +------------+---------------------+
    |        101 | usman               |
    |        101 | usman               |
    |        101 | usman               |
    |        102 | usmanyaqoob         |
    |        103 | muhammadusmanyaqoob |
    |        103 | muhammadusmanyaqoob |
    +------------+---------------------+

现在我们想看到重复的记录
使用此查询:

1
SELECT student_name,student_id ,COUNT(*) c FROM student GROUP BY student_id,student_name HAVING c>1;

1
2
3
4
5
6
+---------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

我有一个更好的选择来获取表中的重复记录

1
2
3
4
5
6
7
8
9
10
11
SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname

上述查询的结果显示具有唯一学生ID和重复出现次数的所有重复名称

单击此处查看sql的结果


尝试

1
2
3
4
SELECT orgName, id, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName, id
HAVING COUNT(*) > 1;