关于Google Cloud Platform:在BigQuery中删除数组之间的重复值

Remove duplicate values between arrays in BigQuery

假设我有以下数组:

1
2
3
SELECT ['A', 'B', 'C', 'A', 'A', 'A'] AS origin_array
UNION ALL
SELECT ['A', 'A', 'B'] AS secondary_array

而且我想删除数组之间的所有重复值(而不是数组内部),以便最终结果将是:

1
SELECT ['C', 'A', 'A'] AS result_array

任何想法如何完成?


以下是BigQuery标准SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#standardSQL
CREATE TEMP FUNCTION DEDUP_ARRAYS(arr1 ANY TYPE, arr2 ANY TYPE) AS ((ARRAY(
  SELECT item FROM (
    SELECT item, ROW_NUMBER() OVER(PARTITION BY item) pos FROM UNNEST(arr1) item UNION ALL
    SELECT item, ROW_NUMBER() OVER(PARTITION BY item) pos FROM UNNEST(arr2) item
  )
  GROUP BY item, pos
  HAVING COUNT(1) = 1
)));
WITH `project.dataset.table` AS (
  SELECT ['A', 'B', 'C', 'A', 'A', 'A'] AS origin_array, ['A', 'A', 'B'] AS secondary_array
)
SELECT DEDUP_ARRAYS(origin_array, secondary_array) AS result_array
FROM `project.dataset.table`

结果为

1
2
3
4
Row result_array    
1   A    
    A    
    C

SELECT ['C', 'A', 'A'] AS result_array返回的内容


如果仅键入UNION而不是UNION ALL,则不应使用重复的值。