How to aggregate multiple rows into one in BigQuery?
假设您有一个非规范化的架构,其中包含多行,如下所示:
1 2 3 4 5 6 | uuid | property | VALUE ------------------------------------------ abc | first_name | John abc | last_name | Connor abc | age | 26 ... |
所有行的相同属性集,不一定要排序。如何创建表格(例如使用BigQuery)(即无客户端):
表user_properties:
1 2 3 | uuid | first_name | last_name | age -------------------------------------------------------- abc | John | Connor | 26 |
在传统的SQL中,为此目的使用了" STUFF"关键字。
如果我至少可以通过uuid获得ORDERED的结果会更容易,这样客户端就不需要加载整个表(4GB)进行排序了-可以通过依次扫描各行来合并每个实体,相同的uuid。但是,这样的查询:
1 | SELECT * FROM user_properties ORDER BY uuid; |
超出了BigQuery中的可用资源(使用allowLargeResults禁止ORDER BY)。除非我订阅高端计算机,否则似乎几乎无法在BigQuery中对大表(4GB)进行排序。有什么想法吗?
1 2 3 4 5 6 7 | SELECT uuid, MAX(IF(property = 'first_name', VALUE, NULL)) AS first_name, MAX(IF(property = 'last_name', VALUE, NULL)) AS last_name, MAX(IF(property = 'age', VALUE, NULL)) AS age FROM user_properties GROUP BY uuid |
Another option - no GROUP'ing involved
1 2 3 4 5 6 7 8 9 10 11 | SELECT uuid, first_name, last_name, age FROM ( SELECT uuid, LEAD(VALUE, 1) OVER(PARTITION BY uuid ORDER BY property) AS first_name, LEAD(VALUE, 2) OVER(PARTITION BY uuid ORDER BY property) AS last_name, VALUE AS age, property = 'age' AS anchor FROM user_properties ) HAVING anchor |