Using KeyValueGroupedDataset cogroup in spark
我想在spark中对KeyValueGroupedDataset使用cogroup方法。这是scala尝试,但出现错误:
1 2 3 4 5 6 | import org.apache.spark.sql.functions._ val x1 = Seq(("a", 36), ("b", 33), ("c", 40), ("a", 38), ("c", 39)).toDS val g1 = x1.groupByKey(_._1) val x2 = Seq(("a","ali"), ("b","bob"), ("c","celine"), ("a","amin"), ("c","cecile")).toDS val g2 = x2.groupByKey(_._1) val cog = g1.cogroup(g2, (k: Long, iter1:Iterator[(String, Int)], iter2:Iterator[(String, String)]) => iter1); |
但是出现错误:
1 2 3 4 5 | <console>:34: error: overloaded method value cogroup with alternatives: [U, R](other: org.apache.spark.sql.KeyValueGroupedDataset[String,U], f: org.apache.spark.api.java.function.CoGroupFunction[String,(String, Int),U,R], encoder: org.apache.spark.sql.Encoder[R])org.apache.spark.sql.Dataset[R] [U, R](other: org.apache.spark.sql.KeyValueGroupedDataset[String,U])(f: (String, Iterator[(String, Int)], Iterator[U]) => TraversableOnce[R])(implicit evidence$11: org.apache.spark.sql.Encoder[R])org.apache.spark.sql.Dataset[R] cannot be applied to (org.apache.spark.sql.KeyValueGroupedDataset[String,(String, String)], (Long, Iterator[(String, Int)], Iterator[(String, String)]) => Iterator[(String, Int)]) val cog = g1.cogroup(g2, (k: Long, iter1:Iterator[(String, Int)], iter2:Iterator[(String, String)]) => iter1); |
我在JAVA中遇到相同的错误。
您尝试使用的
1 2 3 | g1.cogroup(g2)( (k: String, it1: Iterator[(String, Int)], it2: Iterator[(String, String)]) => it1) |
或仅:
1 | g1.cogroup(g2)((_, it1, _) => it1) |
在Java中,我将使用
1 2 3 4 5 6 7 |
其中