Generic T as Spark Dataset[T] constructor
在以下代码段中,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | import scala.util.{Try, Success, Failure} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.Dataset sealed trait CustomRow case class MyRow( id: Int, name: String ) extends CustomRow val ds: Dataset[MyRow] = Seq((1,"foo"), (2,"bar"), (3,"baz")).toDF("id","name").as[MyRow] def tryParquet[T <: CustomRow](session: SparkSession, path: String, target: Dataset[T]): Dataset[T] = Try(session.read.parquet(path)) match { case Success(df) => df.as[T] // <---- compile error here case Failure(_) => { target.write.parquet(path) target } } val readyDS: Dataset[MyRow] = tryParquet(spark,"/path/to/file.parq", ds) |
但是,这会在
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._
Support for serializing other types will be added in future releases.
case Success(df) => df.as[T]
可以通过使
看起来可以通过在type参数中使用
这样,编译器可以证明