关于python：如何将UDF函数的返回值保存到两列中？

How to save the returned values of UDF function into two columns?

我的函数get_data返回一个元组：两个整数值。

1	get_data_udf = udf(lambda id: get_data(spark, id), (IntegerType(), IntegerType()))

我需要将它们分为两列val1和val2。我该怎么办？

1 2	dfnew = df \\ .withColumn("val", get_data_udf(col("id")))

我是否应该将元组保存在列中，例如val，然后以某种方式将其分为两列。还是有更短的方法？

相关讨论

您可以在udf中创建structFields以便访问以后的时间。

1
2
3
4
5
6
7

from pyspark.sql.types import *

get_data_udf = udf(lambda id: get_data(spark, id),
StructType([StructField('first', IntegerType()), StructField('second', IntegerType())]))
dfnew = df \\
.withColumn("val", get_data_udf(col("id"))) \\
.select('*', 'val.`first`'.alias('first'), 'val.`second`'.alias('second'))

相关讨论

例如，您有一个像下面这样的一列示例数据框

1 2	val df = sc.parallelize(Seq(3)).toDF() df.show()

enter

1	def tupleFunction(): (Int,Int) = (1,2)

///我们将根据上述UDF

创建两个新列

1
2
3

df.withColumn("newCol",typedLit(tupleFunction.toString.replace("(","").replace(")","")
.split(","))).select((0 to 1)
.map(i => col("newCol").getItem(i).alias(s"newColFromTuple$i")):_*).show

enter

相关讨论

元组的索引可以像列表一样被索引，因此您可以将第一列的值添加为get_data()[0]，将第二列的第二个值添加为get_data()[1]

您也可以执行v1, v2 = get_data()，这样将返回的元组值分配给变量v1和v2。

在此处查看此问题以进一步澄清。

相关讨论