Pandas DataFrame.assign参数

Pandas DataFrame.assign arguments

问题

如何使用assign返回添加了多个新列的原始数据帧的副本?

期望结果

1
2
3
4
5
6
7
df = pd.DataFrame({'A': range(1, 5), 'B': range(11, 15)})
>>> df.assign({'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28

尝试

上面的示例导致:

ValueError: Wrong number of items passed 2, placement implies 1

背景

pandas中的assign函数将相关数据帧的副本复制到新分配的列中,例如

1
2
3
4
5
6
7
df = df.assign(C=df.B * 2)
>>> df
   A   B   C
0  1  11  22
1  2  12  24
2  3  13  26
3  4  14  28

此函数的0.19.2文档意味着可以向数据帧中添加多个列。

Assigning multiple columns within the same assign is possible, but you cannot reference other columns created within the same assign call.

此外:

Parameters:
kwargs : keyword, value pairs

keywords are the column names.

函数的源代码声明它接受字典:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def assign(self, **kwargs):
   """
    .. versionadded:: 0.16.0
    Parameters
    ----------
    kwargs : keyword, value pairs
        keywords are the column names. If the values are callable, they are computed
        on the DataFrame and assigned to the new columns. If the values are not callable,
        (e.g. a Series, scalar, or array), they are simply assigned.

    Notes
    -----
    Since ``kwargs`` is a dictionary, the order of your
    arguments may not be preserved. The make things predicatable,
    the columns are inserted in alphabetical order, at the end of
    your DataFrame. Assigning multiple columns within the same
    ``assign`` is possible, but you cannot reference other columns
    created within the same ``assign`` call.
   """

    data = self.copy()

    # do all calculations first...
    results = {}
    for k, v in kwargs.items():

        if callable(v):
            results[k] = v(data)
        else:
            results[k] = v

    # ... and then assign
    for k, v in sorted(results.items()):
        data[k] = v

    return data


通过将每个新列作为关键字参数提供,可以创建多个列:

1
df = df.assign(C=df['A']**2, D=df.B*2)

我使用**将字典解包为关键字参数,让您的示例字典工作:

1
df = df.assign(**{'C': df.A.apply(lambda x: x ** 2), 'D': df.B * 2})

似乎EDOCX1[1]应该能够使用字典,但目前看来它不支持基于您发布的源代码。

结果输出:

1
2
3
4
5
   A   B   C   D
0  1  11   1  22
1  2  12   4  24
2  3  13   9  26
3  4  14  16  28