关于python：在返回向量的函数上使用Numpy Vectorize

Using Numpy Vectorize on Functions that Return Vectors

numpy.vectorize使用函数f：a-> b并将其转换为g：a []-> b []。

当a和b是标量时，这可以很好地工作，但是我想不出为什么它不能与b作为ndarray或列表一起工作的原因，即f：a-> b []和g ：a []-> b [] []

例如：

1
2
3
4
5
6

import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
print(g(a))

这样产生：

1
2
3
4

array([[ 0. 0. 0. 0. 0.],
[ 1. 1. 1. 1. 1.],
[ 2. 2. 2. 2. 2.],
[ 3. 3. 3. 3. 3.]], dtype=object)

好的，这样就给出了正确的值，但dtype错误。更糟糕的是：

1	g(a).shape

产量：

(4,)

所以这个数组几乎没有用。我知道我可以将其转换为：

1	np.array(map(list, a), dtype=np.float32)

给我我想要的东西：

1
2
3
4

array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)

但这既没有效率，也没有pythonic。你们中有人可以找到一种更清洁的方法吗？

提前致谢！

np.vectorize只是一个便捷功能。它实际上并没有使代码运行得更快。如果使用np.vectorize不方便，只需编写自己想要的函数即可。

np.vectorize的目的是将不支持numpy的函数转换为可以对numpy数组进行操作(和返回)的函数(例如，将浮点数作为输入，将浮点数作为输出)。

您的函数f已经可以识别numpy了-它在其定义中使用numpy数组并返回一个numpy数组。因此np.vectorize不太适合您的用例。

因此，解决方案只是滚动您自己的函数f，该函数以您期望的方式工作。

相关讨论

1.12.0中的新参数signature完全可以满足您的要求。

1
2
3
4

def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)

g = np.vectorize(f, signature='()->(n)')

然后g(np.arange(4)).shape将给出(4L, 5L)。

在此指定f的签名。 (n)是返回值的形状，()是标量的参数的形状。并且参数也可以是数组。有关更复杂的签名，请参见通用功能API。

1
2
3
4
5
6
7
8
9
10
11
12

import numpy as np
def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
g = np.vectorize(f, otypes=[np.ndarray])
a = np.arange(4)
b = g(a)
b = np.array(b.tolist())
print(b)#b.shape = (4,5)
c = np.ones((2,3,4))
d = g(c)
d = np.array(d.tolist())
print(d)#d.shape = (2,3,4,5)

这应该可以解决该问题，并且无论输入大小如何，它都可以工作。"地图"仅适用于一个三维输入。使用" .tolist()"并创建一个新的ndarray可以更完全，更好地解决该问题(我相信)。希望这可以帮助。

我已经编写了一个函数，它似乎适合您的需求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

def amap(func, *args):
'''array version of build-in map
amap(function, sequence[, sequence, ...]) -> array
Examples
--------
>>> amap(lambda x: x**2, 1)
array(1)
>>> amap(lambda x: x**2, [1, 2])
array([1, 4])
>>> amap(lambda x,y: y**2 + x**2, 1, [1, 2])
array([2, 5])
>>> amap(lambda x: (x, x), 1)
array([1, 1])
>>> amap(lambda x,y: [x**2, y**2], [1,2], [3,4])
array([[1, 9], [4, 16]])
'''
args = np.broadcast(None, *args)
res = np.array([func(*arg[1:]) for arg in args])
shape = args.shape + res.shape[1:]
return res.reshape(shape)

让我们尝试

1
2
3

def f(x):
return x * np.array([1,1,1,1,1], dtype=np.float32)
amap(f, np.arange(4))

产出

1
2
3
4

array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]], dtype=float32)

为了方便起见，您也可以用lambda或部分包装

1 2	g = lambda x:amap(f, x) g(np.arange(4))

注意vectorize的文档字符串说

The vectorize function is provided primarily for convenience, not for
performance. The implementation is essentially a for loop.

因此，我们希望此处的amap与vectorize具有相似的性能。我没有检查它，欢迎任何性能测试。

如果性能确实很重要，则应考虑其他因素，例如使用reshape和broadcast直接进行数组计算以避免纯python中的循环(后一种情况是vectorize和amap)。

解决此问题的最佳方法是使用2-D NumPy数组(在本例中为列数组)作为原始函数的输入，然后该函数将生成2-D输出，并带有我认为您期望的结果。

这是代码中的样子：

1
2
3
4
5
6
7
8

import numpy as np
def f(x):
return x*np.array([1, 1, 1, 1, 1], dtype=np.float32)

a = np.arange(4).reshape((4, 1))
b = f(a)
# b is a 2-D array with shape (4, 5)
print(b)

这是一种更简单，更不易出错的方式来完成操作。此方法不是尝试使用numpy.vectorize转换函数，而是依靠NumPy的自然能力来广播数组。诀窍是要确保至少一维数组之间的长度相等。