Mapping a NumPy array in place
是否可以将NumPy数组映射到位? 如果是,怎么办?
给定
1 2 3 | for row in range(len(a_values)): for col in range(len(a_values[0])): a_values[row][col] = dim(a_values[row][col]) |
但是它是如此丑陋,以至于我怀疑NumPy内的某个地方一定有一个函数可以对以下内容执行相同的操作:
1 | a_values.map_in_place(dim) |
但是,如果存在上述类似内容,我将无法找到它。
仅在空间受限的情况下才值得尝试就地执行此操作。如果是这样,可以通过遍历数组的展平视图来稍微加快代码的速度。由于
我不知道一种更好的方法来实现任意Python函数的真正就地应用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | >>> def flat_for(a, f): ... a = a.reshape(-1) ... for i, v in enumerate(a): ... a[i] = f(v) ... >>> a = numpy.arange(25).reshape(5, 5) >>> flat_for(a, lambda x: x + 5) >>> a array([[ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24], [25, 26, 27, 28, 29]]) |
一些时间:
1 2 3 4 | >>> a = numpy.arange(2500).reshape(50, 50) >>> f = lambda x: x + 5 >>> %timeit flat_for(a, f) 1000 loops, best of 3: 1.86 ms per loop |
它大约是嵌套循环版本的两倍:
1 2 3 4 5 6 7 8 | >>> a = numpy.arange(2500).reshape(50, 50) >>> def nested_for(a, f): ... for i in range(len(a)): ... for j in range(len(a[0])): ... a[i][j] = f(a[i][j]) ... >>> %timeit nested_for(a, f) 100 loops, best of 3: 3.79 ms per loop |
当然矢量化仍然更快,因此如果可以进行复制,请使用以下方法:
1 2 3 4 | >>> a = numpy.arange(2500).reshape(50, 50) >>> g = numpy.vectorize(lambda x: x + 5) >>> %timeit g(a) 1000 loops, best of 3: 584 us per loop |
如果可以使用内置ufuncs重写
1 2 3 | >>> a = numpy.arange(2500).reshape(50, 50) >>> %timeit a + 5 100000 loops, best of 3: 4.66 us per loop |
正如您所期望的那样,
顺便说一句,我对这个问题的原始答案是荒谬的,并且涉及将索引向量化为
This is a write-up of contributions scattered in answers and
comments, that I wrote after accepting the answer to the question.
Upvotes are always welcome, but if you upvote this answer, please
don't miss to upvote also those of senderle and (if (s)he writes
one) eryksun, who suggested the methods below.
问:是否可以在适当的位置映射一个numpy数组?
答:可以,但不能使用单个数组方法。您必须编写自己的代码。
下面的脚本比较了线程中讨论的各种实现:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | import timeit from numpy import array, arange, vectorize, rint # SETUP get_array = lambda side : arange(side**2).reshape(side, side) * 30 dim = lambda x : int(round(x * 0.67328)) # TIMER def best(fname, reps, side): global a a = get_array(side) t = timeit.Timer('%s(a)' % fname, setup='from __main__ import %s, a' % fname) return min(t.repeat(reps, 3)) #low num as in place --> converge to 1 # FUNCTIONS def mac(array_): for row in range(len(array_)): for col in range(len(array_[0])): array_[row][col] = dim(array_[row][col]) def mac_two(array_): li = range(len(array_[0])) for row in range(len(array_)): for col in li: array_[row][col] = int(round(array_[row][col] * 0.67328)) def mac_three(array_): for i, row in enumerate(array_): array_[i][:] = [int(round(v * 0.67328)) for v in row] def senderle(array_): array_ = array_.reshape(-1) for i, v in enumerate(array_): array_[i] = dim(v) def eryksun(array_): array_[:] = vectorize(dim)(array_) def ufunc_ed(array_): multiplied = array_ * 0.67328 array_[:] = rint(multiplied) # MAIN r = [] for fname in ('mac', 'mac_two', 'mac_three', 'senderle', 'eryksun', 'ufunc_ed'): print('\ Testing `%s`...' % fname) r.append(best(fname, reps=50, side=50)) # The following is for visually checking the functions returns same results tmp = get_array(3) eval('%s(tmp)' % fname) print tmp tmp = min(r)/100 print('\ ===== ...AND THE WINNER IS... =========================') print(' mac (as in question) : %.4fms [%.0f%%]') % (r[0]*1000,r[0]/tmp) print(' mac (optimised) : %.4fms [%.0f%%]') % (r[1]*1000,r[1]/tmp) print(' mac (slice-assignment) : %.4fms [%.0f%%]') % (r[2]*1000,r[2]/tmp) print(' senderle : %.4fms [%.0f%%]') % (r[3]*1000,r[3]/tmp) print(' eryksun : %.4fms [%.0f%%]') % (r[4]*1000,r[4]/tmp) print(' slice-assignment w/ ufunc : %.4fms [%.0f%%]') % (r[5]*1000,r[5]/tmp) print('=======================================================\ ') |
至少在我的系统中,以上脚本的输出为:
1 2 3 4 5 6 | mac (as in question) : 88.7411ms [74591%] mac (optimised) : 86.4639ms [72677%] mac (slice-assignment) : 79.8671ms [67132%] senderle : 85.4590ms [71832%] eryksun : 13.8662ms [11655%] slice-assignment w/ ufunc : 0.1190ms [100%] |
如您所见,与分别排在第二和第三和第二的替代方法相比,使用numpy的
如果不使用
1 2 3 4 5 | mac (as in question) : 91.5761ms [672%] mac (optimised) : 88.9449ms [653%] mac (slice-assignment) : 80.1032ms [588%] senderle : 86.3919ms [634%] eryksun : 13.6259ms [100%] |
HTH!
为什么不使用numpy实现以及out_把戏?
1 2 3 4 | from numpy import array, arange, vectorize, rint, multiply, round as np_round def fmilo(array_): np_round(multiply(array_ ,0.67328, array_), out=array_) |
得到了:
1 2 3 4 5 6 7 8 9 | ===== ...AND THE WINNER IS... ========================= mac (as in question) : 80.8470ms [130422%] mac (optimised) : 80.2400ms [129443%] mac (slice-assignment) : 75.5181ms [121825%] senderle : 78.9380ms [127342%] eryksun : 11.0800ms [17874%] slice-assignment w/ ufunc : 0.0899ms [145%] fmilo : 0.0620ms [100%] ======================================================= |
这只是mac编写的更新版本,已针对Python 3.x实现,并添加了numba和numpy.frompyfunc。
numpy.frompyfunc接受一个任意的python函数并返回一个函数,将其强制转换为numpy.array时,将其逐个应用。
但是,它将数组的数据类型更改为对象,因此它不存在,并且将来对该数组的计算会更慢。
为避免此缺点,将在测试中调用numpy.ndarray.astype,将数据类型返回为int。
作为旁注:
Numba不包含在Python的基本库中,如果要对其进行测试,则必须从外部下载。在此测试中,它实际上不执行任何操作,如果使用@jit(nopython = True)进行调用,则会给出一条错误消息,表明它无法在那里进行任何优化。但是,由于numba通常可以加快以功能样式编写的代码的速度,因此将其包括在内是出于完整性考虑。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | import timeit from numpy import array, arange, vectorize, rint, frompyfunc from numba import autojit # SETUP get_array = lambda side : arange(side**2).reshape(side, side) * 30 dim = lambda x : int(round(x * 0.67328)) # TIMER def best(fname, reps, side): global a a = get_array(side) t = timeit.Timer('%s(a)' % fname, setup='from __main__ import %s, a' % fname) return min(t.repeat(reps, 3)) #low num as in place --> converge to 1 # FUNCTIONS def mac(array_): for row in range(len(array_)): for col in range(len(array_[0])): array_[row][col] = dim(array_[row][col]) def mac_two(array_): li = range(len(array_[0])) for row in range(len(array_)): for col in li: array_[row][col] = int(round(array_[row][col] * 0.67328)) def mac_three(array_): for i, row in enumerate(array_): array_[i][:] = [int(round(v * 0.67328)) for v in row] def senderle(array_): array_ = array_.reshape(-1) for i, v in enumerate(array_): array_[i] = dim(v) def eryksun(array_): array_[:] = vectorize(dim)(array_) @autojit def numba(array_): for row in range(len(array_)): for col in range(len(array_[0])): array_[row][col] = dim(array_[row][col]) def ufunc_ed(array_): multiplied = array_ * 0.67328 array_[:] = rint(multiplied) def ufunc_frompyfunc(array_): udim = frompyfunc(dim,1,1) array_ = udim(array_) array_.astype("int") # MAIN r = [] totest = ('mac', 'mac_two', 'mac_three', 'senderle', 'eryksun', 'numba','ufunc_ed','ufunc_frompyfunc') for fname in totest: print('\ Testing `%s`...' % fname) r.append(best(fname, reps=50, side=50)) # The following is for visually checking the functions returns same results tmp = get_array(3) eval('%s(tmp)' % fname) print (tmp) tmp = min(r)/100 results = list(zip(totest,r)) results.sort(key=lambda x: x[1]) print('\ ===== ...AND THE WINNER IS... =========================') for name,time in results: Out = '{:<34}: {:8.4f}ms [{:5.0f}%]'.format(name,time*1000,time/tmp) print(Out) print('=======================================================\ ') |
最后,结果是:
1 2 3 4 5 6 7 8 9 10 | ===== ...AND THE WINNER IS... ========================= ufunc_ed : 0.3205ms [ 100%] ufunc_frompyfunc : 3.8280ms [ 1194%] eryksun : 3.8989ms [ 1217%] mac_three : 21.4538ms [ 6694%] senderle : 22.6421ms [ 7065%] mac_two : 24.6230ms [ 7683%] mac : 26.1463ms [ 8158%] numba : 27.5041ms [ 8582%] ======================================================= |
如果无法使用ufunc,则应考虑使用cython。
它很容易集成,并大大提高了numpy数组的特定用途。