Alternatives/Faster ways to list.extend in python?
我有大量要扩展的数据集。
我想知道什么是替代/更快的方法。
我已经尝试了iadd和extend,它们都花了相当长时间才能创建输出。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | from timeit import timeit raw_data = []; raw_data2 = []; added_data = range(100000) # .__iadd__ def test1(): for i in range(10): raw_data.__iadd__(added_data*i); #extend def test2(): for i in range(10): raw_data2.extend(added_data*i); print(timeit(test1,number=2)); print(timeit(test2,number=2)); |
我觉得列表理解或数组映射可能是我的问题的答案...
我不确定是否有更好的方法来执行此操作,但是使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | from timeit import timeit import ctypes import numpy def test_iadd(): raw_data = [] added_data = range(1000000) for i in range(10): raw_data.__iadd__(added_data) def test_extend(): raw_data = [] added_data = range(1000000) for i in range(10): raw_data.extend(added_data) return def test_memmove(): added_data = numpy.arange(1000000) # numpy equivalent of range raw_data = (ctypes.c_long * (len(added_data) * 10))() # make a ctypes array to contain elements # the address to copy to raw_data_addr = ctypes.addressof(raw_data) # the length of added_data in bytes added_data_len = len(added_data) * ctypes.sizeof(ctypes.c_long) for i in range(10): # copy data for one section ctypes.memmove(raw_data_addr, added_data.ctypes.data, added_data_len) # update address to copy to raw_data_addr += added_data_len tests = [test_iadd, test_extend, test_memmove] for test in tests: print '{} {}'.format(test.__name__, timeit(test, number=5)) |
此代码在我的PC上产生了以下结果:
1 2 3 | test_iadd 0.648954868317 test_extend 0.640357971191 test_memmove 0.201567173004 |
这似乎表明使用
如果您需要将数据作为列表使用,则增益不高-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import timeit from itertools import repeat , chain raw_data = [] added_data = range(100000) # verify data : uncomment: range(5) def iadd(): raw_data = [] for i in range(10): raw_data.__iadd__(added_data) # print(raw_data) def extend(): raw_data = [] for i in range(10): raw_data.extend(added_data) # print(raw_data) def tricked(): raw_data = list(chain.from_iterable(repeat(added_data,10))) # print(raw_data) for w,c in (("__iadd__",iadd),(" extend",extend),(" tricked",tricked)): print(w,end =" :") print("{:08.8f}".format(timeit.timeit(c, number = 200))) |
输出:
1 2 3 4 5 6 7 8 9 10 | # number = 20 __iadd__ : 0.69766775 extend : 0.69303196 #"fastest" tricked : 0.74638002 # number = 200 __iadd__ : 6.94286992 #"fastest" extend : 6.96098415 tricked : 7.46355973 |
如果不需要这些东西,最好使用
有关:
- 马丁·彼得斯(Martijn Pieters)? 回答
1 2 3 4 5 6 | import time added_data = range(1000000) tic = time.time() raw_data=[i for x in range(10) for i in added_data ] print raw_data print (str((time.time()- tic)*1000) + ' ms') |