Python生成器，将另一个可迭代组合成N组

Python generator that groups another iterable into groups of N

本问题已经有最佳答案，请猛点这里访问。

我在寻找一个函数，它接受一个不可数的i和一个大小的n并产生长度为n的元组，这些元组是i的序列值：

1 2	x = [1,2,3,4,5,6,7,8,9,0] [z for z in TheFunc(x,3)]

给予

1	[(1,2,3),(4,5,6),(7,8,9),(0)]

标准库中是否存在这样的函数？

如果它作为标准库的一部分存在，我似乎找不到它，而且我已经没有要搜索的词了。我可以自己写，但我宁愿不写。

如果要将迭代器分组为n的块，而不使用填充值填充最终组，请使用iter(lambda: list(IT.islice(iterable, n)), [])：

1
2
3
4
5
6
7
8
9
10
11
12

import itertools as IT

def grouper(n, iterable):
"""
>>> list(grouper(3, 'ABCDEFG'))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
"""
iterable = iter(iterable)
return iter(lambda: list(IT.islice(iterable, n)), [])

seq = [1,2,3,4,5,6,7]
print(list(grouper(3, seq)))

产量

1	[[1, 2, 3], [4, 5, 6], [7]]

在这个答案的后半部分有一个关于它是如何工作的解释。

如果要将迭代器分组为n的块，并用填充值填充最终组，请使用Grouper配方zip_longest(*[iterator]*n)：

例如，在python2中：

1 2	>>> list(IT.izip_longest([iter(seq)]3, fillvalue='x')) [(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

在python3，原来的izip_longest现在改名为zip_longest：

1 2	>>> list(IT.zip_longest([iter(seq)]3, fillvalue='x')) [(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

当您想将一个序列分组成n的块时，可以使用chunks配方：

1
2
3
4
5

def chunks(seq, n):
# https://stackoverflow.com/a/312464/190597 (Ned Batchelder)
""" Yield successive n-sized chunks from seq."""
for i in xrange(0, len(seq), n):
yield seq[i:i + n]

注意，与一般的迭代器不同，按定义的序列有一个长度(即定义了__len__)。

见itertools包文档中的grouper配方

1
2
3
4

def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)

(不过，这是相当多问题的副本。)

相关讨论

我使用了more-itertools包中的chunked函数。

1
2
3
4
5

$ pip install more_itertools
$ python
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> [tuple(z) for z in more_itertools.more.chunked(x, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (0,)]

这是Python中非常常见的请求。足够普通，它成为博尔顿统一的实用程序包。首先，这里有大量的文档。此外，该模块的设计和测试仅依赖于标准库(兼容python2和3)，这意味着您可以直接将文件下载到项目中。

1
2
3
4
5
6
7
8
9

# if you downloaded/embedded, try:
# from iterutils import chunked

# with `pip install boltons` use:

from boltons.iterutils import chunked

print(chunked(range(10), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

不定/长序列也有迭代器/生成器形式：

1 2	print(list(chunked_iter(range(10), 3, fill=None))) # [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]

如您所见，您也可以用您选择的值填充序列。最后，作为维护人员，我可以向您保证，虽然代码已经被成千上万的开发人员下载/测试过，但是如果您遇到任何问题，您将通过Boltons Github问题页面获得尽可能快的支持。希望这(和/或其他150多个博尔顿食谱)有帮助！

这是一个非常古老的问题，但我认为对于一般情况，下面的方法是有用的。它的主要优点是只需要对数据进行一次迭代，因此它将与数据库游标或其他只能使用一次的序列一起工作。我也发现它更可读。

1
2
3
4
5
6
7
8

def chunks(n, iterator):
out = []
for elem in iterator:
out.append(elem)
if len(out) == n:
yield out
out = []
yield out

这里有一个不同的解决方案，它不使用itertools，即使它有更多的行，当块比iterable length短得多时，它显然优于给定的答案。然而，对于大块头来说，其他的答案要快得多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

def batchiter(iterable, batch_size):
"""
>>> list(batchiter('ABCDEFG', 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
"""
next_batch = []
for element in iterable:
next_batch.append(element)
if len(next_batch) == batch_size:
batch, next_batch = next_batch, []
yield batch
if next_batch:
yield next_batch

In [19]: %timeit [b for b in batchiter(range(1000), 3)]
1000 loops, best of 3: 644 μs per loop

In [20]: %timeit [b for b in grouper(3, range(1000))]
1000 loops, best of 3: 897 μs per loop

In [21]: %timeit [b for b in partition(range(1000), 3)]
1000 loops, best of 3: 890 μs per loop

In [22]: %timeit [b for b in batchiter(range(1000), 333)]
1000 loops, best of 3: 540 μs per loop

In [23]: %timeit [b for b in grouper(333, range(1000))]
10000 loops, best of 3: 81.7 μs per loop

In [24]: %timeit [b for b in partition(range(1000), 333)]
10000 loops, best of 3: 80.1 μs per loop

我知道这已经回答了好几次了，但是我正在添加我的解决方案，这两个解决方案都应该改进：序列和迭代器的一般适用性，可读性(StopIteration异常没有不可见的循环退出条件)和与Grouper配方相比的性能。这与Svein的最后一个答案最相似。

1
2
3
4
5
6
7

def chunkify(iterable, n):
iterable = iter(iterable)
n_rest = n - 1

for item in iterable:
rest = itertools.islice(iterable, n_rest)
yield itertools.chain((item,), rest)

1
2
3

def grouper(iterable, n):
while True:
yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))