嵌套列表推导和嵌套生成器表达式在python中的顺序

The order of nested list comprehension and nested generator expression in python

我对python不熟悉,对python官方文档中的一段代码感到困惑。

1
unique_words = set(word  for line in page  for word in line.split())

在我看来,它相当于:

1
2
3
4
unique_words=set()
for word in line.split():
    for line in page:
        unique_words.add(word)

在嵌套循环中定义行之前,如何在第一个循环中使用行?然而,它实际上是有效的。我认为这表明嵌套列表理解和生成器表达式的顺序是从左到右的,这与我以前的理解相矛盾。

有人能为我澄清一下正确的顺序吗?


word for line in page for word in line.split()

此部分的工作方式如下:

1
2
3
for line in page:
    for word in line.split():
        print word

()这使它成为‘发电机功能’因此,总体报表工作如下:

1
2
3
4
def solve():
    for line in page:
        for word in line.split():
            yield word

set()用于避免重复或重复同一个单词,因为代码旨在获得"唯一的单词"。


在官方文档的教程中:

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:

1
2
>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

and it’s equivalent to:

1
2
3
4
5
6
7
8
>>> combs = []
>>> for x in [1,2,3]:
...     for y in [3,1,4]:
...         if x != y:
...             combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

Note how the order of the for and if statements is the same in both these snippets.

见上文最后一句。

还要注意,您所描述的构造不是(正式的)所谓的"嵌套列表理解"。嵌套的列表理解需要在另一个列表理解中的列表理解,例如(同样来自教程):

1
[[row[i] for row in matrix] for i in range(4)]

你要问的只是一个包含多个for子句的列表理解。


嵌套循环是混合的。代码的作用是:

1
2
3
4
unique_words={}
for line in page:
    for word in line.split():
        unique_words.add(word)

你把回路弄错了。使用此:

1
2
3
4
5
6
7
8
unique_words = set(word for line in page for word in line.split())
print unique_words

l = []
for line in page:
    for word in line.split():
        l.append(word)
print set(l)

输出:

1
2
3
C:\...>python test.py
set(['sdaf', 'sadfa', 'sfsf', 'fsdf', 'fa', 'sdf', 'asd', 'asdf'])
set(['sdaf', 'sadfa', 'sfsf', 'fsdf', 'fa', 'sdf', 'asd', 'asdf'])


除了强调顺序要点的正确答案之外,我还要补充一个事实,即我们使用set从行中删除重复项,以生成"唯一的单词"。检查这个和这个线

1
2
3
4
5
6
7
8
unique_words = set(word for line in page for word in line.split())
print unique_words

l = {}
for line in page:
    for word in line.split():
        l.add(word)
print l