关于python:找到多组交集的最佳方法?

Best way to find the intersection of multiple sets?

我有一个集合列表:

1
setlist = [s1,s2,s3...]

我要s1 s2 s3…

我可以通过执行一系列成对的s1.intersection(s2)等来编写一个函数来完成它。

是否有推荐的、更好的或内置的方法?


在上的python版本2.6中,可以对set.intersection()使用多个参数,例如

1
u = set.intersection(s1, s2, s3)

如果集合在列表中,则转换为:

1
u = set.intersection(*setlist)

其中*a_list为列表扩展


从2.6起,set.intersection任意取多个iterables。

1
2
3
4
5
6
7
8
9
10
>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])


显然,set.intersection是您在这里想要的,但如果您需要将"取所有这些的和"、"取所有这些的乘积"、"取所有这些的xor"概括为"取所有这些的和"、"取所有这些的xor",那么您需要的是reduce函数:

1
2
3
from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

1
print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

如果没有python 2.6或更高版本,则可以编写一个显式for循环:

1
2
3
4
5
6
7
8
9
10
11
def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

您也可以使用reduce

1
2
3
set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

然而,许多Python程序员不喜欢它,包括guido本人:

About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite of the PR value, I think these features should be cut from Python 3000.

So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.


这里我提供了一个多集合交集的通用函数,试图利用可用的最佳方法:

1
2
3
4
5
6
7
8
9
10
11
12
def multiple_set_intersection(*sets):
   """Return multiple set intersection."""
    try:
        return set.intersection(*sets)
    except TypeError: # this is Python < 2.6 or no arguments
        pass

    try: a_set= sets[0]
    except IndexError: # no arguments
        return set() # return empty set

    return reduce(a_set.intersection, sets[1:])

吉多可能不喜欢reduce,但我有点喜欢它。