关于python：避免循环后重复代码？

Avoiding repeat of code after loop?

在使用循环时，我经常会写两次代码。例如，在学习Udacity计算机科学课程时，我编写了代码(对于查找最顺序重复元素的函数)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

def longest_repetition(l):
if not l:
return None
most_reps = count = 0
longest = prv = None
for i in l:
if i == prv:
count += 1
else:
if count > most_reps:
longest = prv
most_reps = count
count = 1
prv = i
if count > most_reps:
longest = prv
return longest

在本例中，我将检查两次计数是否大于以前重复次数最多的元素。当当前元素与上一个元素不同时，以及当我到达列表末尾时，都会发生这种情况。

在逐个字符分析字符串时，我也会遇到这种情况。也有一些时候，它已经达到了大约5行代码。这是常见的，还是我思考/编码方式的结果。我该怎么办？

编辑：同样，在人为的字符串拆分示例中：

1
2
3
4
5
6
7
8
9
10
11
12
13

def split_by(string, delimeter):
rtn = []
tmp = ''
for i in string:
if i == delimeter:
if tmp != '':
rtn.append(tmp)
tmp = ''
else:
tmp += i
if tmp != '':
rtn.append(tmp)
return rtn

编辑：这项考试是为本课程的学生编写的，他们不希望有任何关于python的外部知识；仅限于以前单元中所教的内容。尽管我在Python方面有过丰富的经验，但我仍在努力遵守这些限制，以获得大部分课程。诸如str.split、list和许多关于python的基础知识都被传授了，但是在导入方面还没有任何东西——尤其是像groupby这样的东西。也就是说，如果没有编程入门课程中可能无法教授的任何语言特性，应该如何编写它。

相关讨论

因为您标记了language-agnostic，所以我认为您不会对可以用来提高代码效率、紧凑性和可读性的特定于Python的东西感兴趣。出于同样的原因，我不想展示用Python编写代码有多漂亮。

在某些情况下，根据您的算法，可以避免EDCOX1额外的1个结尾，但大多数情况下，"如果它存在，它应该是重要的和/或高效的"。我不知道Python解释器是如何工作的，但是在编译语言(如C/C++)中，编译器执行各种循环优化，包括如果执行相同的操作，则将if块移出循环。

我运行并比较了各种代码片段的运行时间：

@约翰塞巴斯蒂安-8.9939801693
@斯格尔格-3.13302302361
您的-2.8182990551。

尾随的if给你最好的时间，这不是一种概括。我的观点是：只要遵循你的算法，并尝试优化它。最后，一个if没有什么问题。可能替代解决方案是昂贵的。

关于您输入的第二个示例：检查tmp == ''以确保只返回非空字符串。这实际上是对分割算法的一种附加条件。在任何情况下，循环之后都需要一个额外的rtn.append，因为最后一个分隔符之外还有一些内容。您总是可以在循环中推送一个if条件，比如if curCharIndex == lastIndex: push items to list，它将在每次迭代中执行，并且它将再次出现相同的情况。

我的回答简而言之：

您的代码和您心目中的算法一样高效。
在许多情况下，最终会遇到if—不必担心它们，它们可能会使代码比没有这种if(示例就在这里)的替代方法更有效。
此外，编译器还可以在代码周围发现和修改/移动块。
如果有一个语言特性/库可以使代码快速、同时可读，那么就使用它。(这里的其他答案指出了Python提供的功能：)

相关讨论

看看itertools.groupby的实现，它几乎完全满足您的需要。http://docs.python.org/library/itertools.html itertools.groupby

下面是使用上述代码的算法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

from itertools import groupby

string ="AAABBCCDDDD"

maximum = 0
max_char =""

for i in groupby(string):
x, xs = i
n = len(list(xs))
if n > maximum:
max_char = x
maximum = n

print max_char

我的建议是，在将来编写这样的算法时，尽量不要在一个函数中做任何事情。考虑解决您试图解决的问题的较小函数，例如"将序列中的每个相等项序列分组为较小的序列"。

当然，在上面的算法中，它不一定是字符——它可以是任何可分组的东西。

编辑：作为对OP编辑的回应，我认为在类设置中不允许您使用/了解ITertools之类的库，但我不是建议您应该依赖外部库，而是建议您通过将它们拆分为较小的子问题来考虑问题。因此，在本例中，您将实现自己的groupby并使用它。

相关讨论

避免循环后重复条件的语言不可知技术是将sentinel值附加到输入数据中，例如，如果string的末尾附加了delimiter，则在split_by()中不需要该条件。典型例子：在线性搜索算法中，可以将指针附加到干草堆中，以避免序列检查的结束。

另一种选择是将一些工作委托给一个单独的函数，例如，一个函数计算重复次数，另一个函数查找最大值，如longest_repetition()所示：

1
2
3
4

from itertools import groupby

def longest_repetition(iterable):
return max(groupby(iterable), key=lambda x: sum(1 for _ in x[1]))[0]

如果重复的代码是微不足道的，那么这可能不值得付出努力。

我认为有三种通用方法可以帮助您避免在循环结束时重复代码。对于这三个问题，我将使用一个与您自己的问题稍有不同的例子，计算字符串中的单词。这里有一个"默认"版本，和您的代码一样，在循环结束时重复一些逻辑：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

from collections import Counter

def countWords0(text):
counts = Counter()
word =""

for c in text.lower():
if c not in"abcdefghijklmnopqrstuvwxyz'-":
if word:
counts[word] += 1
word =""
else:
word += c

if word:
counts[word] += 1 # repeated code at end of loop

return counts

第一种方法是在每个字符之后进行(部分)"结束子序列"处理，这样，如果序列在该字符之后立即结束，则簿记是正确的。在您的示例中，您可以消除您的"else"条件，并且每次都在其中运行代码。(这是谢尔格的回答。)

不过，对于某些类型的支票来说，这可能并不容易。为了计算单词，您需要添加一些额外的逻辑，以避免从您处理的"部分"子序列中累积cruft。下面是代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

def countWords1(text):
counts = Counter()
word =""

for c in text.lower():
if c not in"abcdefghijklmnopqrstuvwxyz'-":
word =""
else:
if word:
counts[word] -= 1 # new extra logic
word += c
counts[word] += 1 # this line was moved from above

return counts + Counter() # more new stuff, to remove crufty zero-count items

第二种选择是在序列的末尾附加一个sentinel值，该值将触发所需的"结束子序列"行为。如果你需要避免哨兵污染你的数据(尤其是像数字这样的东西)，这是很棘手的。对于最长连续子序列问题，可以添加不等于序列中最后一项的任何值。None可能是个不错的选择。对于我的计数单词示例，非单词字符(如换行符)将执行以下操作：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

def countWords2(text):
counts = Counter()
word =""

for c in text.lower() +"
": # NOTE: added a sentinel to the string!
if c not in"abcdefghijklmnopqrstuvwxyz'-":
if word:
counts[word] += 1
word =""
else:
word += c

# no need to recheck at the end, since we know we ended with a space

return counts

第三种方法是更改代码的结构，以避免对可能意外结束的序列进行迭代。您可以使用生成器对序列进行预处理，就像其他使用itertools中的groupby的答案一样。(当然，如果必须自己编写，生成器函数也可能有类似的问题。)

例如，我可以使用re模块中的正则表达式来查找单词：

1
2
3
4
5

from re import finditer

def countWords3(text):
return Counter(match.group() for match in
finditer("[\w'-]+", text.lower()))

输出，当给出一个适当的python文本时(所有四个版本的countwords都相同)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

>>> text ="""Well, there's egg and bacon; egg sausage and bacon;
egg and spam; egg bacon and spam; egg bacon sausage and spam;
spam bacon sausage and spam; spam egg spam spam bacon and spam;
spam sausage spam spam bacon spam tomato and spam;
spam spam spam egg and spam; spam spam spam spam spam spam
baked beans spam spam spam; or Lobster Thermidor a Crevette
with a mornay sauce served in a Provencale manner with shallots
and aubergines garnished with truffle pate, brandy and with a
fried egg on top and spam."""

>>> countWords0(text)
Counter({'spam': 28, 'and': 12, 'egg': 8, 'bacon': 7, 'sausage': 4, 'a': 4,
'with': 4, 'well': 1, 'lobster': 1, 'manner': 1, 'in': 1, 'top': 1,
'thermidor': 1,"there's": 1, 'truffle': 1, 'provencale': 1,
'sauce': 1, 'brandy': 1, 'pate': 1, 'shallots': 1, 'garnished': 1,
'tomato': 1, 'on': 1, 'baked': 1, 'aubergines': 1, 'mornay': 1,
'beans': 1, 'served': 1, 'fried': 1, 'crevette': 1, 'or': 1})

相关讨论

通常情况下，需要在循环结束时重新检查在循环内部也被检查的条件。如果您准备牺牲一点效率，避免重复检查的一种方法是在循环中对其进行过度检查。例如：

1
2
3
4
5
6
7
8
9
10
11
12

def my_longest_repetition(l):
if not l:
return None
most_reps = count = 0
longest = prv = None
for i in l:
count = (count + 1) if i == prv else 1
if count > most_reps:
longest = prv
most_reps = count
prv = i
return longest

此代码检查count > most_reps的频率比需要的频率高，但避免了在循环之后再次检查它的需要。

不幸的是，这种变化并不适用于所有情况。

相关讨论

迭代器提供了一种分解循环的好方法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

def longest_repetition(l):
i=iter(l)
n=next(i,None)
longest=None
most_reps=0
while n is not None:
p=n
count=0
while p==n:
n=next(i,None)
count+=1
if count>most_reps:
most_reps=count
longest=p
return longest

许多语言都有类似的概念。