关于正则表达式：使用正则表达式在Python中的两个模式之间匹配行

Match lines between two patterns in Python with regular expressions

本问题已经有最佳答案，请猛点这里访问。

我正在解析日志文件，其中包含有关许多作业的事件的行，这些行由作业ID标识。我正在尝试在Python中的两种模式之间获取日志文件中的所有行。

我已经阅读了这篇非常有用的文章。如何在两种模式之间选择线？并已经使用awk解决了问题，例如：

1	awk '/pattern1/,/pattern2/' file

由于我正在用Python脚本处理日志信息，因此我正在使用subprocess.Popen()执行该awk命令。我的程序可以运行，但是我想单独使用Python解决此问题。

我知道re模块，但是不太了解如何使用它。日志文件已经被压缩到bz2，所以这是我的代码，用于打开.bz2文件并查找两个模式之间的行：

1
2
3
4
5
6
7
8
9
10
11

import bz2
import re

logfile = '/some/log/file.bz2'

PATTERN = r"/{0}/,/{1}/".format('pattern1', 'pattern2')
# example: PATTERN = r"/0001.server;Considering job to run/,/0040;pbs_sched;Job;0001.server/"
re.compile(PATTERN)

with bz2.BZ2File(logfile) as fh:
match = re.findall(PATTERN, fh.read())

但是，match为空(fh.read()不是！)。使用re.findall(PATTERN, fh.read(), re.MULTILINE)无效。
在re.compile()之后使用re.DEBUG会显示许多带有

的行

1
2
3
4
5

literal 47
literal 50
literal 48
literal 49
literal 57

和两个说出

any None

我可以使用像这样的循环来解决问题，例如在两个模式之间进行python打印，包括包含模式的行，但我会尽可能避免嵌套的for-if循环。我相信re模块可以产生我想要的结果，但是我不是如何使用它的专家。

我正在使用Python 2.7.9。

相关讨论

将整个日志文件读取到内存中通常是一个坏主意，因此我将为您提供逐行解决方案。我假设您的示例中的点是模式中唯一变化的部分。我还要假设您要在列表列表中收集行组。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

import bz2
import re

with_delimiting_lines = True
logfile = '/some/log/file.bz2'
group_start_regex = re.compile(r'/0001.server;Considering job to run/')
group_stop_regex = re.compile(r'/0040;pbs_sched;Job;0001.server/')
group_list = []
with bz2.BZ2File(logfile) if logfile.endswith('.bz2') else open(logfile) as fh:
inside_group = False
for line_with_nl in fh:
line = line_with_nl.rstrip()
if inside_group:
if group_stop_regex.match(line):
inside_group = False
if with_delimiting_lines:
group.append(line)
group_list.append(group)
else:
group.append(line)
elif group_start_regex.match(line):
inside_group = True
group = []
if with_delimiting_lines:
group.append(line)

请注意，match()从行的开头开始匹配(当re.MULTILINE模式关闭时，好像模式以^开头)