关于python：如何在当前目录中的所有* .txt文件上运行脚本？

How to run a script on all *.txt files in current directory?

本问题已经有最佳答案，请猛点这里访问。

我正在尝试对当前目录中的所有*.txt文件运行下面的脚本。目前它只处理test.txt文件和基于正则表达式的文本打印块。扫描当前目录中的*.txt文件并在所有找到的*.txt文件上运行以下脚本的最快方法是什么？另外，我如何包括包含"word1"和"word3"的行，因为当前脚本只在这两行之间打印内容？我想打印整块。

1
2
3
4
5
6

#!/usr/bin/env python
import os, re
file = 'test.txt'
with open(file) as fp:
for result in re.findall('word1(.*?)word3', fp.read(), re.S):
print result

对于如何改进上述代码的任何建议或建议，如在大型文本文件集上运行时的速度，我将不胜感激。谢谢您。

相关讨论

受虚假答案的启发，我重写了代码，使其更通用。

现在要浏览的文件：

可以用字符串描述为glob()将使用的第二个参数，或者通过专门为此目的编写的函数，以防无法用glob的模式描述所需的文件集。
如果没有传递第三个参数，则可能在当前目录中，或者在指定的目录中，如果其路径作为第二个参数传递

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

import re,glob
from itertools import ifilter
from os import getcwd,listdir,path
from inspect import isfunction

regx = re.compile('^[^
]*word1.*?word3.*?$',re.S|re.M)

G = '

'\
'MWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMWMW
'\
'MWMWMW %s
'\
'MWMWMW %s
'\
'%s%s'

def search(REGX, how_to_find_files, dirpath='',
G=G,sepm = '
======================
'):
if dirpath=='':
dirpath = getcwd()

if isfunction(how_to_find_files):
gen = ifilter(how_to_find_files,
ifilter(path.isfile,listdir(dirpath)))
elif isinstance(how_to_find_files,str):
gen = glob.glob(path.join(dirpath,
how_to_find_files))

for fn in gen:
with open(fn) as fp:
found = REGX.findall(fp.read())
if found:
yield G % (dirpath,path.basename(fn),
sepm,sepm.join(found))

# Example of searching in .txt files

#============ one use ===================
def select(fn):
return fn[-4:]=='.txt'
print ''.join(search(regx, select))

#============= another use ==============
print ''.join(search(regx,'*.txt'))

通过连续的生成器链接七个文件的处理的好处是，与''.join()的最终连接创建了一个独特的字符串，可以立即写入，然而，如果不这样处理，由于显示中断，一个接一个地打印几个单独的字符串会更长(我能理解吗？)