关于python：如何列出目录中的所有文件？

How do I list all files of a directory?

如何在python中列出目录中的所有文件，并将它们添加到list中？

os.listdir()将为您提供目录中的所有内容—文件和目录。

如果您只需要文件，您可以使用os.path过滤：

1
2
3

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

或者您可以使用os.walk()，它将为它访问的每个目录生成两个列表——为您拆分为文件和目录。如果你只想要最上面的目录，你可以在第一次生成时就中断它。

1
2
3
4
5
6

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
f.extend(filenames)
break

最后，如该示例所示，将一个列表添加到另一个列表中，您可以使用.extend()或

1
2
3
4
5

>>> q = [1, 2, 3]
>>> w = [4, 5, 6]
>>> q = q + w
>>> q
[1, 2, 3, 4, 5, 6]

就我个人而言，我更喜欢.extend()。

相关讨论

我更喜欢使用glob模块，因为它进行模式匹配和扩展。

1 2	import glob print(glob.glob("/home/adam/*.txt"))

它将返回包含查询文件的列表：

1	['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

相关讨论

1 2	import os os.listdir("somedirectory")

将返回"somedirectory"中所有文件和目录的列表。

相关讨论

获取使用python 2和3的文件列表

我还做了一个简短的视频：python：如何在目录中获取文件列表

操作表()

或者…如何获取当前目录(python 3)中的所有文件(和目录)

在python 3中，将文件保存在当前目录中最简单的方法就是这样。这真的很简单；使用os模块和listdir()功能，您将在该目录中拥有该文件(以及目录中的最终文件夹，但您不会在子目录中拥有该文件，因为您可以使用walk—稍后我将讨论它)。

1
2
3
4

>>> import os
>>> arr = os.listdir()
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

使用球

我发现glob更容易选择相同类型的文件或具有共同点的文件。请看以下示例：

1
2
3
4
5

import glob

txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)

使用列表理解

1
2
3

import glob

mylist = [f for f in glob.glob("*.txt")]

使用os.path.abspath获取完整路径名

正如您注意到的，在上面的代码中没有文件的完整路径。如果需要绝对路径，可以使用名为_getfullpathname的os.path模块的另一个函数，将从os.listdir()获得的文件作为参数。还有其他方法可以获得完整路径，稍后我们将进行检查(我用abspath替换了mexmex建议的getfullpathname)。

1
2
3
4

>>> import os
>>> files_path = [os.path.abspath(x) for x in os.listdir()]
>>> files_path
['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

使用walk将文件类型的完整路径名获取到所有子目录中。

我发现这对在许多目录中查找资料非常有用，它帮助我找到了一个我不记得名字的文件：

1
2
3
4
5
6
7
8
9
10

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if".docx" in file:
print(os.path.join(r, file))

os.listdir()：获取当前目录中的文件(python 2)

1
2
3
4
5
6
7

import os
mylist =""
with open("filelist.txt","w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile +"
"
file.write(mylist)

例子：一个与所有的txt文件的硬盘驱动器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

"""We are going to save a txt file with all the files in your directory.
We will use the function walk()

"""

import os

# see all the methods of os
# print(*dir(os), sep=",")
listafile = []
percorso = []
with open("lista_file.txt","w", encoding='utf-8') as testo:
for root, dirs, files in os.walk("D:\"):
for file in files:
listafile.append(file)
percorso.append(root +"\" + file)
testo.write(file +"
")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt","w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file +"
")

with open("percorso.txt","w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file +"
")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

所有文件C：在一个文本文件

这是一个较短的版本前的代码。改变文件夹的文件启动，如果你发现需要从一开始的位置。该代码生成50 MB的文件在我的电脑在线文本和一些不那么具有50万株的完整的文件路径。

相关讨论

应该包括listdir的path参数。
我认为这是为您打开命令或终端的情况下，您想'探索'目录…但我会加上你说的……
当然，我们鼓励您为代码添加一些上下文/解释，因为这样可以使答案更有用。
我同意，但我也没有注意到，python2需要参数，而python3是可选的，如果您改进两个python版本的答案会很好：)
好的，我进入了python 2，找到了不同之处，并编辑了文章。
没有理由这样做，[f for f in os.listdir()]；os.listdir()已经返回list，所以在扔掉之前不必要地复制原来的list。
你可能是对的，但我认为有人想把文件的名称存储在一个标签(‘x’)下，最终详细说明列表中的项目。你可以说我只写了x=os.listdir()，你已经写完了，但是如果你想在os.listdir('f./python')中选择x=[f for f for f i f.endswith('txt')]，这是很有用的。除非在这行代码后面加上x=[f for f in x if x.endswith('.txt')，否则不能使用x=os.listdir()。但是，你是对的，没必要…如果不想存储和详细说明列表中的项目。
os.walk('.')也适用于python 2。
"米莎？"Rybak好的，我将把它添加到专用于Python2的部分。
对于directory list函数是否只返回文件名列表，或者绝对路径或相对路径列表，这个答案需要更清楚。如果函数用于测试文件名是文件还是目录，那么该函数需要一个可用的路径，该路径可能不是list函数返回的路径。例如，如果os.path.isfile(f)中的示例listofiles=[f for f in os.listdir()，那么如果程序员像通常那样向listdir()提供路径，那么它将不起作用。
我将检查您的建议gwideman和我将使它更清楚的函数将返回一个绝对路径或只是文件名，以及如何区分dir和文件。
在这种情况下，"最快"是什么意思？为什么有人要这样做？
哦，不，我不是说代码是最快的方法…(就计算机完成任务所需的时间而言)但我想说的是用几行代码完成任务的方法，只是为了解决这个问题，因为在第一个例子之后，我做了一些更长的例子。但也许我可以在备选方案中添加一些关于"真正的"更快的方法…在下一个更新中。
将os.path.u getfullpathname()替换为os.path.abspath(path)？
@乔瓦尼吉安尼-你有没有检查过abspath是否为你工作正常？我不再完全确定我在做什么，以及我的用例是否与这个线程中的用例相同。谢谢！我认为它只返回路径，所以如果您在从中检索文件列表的目录中，它就可以按预期工作。
@Mexmex我试过了，因为我所经历的，它的工作方式就像getfullpathname，而且，由于它更易读，我按照您的建议修改了它。
pathlib.Path()相当于pathlib.Path('.')。另外，pathlib.Path有一个glob方法(不管路径是文件还是目录，它都有效)。
好。。。。。。。。。。这是一个例子…对我来说是好的……毕竟，编程从来就没有一种方法，这说明了为什么Python是一种很好的语言，同时也很强大。+我为这篇文章的努力付出了一段时间。

只获取文件列表(无子目录)的单行解决方案：

1	filenames = next(os.walk(path))[2]

或绝对路径名：

1	paths = [os.path.join(path,fn) for fn in next(os.walk(path))[2]]

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

import os

def get_filepaths(directory):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),
it yields a 3-tuple (dirpath, dirnames, filenames).
"""
file_paths = [] # List which will store all of the full filepaths.

# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.

return file_paths # Self-explanatory.

# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

我在上面函数中提供的路径包含3个文件-其中两个在根目录中，另一个在名为"subfolder"的子文件夹中。现在可以执行以下操作：
print full_file_paths将打印列表：
- ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

如果愿意，您可以打开和读取内容，或者只关注扩展名为".dat"的文件，如下面的代码所示：

1
2
3

for f in full_file_paths:
if f.endswith(".dat"):
print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat

自3.4版以来，内置迭代器的效率比os.listdir()高很多：

pathlib：3.4版新增。

1 2	>>> import pathlib >>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

根据PEP428，pathlib库的目标是提供一个简单的类层次结构来处理文件系统路径，以及用户对它们进行的常见操作。

os.scandir()：3.5版新增。

1 2	>>> import os >>> [entry for entry in os.scandir('.') if entry.is_file()]

注意，从3.5版开始，os.walk()使用os.scandir()而不是os.listdir()，其速度根据pep 471增加了2-20倍。

让我也推荐阅读下面的影子突击队的评论。

相关讨论

初步说明

尽管问题文本中的文件和目录术语有明显的区别，但有些人可能会认为目录实际上是特殊的文件。
语句："目录中的所有文件"可以用两种方式解释：
仅所有直接(或级别1)后代
整个目录树中的所有子代(包括子目录中的子代)

当问到这个问题时，我认为python 2是LTS版本，但是代码示例将由python 3(.5)运行(我将尽可能使它们与python 2兼容；另外，我要发布的属于python的任何代码都来自v3.5.4，除非另有说明)。这与问题中的另一个关键字"将它们添加到列表"有关：

在Python2.2之前的版本中，序列(iterables)主要由列表(元组、集合等)表示。
在python 2.2中，引入了generator的概念(：generators)——由提供：yield语句。随着时间的推移，对于返回/处理列表的函数，生成器对应项开始出现。
在Python3中，生成器是默认行为
不确定返回列表是否仍然是强制的(或者生成器也可以)，但将生成器传递给列表构造函数将从中创建一个列表(并使用它)。下面的示例说明了上的差异：map(函数，iterable，…)

1
2
3
4
5
6
7
8
>>> import sys
>>> sys.version
'2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
>>> m = map(lambda x: x, [1, 2, 3]) # Just a dummy lambda function
>>> m, type(m)
([1, 2, 3], <type 'list'>)
>>> len(m)
3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
>>> import sys
>>> sys.version
'3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
>>> m = map(lambda x: x, [1, 2, 3])
>>> m, type(m)
(<map object at 0x000001B4257342B0>, <class 'map'>)
>>> len(m)
Traceback (most recent call last):
File"<stdin>", line 1, in <module>
TypeError: object of type 'map' has no len()
>>> lm0 = list(m) # Build a list from the generator
>>> lm0, type(lm0)
([1, 2, 3], <class 'list'>)
>>>
>>> lm1 = list(m) # Build a list from the same generator
>>> lm1, type(lm1) # Empty list now - generator already consumed
([], <class 'list'>)

示例将基于一个名为root_dir的目录，其结构如下(此示例用于win，但我在lnx上也使用了相同的树)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
E:\Work\Dev\StackOverflow\q003207219>tree /f"root_dir"
Folder PATH listing for volume Work
Volume serial number is 00000029 3655:6FED
E:\WORK\DEV\STACKOVERFLOW\Q003207219
OOT_DIR
| file0
| file1
|
+---dir0
| +---dir00
| | | file000
| | |
| | +---dir000
| | file0000
| |
| +---dir01
| | file010
| | file011
| |
| +---dir02
| +---dir020
| +---dir0200
+---dir1
| file10
| file11
| file12
|
+---dir2
| | file20
| |
| +---dir20
| file200
|
+---dir3

解决方案方法：

：os.listdir(path='.')

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' ...

1
2
3
4
5
6
7
8
>>> import os
>>> root_dir ="root_dir" # Path relative to current dir (os.getcwd())
>>>
>>> os.listdir(root_dir) # List all the items in root_dir
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))] # Filter items and only keep files (strip out directories)
['file0', 'file1']

更详细的示例(code_os_listdir.py)：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

import os
from pprint import pformat

def _get_dir_content(path, include_folders, recursive):
entries = os.listdir(path)
for entry in entries:
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
if include_folders:
yield entry_with_path
if recursive:
for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
yield sub_entry
else:
yield entry_with_path

def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
path_len = len(path) + len(os.path.sep)
for item in _get_dir_content(path, include_folders, recursive):
yield item if prepend_folder_name else item[path_len:]

def _get_dir_content_old(path, include_folders, recursive):
entries = os.listdir(path)
ret = list()
for entry in entries:
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
if include_folders:
ret.append(entry_with_path)
if recursive:
ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
else:
ret.append(entry_with_path)
return ret

def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
path_len = len(path) + len(os.path.sep)
return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]

def main():
root_dir ="root_dir"
ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
lret0 = list(ret0)
print(ret0, len(lret0), pformat(lret0))
ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
print(len(ret1), pformat(ret1))

if __name__ =="__main__":
main()

笔记：

：os.walk(顶部，自上而下=true，onerror=none，followlinks=false)

Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
>>> import os
>>> root_dir = os.path.join(os.getcwd(),"root_dir") # Specify the full path
>>> root_dir
'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir'
>>>
>>> walk_generator = os.walk(root_dir)
>>> root_dir_entry = next(walk_generator) # First entry corresponds to the root dir (passed as an argument)
>>> root_dir_entry
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
>>>
>>> root_dir_entry[1] + root_dir_entry[2] # Display dirs and files (direct descendants) in a single list
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]] # Display all the entries in the previous list by their full path
['E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\file1']
>>>
>>> for entry in walk_generator: # Display the rest of the elements (corresponding to every subdir)
... print(entry)
...
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir00', ['dir000'], ['file000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir00\\dir000', [], ['file0000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir01', [], ['file010', 'file011'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir02', ['dir020'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir1', [], ['file10', 'file11', 'file12'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir2', ['dir20'], ['file20'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir2\\dir20', [], ['file200'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\
oot_dir\\dir3', [], [])

笔记：

在场景下，它使用os.scandir(在旧版本上使用os.listdir)
它通过在子文件夹中循环进行重载提升

：glob.glob(路径名，*，recursive=false)(：glob.iglob(路径名，*，recursive=false))

Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell)....Changed in version 3.5: Support for recursive globs using"**".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
>>> import glob, os
>>> wildcard_pattern ="*"
>>> root_dir = os.path.join("root_dir", wildcard_pattern) # Match every file/dir name
>>> root_dir
'root_dir\\*'
>>>
>>> glob_list = glob.glob(root_dir)
>>> glob_list
['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
>>>
>>> [item.replace("root_dir" + os.path.sep,"") for item in glob_list] # Strip the dir name and the path separator from begining
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> for entry in glob.iglob(root_dir +"*", recursive=True):
... print(entry)
...
root_dir\
root_dir\dir0
root_dir\dir0\dir00
root_dir\dir0\dir00\dir000
root_dir\dir0\dir00\dir000\file0000
root_dir\dir0\dir00\file000
root_dir\dir0\dir01
root_dir\dir0\dir01\file010
root_dir\dir0\dir01\file011
root_dir\dir0\dir02
root_dir\dir0\dir02\dir020
root_dir\dir0\dir02\dir020\dir0200
root_dir\dir1
root_dir\dir1\file10
root_dir\dir1\file11
root_dir\dir1\file12
root_dir\dir2
root_dir\dir2\dir20
root_dir\dir2\dir20\file200
root_dir\dir2\file20
root_dir\dir3
root_dir\file0
root_dir\file1

笔记：

使用os.listdir。
对于大型树(尤其是启用递归时)，首选iglob
允许基于名称的高级筛选(由于通配符)

：class pathlib.path(*pathspegs)(python 3.4+，backport:[pypi]：pathlib2)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
>>> import pathlib
>>> root_dir ="root_dir"
>>> root_dir_instance = pathlib.Path(root_dir)
>>> root_dir_instance
WindowsPath('root_dir')
>>> root_dir_instance.name
'root_dir'
>>> root_dir_instance.is_dir()
True
>>>
>>> [item.name for item in root_dir_instance.glob("*")] # Wildcard searching for all direct descendants
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()] # Display paths (including parent) for files only
['root_dir\\file0', 'root_dir\\file1']

笔记：

这是实现我们目标的一种方式
这是处理路径的OOP风格
提供了许多功能

：dircache.listdir(path)(仅限python 2)

但是，根据[github]：python/cpython-(2.7)cpython/lib/dircache.py，它只是一个(瘦)包装，覆盖os.listdir，带有缓存

1
2
3
4
5
6
7
8
9
10
11
12
13

def listdir(path):
"""List directory contents, using cache."""
try:
cached_mtime, list = cache[path]
del cache[path]
except KeyError:
cached_mtime, list = -1, []
mtime = os.stat(path).st_mtime
if mtime != cached_mtime:
list = os.listdir(path)
list.sort()
cache[path] = mtime, list
return list

使用Opendir/Readdir/Closedir([MS.docs]：FindfirstFilew Function/[MS.docs]：FindnexFilew Function/[MS.docs]：Findclose Function(Via[Github]：Python/Cpython-(Master)Cpython/Modules/Posixmodule.c)

使用这些(win specific)功能作为Well(via[Github]：Mhammond/Pywin32-(Master)Pywin32/WIN32/SRC/WIN32File.i)

+U get dir 1.)can be implemented using any of these approaches(some will require more work and some less)

Some advanced filtering(instead of just file vs.dir)could be done：E.G.The include ufolders argument could be replaced by another one(E.G.filter func).which would be a function that takes a path as an argument:EDOCX1>3>(this doesn t strip out out content anything)and inside \ function if inside \失败一个入口，它会变得很滑稽，但代码变得更复杂，长久它会被执行。

Nota Bene！自从我的回馈被使用以来，我必须指出，我在我的LAPTOP(WIN 10 X64)上做了一些测试，完全与这个问题无关，而且当回馈水平在某处达到(990.)(Recursionlimit-1000(Default))，I got stackoverflow：)。如果目录树超出了限度(我不是专家，所以我不知道是否有可能)，那可能是个问题。我还必须指出，我没有尝试增加回归限额，因为我在这一地区没有经验(我如何才能在这之前增加到骨骼水平)，但在理论上，总有失败的可能性，如果深度大于可能的最高回归限度(在这台机器上)

The code samples are only for demonstrative purposes.这意味着我没有把错误处理(我不认为有任何尝试/除了/ELSE/ELSE/Finally Block)，所以代码并不坚固(原因是尽可能简单和简短)。生产误差处理

相关讨论

我真的很喜欢Adamk的回答，建议您使用来自同名模块的glob()。这允许您使用与*s匹配的模式。

但正如其他人在评论中指出的那样，glob()可能会因为不一致的斜线方向而被绊倒。为此，我建议您在os.path模块中使用join()和expanduser()函数，也可以在os模块中使用getcwd()函数。

例如：

1
2
3
4

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

上面的情况很糟糕-路径已经硬编码，并且只能在驱动器名和\之间的Windows上工作。

1
2
3
4
5

from glob import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

上面的方法更有效，但是它依赖于文件夹名Users，这在Windows上很常见，在其他OSS上不常见。它还依赖于具有特定名称的用户，admin。

1
2
3
4
5

from glob import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

这在所有平台上都非常有效。

另一个很好的例子是，它可以在不同的平台上完美地工作，并且可以做一些不同的事情：

1
2
3
4
5
6

from glob import glob
from os import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

希望这些示例能够帮助您了解在标准的Python库模块中可以找到的一些函数的强大功能。

相关讨论

1
2
3
4
5
6
7
8

def list_files(path):
# returns a list of names (with extension, without full path) of all files
# in folder path
files = []
for name in os.listdir(path):
if os.path.isfile(os.path.join(path, name)):
files.append(name)
return files

如果您正在寻找find的python实现，这是我经常使用的一个方法：

1
2
3
4
5
6
7
8

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
print found_file

所以我用它做了一个pypi包，还有一个github存储库。我希望有人会发现它对这段代码有潜在的用处。

返回绝对文件路径列表，不会递归到子目录中

1	L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

import os
import os.path

def get_files(target_dir):
item_list = os.listdir(target_dir)

file_list = list()
for item in item_list:
item_dir = os.path.join(target_dir,item)
if os.path.isdir(item_dir):
file_list += get_files(item_dir)
else:
file_list.append(item_dir)
return file_list

这里我使用递归结构。

我假设您的所有文件都是*.txt格式，并且存储在路径为data/的目录中。

可以使用python的glob模块列出目录中的所有文件，并按以下方式添加到名为fnames的列表中：

1
2
3

import glob

fnames = glob.glob("data/*.txt") #fnames: list data type

For greater results, you can use listdir() method of the os module along with a generator (a generator is a powerful iterator that keeps its state, remember?). The following code works fine with both versions: Python 2 and Python 3.

这里有一个代码：

1
2
3
4
5
6
7
8
9

import os

def files(path):
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
yield file

for file in files("."):
print (file)

listdir()方法返回给定目录的条目列表。如果给定条目是文件，则方法os.path.isfile()返回True。yield操作符退出func，但保持当前状态，只返回作为文件检测到的条目的名称。所有这些都允许我们循环发电机功能。

希望这有帮助。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

# -** coding: utf-8 -*-
import os
import traceback

print '

'

def start():
address ="/home/ubuntu/Desktop"
try:
Folders = []
Id = 1
for item in os.listdir(address):
endaddress = address +"/" + item
Folders.append({'Id': Id, 'TopId': 0, 'Name': item, 'Address': endaddress })
Id += 1

state = 0
for item2 in os.listdir(endaddress):
state = 1
if state == 1:
Id = FolderToList(endaddress, Id, Id - 1, Folders)
return Folders
except:
print"___________________________ ERROR ___________________________
" + traceback.format_exc()

def FolderToList(address, Id, TopId, Folders):
for item in os.listdir(address):
endaddress = address +"/" + item
Folders.append({'Id': Id, 'TopId': TopId, 'Name': item, 'Address': endaddress })
Id += 1

state = 0
for item in os.listdir(endaddress):
state = 1
if state == 1:
Id = FolderToList(endaddress, Id, Id - 1, Folders)
return Id

print start()

相关讨论

使用发电机

1
2
3
4
5
6
7
8

import os
def get_files(search_path):
for (dirpath, _, filenames) in os.walk(search_path):
for filename in filenames:
yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
print(filename)

您可以将此代码用于在文件的完整路径(目录+文件名)上运行的get迭代器。

1
2
3
4
5
6

import os

def get_iterator_all_files_name(dir_path):
for (dirpath, dirnames, filenames) in os.walk(dir_path):
for f in filenames:
yield os.path.join(dirpath, f)

或者用它，把它列入名单。

1
2
3
4
5
6
7
8
9
10

import os

def get_list_all_files_name(dir_path):
all_files_path = []

for (dirpath, dirnames, filenames) in os.walk(dir_path):
for f in filenames:
all_files_path.append(os.path.join(dirpath, f))

return all_files_path

相关讨论

Python3.4+的另一个非常易读的变体是使用pathlib.path.glob：

1
2
3

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

更具体化是很简单的，例如只查找不是符号链接的python源文件，也可以在所有子目录中查找：

1	[f for f in Path(folder).glob('*/.py') if not f.is_symlink()]

如果要使用其他文件类型或获取完整目录，请使用此函数：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

import os

def createList(foldername, fulldir = True, suffix=".jpg"):
file_list_tmp = os.listdir(foldername)
#print len(file_list_tmp)
file_list = []
if fulldir:
for item in file_list_tmp:
if item.endswith(suffix):
file_list.append(os.path.join(foldername, item))
else:
for item in file_list_tmp:
if item.endswith(suffix):
file_list.append(item)
return file_list

相关讨论

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
if len(list[i]) != check:
temp.append(list[i-1])
check = len(list[i])
else:
i = i + 1
count = count - 1

print temp

相关讨论

这是我的通用功能。它返回一个文件路径列表，而不是文件名，因为我发现这更有用。它有几个可选的参数，使其通用。例如，我经常把它与pattern='*.txt'或subfolders=True这样的论点结合使用。

1
2
3
4
5
6
7
8
9
10
11
12

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
"""Return a list of the file paths matching the pattern in the specified
folder, optionally including files inside subfolders.
"""
match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
return [os.path.join(root, f)
for root, dirnames, filenames in walked
for f in filenames if match(f, pattern)]

对于Python 2：pip安装rglob

1
2
3

import rglob
file_list=rglob.rglob("/home/base/dir/","*")
print file_list

一位聪明的老师曾经告诉我：

When there are several established ways to do something, none of them is good for all cases.

因此，我将为问题的一个子集添加一个解决方案：通常，我们只想检查文件是否匹配开始字符串和结束字符串，而不必进入子目录。因此，我们需要一个返回文件名列表的函数，例如：

1	filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

如果要先声明两个函数，可以这样做：

1
2
3
4
5
6
7
8
9
10
11
12
13

def file_filter(filename, radical='', extension=''):
"Check if a filename matches a radical and extension"
if not filename:
return False
filename = filename.strip()
return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
"Filter filenames in directory according to radical and extension"
if not dirname:
dirname = '.'
return [filename for filename in os.listdir(dirname)
if file_filter(filename, radical, extension)]

这个解决方案可以很容易地用正则表达式进行概括(如果您不希望模式总是停留在文件名的开头或结尾，那么您可能需要添加一个pattern参数)。

我将提供一个示例一行程序，其中可以提供sourcepath和文件类型作为输入。代码返回带有csv扩展名的文件名列表。使用。如果需要返回所有文件。这也将递归地扫描子目录。

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

根据需要修改文件扩展名和源路径。

相关讨论

从指定文件夹(包括子目录)中获取所有文件。

1
2
3
4

import glob
import os

print([entry for entry in glob.iglob("{}/**".format("FILE_PATH"), recursive=True) if os.path.isfile(entry) == True])

相关讨论

要显示完整路径和带扩展名的筛选器，请使用，

1 2	import os onlyfiles = [f for f in os.listdir(file) if len(f) >= 5 and f[-5:] ==".json" and isfile(join(file, f))]

根据扩展名/文件类型中的".+"字符更改数字5