使用Python minidom读取XML并遍历每个节点

Reading XML using Python minidom and iterating over each node

我有一个类似于以下内容的XML结构，但规模更大：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

<root>
<conference name='1'>

Bob
</author>

Nigel
</author>
</conference>
<conference name='2'>

Alice
</author>

Mary
</author>
</conference>
</root>

为此，我使用了以下代码：

1
2
3
4
5
6
7
8
9

dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
conf_name=node.getAttribute('name')
print conf_name
alist=node.getElementsByTagName('author')
for a in alist:
authortext= a.nodeValue
print authortext

但是，打印出来的authortext是\\'None。\\'。我尝试使用如下所示的变体来弄乱，但是这会导致我的程序中断。

1	authortext=a[0].nodeValue

正确的输出应该是：

1
2
3
4
5
6

1
Bob
Nigel
2
Alice
Mary

但是我得到的是：

1
2
3
4
5
6

1
None
None
2
None
None

关于如何解决此问题的任何建议？

您的authortext类型为1(ELEMENT_NODE)，通常您需要TEXT_NODE才能获取字符串。这将起作用

1	a.childNodes[0].nodeValue

元素节点没有nodeValue。您必须查看其中的Text节点。如果知道内部始终有一个文本节点，则可以说element.firstChild.data(数据与文本节点的nodeValue相同)。

请注意：如果没有文本内容，则不会有子节点文本节点和element.firstChild为空，导致对.data的访问失败。

获取直接子文本节点的内容的快速方法：

1	text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)

在DOM级别3中核心是您获得了textContent属性，该属性可用于递归地从Element内部获取文本，但是minimini不支持此属性(某些其他Python DOM实现支持)。

由于每个作者始终只有一个文本数据值，因此可以使用element.firstChild.data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

dom = parseString(document)
conferences = dom.getElementsByTagName("conference")

# Each conference here is a node
for conference in conferences:
conference_name = conference.getAttribute("name")
print
print conference_name.upper() +" -"

authors = conference.getElementsByTagName("author")
for author in authors:
print" ", author.firstChild.data
# for

print

快速访问：

1	node.getElementsByTagName('author')[0].childNodes[0].nodeValue

我玩了一下，这就是我要工作的内容：

1
2
3

# ...
authortext= a.childNodes[0].nodeValue
print authortext

导致输出：

1
2
3
4
5
6
7

C:\\temp\\py>xml2.py
1
Bob
Nigel
2
Alice
Mary

我无法确切地说出原因您必须访问childNode才能获取内部文本，但这至少是您所要的。