关于python：LXML Xpath似乎不返回完整路径

LXML Xpath does not seem to return full path

好吧，我将是第一个承认它是，而不是我想要的道路，我也不知道如何获得它。

我正在Eclipse中使用Python 3.3，并且在Windows 7中使用Pydev插件，在家中使用ubuntu 13.04。我是python的新手，并且编程经验有限。

我正在尝试编写一个脚本来接收XML Lloyds市场保险消息，查找所有标签并将它们转储到.csv中，我们可以在其中轻松更新它们，然后重新导入它们以创建更新的xml。

我设法做到了所有这些，除了当我得到所有标签时，它只给出标签名称，而不是上面的标签。

1
2
3
4
5
6
7
8
9
10
11
12
13
14

<TechAccount Sender="broker" Receiver="insurer">
<UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId>
<BrokerReference>HOY123/456</BrokerReference>
<ServiceProviderReference>2012080921401A1</ServiceProviderReference>
<CreationDate>2012-08-10</CreationDate>
<AccountTransactionType>premium</AccountTransactionType>
<GroupReference>2012080921401A1</GroupReference>
<ItemsInGroupTotal>
<Count>1</Count>
</ItemsInGroupTotal>
<ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference>
<ServiceProviderGroupItemsTotal>
<Count>13</Count>
</ServiceProviderGroupItemsTotal>

那是XML的片段。我想要的是找到所有标签及其路径。例如对于我想将其显示为ItemsInGroupTotal / Count但只能将其显示为Count。

这是我的代码：

1
2
3
4
5
6
7
8

xml = etree.parse(fullpath)
print( xml.xpath('.//*'))
all_xpath = xml.xpath('.//*')
every_tag = []
for i in all_xpath:
single_tag = '%s,%s' % (i.tag, i.text)
every_tag.append(single_tag)
print(every_tag)

这给出：

1 2	'{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupReference,8-2012-08-10', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupItemsTotal,\ ', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}Count,13',

如您所见，Count显示为{namespace} Count，而不是{namespace} ItemsInGroupTotal / Count，13

谁能指出我的需求？

谢谢(希望我的第一篇文章可以)

亚当

编辑：

现在是我的代码：
使用open(fullpath，'rb')as xmlFilepath：
xmlfile = xmlFilepath.read()

1
2
3
4
5
6
7
8
9
10

fulltext = '%s' % xmlfile
text = fulltext[2:]
print(text)

xml = etree.fromstring(fulltext)
tree = etree.ElementTree(xml)

every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
print(every_tag)

但这返回一个错误：
ValueError：不支持带有编码声明的Unicode字符串。请使用字节输入或XML片段而不声明。

我删除了前两个字符，因为您是b'，但它抱怨说它不是以标签开头

更新：

我一直在玩这个游戏，如果我删除了xis：xxx标签和顶部的名称空间，它会按预期工作。我需要保留xis标记并能够将它们识别为xis标记，因此不能仅仅删除它们。

我如何实现此目标有帮助吗？

getpath()确实确实返回了不适合人类使用的xpath。通过此xpath，您可以构建更有用的xpath。例如使用这种快捷方法：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

def human_xpath(element):
full_xpath = element.getroottree().getpath(element)
xpath = ''
human_xpath = ''
for i, node in enumerate(full_xpath.split('/')[1:]):
xpath += '/' + node
element = element.xpath(xpath)[0]
namespace, tag = element.tag[1:].split('}', 1)
if element.getparent() is not None:
nsmap = {'ns': namespace}
same_name = element.getparent().xpath('./ns:' + tag,
namespaces=nsmap)
if len(same_name) > 1:
tag += '[{}]'.format(same_name.index(element) + 1)
human_xpath += '/' + tag
return human_xpath

ElementTree objects have a method getpath(element), which returns a
structural, absolute XPath expression to find that element

在iter()循环中的每个元素上调用getpath应该适合您：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

from pprint import pprint
from lxml import etree

text ="""
<TechAccount Sender="broker" Receiver="insurer">
<UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId>
<BrokerReference>HOY123/456</BrokerReference>
<ServiceProviderReference>2012080921401A1</ServiceProviderReference>
<CreationDate>2012-08-10</CreationDate>
<AccountTransactionType>premium</AccountTransactionType>
<GroupReference>2012080921401A1</GroupReference>
<ItemsInGroupTotal>
<Count>1</Count>
</ItemsInGroupTotal>
<ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference>
<ServiceProviderGroupItemsTotal>
<Count>13</Count>
</ServiceProviderGroupItemsTotal>
</TechAccount>
"""

xml = etree.fromstring(text)
tree = etree.ElementTree(xml)

every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
pprint(every_tag)

印刷品：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

['/TechAccount, \
',
'/TechAccount/UUId, 2EF40080-F618-4FF7-833C-A34EA6A57B73',
'/TechAccount/BrokerReference, HOY123/456',
'/TechAccount/ServiceProviderReference, 2012080921401A1',
'/TechAccount/CreationDate, 2012-08-10',
'/TechAccount/AccountTransactionType, premium',
'/TechAccount/GroupReference, 2012080921401A1',
'/TechAccount/ItemsInGroupTotal, \
',
'/TechAccount/ItemsInGroupTotal/Count, 1',
'/TechAccount/ServiceProviderGroupReference, 8-2012-08-10',
'/TechAccount/ServiceProviderGroupItemsTotal, \
',
'/TechAccount/ServiceProviderGroupItemsTotal/Count, 13']

UPD：
如果您的xml数据在文件test.xml中，则代码如下所示：

1
2
3
4
5
6
7
8

from pprint import pprint
from lxml import etree

xml = etree.parse('test.xml').getroot()
tree = etree.ElementTree(xml)

every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
pprint(every_tag)

希望能有所帮助。