LXML Xpath does not seem to return full path
好吧,我将是第一个承认它是,而不是我想要的道路,我也不知道如何获得它。
我正在Eclipse中使用Python 3.3,并且在Windows 7中使用Pydev插件,在家中使用ubuntu 13.04。我是python的新手,并且编程经验有限。
我正在尝试编写一个脚本来接收XML Lloyds市场保险消息,查找所有标签并将它们转储到.csv中,我们可以在其中轻松更新它们,然后重新导入它们以创建更新的xml。
我设法做到了所有这些,除了当我得到所有标签时,它只给出标签名称,而不是上面的标签。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <TechAccount Sender="broker" Receiver="insurer"> <UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId> <BrokerReference>HOY123/456</BrokerReference> <ServiceProviderReference>2012080921401A1</ServiceProviderReference> <CreationDate>2012-08-10</CreationDate> <AccountTransactionType>premium</AccountTransactionType> <GroupReference>2012080921401A1</GroupReference> <ItemsInGroupTotal> <Count>1</Count> </ItemsInGroupTotal> <ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference> <ServiceProviderGroupItemsTotal> <Count>13</Count> </ServiceProviderGroupItemsTotal> |
那是XML的片段。我想要的是找到所有标签及其路径。例如对于我想将其显示为ItemsInGroupTotal / Count但只能将其显示为Count。
这是我的代码:
1 2 3 4 5 6 7 8 | xml = etree.parse(fullpath) print( xml.xpath('.//*')) all_xpath = xml.xpath('.//*') every_tag = [] for i in all_xpath: single_tag = '%s,%s' % (i.tag, i.text) every_tag.append(single_tag) print(every_tag) |
这给出:
1 2 | '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupReference,8-2012-08-10', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}ServiceProviderGroupItemsTotal,\ ', '{http://www.ACORD.org/standards/Jv-Ins-Reinsurance/1}Count,13', |
如您所见,Count显示为{namespace} Count,而不是{namespace} ItemsInGroupTotal / Count,13
谁能指出我的需求?
谢谢(希望我的第一篇文章可以)
亚当
编辑:
现在是我的代码:
使用open(fullpath,'rb')as xmlFilepath:
xmlfile = xmlFilepath.read()
1 2 3 4 5 6 7 8 9 10 | fulltext = '%s' % xmlfile text = fulltext[2:] print(text) xml = etree.fromstring(fulltext) tree = etree.ElementTree(xml) every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()] print(every_tag) |
但这返回一个错误:
ValueError:不支持带有编码声明的Unicode字符串。请使用字节输入或XML片段而不声明。
我删除了前两个字符,因为您是b',但它抱怨说它不是以标签开头
更新:
我一直在玩这个游戏,如果我删除了xis:xxx标签和顶部的名称空间,它会按预期工作。我需要保留xis标记并能够将它们识别为xis标记,因此不能仅仅删除它们。
我如何实现此目标有帮助吗?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | def human_xpath(element): full_xpath = element.getroottree().getpath(element) xpath = '' human_xpath = '' for i, node in enumerate(full_xpath.split('/')[1:]): xpath += '/' + node element = element.xpath(xpath)[0] namespace, tag = element.tag[1:].split('}', 1) if element.getparent() is not None: nsmap = {'ns': namespace} same_name = element.getparent().xpath('./ns:' + tag, namespaces=nsmap) if len(same_name) > 1: tag += '[{}]'.format(same_name.index(element) + 1) human_xpath += '/' + tag return human_xpath |
ElementTree objects have a method getpath(element), which returns a
structural, absolute XPath expression to find that element
在iter()循环中的每个元素上调用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | from pprint import pprint from lxml import etree text =""" <TechAccount Sender="broker" Receiver="insurer"> <UUId>2EF40080-F618-4FF7-833C-A34EA6A57B73</UUId> <BrokerReference>HOY123/456</BrokerReference> <ServiceProviderReference>2012080921401A1</ServiceProviderReference> <CreationDate>2012-08-10</CreationDate> <AccountTransactionType>premium</AccountTransactionType> <GroupReference>2012080921401A1</GroupReference> <ItemsInGroupTotal> <Count>1</Count> </ItemsInGroupTotal> <ServiceProviderGroupReference>8-2012-08-10</ServiceProviderGroupReference> <ServiceProviderGroupItemsTotal> <Count>13</Count> </ServiceProviderGroupItemsTotal> </TechAccount> """ xml = etree.fromstring(text) tree = etree.ElementTree(xml) every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()] pprint(every_tag) |
印刷品:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ['/TechAccount, \ ', '/TechAccount/UUId, 2EF40080-F618-4FF7-833C-A34EA6A57B73', '/TechAccount/BrokerReference, HOY123/456', '/TechAccount/ServiceProviderReference, 2012080921401A1', '/TechAccount/CreationDate, 2012-08-10', '/TechAccount/AccountTransactionType, premium', '/TechAccount/GroupReference, 2012080921401A1', '/TechAccount/ItemsInGroupTotal, \ ', '/TechAccount/ItemsInGroupTotal/Count, 1', '/TechAccount/ServiceProviderGroupReference, 8-2012-08-10', '/TechAccount/ServiceProviderGroupItemsTotal, \ ', '/TechAccount/ServiceProviderGroupItemsTotal/Count, 13'] |
UPD:
如果您的xml数据在文件
1 2 3 4 5 6 7 8 | from pprint import pprint from lxml import etree xml = etree.parse('test.xml').getroot() tree = etree.ElementTree(xml) every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()] pprint(every_tag) |
希望能有所帮助。