How to select nodes between flags when flags repeat and are not consecutive
给定输入文档是一系列相同级别的节点,我想找到出现在两个标志(它们本身就是节点)之间的那些节点。标志可以多次使用,最终结果应该将相同标志之间的所有内容组合在一起。我在这方面表现出色。
给定这个输入文档:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | <root> <p class="text">Hello world 1.</p> <p class="text">Hello world 2.</p> <p class="text">Hello world 3.</p> <p class="excerptstartone">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select --> <p class="text">Hello world 4.</p> <p class="text">Hello world 5.</p> <p class="text">Hello world 6.</p> <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select --> <p class="text">Hello world 7.</p> <p class="excerptstarttwo">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select --> <p class="text">Hello world 8.</p> <p class="excerptendtwo">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select --> <p class="text">Hello world 9.</p> <p class="excerptstartone">Dummy text for starting a new excerpt</p> <!-- this flag identifies the start of the nodes I want to select --> <p class="text">Hello world 10.</p> <p class="text">Hello world 11.</p> <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select --> <p class="text">Hello world 12.</p> <p class="text">Hello world 13.</p> <p class="text">Hello world 14.</p> <p class="text">Hello world 15.</p> <p class="text">Hello world 16.</p> <p class="text">Hello world 17.</p> </root> |
我想要这个输出:
1 2 3 4 5 6 7 8 9 10 11 12 | <root> <p class="excerptstartone">Dummy text</p> <p class="text">Hello world 4.</p> <p class="text">Hello world 5.</p> <p class="text">Hello world 6.</p> <p class="text">Hello world 10.</p> <p class="text">Hello world 11.</p> <p class="excerptendone">Dummy text</p> <p class="excerptstarttwo">Dummy text</p> <p class="text">Hello world 8.</p> <p class="excerptendtwo">Dummy text</p> </root> |
注意:标志总是以 "excerptstart" 和 "excerptend" 开头,并且标志的后缀将始终匹配(也就是说,商业规则保证总会有一个 "excerptendone",如果有一个"excerptstartone").
这是我目前所拥有的。只要我对摘录开始后缀进行硬编码(即\\'one\\'、\\'two\\'),我就可以找到我想要的集合。我坚持试图概括它,因此后缀不必硬编码(我还应该说我不关心在结果树中保留开始/结束段落"标志";我\\为了方便评估结果树,已将它们硬编码在此处):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="root"> <root> <p class="excerptstartone">Dummy text</p> <xsl:for-each select="p[@class='excerptstartone']"> <xsl:sequence select="following-sibling::node() intersect following-sibling::p[@class='excerptendone'][1]/preceding-sibling::node()"/> </xsl:for-each> <p class="excerptendone">Dummy text</p> <p class="excerptstarttwo">Dummy text</p> <xsl:for-each select="p[@class='excerptstarttwo']"> <xsl:sequence select="following-sibling::node() intersect following-sibling::p[@class='excerptendtwo'][1]/preceding-sibling::node()"/> </xsl:for-each> <p class="excerptendtwo">Dummy text</p> </root> </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet> |
看看例如这种 Kayessian 方法。
或者试试这个:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:key name="kFollowing" match="p" use="generate-id(preceding-sibling::p[starts-with(@class, 'excerptstart')][1])"/> <xsl:key name="kExcerptstart" match="p[starts-with(@class, 'excerptstart')]" use="@class"/> <xsl:template match="/*"> <xsl:copy> <xsl:apply-templates select="p"/> </xsl:copy> </xsl:template> <xsl:template match="p" /> <xsl:template match="p[ generate-id() = generate-id( key( 'kExcerptstart', @class)[1])]"> <xsl:copy-of select="."/> <xsl:variable name="start" select="@class" /> <xsl:for-each select=" key( 'kExcerptstart', $start)"> <xsl:variable name="end" select="following-sibling::p[starts-with(@class, 'excerptend')][1]"/> <xsl:variable name="ns1" select="following-sibling::*" /> <xsl:variable name="ns2" select="$end/preceding-sibling::*" /> <!--<xsl:value-of select="count($ns1)"/>,<xsl:value-of select="count($ns2)"/>--> <xsl:copy-of select="$ns1[count(.|$ns2) = count($ns2)]"/> </xsl:for-each> <xsl:copy-of select="following-sibling::p[starts-with(@class, 'excerptend')][1]"/> </xsl:template> </xsl:stylesheet> |
这将产生以下输出:
1 2 3 4 5 6 7 8 9 10 11 12 | <root> <p class="excerptstartone">Dummy text</p> <p class="text">Hello world 4.</p> <p class="text">Hello world 5.</p> <p class="text">Hello world 6.</p> <p class="text">Hello world 10.</p> <p class="text">Hello world 11.</p> <p class="excerptendone">Dummy text</p> <p class="excerptstarttwo">Dummy text</p> <p class="text">Hello world 8.</p> <p class="excerptendtwo">Dummy text</p> </root> |
(I should also say I don't care about retaining the start/end
paragraph"flags" in the result tree; I've hard coded those here for
convenience in assessing the result tree)
这是一个简单的解决方案,只使用分组:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/*"> <root> <xsl:for-each-group select= "p[@class eq 'text'] [preceding-sibling::p[starts-with(@class, 'excerpt')][1] [starts-with(@class, 'excerptstart')] ]" group-by="preceding-sibling::p[starts-with(@class, 'excerpt')][1]/@class"> <xsl:sequence select="current-group()"/> </xsl:for-each-group> </root> </xsl:template> </xsl:stylesheet> |
当此转换应用于提供的 XML 文档时:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | <root> <p class="text">Hello world 1.</p> <p class="text">Hello world 2.</p> <p class="text">Hello world 3.</p> <p class="excerptstartone">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select --> <p class="text">Hello world 4.</p> <p class="text">Hello world 5.</p> <p class="text">Hello world 6.</p> <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select --> <p class="text">Hello world 7.</p> <p class="excerptstarttwo">Dummy text</p> <!-- this flag identifies the start of the nodes I want to select --> <p class="text">Hello world 8.</p> <p class="excerptendtwo">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select --> <p class="text">Hello world 9.</p> <p class="excerptstartone">Dummy text for starting a new excerpt</p> <!-- this flag identifies the start of the nodes I want to select --> <p class="text">Hello world 10.</p> <p class="text">Hello world 11.</p> <p class="excerptendone">Dummy text</p> <!-- this flag identifies the end of the nodes I want to select --> <p class="text">Hello world 12.</p> <p class="text">Hello world 13.</p> <p class="text">Hello world 14.</p> <p class="text">Hello world 15.</p> <p class="text">Hello world 16.</p> <p class="text">Hello world 17.</p> </root> |
产生了想要的、正确的结果:
1 2 3 4 5 6 7 8 | <root> <p class="text">Hello world 4.</p> <p class="text">Hello world 5.</p> <p class="text">Hello world 6.</p> <p class="text">Hello world 10.</p> <p class="text">Hello world 11.</p> <p class="text">Hello world 8.</p> </root> |
这为我想要做的事情提供了一个通用的解决方案,虽然有点笨拙(由于使用了两个 for-eaches):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="root"> <root> <xsl:variable name="uniqueExcerptClasses" select="distinct-values(//@class[starts-with(.,'excerptstart')])"/> <xsl:variable name="context" select="."/> <xsl:for-each select="$uniqueExcerptClasses"> <xsl:text> </xsl:text><p>start excert</p><xsl:text> </xsl:text> <xsl:variable name="curExcerpt" select="."/> <xsl:for-each select="$context/p[@class=$curExcerpt]"> <xsl:sequence select="following-sibling::node() intersect following-sibling::p[@class=replace($curExcerpt,'start','end')][1]/preceding-sibling::node()"/> </xsl:for-each> <xsl:text> </xsl:text><p>end excert</p><xsl:text> </xsl:text> </xsl:for-each> </root> </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet> |