Convert BNF grammar to pyparsing
对于下面显示的脚本语言(Backus–Naur形式),我如何使用正则表达式描述语法(或进行pyparsing更好?):
1 2 3 4 5 6 7 8 | <root> := <tree> | <leaves> <tree> := <group> [* <group>] <group> := "{" <leaves>"}" | <leaf>; <leaves> := {<leaf>;} leaf <leaf> := <name> = <expression>{;} <name> := <string_without_spaces_and_tabs> <expression> := <string_without_spaces_and_tabs> |
脚本示例:
1 2 3 4 5 6 7 8 9 | { stage = 3; some.param1 = [10, 20]; } * { stage = 4; param3 = [100,150,200,250,300] } * endparam = [0, 1] |
我使用python re.compile并希望将所有内容分成组,如下所示:
1 2 3 4 5 6 7 | [ [ 'stage', '3'], [ 'some.param1', '[10, 20]'] ], [ ['stage', '4'], ['param3', '[100,150,200,250,300]'] ], [ ['endparam', '[0, 1]'] ] |
已更新:
我发现pyparsing是比regex更好的解决方案。
Pyparsing使您可以简化其中某些类型的构造
1 | leaves :: {leaf} leaf |
到
1 | OneOrMore(leaf) |
因此pyparsing中BNF的一种形式将类似于:
1 2 3 4 5 6 7 8 9 | from pyparsing import * LBRACE,RBRACE,EQ,SEMI = map(Suppress,"{}=;") name = Word(printables, excludeChars="{}=;") expr = Word(printables, excludeChars="{}=;") | quotedString leaf = Group(name + EQ + expr + SEMI) group = Group(LBRACE + ZeroOrMore(leaf) + RBRACE) | leaf tree = OneOrMore(group) |
我添加了quotedString作为替代expr,以防万一您想要包含不包含其中一个字符的东西。并且在叶子和组周围添加组将保持支撑结构。
很遗憾,您的样本不太符合此BNF:
空格使它们无效exprs
有些叶子没有终止的
仅
此示例使用上述解析器成功解析:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | sample =""" { stage = 3; some.param1 = [10,20]; } { stage = 4; param3 = [100,150,200,250,300]; } endparam = [0,1]; """ parsed = tree.parseString(sample) parsed.pprint() |
给予:
1 2 3 | [[['stage', '3'], ['some.param1', '[10,20]']], [['stage', '4'], ['param3', '[100,150,200,250,300]']], ['endparam', '[0,1]']] |