Antlr4 parser ends prematurely on misplaced token in Python 3.7
我遇到了一个问题,即如果我的解析器找到了它无法放置在任何规则中的标记,则即使没有更多标记放置,它也会在没有显式报告错误的情况下结束。确切地说,令牌实际上是可以识别的(我有一条几乎是万能的规则),但是令牌放错了位置,不能被任何规则覆盖。在这种情况下,我的解析器成功结束,没有报告任何错误(至少响亮)。
我看到的是这种情况:
要解析的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | .class public final Ld; .super Ljava/lang/Object; .source"java-style lambda group" # interfaces .implements Landroid/content/DialogInterface$OnClickListener; <misplaced-tokens> # static fields .field public static final f:Ld; .field public static final g:Ld; ... |
(请注意
已解析的令牌:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | [@0,0:5='.class',<'.class'>,1:0] [@1,7:12='public',<'public'>,1:7] [@2,14:18='final',<'final'>,1:14] [@3,20:22='Ld;',<QUALIFIED_TYPE_NAME>,1:20] [@4,24:29='.super',<'.super'>,2:0] [@5,31:48='Ljava/lang/Object;',<QUALIFIED_TYPE_NAME>,2:7] [@6,50:56='.source',<'.source'>,3:0] [@7,58:82='"java-style lambda group"',<STRING_LITERAL>,3:8] [@8,85:96='# interfaces',<LINE_COMMENT>,channel=1,5:0] [@9,98:108='.implements',<'.implements'>,6:0] [@10,110:158='Landroid/content/DialogInterface$OnClickListener;',<QUALIFIED_TYPE_NAME>,6:12] [@11,160:160='<',<'<'>,7:0] [@12,161:169='misplaced',<IDENTIFIER>,7:1] [@13,170:170='-',<'-'>,7:10] [@14,171:176='tokens',<IDENTIFIER>,7:11] [@15,177:177='>',<'>'>,7:17] [@16,180:194='# static fields',<LINE_COMMENT>,channel=1,9:0] [@17,196:201='.field',<'.field'>,10:0] ... |
解析进度:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | enter parse, LT(1)=.class enter statement, LT(1)=.class enter classDirective, LT(1)=.class consume [@0,0:5='.class',<30>,1:0] rule classDirective enter classModifier, LT(1)=public consume [@1,7:12='public',<53>,1:7] rule classModifier exit classModifier, LT(1)=final enter classModifier, LT(1)=final consume [@2,14:18='final',<56>,1:14] rule classModifier exit classModifier, LT(1)=Ld; enter className, LT(1)=Ld; enter referenceType, LT(1)=Ld; consume [@3,20:22='Ld;',<1>,1:20] rule referenceType exit referenceType, LT(1)=.super exit className, LT(1)=.super exit classDirective, LT(1)=.super exit statement, LT(1)=.super enter statement, LT(1)=.super enter superDirective, LT(1)=.super consume [@4,24:29='.super',<33>,2:0] rule superDirective enter superName, LT(1)=Ljava/lang/Object; enter referenceType, LT(1)=Ljava/lang/Object; consume [@5,31:48='Ljava/lang/Object;',<1>,2:7] rule referenceType exit referenceType, LT(1)=.source exit superName, LT(1)=.source exit superDirective, LT(1)=.source exit statement, LT(1)=.source enter statement, LT(1)=.source enter sourceDirective, LT(1)=.source consume [@6,50:56='.source',<32>,3:0] rule sourceDirective enter sourceName, LT(1)="java-style lambda group" enter stringLiteral, LT(1)="java-style lambda group" consume [@7,58:82='"java-style lambda group"',<304>,3:8] rule stringLiteral exit stringLiteral, LT(1)=.implements exit sourceName, LT(1)=.implements exit sourceDirective, LT(1)=.implements exit statement, LT(1)=.implements enter statement, LT(1)=.implements enter implementsDirective, LT(1)=.implements consume [@9,98:108='.implements',<31>,6:0] rule implementsDirective enter implementsName, LT(1)=Landroid/content/DialogInterface$OnClickListener; enter referenceType, LT(1)=Landroid/content/DialogInterface$OnClickListener; consume [@10,110:158='Landroid/content/DialogInterface$OnClickListener;',<1>,6:12] rule referenceType exit referenceType, LT(1)=< exit implementsName, LT(1)=< exit implementsDirective, LT(1)=< exit statement, LT(1)=< exit parse, LT(1)=< |
(观察解析是主要规则的方式,尽管管道中还有很多令牌,但实际上是在这里退出的)
我试过的
我尝试重新实现默认的错误策略和错误侦听器,并将其添加到lexer和解析器中,只是想看看是否会遇到这些断点。不会碰到任何覆盖所有方法的断点(有时
这是我添加覆盖的方式:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | def parseFile(self, filePath): errorListener = MyErrorListener() strategy = MyErrorStrategy() file = FileStream("file.smali") lexer = SmaliLexer(file) lexer.removeErrorListeners() lexer.addErrorListener(errorListener) lexer.addErrorListener(strategy) stream = CommonTokenStream(lexer) parser = SmaliParser(stream) parser.removeErrorListeners() parser.addErrorListener(errorListener) parser.addErrorListener(strategy) tree = parser.parse() ... |
我的设置如下:
1 2 3 4 | Windows 10 OS Python 3.7 Antlr4 v4.8 - antlr-4.8-complete.jar pip-installed runtime: antlr4_python3_runtime-4.8-py3-none-any.whl |
对于在使Antlr4实际考虑重写的侦听器和策略方面所提供的任何帮助,我将不胜感激,这样我既可以报告错误以进行调试,又可以进行不同的处理。谢谢!
Antlr4 parser ends prematurely
当您调用的规则(在您的情况下为
1 2 3 4 5 6 7 8 | parse : expression ; expression : expression '+' expression | NUMBER ; |
在上述情况下,当输入为
如果要强制解析器使用输入流中的所有令牌,请在开始规则中添加
1 2 3 | parse : expression EOF ; |