关于正则表达式:Java正则表达式中 1 *运算符的含义

The meaning of \1* operator in Java regexes

本问题已经有最佳答案,请猛点这里访问。

我正在学习JavaRexes,我注意到以下操作符:

1
\\*1

我很难理解它的含义(在网上搜索没有帮助)。例如,这两个选项之间的区别是什么:

1
2
3
4
5
6
7
8
    Pattern p1 = Pattern.compile("(a)\\1*"); // option1
    Pattern p2 = Pattern.compile("(a)"); // option2

    Matcher m1 = p1.matcher("a");
    Matcher m2 = p2.matcher("a");

    System.out.println(m1.group(0));
    System.out.println(m2.group(0));

结果:

1
2
a
a

谢谢!


在这种情况下,\\1是对应于这里的第一个捕获组(a)的反向引用。

所以在这种情况下,(a)\\1*相当于(a)a*

下面是一个例子,说明了不同之处:

1
2
3
4
5
6
7
8
9
10
Pattern p1 = Pattern.compile("(a)\\1*");
Pattern p2 = Pattern.compile("(a)");

Matcher m1 = p1.matcher("aa");
Matcher m2 = p2.matcher("aa");

m1.find();
System.out.println(m1.group());
m2.find();
System.out.println(m2.group());

输出:

1
2
aa
a

如您所见,当您有多个a时,第一个正则表达式捕获所有连续的a,而第二个表达式只捕获第一个。


\\1*再次寻找a,0次或更多次。可能更容易理解的是这个例子,使用(a)\\1+,它至少查找2个a

1
2
3
Pattern p1 = Pattern.compile("(a)\\1+");
Matcher m1 = p1.matcher("aaaaabbaaabbba");
while (m1.find()) System.out.println(m1.group());

输出将是:

aaaaa
aaa

但是最后一个a不匹配,因为它没有重复。


In Perl, \1 through \9 are always interpreted as back references; a backslash-escaped number greater than 9 is treated as a back reference if at least that many subexpressions exist, otherwise it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.

从模式文档。

因此,只要至少有一个p2对一个"a"是好的,而p1对任何数量的"a"都是好的。这颗星是X* X, zero or more times。它被称为克莱恩星。