关于javascript：RegEx用于匹配Taskwarrior数据格式

RegEx for matching Taskwarrior data format

我正在尝试解析以下类型的字符串：

1	[key:"val" key2:"val2"]

哪里有任意键：里面有"val"对。我想抓住关键名称和价值。
对于那些好奇的我正在尝试解析任务战士的数据库格式。

这是我的测试字符串：

1	[description:"aoeu" uuid:"123sth"]

这是为了强调除了空格之外的任何东西都可以在键或值中，冒号周围没有空格，值总是用双引号。

在节点中，这是我的输出：

1
2
3
4
5
6
7
8

[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
'uuid',
'123sth',
index: 0,
input: '[description:"aoeu" uuid:"123sth"]' ]

但description:"aoeu"也匹配这种模式。我怎样才能把所有比赛都拿回来？

相关讨论

继续在循环中调用re.exec(s)以获取所有匹配项：

1
2
3
4
5
6
7
8
9
10

var re = /\s*([^[:]+):"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;

do {
m = re.exec(s);
if (m) {
console.log(m[1], m[2]);
}
} while (m);

尝试使用这个JSFiddle：https：//jsfiddle.net/7yS2V/

相关讨论

str.match(pattern)，如果pattern具有全局标志g，则将所有匹配作为数组返回。

例如：

1
2
3
4
5

const str = 'All of us except @Emran, @Raju and @Noman was there';
console.log(
str.match(/@\w*/g)
);
// Will log ["@Emran","@Raju","@Noman"]

相关讨论

要遍历所有匹配项，可以使用replace函数：

1
2
3
4

var re = /\s*([^[:]+):"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';

s.replace(re, function(match, g1, g2) { console.log(g1, g2); });

相关讨论

我认为这太复杂了。但是，很高兴了解做一件简单事情的不同方法(我对你的答案进行投票)。
这是违反直觉的代码。你没有在任何有意义的意义上"替换"任何东西。它只是为了不同的目的利用某些功能。
@LukeMaurer是对的。这是使用用于字符串替换的Javascript工具来执行字符串搜索。当然，它更简单，但美国宇航局采取"简单"的路线，或采取"正确"的路线，在冥王星周围弹射新视野吗？我们是工程师，应该尊重这样做，简单的方法并不总是正确的。回避学习正则表达式因为它很难只是懒惰的编程。我并不是说要嘲笑，而是诚实。当其他开发人员浪费半个小时试图找出这个例子的第三行时，这表明了重要的方式。
@dudewad如果工程师只是遵循规则而不考虑开箱即用，我们甚至不会考虑立即访问其他星球;-)
是的，但这不是创造性思维，它是懒惰的思维。你可以知道何时有更好的解决方案，这是其中一个时间。
@dudewad抱歉，我在这里看不到懒惰的部分。如果完全相同的方法被称为"进程"而不是"替换"，那么你就可以了。我担心你只是坚持术语。
@Christophe我绝对不会坚持术语。我坚持干净的代码。出于某种原因，将用于一个目的的东西用于不同目的称为"hacky"。它会产生令人困惑的代码，这些代码难以理解，并且通常会在性能方面受到影响。事实上你没有正则表达式回答这个问题本身就是一个无效的答案，因为OP正在询问如何用正则表达式来做。然而，我发现将这个社区保持在高标准是很重要的，这就是为什么我支持我上面所说的。
@dudewad你的意思是什么，没有正则表达式？
Sry - 澄清一下，我的意思是不在正则表达式对象上做，而是使用额外的层而不是使用正则表达式本身。
@dudewad没关系。这怎么会使它成为无效的答案？我很好，你不同意，甚至是低估，但让我们真诚地做到这一点。

这是一个解决方案

1
2
3
4
5
6
7

var s = '[description:"aoeu" uuid:"123sth"]';

var re = /\s*([^[:]+):"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
console.log(m[1], m[2]);
}

这是基于lawsea的答案，但更短。

请注意，必须设置`g'标志以将内部指针向前移动到调用之间。

1	str.match(/regex/g)

将所有匹配作为数组返回。

如果出于某种神秘的原因，你需要附加信息附带exec，作为以前答案的替代方法，你可以使用递归函数而不是循环如下(它看起来也更酷)。

1
2
3
4
5
6
7
8

function findMatches(regex, str, matches = []) {
const res = regex.exec(str)
res && matches.push(res) && findMatches(regex, str, matches)
return matches
}

// Usage
const matches = findMatches(/regex/g, str)

如前面的注释中所述，在正则表达式定义结束时使用g来在每次执行中向前移动指针是很重要的。

相关讨论

基于Agus的功能，但我更喜欢只返回匹配值：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

Iterables更好：

1
2
3
4
5
6
7
8
9
10
11
12

循环使用：

1
2
3

for (const match of matches('abcdefabcdef', /ab/g)) {
console.log(match);
}

或者如果你想要一个数组：

1	[ ...matches('abcdefabcdef', /ab/g) ]

相关讨论

我们终于开始看到内置的matchAll函数，请参阅此处的说明和兼容性表。它看起来像2019年4月，支持Chrome和Firefox，但不支持IE，Edge，Opera或Node.js.好像它是在2018年12月起草的，所以给它一些时间来覆盖所有的浏览器，但我相信它会到达那里。

内置的matchAll函数很好，因为它返回一个可迭代的。它还返回每场比赛的捕获组！所以你可以做的事情

1
2
3
4
5
6
7
8
9

// get the letters before and after"o"
let matches ="stackoverflow".matchAll(/(\w)o(\w)/g);

for (match of matches) {
console.log("letter before:" + match[1]);
console.log("letter after:" + match[2]);
}

arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array

似乎每个匹配对象使用与match()相同的格式。因此，每个对象都是匹配和捕获组的数组，以及三个附加属性index，input和groups。所以它看起来像：

1	[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]

有关matchAll的更多信息，还有一个Google开发者页面。还有聚酯填充/垫片可用。

相关讨论

这是我获得比赛的功能：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

如果您的系统(Chrome / Node.js / Firefox)支持ES9，请使用新的a_string.matchAll(regex)。如果你有一个较旧的系统，这里有一个易于复制和粘贴的功能

1
2
3
4
5
6
7
8
9
10
11
12
13

function findAll(regexPattern, sourceString) {
let output = []
let match
// make sure the pattern has the global flag
let regexPatternWithGlobal = RegExp(regexPattern,"g")
while (match = regexPatternWithGlobal.exec(sourceString)) {
// get rid of the string copy
delete match.input
// store the match data
output.push(match)
}
return output
}

示例用法：

1	console.log( findAll(/blah/g,'blah1 blah2') )

输出：

1	[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]

从ES9开始，现在有一种更简单，更好的方式来获取所有匹配项，以及有关捕获组及其索引的信息：

1
2
3
4
5

const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
console.log(match);
}

// ["mice", index: 0, input:"mice like to dice rice", groups:
undefined]

// ["dice", index: 13, input:"mice like to dice rice",
groups: undefined]

// ["rice", index: 18, input:"mice like to dice
rice", groups: undefined]

它目前支持Chrome，Firefox，Opera。根据您阅读本文的时间，请查看此链接以查看其当前支持。

相关讨论

我的猜测是，如果有边缘情况，如额外或缺少空格，这个边界较少的表达式也可能是一个选项：

1
2
3
4
5

^\s*\[\s*([^\s

:]+)\s*:\s*"([^"]*)"\s*([^\s

:]+)\s*:\s*"([^"]*)"\s*\]\s*$

If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

RegEx电路

jex.im可视化正则表达式：

enter image description here

我肯定会建议使用String.match()函数，并为它创建一个相关的RegEx。我的例子是一个字符串列表，在扫描关键字和短语的用户输入时通常是必需的。

1
2
3
4
5
6
7
8
9
10

// 1) Define keywords
var keywords = ['apple', 'orange', 'banana'];

// 2) Create regex, pass"i" for case-insensitive and"g" for global search
regex = new RegExp("(" + keywords.join('|') +")","ig");
=> /(apple|orange|banana)/gi

// 3) Match it against any string to get all matches
"Test string for ORANGE's or apples were mentioned".match(regex);
=> ["ORANGE","apple"]

希望这可以帮助！

这对你的更复杂的问题并没有真正帮助，但无论如何我都会发布这个，因为对于没有像你这样进行全局搜索的人来说，这是一个简单的解决方案。

我已经简化了答案中的正则表达式以使其更清晰(这不是解决您确切问题的方法)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

// We only want the group matches in the array
function purify_regex(reResult){

// Removes the Regex specific values and clones the array to prevent mutation
let purifiedArray = [...reResult];

// Removes the full match value at position 0
purifiedArray.shift();

// Returns a pure array without mutating the original regex result
return purifiedArray;
}

// purifiedResult= ["description","aoeu"]

由于评论，这看起来比它更冗长，这是没有评论的样子

1
2
3
4
5
6
7
8
9

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

function purify_regex(reResult){
let purifiedArray = [...reResult];
purifiedArray.shift();
return purifiedArray;
}

请注意，任何不匹配的组都将在数组中列为undefined值。

此解决方案使用ES6扩展运算符来纯化正则表达式特定值的数组。如果您需要IE11支持，则需要通过Babel运行代码。

这是一个没有while循环的单行解决方案。

订单将保留在结果列表中。

潜在的缺点是

它克隆了每场比赛的正则表达式。

结果与预期的解决方案形式不同。你需要再次处理它们。

1
2
3
4

let re = /\s*([^[:]+):"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'

(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))

1
2
3
4
5
6
7
8
9
10
11
12

[ [ 'description:"aoeu"',
'description',
'aoeu',
index: 0,
input: 'description:"aoeu"',
groups: undefined ],
[ ' uuid:"123sth"',
'uuid',
'123sth',
index: 0,
input: ' uuid:"123sth"',
groups: undefined ] ]

用这个...

1 2	var all_matches = your_string.match(re); console.log(all_matches)

它将返回所有匹配的数组......这样可以正常工作....
但请记住，它不会占用群组。它只会返回完整的匹配...

这是我的答案：

1
2
3
4
5
6
7

var str = '[me nombre es] : My name is. [Yo puedo] is the right word';

var reg = /\[(.*?)\]/g;

var a = str.match(reg);

a = a.toString().replace(/[\[\]]/g,"").split(','));

相关讨论