Javascript Split Array

Javascript Split Array

我正在尝试编写一个自定义的字符串拆分函数,这比我想象的要困难。

基本上,我传入一个字符串和一个字符串将拆分的值数组,它将返回一个子字符串数组,删除空字符串并包括它拆分的值。如果字符串可以在同一位置被两个不同的值拆分,则较长的值具有优先权。

也就是说,

1
split("Go ye away, I want some peace && quiet. & Thanks.", ["Go",",","&&","&","."]);

应该返回

1
["Go","ye away",","," I want some peace","&&"," quiet",".","","&"," Thanks","."]

你能想出一个相当简单的算法吗?如果有一种内置的方法可以在javascript中实现这一点(我不认为有),那就更好了。


像这样?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
function mySplit(input, delimiters) {

    // Sort delimiters array by length to avoid ambiguity
    delimiters.sort(function(a, b) {
       if (a.length > b.length) { return -1; }
       return 0;
    }

    var result = [];

    // Examine input one character at a time
    for (var i = 0; i < input.length; i++) {
        for (var j = 0; j < delimiters.length; j++) {
            if (input.substr(i, delimiters[j].length) == delimiters[j]) {

                // Add first chunk of input to result
                if (i > 0) {
                    result.push(input.substr(0, i));
                }
                result.push(delimiters[j]);

                // Reset input and iteration
                input = input.substr(i + delimiters[j].length);
                i = 0;
                j = 0;
            }
        }
    }

    return result;
}

var input      ="Go ye away, I want some peace && quiet. & Thanks.";
var delimiters = ["Go",",","&&","&","."];

console.log(mySplit(input, delimiters));
// Output: ["Go","ye away",","," I want some peace",
//         "&&"," quiet",".","","&"," Thanks","."]


要求的确切解决方案:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function megasplit(toSplit, splitters) {
    var splitters = splitters.sorted(function(a,b) {return b.length-a.length});
                                                          // sort by length; put here for readability, trivial to separate rest of function into helper function
    if (!splitters.length)
        return toSplit;
    else {
        var token = splitters[0];
        return toSplit
            .split(token)             // split on token
            .map(function(segment) {  // recurse on segments
                 return megasplit(segment, splitters.slice(1))
             })
            .intersperse(token)       // re-insert token
            .flatten()                // rejoin segments
            .filter(Boolean);
    }
}

演示:

1
2
3
4
5
> megasplit(
     "Go ye away, I want some peace && quiet. & Thanks.",
      ["Go",",","&&","&","."]
  )
["Go","ye away",","," I want some peace","&","&"," quiet",".","","&"," Thanks","."]

机械(可重复使用!):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Array.prototype.copy = function() {
    return this.slice()
}
Array.prototype.sorted = function() {
    var copy = this.copy();
    copy.sort.apply(copy, arguments);
    return copy;
}
Array.prototype.flatten = function() {
    return [].concat.apply([], this)
}
Array.prototype.mapFlatten = function() {
    return this.map.apply(this,arguments).flatten()
}
Array.prototype.intersperse = function(token) {
    // [1,2,3].intersperse('x') -> [1,'x',2,'x',3]
    return this.mapFlatten(function(x){return [token,x]}).slice(1)
}

笔记:

  • 这需要大量的研究才能优雅地完成:
    • (深)使用jquery复制数组
    • 在javascript中连接n个数组最有效的方法是什么?(创造了我自己不那么难看的方法)
    • 如何在保留引号的同时,不在双引号内拆分逗号上的文本?(垃圾答案,再次创建了我自己的方法)
  • 由于规范要求令牌(尽管它们将保留在字符串中)不应被拆分(否则您将得到"&","&"),这一事实进一步复杂了。这使得使用EDOCX1[1]成为不可能,并且需要递归。
  • 我个人也不会忽略带有分裂的空字符串。我可以理解,我不想递归地拆分令牌,但我个人会简化函数,使输出像正常的.split["","Go","ye away",","," I want some peace","&&"," quiet",".","","&"," Thanks",".",""]一样工作。
  • 我要指出的是,如果你愿意稍微放宽一下你的要求,这就从15/20线性到1/3线性:

如果遵循规范的拆分行为,则为1行:

1
2
3
4
5
6
7
8
Array.prototype.mapFlatten = function() {
    ...
}
function megasplit(toSplit, splitters) {
    return splitters.sorted(...).reduce(function(strings, token) {
        return strings.mapFlatten(function(s){return s.split(token)});
    }, [toSplit]);
}

3行,如果上述内容难以阅读:

1
2
3
4
5
6
7
8
9
10
Array.prototype.mapFlatten = function() {
    ...
}
function megasplit(toSplit, splitters) {
    var strings = [toSplit];
    splitters.sorted(...).forEach(function(token) {
        strings = strings.mapFlatten(function(s){return s.split(token)});
    });
    return strings;
}