关于C#:如何计算字符串(实际上是字符)在字符串中的出现次数?

How would you count occurrences of a string (actually a char) within a string?

我正在做一些我意识到我想数一数我能在一根绳子中找到多少个/,然后我突然想到,有几种方法可以做到,但无法决定最好(或最简单)是什么。

现在我要做的是:

1
2
string source ="/once/upon/a/time/";
int count = source.Length - source.Replace("/","").Length;

但我一点也不喜欢,有人要吗?

我真的不想为这个挖RegEx,是吗?

我知道我的字符串会有我要搜索的术语,所以你可以假设…

当然,对于长度大于1的字符串,

1
2
3
string haystack ="/once/upon/a/time";
string needle ="/";
int needleCount = ( haystack.Length - haystack.Replace(needle,"").Length ) / needle.Length;


如果您使用的是.NET 3.5,那么可以使用Linq在一行程序中执行此操作:

1
int count = source.Count(f => f == '/');

如果您不想使用LINQ,可以使用:

1
int count = source.Split('/').Length - 1;

您可能会惊讶地发现,您的原始技术似乎比这两种技术快30%左右!我刚刚用"/once/on/a/time/"做了一个快速基准测试,结果如下:

Your original = 12s
source.Count = 19s
source.Split = 17s
foreach (from bobwienholt's answer) = 10s

(时间是50000000次迭代,因此您不太可能注意到现实世界中的差异。)


1
2
3
4
string source ="/once/upon/a/time/";
int count = 0;
foreach (char c in source)
  if (c == '/') count++;

必须比source.Replace()本身更快。


1
int count = new Regex(Regex.Escape(needle)).Matches(haystack).Count;


如果希望能够搜索整个字符串,而不仅仅是字符:

1
src.Select((c, i) => src.Substring(i)).Count(sub => sub.StartsWith(target))

读取为"对于字符串中的每个字符,将从该字符开始的字符串的其余部分作为子字符串;如果以目标字符串开始,则对其进行计数。"


我做了一些研究,发现理查德·沃森的解决方案在大多数情况下都是最快的。这是包含post中每个解决方案的结果的表(使用regex的除外,因为它在分析字符串时抛出异常,如"test test")。

1
2
3
4
5
6
7
    Name      | Short/char |  Long/char | Short/short| Long/short |  Long/long |
    Inspite   |         134|        1853|          95|        1146|         671|
    LukeH_1   |         346|        4490|         N/A|         N/A|         N/A|
    LukeH_2   |         152|        1569|         197|        2425|        2171|
Bobwienholt   |         230|        3269|         N/A|         N/A|         N/A|
Richard Watson|          33|         298|         146|         737|         543|
StefanosKargas|         N/A|         N/A|         681|       11884|       12486|

您可以看到,如果在短字符串(10-50个字符)中查找短子字符串(1-5个字符)的出现次数,则首选原始算法。

另外,对于多字符子字符串,您应该使用以下代码(基于Richard Watson的解决方案)

1
2
3
4
5
6
7
8
9
10
int count = 0, n = 0;

if(substring !="")
{
    while ((n = source.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1)
    {
        n += substring.Length;
        ++count;
    }
}


Linq适用于所有集合,由于字符串只是字符的集合,那么这个漂亮的小行程序如何:

1
var count = source.Count(c => c == '/');

确保在代码文件的顶部有using System.Linq;,因为.Count是来自该命名空间的扩展方法。


1
2
3
4
5
6
7
8
9
string source ="/once/upon/a/time/";
int count = 0;
int n = 0;

while ((n = source.IndexOf('/', n)) != -1)
{
   n++;
   count++;
}

在我的电脑上,它比5000万次迭代的每个字符解决方案快2秒。

2013修订版:

将字符串更改为char[]并迭代。在50米的迭代中再减少一两秒!

1
2
3
4
5
6
char[] testchars = source.ToCharArray();
foreach (char c in testchars)
{
     if (c == '/')
         count++;
}

这还是更快的:

1
2
3
4
5
6
7
char[] testchars = source.ToCharArray();
int length = testchars.Length;
for (int n = 0; n < length; n++)
{
    if (testchars[n] == '/')
        count++;
}

对于好的度量,从数组末尾迭代到0似乎是最快的,大约是5%。

1
2
3
4
5
6
int length = testchars.Length;
for (int n = length-1; n >= 0; n--)
{
    if (testchars[n] == '/')
        count++;
}

我想知道为什么会这样,并且在谷歌上搜索(我记得一些关于反向迭代更快的事情),然后发现了这个问题,这个问题已经很烦人地使用了string-to-char[]技术。不过,我认为反转技巧在这种情况下是新的。

用C迭代字符串中单个字符的最快方法是什么?


这两者都只适用于单字符搜索词…

1
2
3
4
5
6
countOccurences("the","the answer is the answer");

int countOccurences(string needle, string haystack)
{
    return (haystack.Length - haystack.Replace(needle,"").Length) / needle.Length;
}

可能对更长的针来说更好…

但必须有一种更优雅的方式。:)


编辑:

1
source.Split('/').Length-1


1
Regex.Matches( Regex.Escape(input), "stringToMatch" ).Count


在C中,一个很好的字符串子字符串计数器是一个非常棘手的家伙:

1
2
3
4
public static int CCount(String haystack, String needle)
{
    return haystack.Split(new[] { needle }, StringSplitOptions.None).Length - 1;
}


1
2
3
4
string s ="65 fght 6565 4665 hjk";
int count = 0;
foreach (Match m in Regex.Matches(s,"65"))
  count++;


1
2
3
4
private int CountWords(string text, string word) {
    int count = (text.Length - text.Replace(word,"").Length) / word.Length;
    return count;
}

因为最初的解决方案对于chars来说是最快的,我想它也适用于字符串。这是我的贡献。

上下文:我在日志文件中查找类似"failed"和"succeeded"的词。

GR本


对于任何想要立即使用字符串扩展方法的人,

以下是我使用的答案,它是基于已发布的最佳答案:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public static class StringExtension
{    
    /// <summary> Returns the number of occurences of a string within a string, optional comparison allows case and culture control. </summary>
    public static int Occurrences(this System.String input, string value, StringComparison stringComparisonType = StringComparison.Ordinal)
    {
        if (String.IsNullOrEmpty(value)) return 0;

        int count    = 0;
        int position = 0;

        while ((position = input.IndexOf(value, position, stringComparisonType)) != -1)
        {
            position += value.Length;
            count    += 1;
        }

        return count;
    }

    /// <summary> Returns the number of occurences of a single character within a string. </summary>
    public static int Occurrences(this System.String input, char value)
    {
        int count = 0;
        foreach (char c in input) if (c == value) count += 1;
        return count;
    }
}


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public static int GetNumSubstringOccurrences(string text, string search)
{
    int num = 0;
    int pos = 0;

    if (!string.IsNullOrEmpty(text) && !string.IsNullOrEmpty(search))
    {
        while ((pos = text.IndexOf(search, pos)) > -1)
        {
            num ++;
            pos += search.Length;
        }
    }
    return num;
}

我认为最简单的方法是使用正则表达式。这样,您可以获得与使用myvar.split("x")相同的拆分计数,但需要多个字符设置。

1
2
string myVar ="do this to count the number of words in my wording so that I can word it up!";
int count = Regex.Split(myVar,"word").Length;

字符串出现的通用函数:

1
2
3
4
5
6
7
8
9
public int getNumberOfOccurencies(String inputString, String checkString)
{
    if (checkString.Length > inputString.Length || checkString.Equals("")) { return 0; }
    int lengthDifference = inputString.Length - checkString.Length;
    int occurencies = 0;
    for (int i = 0; i < lengthDifference; i++) {
        if (inputString.Substring(i, checkString.Length).Equals(checkString)) { occurencies++; i += checkString.Length - 1; } }
    return occurencies;
}


字符串中的字符串:

在……中找到"etc"。JDJDJDJDJD等,JDJDJDJDJDJDJDJD等。

1
2
3
var strOrigin =" .. JD JD JD JD etc. and etc. JDJDJDJDJDJDJDJD and etc.";
var searchStr ="etc";
int count = (strOrigin.Length - strOrigin.Replace(searchStr,"").Length)/searchStr.Length.

在将此项弃置为不健全/笨拙之前,请检查性能…


1
2
string search ="/string";
var occurrences = (regex.Match(search, @"\/")).Count;

每次程序准确地找到"/s"(区分大小写)时,这将计数,并且出现次数将存储在变量"出现次数"中。


1
2
3
string source ="/once/upon/a/time/";
int count = 0, n = 0;
while ((n = source.IndexOf('/', n) + 1) != 0) count++;

Richard Watson答案的变体,随着效率的提高,字符串中字符出现的次数越多,代码就越少!

尽管我必须说,在没有广泛测试每个场景的情况下,我确实看到了一个非常显著的速度改进,使用:

1
2
int count = 0;
for (int n = 0; n < source.Length; n++) if (source[n] == '/') count++;

我觉得我们缺少某些类型的子字符串计数,比如不安全的逐字节比较。我把原始海报的方法和我能想到的任何方法放在一起。

这些是我做的字符串扩展。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
namespace Example
{
    using System;
    using System.Text;

    public static class StringExtensions
    {
        public static int CountSubstr(this string str, string substr)
        {
            return (str.Length - str.Replace(substr,"").Length) / substr.Length;
        }

        public static int CountSubstr(this string str, char substr)
        {
            return (str.Length - str.Replace(substr.ToString(),"").Length);
        }

        public static int CountSubstr2(this string str, string substr)
        {
            int substrlen = substr.Length;
            int lastIndex = str.IndexOf(substr, 0, StringComparison.Ordinal);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + substrlen, StringComparison.Ordinal);
            }

            return count;
        }

        public static int CountSubstr2(this string str, char substr)
        {
            int lastIndex = str.IndexOf(substr, 0);
            int count = 0;
            while (lastIndex != -1)
            {
                ++count;
                lastIndex = str.IndexOf(substr, lastIndex + 1);
            }

            return count;
        }

        public static int CountChar(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            for (int i = 0; i < length; ++i)
                if (str[i] == substr)
                    ++count;

            return count;
        }

        public static int CountChar2(this string str, char substr)
        {
            int count = 0;
            foreach (var c in str)
                if (c == substr)
                    ++count;

            return count;
        }

        public static unsafe int CountChar3(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = 0; i < length; ++i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountChar4(this string str, char substr)
        {
            int length = str.Length;
            int count = 0;
            fixed (char* chars = str)
            {
                for (int i = length - 1; i >= 0; --i)
                    if (*(chars + i) == substr)
                        ++count;
            }

            return count;
        }

        public static unsafe int CountSubstr3(this string str, string substr)
        {
            int length = str.Length;
            int substrlen = substr.Length;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = 0;

                    for (int i = 0; i < length; ++i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            ++n;
                            if (n == substrlen)
                            {
                                ++count;
                                n = 0;
                            }
                        }
                        else
                            n = 0;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr3(this string str, char substr)
        {
            return CountSubstr3(str, substr.ToString());
        }

        public static unsafe int CountSubstr4(this string str, string substr)
        {
            int length = str.Length;
            int substrLastIndex = substr.Length - 1;
            int count = 0;
            fixed (char* strc = str)
            {
                fixed (char* substrc = substr)
                {
                    int n = substrLastIndex;

                    for (int i = length - 1; i >= 0; --i)
                    {
                        if (*(strc + i) == *(substrc + n))
                        {
                            if (--n == -1)
                            {
                                ++count;
                                n = substrLastIndex;
                            }
                        }
                        else
                            n = substrLastIndex;
                    }
                }
            }

            return count;
        }

        public static int CountSubstr4(this string str, char substr)
        {
            return CountSubstr4(str, substr.ToString());
        }
    }
}

然后是测试代码…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
static void Main()
{
    const char matchA = '_';
    const string matchB ="and";
    const string matchC ="muchlongerword";
    const string testStrA ="_and_d_e_banna_i_o___pfasd__and_d_e_banna_i_o___pfasd_";
    const string testStrB ="and sdf and ans andeians andano ip and and sdf and ans andeians andano ip and";
    const string testStrC =
       "muchlongerword amuchlongerworsdfmuchlongerwordsdf jmuchlongerworijv muchlongerword sdmuchlongerword dsmuchlongerword";
    const int testSize = 1000000;
    Console.WriteLine(testStrA.CountSubstr('_'));
    Console.WriteLine(testStrA.CountSubstr2('_'));
    Console.WriteLine(testStrA.CountSubstr3('_'));
    Console.WriteLine(testStrA.CountSubstr4('_'));
    Console.WriteLine(testStrA.CountChar('_'));
    Console.WriteLine(testStrA.CountChar2('_'));
    Console.WriteLine(testStrA.CountChar3('_'));
    Console.WriteLine(testStrA.CountChar4('_'));
    Console.WriteLine(testStrB.CountSubstr("and"));
    Console.WriteLine(testStrB.CountSubstr2("and"));
    Console.WriteLine(testStrB.CountSubstr3("and"));
    Console.WriteLine(testStrB.CountSubstr4("and"));
    Console.WriteLine(testStrC.CountSubstr("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr2("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr3("muchlongerword"));
    Console.WriteLine(testStrC.CountSubstr4("muchlongerword"));
    var timer = new Stopwatch();
    timer.Start();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr(matchA);
    timer.Stop();
    Console.WriteLine("CS1 chr:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr(matchB);
    timer.Stop();
    Console.WriteLine("CS1 and:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr(matchC);
    timer.Stop();
    Console.WriteLine("CS1 mlw:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr2(matchA);
    timer.Stop();
    Console.WriteLine("CS2 chr:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr2(matchB);
    timer.Stop();
    Console.WriteLine("CS2 and:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr2(matchC);
    timer.Stop();
    Console.WriteLine("CS2 mlw:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr3(matchA);
    timer.Stop();
    Console.WriteLine("CS3 chr:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr3(matchB);
    timer.Stop();
    Console.WriteLine("CS3 and:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr3(matchC);
    timer.Stop();
    Console.WriteLine("CS3 mlw:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountSubstr4(matchA);
    timer.Stop();
    Console.WriteLine("CS4 chr:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrB.CountSubstr4(matchB);
    timer.Stop();
    Console.WriteLine("CS4 and:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrC.CountSubstr4(matchC);
    timer.Stop();
    Console.WriteLine("CS4 mlw:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar(matchA);
    timer.Stop();
    Console.WriteLine("CC1 chr:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar2(matchA);
    timer.Stop();
    Console.WriteLine("CC2 chr:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar3(matchA);
    timer.Stop();
    Console.WriteLine("CC3 chr:" + timer.Elapsed.TotalMilliseconds +"ms");

    timer.Restart();
    for (int i = 0; i < testSize; ++i)
        testStrA.CountChar4(matchA);
    timer.Stop();
    Console.WriteLine("CC4 chr:" + timer.Elapsed.TotalMilliseconds +"ms");
}

结果:csx与countsubstrx对应,ccx与countcharx对应。"chr"在字符串中搜索"ux","在字符串中搜索"and","mlw"在字符串中搜索"muchlongerword"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CS1 chr: 824.123ms
CS1 and: 586.1893ms
CS1 mlw: 486.5414ms
CS2 chr: 127.8941ms
CS2 and: 806.3918ms
CS2 mlw: 497.318ms
CS3 chr: 201.8896ms
CS3 and: 124.0675ms
CS3 mlw: 212.8341ms
CS4 chr: 81.5183ms
CS4 and: 92.0615ms
CS4 mlw: 116.2197ms
CC1 chr: 66.4078ms
CC2 chr: 64.0161ms
CC3 chr: 65.9013ms
CC4 chr: 65.8206ms

最后,我有一个360万字符的文件。这是"derp adfderdserp dfaerpderp deasderp"重复10万次。我用上述方法在文件中搜索"derp",结果是这些结果的100倍。

1
2
3
4
CS1Derp: 1501.3444ms
CS2Derp: 1585.797ms
CS3Derp: 376.0937ms
CS4Derp: 271.1663ms

所以我的第四种方法肯定是胜利者,但实际上,如果一个360万字符的文件100次只取1586毫秒作为最坏的情况,那么所有这些都是可以忽略的。

顺便说一下,我还用100次countsubstr和countchar方法扫描了360万个字符文件中的"d"字符。结果。。。

1
2
3
4
5
6
7
8
CS1  d : 2606.9513ms
CS2  d : 339.7942ms
CS3  d : 960.281ms
CS4  d : 233.3442ms
CC1  d : 302.4122ms
CC2  d : 280.7719ms
CC3  d : 299.1125ms
CC4  d : 292.9365ms

原来的海报方法是非常糟糕的单字符针在一个大草垛根据这一点。

注意:所有值都已更新以释放版本输出。我第一次发布这个时,不小心忘记了构建发布模式。我的一些陈述被修改了。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
string Name ="Very good nice one is very good but is very good nice one this is called the term";
bool valid=true;
int count = 0;
int k=0;
int m = 0;
while (valid)
{
    k = Name.Substring(m,Name.Length-m).IndexOf("good");
    if (k != -1)
    {
        count++;
        m = m + k + 4;
    }
    else
        valid = false;
}
Console.WriteLine(count +" Times accures");


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
str="aaabbbbjjja";
int count = 0;
int size = str.Length;

string[] strarray = new string[size];
for (int i = 0; i < str.Length; i++)
{
    strarray[i] = str.Substring(i, 1);
}
Array.Sort(strarray);
str ="";
for (int i = 0; i < strarray.Length - 1; i++)
{

    if (strarray[i] == strarray[i + 1])
    {

        count++;
    }
    else
    {
        count++;
        str = str + strarray[i] + count;
        count = 0;
    }

}
count++;
str = str + strarray[strarray.Length - 1] + count;

这是用来计算字符发生的次数。在此示例中,输出将为"A4B4J3"


1
2
3
4
5
6
7
8
9
10
11
12
13
            var conditionalStatement = conditionSetting.Value;

            //order of replace matters, remove == before =, incase of ===
            conditionalStatement = conditionalStatement.Replace("==","~").Replace("!=","~").Replace('=', '~').Replace('!', '~').Replace('>', '~').Replace('<', '~').Replace(">=","~").Replace("<=","~");

            var listOfValidConditions = new List<string>() {"!=","==",">","<",">=","<=" };

            if (conditionalStatement.Count(x => x == '~') != 1)
            {
                result.InvalidFieldList.Add(new KeyFieldData(batch.DECurrentField,"The IsDoubleKeyCondition does not contain a supported conditional statement. Contact System Administrator."));
                result.Status = ValidatorStatus.Fail;
                return result;
            }

需要执行类似于从字符串测试条件语句的操作。

将我要查找的内容替换为单个字符,并计算单个字符的实例。

显然,在发生这种情况之前,需要检查您使用的单个字符是否不存在于字符串中,以避免出现错误的计数。


1
2
3
4
string s ="HOWLYH THIS ACTUALLY WORKSH WOWH";
int count = 0;
for (int i = 0; i < s.Length; i++)
   if (s[i] == 'H') count++;

它只检查字符串中的每个字符,如果该字符是您要搜索的字符,请添加一个字符进行计数。


如果你查看这个网页,有15种不同的方法可以做基准测试,包括使用并行循环。

最快的方法是使用单线程for循环(如果.NET版本<4.0),或者使用parallel.for循环(如果使用.NET>4.0并进行数千次检查)。

假设"ss"是搜索字符串,"ch"是字符数组(如果要查找的字符不止一个),下面是运行时最快的单线程代码的基本要点:

1
2
3
4
5
6
7
8
9
10
11
for (int x = 0; x < ss.Length; x++)
{
    for (int y = 0; y < ch.Length; y++)
    {
        for (int a = 0; a < ss[x].Length; a++ )
        {
        if (ss[x][a] == ch[y])
            //it's found. DO what you need to here.
        }
    }
}

还提供了基准源代码,因此您可以运行自己的测试。


我以为我会把我的扩展方法扔到环中(更多信息见评论)。我没有做过任何正式的基准测试,但我认为在大多数情况下都必须非常快速。

编辑:好的-所以这个问题让我想知道我们当前实现的性能如何与这里介绍的一些解决方案相比较。我决定做一点基准测试,发现我们的解决方案非常符合Richard Watson提供的解决方案的性能,直到您使用大字符串(100 kb+)、大子字符串(32 kb+)和许多嵌入重复(10 k+)进行积极搜索。在这一点上,我们的解决方案大约慢了2到4倍。考虑到这一点以及我们非常喜欢Richard Watson提出的解决方案,我们已经相应地重构了我们的解决方案。我只是想让任何人都能从中受益。

我们的原始解决方案:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
    /// <summary>
    /// Counts the number of occurrences of the specified substring within
    /// the current string.
    /// </summary>
    /// <param name="s">The current string.</param>
    /// <param name="substring">The substring we are searching for.</param>
    /// <param name="aggressiveSearch">Indicates whether or not the algorithm
    /// should be aggressive in its search behavior (see Remarks). Default
    /// behavior is non-aggressive.</param>
    /// <remarks>This algorithm has two search modes - aggressive and
    /// non-aggressive. When in aggressive search mode (aggressiveSearch =
    /// true), the algorithm will try to match at every possible starting
    /// character index within the string. When false, all subsequent
    /// character indexes within a substring match will not be evaluated.
    /// For example, if the string was 'abbbc' and we were searching for
    /// the substring 'bb', then aggressive search would find 2 matches
    /// with starting indexes of 1 and 2. Non aggressive search would find
    /// just 1 match with starting index at 1. After the match was made,
    /// the non aggressive search would attempt to make it's next match
    /// starting at index 3 instead of 2.</remarks>
    /// <returns>The count of occurrences of the substring within the string.</returns>
    public static int CountOccurrences(this string s, string substring,
        bool aggressiveSearch = false)
    {
        // if s or substring is null or empty, substring cannot be found in s
        if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring))
            return 0;

        // if the length of substring is greater than the length of s,
        // substring cannot be found in s
        if (substring.Length > s.Length)
            return 0;

        var sChars = s.ToCharArray();
        var substringChars = substring.ToCharArray();
        var count = 0;
        var sCharsIndex = 0;

        // substring cannot start in s beyond following index
        var lastStartIndex = sChars.Length - substringChars.Length;

        while (sCharsIndex <= lastStartIndex)
        {
            if (sChars[sCharsIndex] == substringChars[0])
            {
                // potential match checking
                var match = true;
                var offset = 1;
                while (offset < substringChars.Length)
                {
                    if (sChars[sCharsIndex + offset] != substringChars[offset])
                    {
                        match = false;
                        break;
                    }
                    offset++;
                }
                if (match)
                {
                    count++;
                    // if aggressive, just advance to next char in s, otherwise,
                    // skip past the match just found in s
                    sCharsIndex += aggressiveSearch ? 1 : substringChars.Length;
                }
                else
                {
                    // no match found, just move to next char in s
                    sCharsIndex++;
                }
            }
            else
            {
                // no match at current index, move along
                sCharsIndex++;
            }
        }

        return count;
    }

下面是我们修改后的解决方案:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
    /// <summary>
    /// Counts the number of occurrences of the specified substring within
    /// the current string.
    /// </summary>
    /// <param name="s">The current string.</param>
    /// <param name="substring">The substring we are searching for.</param>
    /// <param name="aggressiveSearch">Indicates whether or not the algorithm
    /// should be aggressive in its search behavior (see Remarks). Default
    /// behavior is non-aggressive.</param>
    /// <remarks>This algorithm has two search modes - aggressive and
    /// non-aggressive. When in aggressive search mode (aggressiveSearch =
    /// true), the algorithm will try to match at every possible starting
    /// character index within the string. When false, all subsequent
    /// character indexes within a substring match will not be evaluated.
    /// For example, if the string was 'abbbc' and we were searching for
    /// the substring 'bb', then aggressive search would find 2 matches
    /// with starting indexes of 1 and 2. Non aggressive search would find
    /// just 1 match with starting index at 1. After the match was made,
    /// the non aggressive search would attempt to make it's next match
    /// starting at index 3 instead of 2.</remarks>
    /// <returns>The count of occurrences of the substring within the string.</returns>
    public static int CountOccurrences(this string s, string substring,
        bool aggressiveSearch = false)
    {
        // if s or substring is null or empty, substring cannot be found in s
        if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(substring))
            return 0;

        // if the length of substring is greater than the length of s,
        // substring cannot be found in s
        if (substring.Length > s.Length)
            return 0;

        int count = 0, n = 0;
        while ((n = s.IndexOf(substring, n, StringComparison.InvariantCulture)) != -1)
        {
            if (aggressiveSearch)
                n++;
            else
                n += substring.Length;
            count++;
        }

        return count;
    }

我最初的经历让我觉得:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public static int CountOccurrences(string original, string substring)
{
    if (string.IsNullOrEmpty(substring))
        return 0;
    if (substring.Length == 1)
        return CountOccurrences(original, substring[0]);
    if (string.IsNullOrEmpty(original) ||
        substring.Length > original.Length)
        return 0;
    int substringCount = 0;
    for (int charIndex = 0; charIndex < original.Length; charIndex++)
    {
        for (int subCharIndex = 0, secondaryCharIndex = charIndex; subCharIndex < substring.Length && secondaryCharIndex < original.Length; subCharIndex++, secondaryCharIndex++)
        {
            if (substring[subCharIndex] != original[secondaryCharIndex])
                goto continueOuter;
        }
        if (charIndex + substring.Length > original.Length)
            break;
        charIndex += substring.Length - 1;
        substringCount++;
    continueOuter:
        ;
    }
    return substringCount;
}

public static int CountOccurrences(string original, char @char)
{
    if (string.IsNullOrEmpty(original))
        return 0;
    int substringCount = 0;
    for (int charIndex = 0; charIndex < original.Length; charIndex++)
        if (@char == original[charIndex])
            substringCount++;
    return substringCount;
}

使用"替换和分割"的干草堆方法中的针产生21+秒,而这需要大约15.2秒。

在添加了一点可以添加substring.Length - 1到charindex(就像它应该的那样)之后进行编辑,现在是11.6秒。

编辑2:我使用了一个有26个两个字符串的字符串,下面是更新到相同示例文本的时间:

大海捞针(操作版):7.8秒

建议机制:4.6秒。

编辑3:添加单个字符的角大小写,它变为1.2秒。

编辑4:上下文:使用了5000万次迭代。


对于字符串分隔符的大小写(不是字符大小写,如主题所述):
string source="@@@一次@@@一次@@@@一次@@@@@@";
int count=source.split(new[]"@@@",StringSplitOptions.removeEmptyEntries).length-1;
< BR>海报的原始源值("/once/on/a/time/")自然分隔符是一个字符"/",响应确实解释了source.split(char[])选项,但是…