关于C ++：fork()与Thread的比较

Fair comparison of fork() Vs Thread

我正在讨论fork()相对于任务并行化的thread()的相对成本。

我们了解进程与线程之间的基本区别

线：

线程之间易于通信
快速上下文切换。

流程：

容错能力。
与父母沟通不是真正的问题(打开管道)
与其他子进程的沟通困难

但是我们不同意进程与线程的启动成本。
因此，为了测试这些理论，我编写了以下代码。我的问题：这是衡量启动成本的有效测试，还是我遗漏了一些东西。我也会对每种测试在不同平台上的执行情况感兴趣。

fork.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

#include <boost/lexical_cast.hpp>
#include <vector>
#include <unistd.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>

extern"C" int threadStart(void* threadData)
{
return 0;
}

int main(int argc,char* argv[])
{
int threadCount = boost::lexical_cast<int>(argv[1]);

std::vector<pid_t> data(threadCount);
clock_t start = clock();
for(int loop=0;loop < threadCount;++loop)
{
data[loop] = fork();
if (data[looo] == -1)
{
std::cout <<"Abort
";
exit(1);
}
if (data[loop] == 0)
{
exit(threadStart(NULL));
}
}
clock_t middle = clock();
for(int loop=0;loop < threadCount;++loop)
{
int result;
waitpid(data[loop], &result, 0);
}
clock_t end = clock();

std::cout << threadCount <<"\t" << middle - start <<"\t" << end - middle <<"\t"<< end - start <<"
";
}

线程.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

#include <boost/lexical_cast.hpp>
#include <vector>
#include <iostream>
#include <pthread.h>
#include <time.h>

extern"C" void* threadStart(void* threadData)
{
return NULL;
}

int main(int argc,char* argv[])
{
int threadCount = boost::lexical_cast<int>(argv[1]);

std::vector<pthread_t> data(threadCount);

clock_t start = clock();
for(int loop=0;loop < threadCount;++loop)
{
if (pthread_create(&data[loop], NULL, threadStart, NULL) != 0)
{
std::cout <<"Abort
";
exit(1);
}
}
clock_t middle = clock();
for(int loop=0;loop < threadCount;++loop)
{
void* result;
pthread_join(data[loop], &result);
}
clock_t end = clock();

std::cout << threadCount <<"\t" << middle - start <<"\t" << end - middle <<"\t"<< end - start <<"
";

}

我希望Windows在进程创建方面做得更好。
但是我希望像Unix这样的现代系统的分叉成本相当低，并且至少可以与线程相比。在较旧的Unix风格的系统上(在将fork()实现为在写页面上使用副本之前)，情况会更糟。

无论如何，我的计时结果是：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

> uname -a
Darwin Alpha.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386
> gcc --version | grep GCC
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659)
> g++ thread.cpp -o thread -I~/include
> g++ fork.cpp -o fork -I~/include
> foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 )
foreach? ./thread ${a} >> A
foreach? end
> foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 )
foreach? ./fork ${a} >> A
foreach? end
vi A

Thread: Fork:
C Start Wait Total C Start Wait Total
==============================================================
1 26 145 171 1 160 37 197
2 44 198 242 2 290 37 327
3 62 234 296 3 413 41 454
4 77 275 352 4 499 59 558
5 91 107 10808 5 599 57 656
6 99 332 431 6 665 52 717
7 130 388 518 7 741 69 810
8 204 468 672 8 833 56 889
9 164 469 633 9 1067 76 1143
10 165 450 615 10 1147 64 1211
12 343 585 928 12 1213 71 1284
15 232 647 879 15 1360 203 1563
20 319 921 1240 20 2161 96 2257
30 461 1243 1704 30 3005 129 3134
40 559 1487 2046 40 4466 166 4632
50 686 1912 2598 50 4591 292 4883
60 827 2208 3035 60 5234 317 5551
70 973 2885 3858 70 7003 416 7419
80 3545 2738 6283 80 7735 293 8028
90 1392 3497 4889 90 7869 463 8332
100 3917 4180 8097 100 8974 436 9410

编辑：

做1000个孩子会导致fork版本失败。
因此，我减少了孩子人数。但是进行单个测试似乎也不公平，因此这里有一系列值。

相关讨论

微基准测试表明，线程创建和连接(在我编写此文档时没有派生结果)花费数十或数百微秒(假设您的系统具有CLOCKS_PER_SEC = 1000000，因为它是XSI要求，所以可能具有) )。

因为您说过fork()花费的线程成本是线程的3倍，所以最糟糕的情况仍然是十分之一毫秒。如果这在应用程序上很明显，则可以使用进程/线程池，就像Apache 1.3一样。无论如何，我想说启动时间是有争议的。

线程与进程(在Linux和大多数类似Unix上)的重要区别在于，在进程上，您可以使用IPC，共享内存(SYSV或mmap样式)，管道，套接字(可以通过以下方式发送文件描述符)来明确选择共享内容AF_UNIX套接字，这意味着您可以选择要共享的fd)，...在线程上，默认情况下几乎所有内容都是共享的，不管是否需要共享。实际上，这就是Plan 9具有rfork()和Linux具有clone()(以及最近的unshare())的原因，因此您可以选择共享的内容。

在Linux中，fork是在库中或内核中对sys_clone的特殊调用。克隆有很多开关可以打开和关闭，每个开关都会影响启动的成本。

实际的库函数clone可能比fork更昂贵，尽管它执行的更多，尽管大多数都在子方面(堆栈交换和通过指针调用函数)。