关于c ++：编译/链接过程如何工作？

How does the compilation/linking process work?

编译和链接过程是如何工作的？

(注：这意味着是堆栈溢出的C++FAQ的一个条目。如果你想批评在这个表单中提供一个常见问题解答的想法，那么在meta上发布的开始所有这一切的地方就是这样做的地方。这个问题的答案是在C++聊天室中进行监控的，FAQ的想法一开始就出现了，所以你的答案很可能会被那些想出这个想法的人读到。

相关讨论

C++程序的编译包括三个步骤：

预处理：预处理器采用C++源代码文件，处理EDCOX1、0、S、EDCOX1、1、S和其他预处理器指令。这个步骤的输出是一个没有预处理器指令的"纯"C++文件。

编译：编译器获取预处理器的输出并从中生成一个对象文件。

链接：链接器获取编译器生成的对象文件，并生成库或可执行文件。

预处理

预处理器处理预处理器指令，如#include和#define。它是不可知的C++语法，这就是为什么必须小心使用它的原因。

它一次在一个C++源文件上工作，用相应文件的内容(通常只是声明)替换EDCOX1 0指令，进行宏替换(EDOCX1，1)，并根据EDOCX1、6、EDCOX1、7和EDCOX1、8指令来选择不同的文本部分。

预处理器在预处理令牌流上工作。宏替换被定义为用其他令牌替换令牌(操作符##允许在有意义时合并两个令牌)。

在所有这些之后，预处理器生成一个单一的输出，该输出是由上述转换产生的令牌流。它还添加了一些特殊的标记，告诉编译器每一行的来源，这样它就可以使用这些标记来产生合理的错误消息。

在这个阶段，巧妙地使用#if和#error指令可以产生一些错误。

汇编

编译步骤在预处理器的每个输出上执行。编译器解析纯C++源代码(现在没有任何预处理器指令)，并将其转换成汇编代码。然后调用底层后端(工具链中的汇编程序)，将代码汇编成机器代码，以某种格式(elf、coff、a.out，…)生成实际的二进制文件。此对象文件包含输入中定义的符号的已编译代码(二进制形式)。对象文件中的符号按名称引用。

对象文件可以引用未定义的符号。这是使用声明时的情况，不提供声明的定义。编译器不介意这样做，只要源代码格式良好，它就会很高兴地生成对象文件。

编译器通常允许您在此时停止编译。这非常有用，因为有了它，您可以分别编译每个源代码文件。这样做的好处是，如果只更改一个文件，就不需要重新编译所有文件。

生成的对象文件可以放在称为静态库的特殊存档中，以便以后重用。

在这个阶段，会报告"常规"编译器错误，如语法错误或失败的重载解析错误。

链接

链接器从编译器生成的对象文件生成最终编译输出。此输出可以是共享(或动态)库(虽然名称类似，但与前面提到的静态库没有太多共同之处)或可执行文件。

它通过用正确的地址替换对未定义符号的引用来链接所有对象文件。这些符号中的每一个都可以在其他对象文件或库中定义。如果它们是在标准库之外的库中定义的，则需要将它们告知链接器。

在这个阶段，最常见的错误是缺少定义或重复定义。前者意味着要么定义不存在(即它们没有被写入)，要么它们所在的对象文件或库没有提供给链接器。后者是显而易见的：相同的符号是在两个不同的对象文件或库中定义的。

相关讨论

编译阶段也会在转换为对象文件之前调用汇编程序。
哪里应用了优化？乍一看，它似乎是在编译步骤中完成的，但另一方面，我可以想象只有在链接之后才能进行适当的优化。
@Bartvanheukelom传统上是在编译过程中完成的，但是现代编译器支持所谓的"链接时间优化"，它具有跨翻译单元优化的优势。
明确地说，链路时间优化并不妨碍在编译过程中进行优化。它所做的是利用链接时的附加信息来执行更强大的优化。
C有相同的步骤吗？
如果链接器将引用库中类/方法的符号转换为地址，这是否意味着库二进制文件存储在OS保持不变的内存地址中？我只是对链接器如何知道所有目标系统的stdio二进制文件的确切地址感到困惑。文件路径总是相同的，但确切的地址可以更改，对吗？
@丹卡特，我也想知道。希望这些不是运行时内存地址，除非有人澄清了它。
编译后再链接的过程称为构建。
是的，@kevinzhu这些步骤对C来说是一样的。
@Dancarter依赖于平台和链接器，但一般来说，链接器只生成相对地址。这意味着它可能把main()设为0，myFunction()设为100。然后，当操作系统实际加载可执行文件以运行时，它将在某个地址加载代码，然后所有地址都会被加载可执行文件代码的任何地址所偏移。(只是加了一个数字)
@但是，有些平台有编译器预先确定的地址。例如，在许多unix上，每个应用程序都有"虚拟地址"，这意味着CPU有一个特殊的部分，即内存管理单元(mmu)，它将所有地址从可执行文件的"假"地址转换为实际内存中的实际地址(这也用于交换)。在这种情况下，通常为操作系统保留一个保留范围，例如0…10000，然后在地址10000+处保留您的应用程序。
从"链接"一节的第二段到第三段，"库"一词是指"静态库"吗？我问这个是因为作为一个新手我有点困惑。如果你回答我的问题，请戳我一下。

本主题在cprogramming.com上讨论：https://www.cprogramming.com/compilengandlinking.html网站

这是作者写的：

Compiling isn't quite the same as creating an executable file!
Instead, creating an executable is a multistage process divided into
two components: compilation and linking. In reality, even if a program
"compiles fine" it might not actually work because of errors during
the linking phase. The total process of going from source code files
to an executable might better be referred to as a build.

Compilation

Compilation refers to the processing of source code files (.c, .cc, or
.cpp) and the creation of an 'object' file. This step doesn't create
anything the user can actually run. Instead, the compiler merely
produces the machine language instructions that correspond to the
source code file that was compiled. For instance, if you compile (but
don't link) three separate files, you will have three object files
created as output, each with the name .o or .obj
(the extension will depend on your compiler). Each of these files
contains a translation of your source code file into a machine
language file -- but you can't run them yet! You need to turn them
into executables your operating system can use. That's where the
linker comes in.

Linking

Linking refers to the creation of a single executable file from
multiple object files. In this step, it is common that the linker will
complain about undefined functions (commonly, main itself). During
compilation, if the compiler could not find the definition for a
particular function, it would just assume that the function was
defined in another file. If this isn't the case, there's no way the
compiler would know -- it doesn't look at the contents of more than
one file at a time. The linker, on the other hand, may look at
multiple files and try to find references for the functions that
weren't mentioned.

You might ask why there are separate compilation and linking steps.
First, it's probably easier to implement things that way. The compiler
does its thing, and the linker does its thing -- by keeping the
functions separate, the complexity of the program is reduced. Another
(more obvious) advantage is that this allows the creation of large
programs without having to redo the compilation step every time a file
is changed. Instead, using so called"conditional compilation", it is
necessary to compile only those source files that have changed; for
the rest, the object files are sufficient input for the linker.
Finally, this makes it simple to implement libraries of pre-compiled
code: just create object files and link them just like any other
object file. (The fact that each file is compiled separately from
information contained in other files, incidentally, is called the
"separate compilation model".)

To get the full benefits of condition compilation, it's probably
easier to get a program to help you than to try and remember which
files you've changed since you last compiled. (You could, of course,
just recompile every file that has a timestamp greater than the
timestamp of the corresponding object file.) If you're working with an
integrated development environment (IDE) it may already take care of
this for you. If you're using command line tools, there's a nifty
utility called make that comes with most *nix distributions. Along
with conditional compilation, it has several other nice features for
programming, such as allowing different compilations of your program
-- for instance, if you have a version producing verbose output for debugging.

Knowing the difference between the compilation phase and the link
phase can make it easier to hunt for bugs. Compiler errors are usually
syntactic in nature -- a missing semicolon, an extra parenthesis.
Linking errors usually have to do with missing or multiple
definitions. If you get an error that a function or variable is
defined multiple times from the linker, that's a good indication that
the error is that two of your source code files have the same function
or variable.

相关讨论

在标准正面：

翻译单元是源文件、包含的头文件和源文件的组合，减去条件包含预处理器指令跳过的任何源行。
本标准规定了翻译中的9个阶段。前四个对应于预处理，后三个对应于编译，下一个对应于模板的实例化(生成实例化单元)，最后一个对应于链接。

实际上，第八阶段(模板的实例化)通常是在编译过程中完成的，但有些编译器会将其延迟到链接阶段，而有些则将其分散到两个阶段。

相关讨论

瘦的是，CPU从内存地址加载数据，将数据存储到内存地址，然后从内存地址中顺序执行指令，在处理的指令序列中有一些条件跳转。这三类指令中的每一类都涉及到计算机器指令中使用的存储器单元的地址。因为机器指令的长度是可变的，这取决于所涉及的特定指令，而且由于我们在构建机器代码时将它们的可变长度串在一起，所以在计算和构建任何地址时都需要两个步骤。

首先，我们尽可能地安排内存分配，然后才能知道每个单元中到底有什么。我们计算出字节、单词或构成指令、文字和任何数据的任何东西。我们只需开始分配内存并构建值，这些值将在我们运行时创建程序，并记下我们需要返回并修复地址的任何地方。在那个地方，我们放置了一个虚拟对象来填充位置，这样我们就可以继续计算内存大小。例如，我们的第一个机器代码可能需要一个单元。下一个机器代码可能需要3个单元，包括一个机器代码单元和两个地址单元。现在我们的地址指针是4。我们知道进入机器单元的是什么，这是操作码，但是我们必须等待计算进入地址单元的是什么，直到我们知道数据将被定位在哪里，也就是说，数据的机器地址是什么。

如果只有一个源文件，编译器理论上可以在没有链接器的情况下生成完全可执行的机器代码。在一个二通过程中，它可以计算任何机器加载或存储指令引用的所有数据单元的所有实际地址。它可以计算所有绝对跳转指令引用的绝对地址。这就是为什么像forth中的编译器那样，在没有链接器的情况下工作更简单。

链接器允许单独编译代码块。这可以加快构建代码的整个过程，并允许在以后如何使用块方面具有一定的灵活性，换句话说，它们可以在内存中重新定位，例如，向每个地址添加1000个，以将块移动1000个地址单元。

所以编译器输出的是尚未完全构建的粗略的机器代码，但是它的布局使我们知道所有内容的大小，换句话说，我们可以开始计算所有绝对地址的位置。编译器还输出名称/地址对的符号列表。这些符号将模块中机器代码中的内存偏移量与名称关联起来。偏移量是到模块中符号存储器位置的绝对距离。

这就是我们到链接器的地方。链接器首先将所有这些机器代码块端到端拍打在一起，并记下每个代码块的起始位置。然后，通过将模块内的相对偏移量和模块在较大布局中的绝对位置相加，计算出要固定的地址。

很明显，我过于简单化了，所以你可以试着去理解它，我故意不使用对象文件、符号表等的行话，这对我来说是混乱的一部分。

查看网址：http://faculty.cs.niu.edu/~mcmahon/cs241/notes/compile.html在这个URL中清楚地介绍了C++的完全编译过程。

相关讨论

GCC通过4个步骤将C/C++程序编译成可执行程序。

例如，执行"gcc -o hello.exe hello.c"如下：

1。预处理

通过GNU C预处理器(cpp.exe)进行预处理，其中包括头(include)和展开宏(define)。

cpp hello.c > hello.i

结果中间文件"hello.i"包含扩展的源代码。

2。汇编

编译器将预处理的源代码编译成特定处理器的汇编代码。

gcc -S hello.i

-s选项指定生成程序集代码，而不是对象代码。生成的程序集文件是"hello.s"。

三。装配

汇编程序(as.exe)将汇编代码转换为对象文件"hello.o"中的机器代码。

as -o hello.o hello.s

4。链接器

最后，链接器(ld.exe)将对象代码与库代码链接，以生成一个可执行文件"hello.exe"。

ld -o hello.exe hello.o ...libraries...