关于多线程：C ++ volatile关键字是否引入了内存栅栏？

Does the C++ volatile keyword introduce a memory fence?

我知道volatile通知编译器值可能会更改，但为了完成此功能，编译器是否需要引入一个内存边界来使其工作？

据我所知，对易失性对象的操作序列不能重新排序，必须保留。这似乎意味着一些记忆障碍是必要的，而且没有真正的方法来解决这个问题。我说得对吗？

在这个相关问题上有一个有趣的讨论

乔纳森·韦克利写道：

... Accesses to distinct volatile variables cannot be reordered by the
compiler as long as they occur in separate full expressions ... right
that volatile is useless for thread-safety, but not for the reasons he
gives. It's not because the compiler might reorder accesses to
volatile objects, but because the CPU might reorder them. Atomic
operations and memory barriers prevent the compiler and the CPU from
reordering

大卫·施瓦茨在评论中回复：

... There's no difference, from the point of view of the C++ standard,
between the compiler doing something and the compiler emitting
instructions that cause the hardware to do something. If the CPU may
reorder accesses to volatiles, then the standard doesn't require that
their order be preserved. ...

... The C++ standard doesn't make any distinction about what does the
reordering. And you can't argue that the CPU can reorder them with no
observable effect so that's okay -- the C++ standard defines their
order as observable. A compiler is compliant with the C++ standard on
a platform if it generates code that makes the platform do what the
standard requires. If the standard requires accesses to volatiles not
be reordered, then a platform the reorders them isn't compliant. ...

My point is that if the C++ standard prohibits the compiler from
reordering accesses to distinct volatiles, on the theory that the
order of such accesses is part of the program's observable behavior,
then it also requires the compiler to emit code that prohibits the CPU
from doing so. The standard does not differentiate between what the
compiler does and what the compiler's generate code makes the CPU do.

哪一个会产生两个问题：它们中的任何一个是"对"的？实际的实现真正做什么？

相关讨论

这主要意味着编译器不应该将该变量保存在寄存器中。源代码中的每个分配和读取都应该对应于二进制代码中的内存访问。
+第一个问题，我已经看了一整天了。
stackoverflow.com/questions/14785639/&hellip；
我怀疑关键是，如果将值存储在内部寄存器中，那么任何内存边界都将无效。我认为在同时发生的情况下，你仍然需要采取其他的保护措施。
据我所知，volatile用于可由硬件更改的变量(通常与微控制器一起使用)。这仅仅意味着读取变量不能以不同的顺序进行，也不能优化。虽然是C，但在++中应该是相同的。
@mast我还没有看到一个编译器可以防止volatile变量的读取被CPU缓存优化掉。要么所有这些编译器都是不一致的，要么标准并不意味着你认为它意味着什么。(标准没有区分编译器做什么和编译器让CPU做什么。编译器的任务是在运行时发出符合标准的代码。)
@Davidschwartz优化远离什么/哪里？RAM？内存映射IO？
@奇怪的是，我的观点是，volatile不提供任何订购保证，也不阻止从平台可以优化它们的任何地方优化读取，并且仍然保持volatile对那些使用有用标准要求它对
@davidschwartz这样一个合理的实现不需要保证volatile的ptrace语义，也就是说，当插入一个断点时，所有volatile变量都可以用ptrace进行检查和更改，如果重新启动执行，所有变量都会真正保持它们的新值，并且程序的行为定义良好。预计起飞时间？(任何易失性访问都是一个定义良好的执行位置，一个"明确定义的位置"是一个可以设置精确断点的位置，确切的断点是一个完全停止在C/C++指令中执行的那些断点)。
@好奇的家伙，我认为这是一个不合理的期望，至少没有特定的编译器标志。这可能需要禁用99.9%的代码可以受益的非常重要的优化。当然，C++标准并不需要它。(考虑到当调用任何函数时，编译器通常不知道该函数是否包含任何volatile访问。因此，即使对于从未使用过volatile的代码，也可能必须禁用优化。)
@Davidschwartz允许ptrace对易失变量进行窥视/窥视，会阻止哪种优化？(并且仅限于易变变量)哪个真正的编译器还没有并且总是像我描述的那样实现了"ptrace volatile semantic"？在易失性访问点允许断点，在停止的线程上允许任意更改易失性变量，如何影响不使用易失性的程序？信号处理程序已经可以修改现有编译器afaik上的任何可变变量。
@我认为GCC实现比"ptrace语义"更强大，因为易失性自动对象存储在内存中。ptrace"允许挥发物在寄存器中，只要不知道这些变量的值。
@奇怪的是，当前的实现所做的并不是您应该基于声称可移植的期望。
@首先，Davidschwartz认为所有的实现至少都提供了"ptrace volatile"，这表明它没有太大的成本。第二，我不明白一个实现如何提供更少的资源。
@好奇的家伙，它表明，它没有一个奢侈的成本在今天的硬件。但是，如果它在未来的硬件上实现了这一点，平台很可能不会支付这一成本。你的第二个论点缺乏想像力，我对此并不印象深刻，因为我看到过这些论点反复失败。许多早期的Windows代码对编译器永远不会优化的内容或他们认为可以保证的"发生什么"行为做出了类似的假设，这不会造成任何痛苦。
@Davidschwartz的"ptrace语义"显然假定存在基于断点的调试。假设任何高质量的CPU都希望提供这种功能，至少在模拟器中是这样的，这似乎是公平的。C和C++实现通常允许在不存在易失性访问、SysCalor或其他强外部函数调用(不可能内联的函数调用)的情况下放置断点。
@好奇的家伙，我真的希望你不要做一个鼓励程序员基于这些假设设计的实践，因为它是绝对和完全不必要的，没有任何好处，并导致大量的痛苦在最近的过去。
@戴维施瓦茨，我真的知道。"ptrace semantic"是解释易变的最清晰的方法，在实践中非常容易使用。它为许多目的提供了明显的好处：测试、编写信号处理程序、尽可能使用消费语义编写mt代码…

不要解释volatile的作用，让我解释一下什么时候应该使用volatile。

在信号处理器内部时。因为写入volatile变量几乎是标准允许您从信号处理程序中进行的唯一操作。由于C++ 11，你可以使用EDCOX1，4，用于这个目的，但是只有当原子是无锁的。
根据英特尔的说法，在与setjmp打交道时。
当直接处理硬件时，您希望确保编译器不会优化您的读或写操作。

例如：

1
2
3

volatile int *foo = some_memory_mapped_device;
while (*foo)
; // wait until *foo turns false

如果没有volatile说明符，编译器可以完全优化循环。volatile说明符告诉编译器，它可能不会假定2个后续读取返回相同的值。

注意，volatile与线程无关。如果有一个不同的线程写入到*foo中，则上述示例不起作用，因为不涉及获取操作。

在所有其他情况下，应将volatile的使用视为不可移植的，并且不再通过代码审查，除非在处理C++11之前的编译器和编译器扩展时(如msvc的/volatile:ms开关，该开关在x86/i64下默认启用)。

相关讨论

它比"不能假定随后的两次读取返回相同的值"更严格。即使只读取一次和/或丢弃值，也必须进行读取。
在信号处理器和setjmp中的使用是标准的两个保证。另一方面，至少在开始时，其目的是支持内存映射的IO。在某些处理器上可能需要围栏或内存条。
@飞利浦，除了没人知道"读"是什么意思。例如，没有人相信必须进行实际的内存读取——我所知道的任何编译器都不会尝试在volatile访问上绕过CPU缓存。
@詹姆斯坎兹：不是。Re-Signal处理程序标准指出，在信号处理期间，只有volatile std:：sig_atomic_t&lock free atomic对象具有定义的值。但它也说，访问易挥发物体是可以观察到的副作用。
@Davidschwartz：该标准规定，实现必须定义与易失性访问的抽象机上的"可观察行为"副作用相对应的实际效果。
@飞利浦是的。所以说这个标准说"必须完成读取"并不意味着什么，因为它是定义"读取"是什么的实现。该标准说，实现必须做一些实现定义的事情，"相当于说，该标准对实现没有任何要求。如果愿意的话，实现可以以这样一种方式定义"读取"，即在该平台上获得一些特定的、有用的行为。但是该标准并没有强制一个有用的定义，因此没有可靠的、可移植的关于未被优化的读取的行为。
@但它也指出，构成访问的是"实现定义的"。
@davidschwartz一些编译器体系结构对映射到实际效果的标准指定访问序列，工作程序访问挥发物以获得这些效果。一些这样的对没有映射或者一个微不足道的无用映射的事实与实现的质量有关，但与手边的点无关。
@Jameskanze看到我对Davidschwartz的上述评论。
MMIO寄存器呢？我看不到它们在没有挥发性的情况下正常工作。
@kefschecter mmio寄存器是一个完全特定于平台的东西。平台可以指定volatile具有对它们有用的语义。或者它可以提供一些其他非标准的方法来访问它们。
@davidschwartz"没有编译器我知道试图绕过CPU缓存"，哪个CPU不能保证IO映射？
@除了没人知道"读"是什么意思外，Davidschwartz"，但每个人至少都同意CPU必须检查"读"地址是否有效，并且您有权读它。
@奇怪的是，大多数CPU都有特定于CPU的方法来识别IO映射。例如，x86 CPU具有标识IO映射的MTRR，用于其他地址没有的特殊处理。因此，除了IO映射之外，任何关于IO映射的内容都不适用。实际上，C++不要求CPU检查读取是有效的，因为无效的读取是UB，所以您不应该从C++ POV中假设检查完成。

Does the C++ volatile keyword introduce a memory fence?

符合规范的C++编译器不需要引入内存栅栏。您的特定编译器可能；将您的问题引向编译器的作者。

C++中的"易失性"函数与线程无关。记住，"volatile"的目的是禁用编译器优化，这样就不会优化由于外部条件而更改的寄存器的读取。由不同的CPU上的不同线程写入的内存地址是由于外部条件而改变的寄存器吗？不。同样，如果一些编译器作者选择将不同CPU上的不同线程写入的内存地址视为由于外部条件而改变的寄存器，那么这就是他们的业务；他们不需要这样做。它们也不是必需的——即使它引入了内存边界——例如，确保每个线程都能看到稳定的读写顺序。

事实上，对于C/C++中的线程来说，易失性几乎没有用。最好的做法是避免。

此外：内存围栏是特定处理器体系结构的一个实现细节。在C中，volatile显式设计用于多线程，规范并没有说明将引入半围栏，因为程序可能运行在一个最初没有围栏的体系结构上。相反，该规范再次保证了编译器、运行时和CPU将避免哪些优化，从而对如何排序某些副作用施加某些(非常弱)约束。实际上，这些优化是通过使用半围栏来消除的，但这是一个将来可能发生变化的实现细节。

您关心任何语言中volatile的语义，因为它们与多线程有关，这表明您正在考虑跨线程共享内存。考虑不这么做。它使您的程序更难理解，并且更可能包含微妙的、不可能重现的错误。

相关讨论

"在C/C++中，波动是无用的。"一点也不！你有一个非常以用户模式桌面为中心的世界视图…但是大多数C和C++代码在嵌入式系统上运行，其中内存映射I/O.非常需要波动。
而保持易失性访问的原因并不仅仅是因为外部条件可以改变内存位置。访问本身可以触发进一步的操作。例如，读操作提前FIFO或清除中断标志是很常见的。
@本沃伊特：有效地处理线程问题是我的本意。
In fact, volatile is pretty much useless in C/C++. Best practice is to avoid it.是个坏提示！如果底层硬件访问必须在没有优化的情况下进行，那么volatile在任何读/写/修改访问中都很有用。这与线程无关，而与优化无关。
@克劳斯：有效地处理线程问题是我的本意。你已经正确地重新陈述了我第二段的前两句话，所以我们同意这一点。
感谢您最后一次编辑时对其进行澄清。事实上，没有volatile就不能完成嵌入式编程。在引入C++之后，11/14 C++现在非常适合于非常小的嵌入式系统。在这里，volatile只是必须拥有的。
克劳斯，但这是因为这些系统是如何实现EDOCX1 1的，而不是因为C++标准规定的EDOCX1 1的行为。这些是volatile的平台特定、非标准行为。
@Davidschwartz标准显然不能保证内存映射IO的工作方式。但是内存映射的IO是将volatile引入C标准的原因。尽管如此，因为标准不能规定诸如"访问"实际发生的事情，它说"对具有可变限定类型的对象的访问的组成部分是实现定义的。"现在太多的实现都不能提供访问的有用定义，即使它违反了标准的精神，也不例外。与信相符。
这种编辑确实是一种进步，但你的解释仍然过于集中在"记忆可能是外生的改变"上。volatile的语义比这强，编译器必须生成每个请求的访问(1.9/8，1.9/12)，而不是简单地保证最终检测到外部变化(1.10/27)。在内存映射I/O的世界中，内存读取可以具有任意关联逻辑，如属性getter。您不会根据您为volatile声明的规则优化对属性getter的调用，标准也不允许这样做。

戴维忽略的是，C++标准指定了在特定情况下交互的多个线程的行为，所有其他结果都导致未定义的行为。如果不使用原子变量，则涉及至少一次写入的争用条件是未定义的。

因此，编译器完全有权放弃任何同步指令，因为CPU只会注意到由于缺少同步而显示未定义行为的程序中的差异。

相关讨论

首先，C++标准不能保证正确地对非原子的读/写排序所需的内存障碍。建议使用易失性变量与MMIO、信号处理等配合使用。在大多数实现中，易失性对于多线程不有用，通常不建议使用。

关于易失性访问的实现，这是编译器的选择。

本文描述了gcc行为，它表明您不能使用volatile对象作为内存屏障来对volatile内存进行顺序写入。

关于ICC行为，我发现这个源代码还告诉我们volatile并不保证按顺序访问内存。

Microsoft VS2013编译器有不同的行为。本文档解释了volatile如何加强发布/获取语义，并使volatile对象能够在多线程应用程序的锁/发布中使用。

需要考虑的另一个方面是，同一个编译器可能具有不同的行为WRT。取决于目标硬件架构的易失性。关于MSVS2013编译器的这篇文章清楚地说明了使用volatile for ARM平台编译的细节。

所以我的答案是：

Does the C++ volatile keyword introduce a memory fence?

可能是：没有保证，可能没有，但有些编译器可能会这样做。你不应该依赖这样的事实。

相关讨论

据我所知，编译器只在Itanium体系结构上插入内存边界。

volatile关键字实际上最适合用于异步更改，例如信号处理程序和内存映射寄存器；它通常是用于多线程编程的错误工具。

相关讨论

这取决于哪个编译器是"编译器"。从2005开始，VisualC++完成。但是该标准不需要它，所以其他一些编译器不需要它。

相关讨论

VC++2012似乎并没有插入一个围栏：int volatile i; int main() { return i; }生成一个main，其中正好有两个指令：mov eax, i; ret 0;。
@Jameskanze：确切地说，是哪个版本？您是否使用了任何非默认的编译选项？我依赖于文档(第一个受影响的版本)和(最新的版本)，其中明确提到了获取和发布语义。
cl /help表示版本18.00.21005.1。它所在的目录是C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC。命令窗口上的标题显示为vs 2013。所以关于版本…我使用的唯一选项是/c /O2 /Fa。(没有/O2也建立了本地堆栈帧。但仍然没有围栏指示。)
JAMESKANZE:我更感兴趣的是架构，例如"微软(R)C/C++优化编译器版本18.00 .30723，X64"，也许没有围栏，因为X86和X64在他们的内存模型中有相当强的缓存一致性保证开始了吗？
nevermind(注释删除)，使/Fa和/Za混淆：(
也许吧。我真的不知道。我在main中这样做的事实，使编译器可以看到整个程序，并知道没有其他线程，或者至少没有其他访问我之前的变量的权限(这样就不会有缓存问题)，也可能会影响到这一点，但不知何故，我对此表示怀疑。
@Jameskanze：在x86 ASM中，每个负载都是一个获取负载，每个存储都是一个发布存储。为了在编译C++时提供这些语义，MSVC仅仅需要避免编译时间重新排序。因此，对volatile变量的读取是重新排序加载/存储的编译器屏障，但不会导致额外的指令。(同样的原因，atomic_thread_fence(mo_acquire)并没有在x86上发出任何指令，就像每个架构上的atomic_signal_fence(mo_acquire)一样。只有atomic_thread_fence(mo_seq_cst)需要一个货币。)
iirc，我读过一些版本的msvc为x.load(memory_order_seq_cst)发出mov eax, [mem]/MFENCE，但其他编译器只发出带有seq-cst存储的mfence。(因为x86允许StoreLoad重新排序，但不允许LoadStore重新排序)。例如，在Godbolt编译器资源管理器上，看看GCC对x86或ARM做了什么。

这主要来自内存，基于C++11之前的版本，没有线程。但是我参加了委员会关于线程的讨论，可以这么说委员会从来没有打算用volatile来线程之间的同步。微软提出了，但是没有携带。

volatile的关键规范是访问volatile表示"可观察的行为"，就像IO一样。同样，编译器不能重新排序或删除特定的IO，它不能重新排序或删除对volatile对象(或者更准确地说，通过左值表达式访问挥发性合格类型)。事实上，volatile的最初意图是支持内存映射IO。然而，这个问题的"问题"在于实现定义了"可变访问"的组成部分。和许多编译器将其实现为"一条读取或已执行对内存的写入操作"。这是合法的，尽管没用定义，如果实现指定了它。(我还没有找到任何编译器的规范。)

可以证明(这是我接受的一个论点)，这违背了标准，因为除非硬件将地址识别为内存映射IO和禁止任何重新排序等，您甚至不能使用volatile作为内存映射IO，至少在SPARC或Intel架构上。从来没有少过，没有一个我看过的Comiler(Sun CC、G++和MSC)可以输出任何围栏或内存条。指令。(关于微软提议扩展规则的时间volatile，我认为他们的一些编纂者实施了他们的建议，并且为易失性访问发出围栏指令。我还没有核实最近编译器可以，但如果它依赖于某个编译器，我不会感到惊讶。选择权。我检查的版本&mdash；我认为它是vs6.0&mdash；未发出然而，围栏。)

相关讨论

为什么你只是说编译器不能重新排序或删除对易失性对象的访问？当然，如果访问是可观察到的行为，那么防止CPU、写发布缓冲区、内存控制器以及其他所有东西对它们进行重新排序也同样重要。
@因为标准就是这么说的。当然，从实际的角度来看，我验证过的编译器所做的工作是完全无用的，但是标准的黄鼠狼的话已经足够了，所以它们仍然可以声明一致性(或者，如果它们确实记录了一致性的话，也可以)。
@davidschwartz：对于将I/O映射到外围设备的独占(或互斥)内存，volatile语义完全足够。一般来说，这些外围设备报告它们的内存区域是不可缓存的，这有助于在硬件级别重新排序。
在"benvoigt wondered somehow：关于我的想法，somehow处理器"知道"，它的目的地址的冰与冰的记忆mapped IO。我知道父亲AA AA，SPARCS没有任何支持这个，所以我还是会让孙CC和G + +在SPARC unusable存储mapped IO。(我为这看起来比较感兴趣，主要是在一个SPARC)。
jameskanze：从"小矿在做什么，它看起来有ranges SPARC专用地址"alternate观"，是noncacheable存储器。As Long As挥发的接入点到你的ASI_REAL_IO盘的地址空间，我想你应该是好的。(Altera的NiosⅡA相似辨别技术，与高位的地址控制MMU的旁路；酸性有别人。)
"这不是jameskanze说什么标准。标准不谈论什么是编译器能或不能做全。信息技术-谈吗？生的代码能或不能做的。这将是incoherent说，"标准"的编译器不能做X，但它可以是X Code Generate…"。
我可以是benvoigt"信息。我没有找到任何关于他们在我很小时(也许10年前)，但不甚远看它。即使在SPARC架构不熟悉这样的，当然，一个人可以实现芯片上的信息。
"davidschwartz说，"什么是标准的构成方式接入冰执行定义"。如果这场为执行《执行A或A大负荷指令，当时的这场信息技术的实施。即使这有没有明显的效应外部的处理器。在这个案例中，practically说，这两个标准的实施允许定义"observable行为"的东西，实际上你不能要守。
"benvoigt观：《关于noncacheable缓存不是问题；它的流水线的CPU本身。怎么会有两种构型的影响volatileCPU是有效的，没有一membar。
"jameskanze"标准的实施允许两个定义的"observable行为"的东西，实际上你不能要守"，但你可以在一个CPU模拟器。

不必。volatile不是同步原语。它只是禁用优化，即在线程中按抽象机指定的相同顺序获得可预测的读写序列。但是不同线程中的读和写首先没有顺序，说保留或不保留它们的顺序是没有意义的。通过同步原语可以建立两个命令之间的顺序，不需要它们就可以得到UB。

关于记忆障碍的一点解释。典型的CPU具有多个级别的内存访问。有一个内存管道，几个级别的缓存，然后是RAM等。

MEMBAR说明冲洗管道。它们不会改变执行读和写的顺序，只会强制在给定的时刻执行未完成的读和写。它对多线程程序很有用，但对其他程序不太有用。

缓存通常在CPU之间自动保持一致。如果要确保缓存与RAM同步，则需要缓存刷新。它与Membar非常不同。

相关讨论

"它只是禁用优化"你是说这就是它在一些未指定的编译器或平台上所做的一切吗？或者你是说C++的所有标准都要求它这么做。如果是前者，我看不出这种观察有多有用。如果是后者的话，我想听听你对我的论点的回应，与原问题相反。
后者。我不明白你的论点如何适用。它是关于重新排序的，但是线程之间没有顺序可以开始。如果您谈论的是单个线程，那么它同样是无效的，因为对可变变量的访问是由实现定义的，并且定义不需要涉及特定类型的物理内存。RAM并不比缓存或交换更好。
那么你是说C++标准说EDCOX1第0条仅仅禁用编译器优化？这没有任何意义。编译器可以做的任何优化，至少在原则上，都可以由CPU做得同样好。因此，如果该标准说它只是禁用了编译器优化，那就意味着它将不会提供任何人在可移植代码中可以依赖的行为。但这显然不是真的，因为可移植代码可以依赖于它对setjmp和信号的行为。
@大卫施瓦茨不，标准上没有这样的规定。禁用优化通常是为了实现标准而进行的。该标准要求可观察行为的发生顺序与抽象机所要求的顺序相同。当抽象机不需要任何命令时，实现可以自由使用任何命令，或者根本不需要任何命令。除非应用了额外的同步，否则不会对不同线程中的易失性变量进行访问。
所以当你说"后者"时，你的意思是前者？如果只是在某些平台上禁用了优化，那么您就不能保证获得可预测的读写序列，而只是在某些平台上发生了这种情况。
@戴维施瓦茨，我为不准确的措辞道歉。本标准不要求禁用优化。它完全没有优化的概念。相反，它规定了实际中要求编译器禁用某些优化的行为，使可观察的读写序列符合标准。
但它不要求这样做，因为标准允许实现定义"可观察的读和写序列"，不管它们想要什么。如果实现选择定义可观察的序列，以至于必须禁用优化，那么它们会这样做。如果没有，那么就没有。如果(并且仅当)实现已经选择将其提供给您时，您将得到一个可预测的读写序列。
不，实现需要定义构成单个访问的内容。这些访问的顺序由抽象机规定。实现必须保留顺序。该标准明确表示，"volatile是实现的一个提示，以避免涉及到对象的激进优化"，尽管是在非规范部分，但其意图是明确的。

编译器需要在volatile访问周围引入一个内存围栏，如果(并且仅当)这对于在该特定平台上使用标准工作(setjmp和信号处理程序等)中指定的volatile访问是必要的。

请注意，一些编译器确实超出了C++标准所要求的，以便使EDCOX1 0在这些平台上更强大或更有用。可移植的代码不应该依赖于EDCOX1×0来做超出C++标准中规定的任何事情。

我总是在中断服务例程中使用volatile，例如，isr(通常是汇编代码)修改一些内存位置，并且在中断上下文之外运行的更高级别的代码通过指向volatile的指针访问内存位置。

我为RAM和内存映射IO执行此操作。

根据这里的讨论，这似乎仍然是volatile的有效使用，但与多线程或CPU无关。如果一个微控制器的编译器"知道"不可能有任何其他的访问(例如，每一次都是片上的，没有缓存，只有一个核心)，我会认为一个内存边界根本没有暗示，编译器只需要防止某些优化。

当我们把更多的东西堆到执行对象代码的"系统"中时，几乎所有的赌注都被取消了，至少这就是我阅读本文的方式。编译器怎么可能覆盖所有的基呢？

关键字volatile本质上意味着读取和写入对象应该完全按照程序所写的方式执行，而不是以任何方式优化。二进制代码应该遵循C或C++代码：一个读取的负载，一个有写入的存储区。

它还意味着不应该期望任何读取都会产生可预测的值：编译器不应该假定任何关于读取的内容，即使是在写入同一个易失性对象之后：

1
2
3
4

volatile int i;
i = 1;
int j = i;
if (j == 1) // not assumed to be true

volatile可能是"C是高级汇编语言"工具箱中最重要的工具。

声明对象volatile是否足以确保处理异步更改的代码的行为取决于平台：不同的CPU为正常内存读写提供不同级别的保证同步。除非您是该领域的专家，否则您可能不应该尝试编写这样的低级多线程代码。

原子原语为多线程提供了一个更高级别的对象视图，这使得对代码进行推理变得容易。几乎所有程序员都应该使用原子原语或提供互斥的原语，如互斥、读写锁、信号量或其他阻塞原语。

当我正在为3D图形和游戏引擎开发工作时，我正在学习一个在线可下载视频教程。我们确实在一个类中使用了volatile。教程网站可以在这里找到，使用volatile关键字的视频可以在Shader Engine系列视频98中找到。这些作品不是我自己的，但被认证为Marek A. Krzeminski, MASc，这是视频下载页面的摘录。

"Since we can now have our games run in multiple threads it is important to synchronize data between threads properly. In this video I show how to create a volitile locking class to ensure volitile variables are properly synchronized..."

如果你订阅了他的网站，并且可以在这个视频中访问他的视频，他引用了这篇文章，涉及使用volatile和multithreading编程。

下面是来自上面链接的文章：http://www.drdobbs.com/cpp/volatile-the-multithread-programmers-b/184403766

volatile: The Multithreaded Programmer's Best Friend

By Andrei Alexandrescu, February 01, 2001

The volatile keyword was devised to prevent compiler optimizations that might render code incorrect in the presence of certain asynchronous events.

I don't want to spoil your mood, but this column addresses the dreaded topic of multithreaded programming. If — as the previous installment of Generic says — exception-safe programming is hard, it's child's play compared to multithreaded programming.

Programs using multiple threads are notoriously hard to write, prove correct, debug, maintain, and tame in general. Incorrect multithreaded programs might run for years without a glitch, only to unexpectedly run amok because some critical timing condition has been met.

Needless to say, a programmer writing multithreaded code needs all the help she can get. This column focuses on race conditions — a common source of trouble in multithreaded programs — and provides you with insights and tools on how to avoid them and, amazingly enough, have the compiler work hard at helping you with that.

Just a Little Keyword

Although both C and C++ Standards are conspicuously silent when it comes to threads, they do make a little concession to multithreading, in the form of the volatile keyword.

Just like its better-known counterpart const, volatile is a type modifier. It's intended to be used in conjunction with variables that are accessed and modified in different threads. Basically, without volatile, either writing multithreaded programs becomes impossible, or the compiler wastes vast optimization opportunities. An explanation is in order.

Consider the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Gadget {
public:
void Wait() {
while (!flag_) {
Sleep(1000); // sleeps for 1000 milliseconds
}
}
void Wakeup() {
flag_ = true;
}
...
private:
bool flag_;
};

The purpose of Gadget::Wait above is to check the flag_ member variable every second and return when that variable has been set to true by another thread. At least that's what its programmer intended, but, alas, Wait is incorrect.

Suppose the compiler figures out that Sleep(1000) is a call into an external library that cannot possibly modify the member variable flag_. Then the compiler concludes that it can cache flag_ in a register and use that register instead of accessing the slower on-board memory. This is an excellent optimization for single-threaded code, but in this case, it harms correctness: after you call Wait for some Gadget object, although another thread calls Wakeup, Wait will loop forever. This is because the change of flag_ will not be reflected in the register that caches flag_. The optimization is too ... optimistic.

Caching variables in registers is a very valuable optimization that applies most of the time, so it would be a pity to waste it. C and C++ give you the chance to explicitly disable such caching. If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable. So all you have to do to make Gadget's Wait/Wakeup combo work is to qualify flag_ appropriately:

1
2
3
4
5
6
class Gadget {
public:
... as above ...
private:
volatile bool flag_;
};

Most explanations of the rationale and usage of volatile stop here and advise you to volatile-qualify the primitive types that you use in multiple threads. However, there is much more you can do with volatile, because it is part of C++'s wonderful type system.

Using volatile with User-Defined Types

You can volatile-qualify not only primitive types, but also user-defined types. In that case, volatile modifies the type in a way similar to const. (You can also apply const and volatile to the same type simultaneously.)

Unlike const, volatile discriminates between primitive types and user-defined types. Namely, unlike classes, primitive types still support all of their operations (addition, multiplication, assignment, etc.) when volatile-qualified. For example, you can assign a non-volatile int to a volatile int, but you cannot assign a non-volatile object to a volatile object.

Let's illustrate how volatile works on user-defined types on an example.

1
2
3
4
5
6
7
8
9
10
11
12
class Gadget {
public:
void Foo() volatile;
void Bar();
...
private:
String name_;
int state_;
};
...
Gadget regularGadget;
volatile Gadget volatileGadget;

If you think volatile is not that useful with objects, prepare for some surprise.

1
2
3
4
5
6
volatileGadget.Foo(); // ok, volatile fun called for
// volatile object
regularGadget.Foo(); // ok, volatile fun called for
// non-volatile object
volatileGadget.Bar(); // error! Non-volatile function called for
// volatile object!

The conversion from a non-qualified type to its volatile counterpart is trivial. However, just as with const, you cannot make the trip back from volatile to non-qualified. You must use a cast:

1
2
Gadget& ref = const_cast<Gadget&>(volatileGadget);
ref.Bar(); // ok

A volatile-qualified class gives access only to a subset of its interface, a subset that is under the control of the class implementer. Users can gain full access to that type's interface only by using a const_cast. In addition, just like constness, volatileness propagates from the class to its members (for example, volatileGadget.name_ and volatileGadget.state_ are volatile variables).

volatile, Critical Sections, and Race Conditions

The simplest and the most often-used synchronization device in multithreaded programs is the mutex. A mutex exposes the Acquire and Release primitives. Once you call Acquire in some thread, any other thread calling Acquire will block. Later, when that thread calls Release, precisely one thread blocked in an Acquire call will be released. In other words, for a given mutex, only one thread can get processor time in between a call to Acquire and a call to Release. The executing code between a call to Acquire and a call to Release is called a critical section. (Windows terminology is a bit confusing because it calls the mutex itself a critical section, while"mutex" is actually an inter-process mutex. It would have been nice if they were called thread mutex and process mutex.)

Mutexes are used to protect data against race conditions. By definition, a race condition occurs when the effect of more threads on data depends on how threads are scheduled. Race conditions appear when two or more threads compete for using the same data. Because threads can interrupt each other at arbitrary moments in time, data can be corrupted or misinterpreted. Consequently, changes and sometimes accesses to data must be carefully protected with critical sections. In object-oriented programming, this usually means that you store a mutex in a class as a member variable and use it whenever you access that class' state.

Experienced multithreaded programmers might have yawned reading the two paragraphs above, but their purpose is to provide an intellectual workout, because now we will link with the volatile connection. We do this by drawing a parallel between the C++ types' world and the threading semantics world.

Outside a critical section, any thread might interrupt any other at any time; there is no control, so consequently variables accessible from multiple threads are volatile. This is in keeping with the original intent of volatile — that of preventing the compiler from unwittingly caching values used by multiple threads at once.

Inside a critical section defined by a mutex, only one thread has access. Consequently, inside a critical section, the executing code has single-threaded semantics. The controlled variable is not volatile anymore — you can remove the volatile qualifier.

In short, data shared between threads is conceptually volatile outside a critical section, and non-volatile inside a critical section.

You enter a critical section by locking a mutex. You remove the volatile qualifier from a type by applying a const_cast. If we manage to put these two operations together, we create a connection between C++'s type system and an application's threading semantics. We can make the compiler check race conditions for us.

LockingPtr

We need a tool that collects a mutex acquisition and a const_cast. Let's develop a LockingPtr class template that you initialize with a volatile object obj and a mutex mtx. During its lifetime, a LockingPtr keeps mtx acquired. Also, LockingPtr offers access to the volatile-stripped obj. The access is offered in a smart pointer fashion, through operator-> and operator*. The const_cast is performed inside LockingPtr. The cast is semantically valid because LockingPtr keeps the mutex acquired for its lifetime.

First, let's define the skeleton of a class Mutex with which LockingPtr will work:

1
2
3
4
5
6
class Mutex {
public:
void Acquire();
void Release();
...
};

To use LockingPtr, you implement Mutex using your operating system's native data structures and primitive functions.

LockingPtr is templated with the type of the controlled variable. For example, if you want to control a Widget, you use a LockingPtr that you initialize with a variable of type volatile Widget.

LockingPtr's definition is very simple. LockingPtr implements an unsophisticated smart pointer. It focuses solely on collecting a const_cast and a critical section.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
template <typename T>
class LockingPtr {
public:
// Constructors/destructors
LockingPtr(volatile T& obj, Mutex& mtx)
: pObj_(const_cast<T*>(&obj)), pMtx_(&mtx) {
mtx.Lock();
}
~LockingPtr() {
pMtx_->Unlock();
}
// Pointer behavior
T& operator*() {
return *pObj_;
}
T* operator->() {
return pObj_;
}
private:
T* pObj_;
Mutex* pMtx_;
LockingPtr(const LockingPtr&);
LockingPtr& operator=(const LockingPtr&);
};

In spite of its simplicity, LockingPtr is a very useful aid in writing correct multithreaded code. You should define objects that are shared between threads as volatile and never use const_cast with them — always use LockingPtr automatic objects. Let's illustrate this with an example.

Say you have two threads that share a vector object:

1
2
3
4
5
6
7
8
9
class SyncBuf {
public:
void Thread1();
void Thread2();
private:
typedef vector<char> BufT;
volatile BufT buffer_;
Mutex mtx_; // controls access to buffer_
};

Inside a thread function, you simply use a LockingPtr to get controlled access to the buffer_ member variable:

1
2
3
4
5
6
7
void SyncBuf::Thread1() {
LockingPtr<BufT> lpBuf(buffer_, mtx_);
BufT::iterator i = lpBuf->begin();
for (; i != lpBuf->end(); ++i) {
... use *i ...
}
}

The code is very easy to write and understand — whenever you need to use buffer_, you must create a LockingPtr pointing to it. Once you do that, you have access to vector's entire interface.

The nice part is that if you make a mistake, the compiler will point it out:

1
2
3
4
5
6
7
8
void SyncBuf::Thread2() {
// Error! Cannot access 'begin' for a volatile object
BufT::iterator i = buffer_.begin();
// Error! Cannot access 'end' for a volatile object
for ( ; i != lpBuf->end(); ++i ) {
... use *i ...
}
}

You cannot access any function of buffer_ until you either apply a const_cast or use LockingPtr. The difference is that LockingPtr offers an ordered way of applying const_cast to volatile variables.

LockingPtr is remarkably expressive. If you only need to call one function, you can create an unnamed temporary LockingPtr object and use it directly:

1
2
3
unsigned int SyncBuf::Size() {
return LockingPtr<BufT>(buffer_, mtx_)->size();
}

Back to Primitive Types

We saw how nicely volatile protects objects against uncontrolled access and how LockingPtr provides a simple and effective way of writing thread-safe code. Let's now return to primitive types, which are treated differently by volatile.

Let's consider an example where multiple threads share a variable of type int.

1
2
3
4
5
6
7
8
class Counter {
public:
...
void Increment() { ++ctr_; }
void Decrement() { —ctr_; }
private:
int ctr_;
};

If Increment and Decrement are to be called from different threads, the fragment above is buggy. First, ctr_ must be volatile. Second, even a seemingly atomic operation such as ++ctr_ is actually a three-stage operation. Memory itself has no arithmetic capabilities. When incrementing a variable, the processor:

Reads that variable in a register

Increments the value in the register

Writes the result back to memory

This three-step operation is called RMW (Read-Modify-Write). During the Modify part of an RMW operation, most processors free the memory bus in order to give other processors access to the memory.

If at that time another processor performs a RMW operation on the same variable, we have a race condition: the second write overwrites the effect of the first.

To avoid that, you can rely, again, on LockingPtr:

1
2
3
4
5
6
7
8
9
class Counter {
public:
...
void Increment() { ++*LockingPtr<int>(ctr_, mtx_); }
void Decrement() { —*LockingPtr<int>(ctr_, mtx_); }
private:
volatile int ctr_;
Mutex mtx_;
};

Now the code is correct, but its quality is inferior when compared to SyncBuf's code. Why? Because with Counter, the compiler will not warn you if you mistakenly access ctr_ directly (without locking it). The compiler compiles ++ctr_ if ctr_ is volatile, although the generated code is simply incorrect. The compiler is not your ally anymore, and only your attention can help you avoid race conditions.

What should you do then? Simply encapsulate the primitive data that you use in higher-level structures and use volatile with those structures. Paradoxically, it's worse to use volatile directly with built-ins, in spite of the fact that initially this was the usage intent of volatile!

volatile Member Functions

So far, we've had classes that aggregate volatile data members; now let's think of designing classes that in turn will be part of larger objects and shared between threads. Here is where volatile member functions can be of great help.

When designing your class, you volatile-qualify only those member functions that are thread safe. You must assume that code from the outside will call the volatile functions from any code at any time. Don't forget: volatile equals free multithreaded code and no critical section; non-volatile equals single-threaded scenario or inside a critical section.

For example, you define a class Widget that implements an operation in two variants — a thread-safe one and a fast, unprotected one.

1
2
3
4
5
6
7
8
class Widget {
public:
void Operation() volatile;
void Operation();
...
private:
Mutex mtx_;
};

Notice the use of overloading. Now Widget's user can invoke Operation using a uniform syntax either for volatile objects and get thread safety, or for regular objects and get speed. The user must be careful about defining the shared Widget objects as volatile.

When implementing a volatile member function, the first operation is usually to lock this with a LockingPtr. Then the work is done by using the non- volatile sibling:

1
2
3
4
void Widget::Operation() volatile {
LockingPtr<Widget> lpThis(*this, mtx_);
lpThis->Operation(); // invokes the non-volatile function
}

Summary

When writing multithreaded programs, you can use volatile to your advantage. You must stick to the following rules:

Define all shared objects as volatile.

Don't use volatile directly with primitive types.

When defining shared classes, use volatile member functions to express thread safety.

If you do this, and if you use the simple generic component LockingPtr, you can write thread-safe code and worry much less about race conditions, because the compiler will worry for you and will diligently point out the spots where you are wrong.

A couple of projects I've been involved with use volatile and LockingPtr to great effect. The code is clean and understandable. I recall a couple of deadlocks, but I prefer deadlocks to race conditions because they are so much easier to debug. There were virtually no problems related to race conditions. But then you never know.

Acknowledgements

Many thanks to James Kanze and Sorin Jianu who helped with insightful ideas.

Andrei Alexandrescu is a Development Manager at RealNetworks Inc. (www.realnetworks.com), based in Seattle, WA, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (www.gotw.ca/cpp_seminar).

这篇文章可能有点过时，但它确实提供了一个很好的见解，即在使用多线程编程时使用volatile修饰符可以帮助保持事件异步，同时让编译器为我们检查竞争条件。这可能无法直接回答有关创建内存围栏的操作初始问题，但我选择将此作为其他人的答案，作为在处理多线程应用程序时充分使用volatile的极好参考。

我认为有关易失性和指令重新排序的混淆源于CPU重新排序的两个概念：

无序执行。

其他CPU看到的内存读/写顺序(重新排序，每个CPU可能看到不同的顺序)。

volatile会影响编译器生成代码的方式，假设是单线程执行(这包括中断)。它并不意味着任何关于内存屏障指令的内容，但它阻止编译器执行与内存访问相关的某些优化。一个典型的例子是从内存中重新获取一个值，而不是使用一个缓存在寄存器中的值。

无序执行

如果最终结果可能发生在原始代码中，CPU可以不按顺序/推测地执行指令。CPU可以执行编译器中不允许的转换，因为编译器只能执行在所有情况下都正确的转换。相反，CPU可以检查这些优化的有效性，如果发现它们不正确，就退出它们。

其他CPU看到的内存读/写序列

指令序列的最终结果(有效顺序)必须与编译器生成的代码的语义一致。但是，CPU选择的实际执行顺序可能不同。在其他CPU中看到的有效顺序(每个CPU可以有不同的视图)可能受到内存屏障的限制。我不确定实际的顺序和有效的顺序会有多大的不同，因为我不知道内存障碍会在多大程度上阻止CPU执行无序执行。

资料来源：

记忆障碍
LLVM：Atomics
访问一次()和编译器错误