CUDA conditional thread synchronization
CUDA编程指南指出
__syncthreads() is allowed in conditional code but only if the
conditional evaluates identically
across the entire thread block,
otherwise the code execution is likely
to hang or produce unintended side
effects.
因此,如果我需要使线程与跨块的条件分支同步,其中某些线程可能会或可能不会采用包含
我在想,可能会有各种各样的情况需要这样做。例如,如果您具有二进制掩码,并且需要有条件地对像素进行某些操作。假设
如果您需要走这条路线,可以将身体分为两个阶段:
1 2 3 4 5 6 7 8 9 | if (condition) { // code before sync } __syncthreads(); if (condition) // or remember a flag or whatever { // code after sync } |
或者,您可以使用该条件来设置禁用某些操作的标志,例如,如果您正在计算增量更新,则可以执行以下操作:
1 2 3 4 5 | // *ALL* compute a delta update, those threads that would have failed the condition // simply compute garbage. // This can include syncthreads if (condition) // apply update |
从3.0开始,您可以使用warp投票功能来完成__syncthreads无法完成的任务:
Warp vote functions are only supported
by devices of compute capability 1.2int __all(int predicate); predicate
for all threads of the warp and
returns non-zero if and only if
predicate evaluates to non-zero for
all of them.int __any(int predicate);
evaluates predicate for
all threads of the warp and returns
non-zero if and only if predicate
evaluates to non-zero for any of them.unsigned int __ballot(int predicate);
evaluates predicate for all threads of
the warp and returns an integer whose
Nth bit is set if and only if
predicate evaluates to non-zero for
the Nth thread of the warp. This
function is only supported by devices
of compute capability 2.x.
否则,还有原??子按位函数
atomicAnd, atomicOr, atomicXor
请参阅《 cuda编程指南》 B.11节。