关于C#:CUDA条件线程同步

CUDA conditional thread synchronization

CUDA编程指南指出

__syncthreads() is allowed in conditional code but only if the
conditional evaluates identically
across the entire thread block,
otherwise the code execution is likely
to hang or produce unintended side
effects.

因此,如果我需要使线程与跨块的条件分支同步,其中某些线程可能会或可能不会采用包含__syncthreads()调用的分支,这是否意味着它将行不通? >

我在想,可能会有各种各样的情况需要这样做。例如,如果您具有二进制掩码,并且需要有条件地对像素进行某些操作。假设if (mask(x, y) != 0)然后执行包含__syncthreads()的代码,否则什么也不做。该怎么做?


如果您需要走这条路线,可以将身体分为两个阶段:

1
2
3
4
5
6
7
8
9
if (condition)
{
    // code before sync
}
__syncthreads();
if (condition) // or remember a flag or whatever
{
    // code after sync
}

或者,您可以使用该条件来设置禁用某些操作的标志,例如,如果您正在计算增量更新,则可以执行以下操作:

1
2
3
4
5
// *ALL* compute a delta update, those threads that would have failed the condition
// simply compute garbage.
// This can include syncthreads
if (condition)
    // apply update


从3.0开始,您可以使用warp投票功能来完成__syncthreads无法完成的任务:

Warp vote functions are only supported
by devices of compute capability 1.2

int __all(int predicate); predicate
for all threads of the warp and
returns non-zero if and only if
predicate evaluates to non-zero for
all of them.

int __any(int predicate);
evaluates predicate for
all threads of the warp and returns
non-zero if and only if predicate
evaluates to non-zero for any of them.

unsigned int __ballot(int predicate);
evaluates predicate for all threads of
the warp and returns an integer whose
Nth bit is set if and only if
predicate evaluates to non-zero for
the Nth thread of the warp. This
function is only supported by devices
of compute capability 2.x.

否则,还有原??子按位函数

atomicAnd, atomicOr, atomicXor

请参阅《 cuda编程指南》 B.11节。