关于C#：我可以使用cudaMalloc分配比必要数量更多的内存，以避免重新分配吗？

Can I allocate more memory than necessary with cudaMalloc to avoid reallocating?

我正在编写一个代码，使用cuSparse在GPU上用数千个稀疏矩阵进行计算。由于GPU上的内存有限，因此我需要一一对待它们，因为其余的内存将被其他GPU变量和密集矩阵占用。

我的工作流程(使用伪代码)如下：

1
2
3
4
5
6

for (i=0;i<1000;i++){
//allocate sparse matrix using cudaMalloc
//copy sparse matrix from host using cudaMemcpy
//do calculation by calling cuSparse
//deallocate sparse matrix with cudaFree
}

在上面，我在每个步骤中为每个稀疏矩阵分配并释放了内存，因为它们的稀疏性各不相同，因此每个人所需的内存也各不相同。

我是否可以做类似的事情：

1
2
3
4
5
6
7

//allocate buffer once in the beginning using cudaMalloc with some extra space such
//that even the sparse matrix with the highest density would fit.
for (i=0;i<1000;i++){
//copy sparse matrix from host using cudaMemcpy to the same buffer
//do calculation by calling cuSparse
}
//free the buffer once at the end using cudaFree

以上内容避免了每次迭代都必须malloc和释放缓冲区。以上工作有效吗？会提高性能吗？是好的做法还是有更好的方法来做到这一点？