multithreading - Maximum number of threads for a CUDA kernel on Tesla M2050 -
i testing maximum number of threads simple kernel. find total number of threads cannot exceed 4096. code follow:
#include <stdio.h> #define n 100 __global__ void test(){ printf("%d %d\n", blockidx.x, threadidx.x); } int main(void){ double *p; size_t size=n*sizeof(double); cudamalloc(&p, size); test<<<64,128>>>(); //test<<<64,128>>>(); cudafree(p); return 0; }
my test environment: cuda 4.2.9 on tesla m2050. code compiled
nvcc -arch=sm_20 test.cu
while checking what's output, found combinations missing. run command
./a.out|wc -l
i got 4096. when check cc2.0, can find maximum numberof blocks x,y,z dimensions (1024,1024,512) , maximum number of threads per block 1024. , calls kernel (either <<<64,128>>> or <<<128,64>>>) in limits. idea?
nb: cuda memory operations there block code output kernel shown.
you abusing kernel printf
, , using judge how many threads can run nonsensical idea. runtime has limited buffer size printf
output, , overflowing output when run enough threads. there api query , set printf
buffer size, using cudadevicegetlimit
, cudadevicesetlimit
(thanks robert crovella link printf
documentation in comments).
you can find maximum number of threads given kernel can run looking here in documentation.
Comments
Post a Comment