multithreading - Maximum number of threads for a CUDA kernel on Tesla M2050 -

i testing maximum number of threads simple kernel. find total number of threads cannot exceed 4096. code follow:

#include <stdio.h> #define n 100  __global__ void test(){     printf("%d %d\n", blockidx.x, threadidx.x); }  int main(void){     double *p;     size_t size=n*sizeof(double);     cudamalloc(&p, size);     test<<<64,128>>>();    //test<<<64,128>>>();    cudafree(p);    return 0; } 

my test environment: cuda 4.2.9 on tesla m2050. code compiled

 nvcc -arch=sm_20 

while checking what's output, found combinations missing. run command

./a.out|wc -l 

i got 4096. when check cc2.0, can find maximum numberof blocks x,y,z dimensions (1024,1024,512) , maximum number of threads per block 1024. , calls kernel (either <<<64,128>>> or <<<128,64>>>) in limits. idea?

nb: cuda memory operations there block code output kernel shown.

you abusing kernel printf, , using judge how many threads can run nonsensical idea. runtime has limited buffer size printf output, , overflowing output when run enough threads. there api query , set printf buffer size, using cudadevicegetlimit , cudadevicesetlimit (thanks robert crovella link printf documentation in comments).

you can find maximum number of threads given kernel can run looking here in documentation.


Popular posts from this blog

android - getbluetoothservice() called with no bluetoothmanagercallback -

sql - ASP.NET SqlDataSource, like on SelectCommand -

javascript - Image onload event not firing in firefox -