multithreading - Maximum number of threads for a CUDA kernel on Tesla M2050 -


i testing maximum number of threads simple kernel. find total number of threads cannot exceed 4096. code follow:

#include <stdio.h> #define n 100  __global__ void test(){     printf("%d %d\n", blockidx.x, threadidx.x); }  int main(void){     double *p;     size_t size=n*sizeof(double);     cudamalloc(&p, size);     test<<<64,128>>>();    //test<<<64,128>>>();    cudafree(p);    return 0; } 

my test environment: cuda 4.2.9 on tesla m2050. code compiled

 nvcc -arch=sm_20 test.cu 

while checking what's output, found combinations missing. run command

./a.out|wc -l 

i got 4096. when check cc2.0, can find maximum numberof blocks x,y,z dimensions (1024,1024,512) , maximum number of threads per block 1024. , calls kernel (either <<<64,128>>> or <<<128,64>>>) in limits. idea?

nb: cuda memory operations there block code output kernel shown.

you abusing kernel printf, , using judge how many threads can run nonsensical idea. runtime has limited buffer size printf output, , overflowing output when run enough threads. there api query , set printf buffer size, using cudadevicegetlimit , cudadevicesetlimit (thanks robert crovella link printf documentation in comments).

you can find maximum number of threads given kernel can run looking here in documentation.


Comments

Popular posts from this blog

android - getbluetoothservice() called with no bluetoothmanagercallback -

sql - ASP.NET SqlDataSource, like on SelectCommand -

ios - Undefined symbols for architecture armv7: "_OBJC_CLASS_$_SSZipArchive" -