I create a CUDA kernel using KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC). Block size is computed from KERN.MaxThreadsPerBlock which may vary based on a function which is used to build the kernel. I presumed MaxThreadsPerBlock is only dependent on gpuDevice properties. So far, it seems there might be some connection to number of function parameters. Can someone explain how this is actually determined or am I missing something?
I'm using Matlab 2019b, GCC 8.3, CUDA Toolkit 10.1 with NVidia V100 (CC 7.0).