using feval() to launch CUDA kernels from Matlab

11 views (last 30 days)
Crni
Crni on 13 Dec 2017
Commented: Crni on 14 Dec 2017
I'm using feval() for rapid prototyping of tests for my CUDA kernels, written in CUDA C. Works like a charm, but I have two questions:
1. The explanation of the treatment of input/output kernel arguments in the feval() function documentation leaves me wondering is there any data copying overhead involved? Namely, let's say that I have following kernel implemented in file addAssign.cu:
_global__ void addAssign(int const n, float* __restrict__ y, float const* __restrict__ x)
{
/* do y[i] += x[i] here */
}
and that I call it from Matlab as follows:
n = 10000;
x = gpuArray(single(rand(n, 1)));
y = gpuArray(single(rand(n, 1)));
kernel = parallel.gpu.CUDAKernel('addAssign.ptx', 'addAssign.cu');
kernel.ThreadBlockSize = [128, 1, 1];
kernel.GridSize = [ceil(n / 128), 1, 1];
y = feval(kernel, n, y, x);
So, is Matlab runtime going to do any data copying for assignment in the last statement, or it is going to recognize that 'y' appears both as input and output, and that the kernel will properly update it? I'm asking because I've noticed that if I omit the assignment, values of 'y' won't get updated.
2. How to use gputimeit() with feval() run as above? It seems like gputimeit() would discard return values from function passed as argument.
Thanks.

Answers (1)

Edric Ellis
Edric Ellis on 14 Dec 2017
1. Input/output variables to CUDAKernel.feval are operated on in-place in the usual way for MATLAB data. See this blog entry for more details. In other words, providing the type of y matches exactly the prototype of your kernel (i.e. it doesn't need to be cast to a different numeric type), and that it appears on the left-hand side too, then it is eligible for in-place operation.
2. You could use the wait method of GPUDevice rather than gputimeit to ensure you're timing exactly what you expect. (Behind the scenes, gputimeit uses the wait method to ensure execution has completed).
  1 Comment
Crni
Crni on 14 Dec 2017
Thanks for your reply. This is behavior that I was hoping for, but I was not aware of this additional requirement that in-place operation is possible only if given piece of Matlab code is within a function. Let me present full example. Here is the kernel code, let's say we put it in file addAssign.cu:
__global__ void addAssign(const int n, float* __restrict__ y,
const float * __restrict__ x)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int incr = gridDim.x * blockDim.x;
if (idx == 0)
printf("%p %p\n", x, y);
for ( ; idx < n; idx += incr)
y[idx] += x[idx];
}
And here is Matlab script:
n = 1000;
x = gpuArray(single(ones(n, 1)));
y = gpuArray(single(ones(n, 1)));
kernel = parallel.gpu.CUDAKernel('addAssign.ptx', 'addAssign.cu');
kernel.ThreadBlockSize = [128, 1, 1];
kernel.GridSize = [ceil(n / 128), 1, 1];
y = feval(kernel, n, y, x);
y = feval(kernel, n, y, x);
The idea is that kernel print pointers, so that through successive kernel calls we could check was there any re-allocation. Now, if we compile the kernel through:
nvcc -ptx addAssign.cu
and if we run the script, then it would print something like:
0x7fa6f0a00000 0x7fa6f0a20000
0x7fa6f0a00000 0x7fa6f0a10000
which means that y would get re-allocated. However, if I put the Matlab code above in a function, and call this function, then the output is:
0x7fa6f0a20000 0x7fa6f0a30000
0x7fa6f0a20000 0x7fa6f0a30000
which means y is used in-place. I don't quite understand why this restriction, but in any case it's good to know that in-place operation is possible.
As far as measuring elapsed time concerned, I guess what you suggested is replacing feval() call in the script above with something as follows:
device = gpuDevice;
tic;
y = feval(kernel, n, y, x);
device.wait();
toc;
That's fine, but I was hoping that gputimeit() is using high-precision CUDA timers. Is that correct, and if so, is there a way to use gputimeit() in combination with feval()?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!