Is it possible to use cuRAND with feval (Parallel computing toolbox)?
Show older comments
Hi,
I am trying to call feval instruction (Parallel Computing toolbox) with a kernel which uses the cuRAND library (<http://developer.nvidia.com/curand)>, and I need to pass to feval an argument of type curandState (needed to initialize random generators in cuRAND).
I have something similar to:
K=parallel.gpu.CUDAKernel('kernel.ptx','kernel.cu');
[arg_out]=feval(K,arg_in, state);
"state" must be a curandState variable.
I tried cheating MATLAB with:
[arg_out]=feval(K,arg_in, 1);
But I got the following error message:
_Error using iParseToken (line 259) Unsupported type in argument specification "curandState * state".
Error in C:\Program Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\handleKernelArgs.p>iParseCPrototype (line 181)
Error in C:\Program Files\MATLAB\R2011b\toolbox\distcomp\gpu\+parallel\+internal\+gpu\handleKernelArgs.p>handleKernelArgs (line 70)_
I have not found any information in google. Could anyone please help me?
Thank you in advance.
María.
Accepted Answer
More Answers (2)
Edric Ellis
on 1 Feb 2012
For what it's worth, I have some example CUDA code and MATLAB driving code to show how one might use CURAND. First off, here's the CUDA code:
#include <curand_kernel.h>
const size_t stateSize = sizeof( curandState );
__device__ void copyState( void * out, void const * in ) {
unsigned char * outc = static_cast< unsigned char * >( out );
unsigned char const * inc = static_cast< unsigned char const * >( in );
for ( int i = 0; i < stateSize; ++ i ) {
outc[i] = inc[i];
}
}
__global__ void returnStateSize( unsigned int * value ) {
value[0] = stateSize;
}
__global__ void initState( unsigned char * stateArray ) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
curandState state;
curand_init( 1234, idx, 0, &state );
copyState( stateArray + idx * stateSize, &state );
}
__global__ void generate( double * x, unsigned char * stateArray ) {
int idx = blockDim.x * blockIdx.x + threadIdx.x;
curandState state;
copyState( &state, stateArray + idx * stateSize );
x[idx] = curand_uniform_double( &state );
copyState( stateArray + idx * stateSize, &state );
}
And here's some MATLAB code which uses that:
import parallel.gpu.GPUArray;
% Get the number of bytes per thread of state.
stateSizeK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'returnStateSize' );
stateSz = double( gather( feval( stateSizeK, zeros( 'uint32' ) ) ) );
% Set up the random state
initK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'initState' );
initK.ThreadBlockSize = 256;
initK.GridSize = 10;
randState = feval( initK, GPUArray.zeros( stateSz, 256*10, 'uint8' ) );
genK = parallel.gpu.CUDAKernel( 'userand.ptx', 'userand.cu', 'generate' );
genK.ThreadBlockSize = 256;
genK.GridSize = 10;
% Generate some random numbers
[rand1, randState] = feval( genK, GPUArray.zeros(1, 256*10), randState );
María
on 31 Jan 2012
0 votes
Categories
Find more on Code Performance in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!