mexcuda Compile CUDA code with child kernels
3 views (last 30 days)
Show older comments
Dear,
I cannot manage to get a cuda mex code compiled and running if I need to run a Child Kernel from the Parent kernel (both cuda threads).
Let me outline the code:
#include "mex.h"
#include<stdlib.h>
#include<stdio.h>
#include <cuda_runtime.h>
#define GRIDx 32
#define BLOCKx 32
/*g_scal scales y = alpha*y */
__global__ void g_scal(double *y, double alpha, int n){
int tid = blockIdx.x * blockDim.x + threadIdx.x;
for(int i = tid; i < n; i+=blockDim.x*gridDim.x){
y[i] = alpha*y[i];
}
}
__global__ void mainKernel(double *x, double a, int n){
//this kernel call is always invoked as <<<1,1,0,streamId>>>, so threadIdx.x = 0
g_scal<<<GRIDx, BLOCKx>>>(x, a, n);
cudaDeviceSynchronize();
//yes for the full example, I need device synchronize mid kernel call. I am aware of the deprecated issue
}
extern "C" void launchKernel(double *x, double a, int n){
//allocate gpu memory, memcpy
//invoke, lets leave the streams alone for now
mainKernel<<<1,1>>>(g_x, a, n);
cudaDeviceSynchronize();
//free memory
}
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
//Get pointers and cast, call launchKernel and define outputs
}
OK, so the point is, everything manages to compile in my own windows terminal just fine (omitting the mexFunction, since I cannot link mex.h correctly). However, no matter what I seem to do, in the end I always get the
error LNK2019: unresolved external symbol
__fatbinwrap_e05b72bc_22_cuda_device_runtime_cu_66a51b0c_63316 referenced in function
__cudaRegisterLinkedBinary_e05b72bc_22_cuda_device_runtime_cu_66a51b0c_63316
cuADMMsolver.mexw64 : fatal error LNK1120: 1 unresolved externals
Error. Here is the mex line (in MATLAB):
mexcuda -v -dynamic -DCUDA_FORCE_CDP1_IF_SUPPORTED cuADMMsolver.cu '-LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\lib\x64' -lcudadevrt -lcudart_static -lcuda NVCC_FLAGS=-rdc=true NVCC_FLAGS=-arch=compute_50
Now, online I found that much of these problems related to not linking -lcudadevrt, which I am doing very explicitly. Does anyone know what is going wrong?
Thanks in advance
-Peter
0 Comments
Accepted Answer
Joss Knight
on 5 Mar 2025
Edited: Joss Knight
on 5 Mar 2025
I think the problem might be that you're trying to do the device binary linking yourself, whereas mexcuda is taking care of that for you. Plus you are trying to add your own toolkit library to the search path instead of letting MATLAB use its version. Remove all that stuff and just use the simple command
mexcuda -v -dynamic cuADMMsolver.cu
You can add back CUDA_FORCE_CDP1_IF_SUPPORTED and NVCC_FLAGS if you think you need them.
If this doesn't work, show me the verbose output printed by mexcuda. With a couple of tweaks I was able to get your code to compile.
More Answers (0)
See Also
Categories
Find more on MATLAB Compiler in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!