MATLAB Answers

0

Faster interp1 and indexing on GPU

Asked by mengya hu on 18 Nov 2019 at 2:03
Latest activity Commented on by mengya hu on 26 Nov 2019 at 0:08
Dear all,
This is my first time using Matlab on GPU.
I tried the benchmark code to test my GPU. For double precision, my GPU is around 50 times better than CPU.
I changed my input arrary into gpuArray. The performance is shown in the figures. test_bi_grlt_pat*.m calls Bi_GLRT_patch1_1.m and then calls Dnoisefun.m (Dnoisefun. and noisefun.m are similiar.)
I am doing image processing. Bi_GLRT_patch1_1.m is basically gradient descent on each pixel. Dnoisefun.m calculates the gradient on each pixel. noisefun.m calculates the value on each pixel.
For CPU:
For GPU:
As we can see, GPU is much slower than CPU. The reason is: we called Dnoisefun.m and noisefun.m a lot; 'interp1' should be faster on GPU but didn't seem so; the indexing operation 'result(result<0)' is super slow on GPU.
Any advice on how to improve this?
Furthermore, I wrote a simple code to test different dimension of array's performance on GPU and CPU, where Inten, DProb is the x, y for interpolation:
gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
disp(size(x));
xg= gpuArray(x);
tic
result1=interp1(Inten,DProb,x,'linear','extrap' );
time1 = toc;
disp(time1)
x1=x';
tic
result2=interp1(Inten,DProb,x1,'linear','extrap' );
time2 = toc;
disp(time2)
tic
result3=interp1(Inten,DProb,xg,'linear','extrap' );
time3 = toc;
disp(time3)
xg1=xg';
tic
result=interp1(Inten,DProb,xg1,'linear','extrap' );
time4 = toc;
disp(time4)
The performance is not very consistent for different trials. Here are some of the trials' results:
test_gpu
1 10000
8.0200e-04
2.8000e-04
3.2500e-04
1.2600e-04
>> clear
>> test_gpu
1 100000
9.7700e-04
8.8300e-04
0.0011
1.6100e-04
>> clear
>> test_gpu
1 1000000
0.0055
0.0048
5.1600e-04
9.3200e-04
>> clear
>> test_gpu
1 1000000
0.0051
0.0046
3.5500e-04
1.1500e-04
>> clear
>> test_gpu
1 1000000
0.0059
0.0043
3.7100e-04
1.1600e-04
>> clear
>> test_gpu
1 1000000
0.0058
0.0046
3.6500e-04
1.1900e-04
>> clear
>> test_gpu
1 1000000
0.0057
0.0047
6.5600e-04
0.0011
Similarly, the idexing performance is not consistent either:
clear
load('DDetectorProb.mat')
gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
xs=x;
ban = (min(Inten)+max(Inten))/2;
disp(size(x));
xg= gpuArray(x);
xgs = xg;
tic
xs(x>ban)=1;
time1 = toc;
disp(time1)
x1=x';
xs = x1;
tic
xs(x1>ban)=1;
time2 = toc;
disp(time2)
tic
xgs(xg>ban)=1;
time3 = toc;
disp(time3)
xg1=xg';
xg1s = xg1;
tic
xg1s(xg1>ban)=1;
time4 = toc;
disp(time4)
Results:
1 1000000
0.0031
0.0034
0.0010
0.0014
1 1000000
0.0032
0.0030
7.6000e-04
8.7700e-04
1 1000000
0.0032
0.0031
7.2500e-04
0.0021
1 1000000
0.0030
0.0031
7.7100e-04
0.0019

  4 Comments

Show 1 older comment
mengya hu on 21 Nov 2019 at 16:59
Thanks. Just attached it.
Joss Knight
on 25 Nov 2019 at 10:27
Hopefully your response through technical support was sufficient?
mengya hu on 26 Nov 2019 at 0:08
Thanks. Yes. Should I copy or you copy the answers I get for other users who may see this post for help later?

Sign in to comment.

Tags

0 Answers