Faster interp1 and indexing on GPU

10 views (last 30 days)
mengya hu
mengya hu on 18 Nov 2019
Commented: Walter Roberson on 12 Mar 2020
Dear all,
This is my first time using Matlab on GPU.
I tried the benchmark code to test my GPU. For double precision, my GPU is around 50 times better than CPU.
I changed my input arrary into gpuArray. The performance is shown in the figures. test_bi_grlt_pat*.m calls Bi_GLRT_patch1_1.m and then calls Dnoisefun.m (Dnoisefun. and noisefun.m are similiar.)
I am doing image processing. Bi_GLRT_patch1_1.m is basically gradient descent on each pixel. Dnoisefun.m calculates the gradient on each pixel. noisefun.m calculates the value on each pixel.
For CPU:
For GPU:
As we can see, GPU is much slower than CPU. The reason is: we called Dnoisefun.m and noisefun.m a lot; 'interp1' should be faster on GPU but didn't seem so; the indexing operation 'result(result<0)' is super slow on GPU.
Any advice on how to improve this?
Furthermore, I wrote a simple code to test different dimension of array's performance on GPU and CPU, where Inten, DProb is the x, y for interpolation:
gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
disp(size(x));
xg= gpuArray(x);
tic
result1=interp1(Inten,DProb,x,'linear','extrap' );
time1 = toc;
disp(time1)
x1=x';
tic
result2=interp1(Inten,DProb,x1,'linear','extrap' );
time2 = toc;
disp(time2)
tic
result3=interp1(Inten,DProb,xg,'linear','extrap' );
time3 = toc;
disp(time3)
xg1=xg';
tic
result=interp1(Inten,DProb,xg1,'linear','extrap' );
time4 = toc;
disp(time4)
The performance is not very consistent for different trials. Here are some of the trials' results:
test_gpu
1 10000
8.0200e-04
2.8000e-04
3.2500e-04
1.2600e-04
>> clear
>> test_gpu
1 100000
9.7700e-04
8.8300e-04
0.0011
1.6100e-04
>> clear
>> test_gpu
1 1000000
0.0055
0.0048
5.1600e-04
9.3200e-04
>> clear
>> test_gpu
1 1000000
0.0051
0.0046
3.5500e-04
1.1500e-04
>> clear
>> test_gpu
1 1000000
0.0059
0.0043
3.7100e-04
1.1600e-04
>> clear
>> test_gpu
1 1000000
0.0058
0.0046
3.6500e-04
1.1900e-04
>> clear
>> test_gpu
1 1000000
0.0057
0.0047
6.5600e-04
0.0011
Similarly, the idexing performance is not consistent either:
clear
load('DDetectorProb.mat')
gridSize = 1000000;
x =linspace(min(Inten),max(Inten),gridSize);
xs=x;
ban = (min(Inten)+max(Inten))/2;
disp(size(x));
xg= gpuArray(x);
xgs = xg;
tic
xs(x>ban)=1;
time1 = toc;
disp(time1)
x1=x';
xs = x1;
tic
xs(x1>ban)=1;
time2 = toc;
disp(time2)
tic
xgs(xg>ban)=1;
time3 = toc;
disp(time3)
xg1=xg';
xg1s = xg1;
tic
xg1s(xg1>ban)=1;
time4 = toc;
disp(time4)
Results:
1 1000000
0.0031
0.0034
0.0010
0.0014
1 1000000
0.0032
0.0030
7.6000e-04
8.7700e-04
1 1000000
0.0032
0.0031
7.2500e-04
0.0021
1 1000000
0.0030
0.0031
7.7100e-04
0.0019
  5 Comments
mengya hu
mengya hu on 26 Nov 2019
Thanks. Yes. Should I copy or you copy the answers I get for other users who may see this post for help later?
Kyle Steiner
Kyle Steiner on 12 Mar 2020
I'd be interested to see your response from technical support - would you be able to post?
Thanks!

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 12 Mar 2020
Instead of indexing modify your lower boundary slightly and use min and max
result = min(0.8, max(realmin, result)) ;
The difference is that in your original code any value that was exactly 0 was left exactly 0 and negative were modified to realmin (which is positive), whereas in this revised code, values that are exactly 0 would modified to realmin as well.
  1 Comment
Walter Roberson
Walter Roberson on 12 Mar 2020
Which is to say: don't do your own indexing on GPUs if you can avoid it. The architecture of Nvidia gpu makes indexing inefficient.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!