How do I optimize this code to run efficiently on the GPU?

1 view (last 30 days)
Dear Matlab user,
I try making optimization of function listed below for GPU computing. I try many version of GPU algorithm but look for me that always is GPU slower. I really appreciate any suggestion or help.
%%Declaration of variables
K=4;
C11n = rand(K,508032);
[x1, x2] = size(C11n);
C22b = zeros(x1,x1*(x2/2),'double');
C2 = zeros(K,K,x2/2,'double');
E=eye(x1);
A=reshape(C11n,K,2,x2/2);
AT=permute(A,[2 1 3]);
%%CPU code
tic
for k=1:x2/2
C2(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
%%GPU code
% Declaration of variables
C22 = gpuArray(zeros(K,K,x2/2,'double'));
E=gpuArray(eye(x1));
A=gpuArray(reshape(C11n,K,2,x2/2));
tic
for k=1:x2/2
C22(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
with best regards
Jan

Answers (2)

Ashish Uthama
Ashish Uthama on 27 Nov 2013
A quick 'air' code using pagefun:
tic
M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2));
M = pagefun(@mtimes, M, M);
C22 = repmat(E,[1 1 x2/2])-M;
toc
I would be curious to know if this works for you, and what times you get on your hardware.
  1 Comment
Jan
Jan on 27 Nov 2013
Edited: Jan on 27 Nov 2013
Thank you for idea, I will try and let you know...
I apologise, but first time I wrote bad code, I forgot for inversion of matrix, now is code corrected.
BTW, pagefun, help me, it is 10x times speed up (M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2)); ). Now I need figure out how do it quick inversion on every page of 3D matrix. I will inform you.

Sign in to comment.


Joss Knight
Joss Knight on 28 Nov 2013
Edited: Joss Knight on 28 Nov 2013
Are your matrices always 4x2? This results in AT*A being 2x2, so you can just calculate your inverses manually:
function Ainv = batch2x2inv(A)
% Grab each matrix element as a vector
a = A(1,1,:);
b = A(1,2,:);
c = A(2,1,:);
d = A(2,2,:);
% Compute determinants
det = a.*d - b.*c;
% Construct inverse
Ainv = bsxfun(@rdivide, [d -b; -c a], det);
end
...and the relevant chunk of your code also uses pagefun as Ashish suggests:
AT = pagefun(@transpose, A);
ATA = pagefun(@mtimes, AT, A);
invATA = batch2x2inv(ATA);
pinvA = pagefun(@mtimes, invATA, AT);
residual = pagefun(@mtimes, A, pinvA);
C22 = bsxfun(@minus, E, residual);
Your code now runs 6x faster than the CPU on my machine.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!