MATLAB Answers

Jan
0

How do I optimize this code to run efficiently on the GPU?

Asked by Jan
on 27 Nov 2013
Latest activity Edited by Joss Knight
on 28 Nov 2013
Dear Matlab user,
I try making optimization of function listed below for GPU computing. I try many version of GPU algorithm but look for me that always is GPU slower. I really appreciate any suggestion or help.
%%Declaration of variables
K=4;
C11n = rand(K,508032);
[x1, x2] = size(C11n);
C22b = zeros(x1,x1*(x2/2),'double');
C2 = zeros(K,K,x2/2,'double');
E=eye(x1);
A=reshape(C11n,K,2,x2/2);
AT=permute(A,[2 1 3]);
%%CPU code
tic
for k=1:x2/2
C2(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
%%GPU code
% Declaration of variables
C22 = gpuArray(zeros(K,K,x2/2,'double'));
E=gpuArray(eye(x1));
A=gpuArray(reshape(C11n,K,2,x2/2));
tic
for k=1:x2/2
C22(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
with best regards
Jan

  1 Comment

What GPU do you have?

Sign in to comment.

2 Answers

Answer by Ashish Uthama on 27 Nov 2013

A quick 'air' code using pagefun:
tic
M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2));
M = pagefun(@mtimes, M, M);
C22 = repmat(E,[1 1 x2/2])-M;
toc
I would be curious to know if this works for you, and what times you get on your hardware.

  1 Comment

Thank you for idea, I will try and let you know...
I apologise, but first time I wrote bad code, I forgot for inversion of matrix, now is code corrected.
BTW, pagefun, help me, it is 10x times speed up (M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2)); ). Now I need figure out how do it quick inversion on every page of 3D matrix. I will inform you.

Sign in to comment.


Answer by Joss Knight
on 28 Nov 2013
Edited by Joss Knight
on 28 Nov 2013

Are your matrices always 4x2? This results in AT*A being 2x2, so you can just calculate your inverses manually:
function Ainv = batch2x2inv(A)
% Grab each matrix element as a vector
a = A(1,1,:);
b = A(1,2,:);
c = A(2,1,:);
d = A(2,2,:);
% Compute determinants
det = a.*d - b.*c;
% Construct inverse
Ainv = bsxfun(@rdivide, [d -b; -c a], det);
end
...and the relevant chunk of your code also uses pagefun as Ashish suggests:
AT = pagefun(@transpose, A);
ATA = pagefun(@mtimes, AT, A);
invATA = batch2x2inv(ATA);
pinvA = pagefun(@mtimes, invATA, AT);
residual = pagefun(@mtimes, A, pinvA);
C22 = bsxfun(@minus, E, residual);
Your code now runs 6x faster than the CPU on my machine.

  0 Comments

Sign in to comment.