- Reimplement with a loop instead of arrayfun. For 3x3 calls to EIG, however, I do not expect a speedup with gpuArray.
- Implement the algorithm for 3x3 eigenvalue decomposition manually to use inside your arrayfun call. You can probably find one on the internet. In this question someone claimed to have written it, you could try asking them for their code.
- Vectorize your algorithm rather than using arrayfun. The first part would assemble all the H matrices as a 3-by-3-by-numpoints array, the middle part would compute the eigen-decompositions of each page, the final part would extract your results. Problem is, I don't know how to do the middle part efficiently because pagefun doesn't support eig. Once again, implementing your own 3x3 version would help.
Is there any work around the error "Array concatenation is not supported"?
    6 views (last 30 days)
  
       Show older comments
    
I have a function which take two inputs (x,y) and return three outputs (out1,out2,out3). I want to evaluate this function for 2D grid of (x,y) points. To speed up, I want to use gpuarray(). My function evaluate eigenvalues of a 3 by 3 matrix at some point, so, it writes a 3-by-3 matrix. gpuarray() gives error that "Array concatenation is not supported.". Is there any way around it?
For simplicity, I have wrote a simple code and function here:
main code:
clear; clc;
N = 100;
%==== points on which I want to evaluate fun ======
a = 1;
xmin = -2*pi/(3*a); 
xmax = 4*pi/(3*a);
ymin = -2*pi/(sqrt(3)*a); 
ymax = 2*pi/(sqrt(3)*a);
xs = gpuArray.linspace(xmin,xmax,N+1); %array of x points
ys = gpuArray.linspace(ymin,ymax,N+1); %array of y points
xs(end)=[]; ys(end)=[];
[xGrid,yGrid] = meshgrid( xs, ys );
% ===== calling fun  =====
[A,B,C] = arrayfun( @fun,xGrid, yGrid );
output:
Array concatenation is not supported.
For more information see Tips.
 Error in 'fun' (line: 16)
function:
function [out1,out2,out3] = fun(kx,ky)
JNN = 0; 
JN = 4;
D = 1;
s = 1;
h11 = 0;
h12 = -(JN+1i*D)*s*(1+exp(1i*(-kx-sqrt(3)*ky)/2))...
    -JNN*s*(exp(-1i*kx)+exp(1i*(+kx-sqrt(3)*ky)/2));
h13 = -(JN-1i*D)*s*(1+exp(1i*(+kx-sqrt(3)*ky)/2))...
    -JNN*s*(exp(1i*kx)+exp(1i*(-kx-sqrt(3)*ky)/2));
h23 = -(JN+1i*D)*s*(1+exp(1i*kx))-2*JNN*s*exp(1i*kx/2)*cos(sqrt(3)/2*ky);
%it is where the error comes:
h = [h11 h12 h13;
    conj(h12) h11 h23;
    conj(h13) conj(h23) h11];
[evecs, evals] = eig(h);
v1 = evecs(:,1);
v2 = evecs(:,2);
v3 = evecs(:,3);
e1 = evals(1,1);
e2 = evals(2,2);
e3 = evals(3,3);
X = [                                                                                                                                                                            0, (exp(- (kx*1i)/2 - (3^(1/2)*ky*1i)/2)*(D*1i + JN)*1i)/2 + JNN*(exp(-kx*1i)*1i - (exp((kx*1i)/2 - (3^(1/2)*ky*1i)/2)*1i)/2), - JNN*(exp(kx*1i)*1i - (exp(- (kx*1i)/2 - (3^(1/2)*ky*1i)/2)*1i)/2) + (exp((kx*1i)/2 - (3^(1/2)*ky*1i)/2)*(D*1i - JN)*1i)/2
  conj(JNN)*((exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*1i)/2 - exp(conj(kx)*1i)*1i) + (exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(- conj(JN) + conj(D)*1i)*1i)/2,                                                                                                                          0,                                                     - exp(kx*1i)*(D*1i + JN)*1i - JNN*exp((kx*1i)/2)*cos((3^(1/2)*ky)/2)*1i
 - conj(JNN)*((exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*1i)/2 - exp(-conj(kx)*1i)*1i) + (exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(conj(JN) + conj(D)*1i)*1i)/2,            - exp(-conj(kx)*1i)*(- conj(JN) + conj(D)*1i)*1i + exp(-(conj(kx)*1i)/2)*conj(JNN)*cos((3^(1/2)*conj(ky))/2)*1i,                                                                                                                           0];
Y = [                                                                                                                                                                   0, (3^(1/2)*JNN*exp((kx*1i)/2 - (3^(1/2)*ky*1i)/2)*1i)/2 + (3^(1/2)*exp(- (kx*1i)/2 - (3^(1/2)*ky*1i)/2)*(D*1i + JN)*1i)/2, - (3^(1/2)*exp((kx*1i)/2 - (3^(1/2)*ky*1i)/2)*(D*1i - JN)*1i)/2 + (3^(1/2)*JNN*exp(- (kx*1i)/2 - (3^(1/2)*ky*1i)/2)*1i)/2
 (3^(1/2)*exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(- conj(JN) + conj(D)*1i)*1i)/2 - (3^(1/2)*exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*conj(JNN)*1i)/2,                                                                                                                       0,                                                                            3^(1/2)*JNN*exp((kx*1i)/2)*sin((3^(1/2)*ky)/2)
 - (3^(1/2)*exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(conj(JN) + conj(D)*1i)*1i)/2 - (3^(1/2)*exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*conj(JNN)*1i)/2,                                                       3^(1/2)*sin((3^(1/2)*conj(ky))/2)*exp(-(conj(kx)*1i)/2)*conj(JNN),                                                                                                                         0];
o1a = ( ((v2'*X*v1)*(v1'*Y*v2)) - ((v2'*Y*v1)*(v1'*X*v2)))/((e2-e1)^2);
o1b = ( ((v3'*X*v1)*(v1'*Y*v3)) - ((v3'*Y*v1)*(v1'*X*v3)))/((e3-e1)^2);
o1 = 1i*(o1a+o1b);
o2a = ( ((v1'*X*v2)*(v2'*Y*v1)) - ((v1'*Y*v2)*(v2'*X*v1)))/((e1-e2)^2);
o2b = ( ((v3'*X*v2)*(v2'*Y*v3)) - ((v3'*Y*v2)*(v2'*X*v3)))/((e3-e2)^2);
o2 = 1i*(o2a+o2b);
o3a = ( ((v1'*X*v3)*(v3'*Y*v1)) - ((v1'*Y*v3)*(v3'*X*v1)))/((e1-e3)^2);
o3b = ( ((v2'*X*v3)*(v3'*Y*v2)) - ((v2'*Y*v3)*(v3'*X*v2)))/((e2-e3)^2);
o3 = 1i*(o3a+o3b);
out1 = -real(o1); 
out2 = -real(o2);
out3 = -real(o3);
end
0 Comments
Answers (2)
  Joss Knight
    
 on 4 Nov 2020
        Yeah, you can't do anything with arrays in GPU arrayfun, only scalars. You have some options:
0 Comments
  Michael Croucher
      
 on 7 Nov 2020
        
      Edited: Michael Croucher
      
 on 7 Nov 2020
  
      As Joss said, what you want to do is outside the scope of what arrayfun can do but all is not lost if what you want is speed-up and you are happy with using the CPU.  For a start, there are some optimisations you can do with your function, fun.  Here, I have done a few for you:
function [A,B,C] = fun(kx,ky)
JNN = 0; 
JN = 4;
D = 1;
s = 1;
h11 = 0;
h12 = -(JN+1i*D)*s*(1+exp(1i*(-kx-sqrt(3)*ky)/2))...
    -JNN*s*(exp(-1i*kx)+exp(1i*(+kx-sqrt(3)*ky)/2));
h13 = -(JN-1i*D)*s*(1+exp(1i*(+kx-sqrt(3)*ky)/2))...
    -JNN*s*(exp(1i*kx)+exp(1i*(-kx-sqrt(3)*ky)/2));
h23 = -(JN+1i*D)*s*(1+exp(1i*kx))-2*JNN*s*exp(1i*kx/2)*cos(sqrt(3)/2*ky);
%it is where the error comes:
h = [h11      h12       h13;
    conj(h12) h11       h23;
    conj(h13) conj(h23) h11];
[evecs, evals] = eig(h);
v1 = evecs(:,1);
v2 = evecs(:,2);
v3 = evecs(:,3);
e1 = evals(1,1);
e2 = evals(2,2);
e3 = evals(3,3);
x11 = 0;
x21 =   conj(JNN)*((exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*1i)/2 - exp(conj(kx)*1i)*1i) + (exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(- conj(JN) + conj(D)*1i)*1i)/2;
x31 = - conj(JNN)*((exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*1i)/2 - exp(-conj(kx)*1i)*1i) + (exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(conj(JN) + conj(D)*1i)*1i)/2;
x32 = - exp(-conj(kx)*1i)*(- conj(JN) + conj(D)*1i)*1i + exp(-(conj(kx)*1i)/2)*conj(JNN)*cos((3^(1/2)*conj(ky))/2)*1i;
X = [x11,conj(x21),conj(x31);
     x21,x11,     conj(x32);
     x31,x32,     x11];
y11 = 0;
y21 = (3^(1/2)*exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(- conj(JN) + conj(D)*1i)*1i)/2 - (3^(1/2)*exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*conj(JNN)*1i)/2;
y31 = - (3^(1/2)*exp(- (conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*(conj(JN) + conj(D)*1i)*1i)/2 - (3^(1/2)*exp((conj(kx)*1i)/2 + (3^(1/2)*conj(ky)*1i)/2)*conj(JNN)*1i)/2;
y32 = 3^(1/2)*sin((3^(1/2)*conj(ky))/2)*exp(-(conj(kx)*1i)/2)*conj(JNN);
Y = [y11,conj(y21),conj(y31);
     y21,y11,     conj(y32);
     y31,y32,     y11];
Xv1 = X*v1;
Xv2 = X*v2;
Xv3 = X*v3;
Yv1 = Y*v1;
Yv2 = Y*v2;
Yv3 = Y*v3;
o1a = ( ((v2'*Xv1)*(v1'*Yv2)) - ((v2'*Yv1)*(v1'*Xv2)))/((e2-e1)^2);
o1b = ( ((v3'*Xv1)*(v1'*Yv3)) - ((v3'*Yv1)*(v1'*Xv3)))/((e3-e1)^2);
o1 = 1i*(o1a+o1b);
o2a = ( ((v1'*Xv2)*(v2'*Yv1)) - ((v1'*Yv2)*(v2'*Xv1)))/((e1-e2)^2);
o2b = ( ((v3'*Xv2)*(v2'*Yv3)) - ((v3'*Yv2)*(v2'*Xv3)))/((e3-e2)^2);
o2 = 1i*(o2a+o2b);
o3a = ( ((v1'*Xv3)*(v3'*Yv1)) - ((v1'*Yv3)*(v3'*Xv1)))/((e1-e3)^2);
o3b = ( ((v2'*Xv3)*(v3'*Yv2)) - ((v2'*Yv3)*(v3'*Xv2)))/((e2-e3)^2);
o3 = 1i*(o3a+o3b);
A = -real(o1); 
B = -real(o2);
C = -real(o3);
end
I think there are more there too but this shows the idea. I changed arrayfun to a 1d for-loop:
clear; clc;
N = 100;
%==== points on which I want to evaluate fun ======
a = 1;
xmin = -2*pi/(3*a); 
xmax = 4*pi/(3*a);
ymin = -2*pi/(sqrt(3)*a); 
ymax = 2*pi/(sqrt(3)*a);
xs = linspace(xmin,xmax,N+1); %array of x points
ys = linspace(ymin,ymax,N+1); %array of y points
xs(end)=[]; ys(end)=[];
[xGrid,yGrid] = meshgrid( xs, ys );
% ===== calling fun  =====
disp(numel(xGrid));
A = zeros(size(xGrid));
B = zeros(size(xGrid));
C = zeros(size(xGrid));
tic
for i=1:numel(xGrid);
    [A(i) B(i) C(i)] = fun(xGrid(i),yGrid(i));
end
new=toc
Using my version of fun for N=100 takes 0.4165 seconds on my laptop whereas your version of fun took 0.6817 seconds.  
The next thing we can do is change that for-loop to a parfor-loop:
clear; clc;
N = 100;
%==== points on which I want to evaluate fun ======
a = 1;
xmin = -2*pi/(3*a); 
xmax = 4*pi/(3*a);
ymin = -2*pi/(sqrt(3)*a); 
ymax = 2*pi/(sqrt(3)*a);
xs = linspace(xmin,xmax,N+1); %array of x points
ys = linspace(ymin,ymax,N+1); %array of y points
xs(end)=[]; ys(end)=[];
[xGrid,yGrid] = meshgrid( xs, ys );
% ===== calling fun  =====
disp(numel(xGrid));
A = zeros(size(xGrid));
B = zeros(size(xGrid));
C = zeros(size(xGrid));
tic
parfor i=1:numel(xGrid);
    [A(i) B(i) C(i)] = fun(xGrid(i),yGrid(i));
end
new=toc
On my 4-core laptop I set up a 4 core pool 
parpool(4)
and now when I run the script, the runtime is down to 0.2647 seconds which gives a total speed-up of 2.5x which isn't quite as nice as I'd like.  However, I assume that you want a larger value of N for your real runs. Setting
N = 1000
will increase your problem size by a factor of 100.  Now, I get the following times on my laptop
- Your original fun but with the 1d for-loop instead of array fun: 68.147 seconds
- My fun and the 1d parfor-loop: 14.32 seconds
Which gives 4.76x speed-up.  I suspect that this will scale quite nicely with higher core counts.
0 Comments
See Also
Categories
				Find more on Parallel Computing Fundamentals in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

