Inexplicable different execution speeds when filling a matrix with complex entries

4 views (last 30 days)
Hello everyone,
In my code, I got something disturbing regarding the execution speed. Here is the toy model where I isolated my issue:
n = 100000;
m = 10;
Me = ones(2,1);
M = zeros(m,n); % create a table M of length n that we partially fill with Me. The issue is the same for all m>1
count = 1;
% I want to fill the to first rows of each column of M with coef*Me:
for i = 1 : n
% Counter to see progress
if i == floor(count * n/100) + 1
disp(['Done ', num2str(count), '/100']);
count = count + 1;
end
% Filling with coef*M where coef can be either real or complex (arbitrary criterion n/2 just to highlight the issue)
if i < n/2
coef = 1;
else
coef = 1i;
end
M(1:2,i) = coef * Me; % Fill in
end
The issue is the following: when , we see that the execution speed is considerably decreased, due to the fact that becomes complex.
However, if we take the opposite criterion
if i > n/2
then, the speed is constant, without issue. So why, when filling with complex numbers, do we encounter such a phenomenon direction?
I observed that when we fill with a complex, all the entries of M take the complex format "_ + _i". Hence, if we start filling with complex coefficients, all the entries of M are rewritten into the new format, whatever if we keep filling with complex or reals. But if we start with reals, then the first complex filling will retroactively convert the format of all previous entries.
If is a scalar, there is no issue.
If is complex at the beginning, then, as we would expect, it is the converse: the criterion
if i < n/2
works fine and
if i > n/2
leads to the issue.
I also tried to have when odd i and when even i (or the converse, just to switch frequently) and then, no issue. Or I think that the issue is somehow "dissolved" through the processing and we don't clearly observe the slowdown.
The fact is that my code deals with a million entries table M. It works well with the second criterion, but with the first I am finished. It seems to be very basic so I feel a bit puzzled. Any idea of what is the issue?
  4 Comments
Bruno Luong
Bruno Luong on 12 Dec 2020
Edited: Bruno Luong on 12 Dec 2020
If the speed is important, then don't create a full temporary matrix. This is a waste of time. Create directly your sparse FEM matrix.
david gasperini
david gasperini on 12 Dec 2020
The fact is that I build with the full matrix all the entries of my elementary fem matrices (hence this matrix is almost full, but some coefs, maybe one half, will remain zero). Then I spread it to my sparse global matrix using sparse. It is the best way to write it, since otherwise I would have to reallocate n a loop each entry of the global sparse matrix, which is the worst thing to do regarding time

Sign in to comment.

Accepted Answer

Bruno Luong
Bruno Luong on 13 Dec 2020
Edited: Bruno Luong on 13 Dec 2020
I also dig deeper into this "issue" of slowness.
The strange issue I discover is that when filling with real values into a complex array, the runtime seems to depend on the density of complex numbers in the preallocation array, as showed as this code and plot below.
The code initialize M with zero(m,n) and distribute p complex numbers. The begining and end are forced to be complexes. So the array should remain complex in the entire for-loop regardless the direction of scan/populate.
I bench for various values of the density (~1/p in the plot) and provide the result on my 2-year old laptop, R2020b update 3.
[ptab,t]= fillM(10,100000);
loglog(ptab,t)
xlabel('p number of complexes when array is initialized');
ylabel('execution time [s]')
grid on
title('real filling')
%%
function [ptab,t] = fillM(m,n)
%
dummynan = nan*1i;
ptab = 2.^(0:floor(log2(m*n)));
ptab = union(ptab,m*n); % the last point, who array is complex
t = zeros(size(ptab));
for k=1:length(ptab) % loop on density
p = ptab(k);
M = zeros(m,n);
Me = ones(2,1);
count = 1;
% distribute p complex numbers in the allocate array
inan = round(linspace(1,m*n,p));
M(inan) = dummynan;
tic
% I want to fill the to first rows of each column of M with coef*Me:
for i = 1 : n
% Counter to see progress
if i == floor(count * n/100) + 1
disp(['Done ', num2str(count), '/100']);
count = count + 1;
end
% Filling with coef*M where coef is real
M(1:2,i) = 1*Me; % Fill in, change 1 to 1i to switch to complex filling
end
% replace the dummmy nan by 0
b = isnan(M(inan));
M(inan(b)) = 0;
t(k) = toc;
end % for
end
If we fill with complex
M(1:2,i) = 1i*Me; % instead of M(1:2,i) = 1*Me
the CPU time is fast and does not depend on the density of complex numbers distributed in the preallocation array.
It looks to me that when filling with REAL on COMPLEX array, the data deepcoying is (bi)multi-level. It proceed by block of around 100-1000 elements, then progressivey gather together to the entire array,
Not of my trick of filling NaN complex then undo later does not have significant speed penalty.
So there we go. It seems a candidate for workaround.
  2 Comments
Walter Roberson
Walter Roberson on 13 Dec 2020
Bruno, is it possible that the block size is 128 ? That would correspond to 1024 bytes of real-only, 2048 bytes for inline-complex

Sign in to comment.

More Answers (4)

Bruno Luong
Bruno Luong on 11 Dec 2020
Edited: Bruno Luong on 12 Dec 2020
You should preallocate M as complex
M = 1i+zeros(m,n);
if you intend to populate it with complex number later in the for-loop.
Everytime a real array becomes complex it will take time. The opposite is also true.
I think it's better keep the array complex from start to finish.
  14 Comments
Bruno Luong
Bruno Luong on 12 Dec 2020
So if I understand correctly the 0s in M are due to PDEs that are not directly coupled?

Sign in to comment.


James Tursa
James Tursa on 12 Dec 2020
Edited: James Tursa on 12 Dec 2020
When an array switches from real to complex in R2018a or later, a new memory allocation and deep data copy must occur to change the data from packed real-only to interleaved complex. So, extra memory gets allocated and all of the current values must get copied over as well. If you start the variable as complex to begin with, as Bruno suggests, this extra allocation and deep copy can be avoided. I would need to research what the fastest way to pre-allocate a complex variable is. Basically need to find a syntax that the MATLAB parser understands to avoid unnecessary copies etc.
I don't know the behind-the-scenes rules for when MATLAB checks to see if a variable should switch from one format to the other, or even the method/order it uses for the checking. Maybe by forcing a complex number to be in the first and last spots until the very end of your algorithm you might be able to short-circuit this checking midstream and avoid midstream changes. E.g.,
>> tic;C = 1i+zeros(m);toc
Elapsed time is 0.070991 seconds.
>> tic;D = zeros(m,m,'like',1i);toc
Elapsed time is 0.008061 seconds.
>> tic;clear M;M(m,m) = 1i;toc
Elapsed time is 0.304390 seconds.
C has complex 1i in every spot, but D and M do not. If we then do this:
>> tic;C(1,1)=0;toc % C remains complex
Elapsed time is 0.000113 seconds.
>> tic;D(1,1)=0;toc % D gets turned into packed real-only, so deep data copy
Elapsed time is 0.320612 seconds.
If we get rid of all of the complex values except the last one in C, watch what happens when we get rid of the last one:
>> tic;C(1:end-1)=0;toc % the last element is still complex
Elapsed time is 0.080873 seconds.
>> tic;C(end)=0;toc % the last element (and all of C) is now pure real
Elapsed time is 0.118038 seconds.
That last action on C(end) caused a deep data copy when it got converted from complex to pure real. There may or may not be an additional memory allocation when shrinking from complex to pure real ... again these rules are not published.
  10 Comments
david gasperini
david gasperini on 12 Dec 2020
I trust you
But then I don't understand why, with large number of columns and small number of rows, when I preallocate a complex row with a complex, I get correct speed, whereas when I preallocate a column, it slows down
Paul
Paul on 12 Dec 2020
James,
Referring to the code in the original question, doesn't the deep copy from real to complex occur only once, i.e., the first time that i > n/2? But when I run the code I see it slow down for all iterations where i > n/2. So it seems that this issue relates more to how elements of M are accessed and assigned to, rather than whether or not M is preallocated real or complex?
There's a lot to digest in this thread, I'm just trying to understand it all ....

Sign in to comment.


david gasperini
david gasperini on 12 Dec 2020
Edited: david gasperini on 12 Dec 2020
As far I understand, Bruno's and James's solution work, but would need to replace NaN entries by 0, or every small entries if the goal is to sparsify.
Walter's one seems to be a good trade off: my table is of large length and relatively small width so I create an additional last row with 1i, which seems to be enough to force the format along the whole matrix, then I populate as wanted, and sparsify everything but the last row
M = [zeros(m,n,'like',1i); 1i + zeros(n,1)];
%apply the fill in
% then sparsify M(1:end-1,:) according to the expected size
Here we hope it will be sufficient for the different widths of my matrices, since we see that it forces the complex format only for a certain block around the complex entry. If to large, then the same issue appears.
It seems that if we put a complex entry A at (a1,a2), it forces the format for a block like (a1-50 : a1+10, a2-50 : a2+10)
It is definitely weird
We would need something to force in the whole matrix, regardless to its size...
  2 Comments
Bruno Luong
Bruno Luong on 12 Dec 2020
Edited: Bruno Luong on 12 Dec 2020
"We would need something to force in the whole matrix, regardless to its size..."
I second that. Currently there is no-way to control the complexity and leave it persistent for a given array.
The automatic data deep copying sometime is anoying, command such as COMPLEX(A) is useless due to dynamic MATLAB decision to allocate internal data as real or complex. The matter of efficiency gets worse with recent complex data interleave.
James Tursa
James Tursa on 14 Dec 2020
Edited: James Tursa on 14 Dec 2020
Really, the only guaranteed way to keep a variable complex when assigning elements is to essentially bypass MATLAB processing and modify variables inplace via a mex routine. Not official, not recommended, has potential side effects, etc. etc. etc.
There are plenty of unused bits left in the flags int of the mxArray for a new "persistently complex" bit. This is where the "is 2D", "is complex", "is numeric", etc. bits are stored. But it would take an enhancement request to TMW to accomplish this of course.

Sign in to comment.


Walter Roberson
Walter Roberson on 13 Dec 2020
Test code included.
On the first line, set only_do_subset to false to run the larger suite of tests.
[1:2]no complex: real half min = 0.00167084, max = 0.0968766, mean = 0.00534692
[1:2]no complex: complex half min = 0.457809, max = 0.540868, mean = 0.486138
[1:2](1,:): real half min = 0.0214277, max = 0.539432, mean = 0.21429
[1:2](1,:): complex half min = 0.45741, max = 0.569145, mean = 0.489417
[1:2](:,1): real half min = 0.00175333, max = 0.12962, mean = 0.00614126
[1:2](:,1): complex half min = 0.00141391, max = 0.00303835, mean = 0.00224259
[1:2](end,:): real half min = 0.0014808, max = 0.090944, mean = 0.00523808
[1:2](end,:): complex half min = 0.00156304, max = 0.0023197, mean = 0.00202867
[1:2](:,end): real half min = 1.11345, max = 1.55058, mean = 1.17526
[1:2](:,end): complex half min = 0.458782, max = 0.709971, mean = 0.494252
[1](end,:): real half min = 0.00172286, max = 0.149381, mean = 0.00841403
[1](end,:): complex half min = 0.00109197, max = 0.00278375, mean = 0.00192156
[1,4](end,:): real half min = 0.0019306, max = 0.0885242, mean = 0.00633371
[1,4](end,:): complex half min = 0.00206219, max = 0.00495296, mean = 0.00286406
[1:4](end,:): real half min = 0.00207948, max = 0.0889896, mean = 0.00749856
[1:4](end,:): complex half min = 0.00206808, max = 0.0133852, mean = 0.00356371
[1:2](diagonal): real half min = 0.00146681, max = 0.0696898, mean = 0.00450054
[1:2](diagonal): complex half min = 0.00196183, max = 0.00275398, mean = 0.00212438
[1:2](end,1): real half min = 0.00156037, max = 0.0687157, mean = 0.00459935
[1:2](end,1): complex half min = 0.0014585, max = 0.00262812, mean = 0.00203858
The notation here is that the leading [] shows the row indices of each column that are written to. The part immediately after that talks which parts of M were pre-initialized to be complex-valued -- for example (end,:) meaning M(end,:)=1i was used for initialization. (diagonal) was linear (m+1:m+1:end) which runs down the +1 diagonal but wrapping as often as needed
The "real half" is the segments up to n/2 that have m1 written into the array locations. The "complex half" is the segments beyond that have m2 written into the array locations.
The "min", "max", and "mean" refer to timings in second for each of the groups of n/100. With n = 100000, each segment corresponds to 1000 columns.
Row index [1:2] corresponds to the original test M(1:2,i) = coef * Me where the first two rows of each column were written into. The subset also shows [1] and [1:4] to explore how much the number of rows written to affects the results, and shows [1,4] to compare to [1:2] to explore how much effect there is if the locations being written to are not consecutive rows.
Discussion
The worst case by far is [1:2](:,end) which is the case where each row has a complex value in it. Writing real values into slots that are 0 inside such an array is the slowest operation -- over 1 second per 1000 columns. Any discussion that claims that the writing speed should depend only on the global properties cannot account for this case.
Agggh... I just added a test that appears to account for major aspects of the time cases (though not necessarily the fine details)
After each potential assignment, MATLAB needs to test to see whether the new matrix is functionally complex-valued or not, in order to know whether to convert to real-only or complex-only. Imagine it does that by starting at the beginning of the matrix and scanning through the complex components in linear order until it encounters a non-zero complex value, or gets to the end. If it gets to the end and no values had non-zero complex component, it needs to convert the matrix to real-valued; otherwise if it was real-valued and a complex component is encountered, it needs to convert the array to complex-valued.
Now, if this is correct... then the portion of the array it has to travel through before it finds the non-zero complex value could have a major impact on the speed of the assignments.
It just so happens that the first half of the columns, David happened to write only real values. With M initialized to all 0, and with the case involving arrays both marked as real optimized, writing to the first half of the columns is fast. Then when you start writing the complex values in, each time the assignment hypothesizes that maybe the array is no longer complex valued, and starts scanning through it... and in this particular code, has to get about half way through the array before it finds a non-zero imaginary component and says "Okay, guess it is still complex-valued".
It is clear to me that there are heuristics that could be used to optimize the process. If MATLAB kept an approximate count of non-zero imaginary components, and decremented it once for each real-only value assigned in, but did not increment it for each imaginary component assigned in, and re-scanned the array each time the approximate count reached zero, then you could get away with a lot fewer checks.
My first thought had been to increase the approximate count for each complex value written in, but I realized that if you kept assigning to the same location, you could increase the count arbitrarily; if it was the only location that was complex valued then even just a single assignment of a real value could turn the entire array real-valued. But if you know you have 10 non-zero components imaginary components, and you have only written 5 real-only components so far, then no matter whether you wrote those to 5 different locations or to the same location, you would know logically that there must be at least 5 non-zero components out there somewhere. You might have more than 5 -- A(1)=0;A(1)=1i repeated 5 times would leave either 10 or 11 locations with non-zero components, so maybe there are more efficient ways that do not involve having to track each individual location, but this heuristic should be fairly cheap.
  1 Comment
Paul
Paul on 13 Dec 2020
Wouldn't it be quite surprising if Matlab actually scanned through all of the imaginary components either before or after the assignment to determine if the result needs to be converted from complex to double? I mean, wouldn't it be better to keep track of whether or not the imaginary part has any non-zero element as the matrix is created and then operated on?
On a related note, is there any way to make a complex matrix with zero imaginay part other than
M = complex(realpart,0)
M = complex(realpart)
In other words, if M already exists and is complex is there any way to force the imaginary part of M to zero but have M maintain its complex attribute w/o rewiting M?

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!