Inexplicable different execution speeds when filling a matrix with complex entries

Question

2 votes

Hello everyone,

In my code, I got something disturbing regarding the execution speed. Here is the toy model where I isolated my issue:

n = 100000;
m = 10;
Me = ones(2,1); 
M = zeros(m,n);  % create a table M of length n that we partially fill with Me. The issue is the same for all m>1
count = 1; 
% I want to fill the to first rows of each column of M with coef*Me:
for i = 1 : n
    % Counter to see progress
    if i == floor(count * n/100) + 1   
        disp(['Done ', num2str(count), '/100']);
        count = count + 1;
    end
    % Filling with coef*M where coef can be either real or complex (arbitrary criterion n/2 just to highlight the issue)
    if i < n/2
        coef = 1;
    else
        coef = 1i;
    end
    M(1:2,i) = coef * Me;  % Fill in
end

The issue is the following: when

, we see that the execution speed is considerably decreased, due to the fact that

becomes complex.

However, if we take the opposite criterion

if i > n/2

then, the speed is constant, without issue. So why, when filling with complex numbers, do we encounter such a phenomenon direction?

I observed that when we fill with a complex, all the entries of M take the complex format "_ + _i". Hence, if we start filling with complex coefficients, all the entries of M are rewritten into the new format, whatever if we keep filling with complex or reals. But if we start with reals, then the first complex filling will retroactively convert the format of all previous entries.

If

is a scalar, there is no issue.

If

is complex at the beginning, then, as we would expect, it is the converse: the criterion

if i < n/2

works fine and

if i > n/2

leads to the issue.

I also tried to have

when odd i and

when even i (or the converse, just to switch frequently) and then, no issue. Or I think that the issue is somehow "dissolved" through the processing and we don't clearly observe the slowdown.

The fact is that my code deals with a million entries table M. It works well with the second criterion, but with the first I am finished. It seems to be very basic so I feel a bit puzzled. Any idea of what is the issue?

4 Comments
Show 2 older comments Hide 2 older comments

Walter Roberson on 12 Dec 2020

Open in MATLAB Online

Interesting!!

Further testing shows that the speed depends upon the proportion of individual column that are being written to has a complex value with non-zero term -- even when the rows being written into are not the ones that have the complex values!

If you test the below code, and put in r = 1 instead of the random value, you can get times for the first half that exceed 1 second per iteration, but then switch to about 0.4 seconds once the complex values start getting written.

As you put in smaller r (so filling the bottom of more and more columns) the code gets faster per iteration -- even when not dealing with the columns that have had a complex value written to them. So for example for r = .9 then it will take about .9 seconds per block instead of over 1 second per iteration.

If you put in r about 1/1000 the code zips along.

n = 100000;
m = 10;
Me = ones(2,1);
Me1 = Me;
Me2 = 1i*Me;
M = zeros(m,n);  % create a table M of length n that we partially fill with Me. The issue is the same for all m>1
r = rand / 5;
fprintf('last row UNfilled fraction: %g\n', r);
M(end,floor(r*end):end) = 1i;
count = 1; 
start_time = tic;
last_loop_start_time = start_time;
% I want to fill the to first rows of each column of M with coef*Me:
for i = 1 : n
    % Counter to see progress
    if i == floor(count * n/100) + 1
        since_beginning = toc(start_time);
        since_last = toc(last_loop_start_time);
        last_loop_start_time = tic;
        fprintf('Done %d/100, this section %g seconds, total since beginning %g seconds\n', count, since_last, since_beginning);
        count = count + 1;
    end
    % Filling with coef*M where coef can be either real or complex (arbitrary criterion n/2 just to highlight the issue)
    if i < n/2
        coef = Me1;
    else
        coef = Me2;
    end
    M(1:2,i) = coef;  % Fill in
end

Bruno Luong on 12 Dec 2020

Edited: Bruno Luong on 12 Dec 2020

If the speed is important, then don't create a full temporary matrix. This is a waste of time. Create directly your sparse FEM matrix.

david gasperini on 12 Dec 2020

The fact is that I build with the full matrix all the entries of my elementary fem matrices (hence this matrix is almost full, but some coefs, maybe one half, will remain zero). Then I spread it to my sparse global matrix using sparse. It is the best way to write it, since otherwise I would have to reallocate n a loop each entry of the global sparse matrix, which is the worst thing to do regarding time

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Bruno Luong on 13 Dec 2020

Edited: Bruno Luong on 13 Dec 2020

Open in MATLAB Online

0 votes

I also dig deeper into this "issue" of slowness.

The strange issue I discover is that when filling with real values into a complex array, the runtime seems to depend on the density of complex numbers in the preallocation array, as showed as this code and plot below.

The code initialize M with zero(m,n) and distribute p complex numbers. The begining and end are forced to be complexes. So the array should remain complex in the entire for-loop regardless the direction of scan/populate.

I bench for various values of the density (~1/p in the plot) and provide the result on my 2-year old laptop, R2020b update 3.

[ptab,t]= fillM(10,100000);
loglog(ptab,t)
xlabel('p number of complexes when array is initialized');
ylabel('execution time [s]')
grid on
title('real filling')
%%
function [ptab,t] = fillM(m,n)
%
dummynan = nan*1i;
ptab = 2.^(0:floor(log2(m*n)));
ptab = union(ptab,m*n); % the last point, who array is complex
t = zeros(size(ptab));
for k=1:length(ptab) % loop on density
    p = ptab(k);
    M = zeros(m,n);
    Me = ones(2,1);
    count = 1;
    
    % distribute p complex numbers in the allocate array
    inan = round(linspace(1,m*n,p));
    M(inan) = dummynan;
    
    tic
    % I want to fill the to first rows of each column of M with coef*Me:
    for i = 1 : n
        % Counter to see progress
        if i == floor(count * n/100) + 1
            disp(['Done ', num2str(count), '/100']);
            count = count + 1;
        end
        % Filling with coef*M where coef is real
        M(1:2,i) = 1*Me;  % Fill in, change 1 to 1i to switch to complex filling
    end
    % replace the dummmy nan by 0
    b = isnan(M(inan));
    M(inan(b)) = 0;
    t(k) = toc;
end % for
end

If we fill with complex

M(1:2,i) = 1i*Me; % instead of M(1:2,i) = 1*Me

the CPU time is fast and does not depend on the density of complex numbers distributed in the preallocation array.

It looks to me that when filling with REAL on COMPLEX array, the data deepcoying is (bi)multi-level. It proceed by block of around 100-1000 elements, then progressivey gather together to the entire array,

Not of my trick of filling NaN complex then undo later does not have significant speed penalty.

So there we go. It seems a candidate for workaround.

2 Comments
Show None Hide None

Walter Roberson on 13 Dec 2020

Bruno, is it possible that the block size is 128 ? That would correspond to 1024 bytes of real-only, 2048 bytes for inline-complex

Bruno Luong on 13 Dec 2020

Possibly Walter. 100 is really a gross estimation, 128 makes more sense.

Sign in to comment.

Answer 2

Bruno Luong on 11 Dec 2020

Edited: Bruno Luong on 12 Dec 2020

Open in MATLAB Online

0 votes

You should preallocate M as complex

M = 1i+zeros(m,n);

if you intend to populate it with complex number later in the for-loop.

Everytime a real array becomes complex it will take time. The opposite is also true.

I think it's better keep the array complex from start to finish.

14 Comments
Show 12 older comments Hide 12 older comments

Bruno Luong on 12 Dec 2020

Edited: Bruno Luong on 12 Dec 2020

Open in MATLAB Online

Walter:

"

M = complex(zeros(m,n), 0);

would be enough."

No, as soon as a real number is set to single element M becomes real. Useless.

To David : you can populate M with 1i*NaN then later replace them with 0s.

EDIT: I see James already propose this solution.

Another suggestion is populate the array with 1i*realmin and leave it as it is. The appreciation whereas 1i*realmin can be considered as practically 0s is leave to you to decide because you know how they will be used downstream.

Another option is to allocate M wih one more column with some dummy complex value, but never use it.

There is no really a good solution.

>> M=1i*realmin+zeros(5)
M =
  1.0e-307 *
   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i
   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i
   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i
   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i
   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i   0.0000 + 0.2225i
>> M(1)=1
M =
   1.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i
   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i
   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i
   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i
   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i   0.0000 + 0.0000i
>> 

david gasperini on 12 Dec 2020

Edited: david gasperini on 12 Dec 2020

Open in MATLAB Online

Ah yes exactly, i and j gives the position where to spread (so local to global for the nodes) and s my table M of all elementary matrices

, right?

If I well understood, it is already how I proceed:

M = zeros(9,msh.nb_TRI); % Initialize tables of elementary matrices
for i = 1 : msh.nb_TRI
    Me = getMassElem(i); % Get the elementary for element i
    M(1 : 9,i) = Me;    % Load it in the table
end
I_glob = msh.TRIANGLES(:,[1 2 3 1 2 3 1 2 3]).';  % Connectivity list of triangles, rows
J_glob = msh.TRIANGLES(:,[1 1 1 2 2 2 3 3 3]).';  % Connectivity list of triangles, columns
M_glob = sparse(I_glob,J_glob,M,msh.nbNod,msh.nbNod); % Build global

Is it what you mean? I precise that I cannot vectorize everything since I have to consider some complex coefs

that have to be computed for each element, in a way that I cannot avoid the loop. And the

's, which are sometimes complex, will be multiply, for i, to the corresponding

.

The fact is that in reality, I have for example up to N "other" sparse mass matrices

where the elementary ones are differently coefficiented (due to a coupling in my PDE). Then instead of creating N tables M, I create a big one of size (9N,n) and use sparsify on a part of M, like for instance for the 2nd table which gives the second mass

:

M_glob2 = sparse(I_glob,J_glob,M(1+9*2 : 9*3,:))

N could be 3 or either 20, so in this last case it is too large and Walter's technique does not work

Bruno Luong on 12 Dec 2020

So if I understand correctly the 0s in M are due to PDEs that are not directly coupled?

david gasperini on 12 Dec 2020

Yes, exactly

Sign in to comment.

Answer 3

James Tursa on 12 Dec 2020

Edited: James Tursa on 12 Dec 2020

Open in MATLAB Online

0 votes

When an array switches from real to complex in R2018a or later, a new memory allocation and deep data copy must occur to change the data from packed real-only to interleaved complex. So, extra memory gets allocated and all of the current values must get copied over as well. If you start the variable as complex to begin with, as Bruno suggests, this extra allocation and deep copy can be avoided. I would need to research what the fastest way to pre-allocate a complex variable is. Basically need to find a syntax that the MATLAB parser understands to avoid unnecessary copies etc.

I don't know the behind-the-scenes rules for when MATLAB checks to see if a variable should switch from one format to the other, or even the method/order it uses for the checking. Maybe by forcing a complex number to be in the first and last spots until the very end of your algorithm you might be able to short-circuit this checking midstream and avoid midstream changes. E.g.,

>> tic;C = 1i+zeros(m);toc
Elapsed time is 0.070991 seconds.
>> tic;D = zeros(m,m,'like',1i);toc
Elapsed time is 0.008061 seconds.
>> tic;clear M;M(m,m) = 1i;toc
Elapsed time is 0.304390 seconds.

C has complex 1i in every spot, but D and M do not. If we then do this:

>> tic;C(1,1)=0;toc % C remains complex
Elapsed time is 0.000113 seconds.
>> tic;D(1,1)=0;toc % D gets turned into packed real-only, so deep data copy
Elapsed time is 0.320612 seconds.

If we get rid of all of the complex values except the last one in C, watch what happens when we get rid of the last one:

>> tic;C(1:end-1)=0;toc % the last element is still complex
Elapsed time is 0.080873 seconds.
>> tic;C(end)=0;toc % the last element (and all of C) is now pure real
Elapsed time is 0.118038 seconds.

That last action on C(end) caused a deep data copy when it got converted from complex to pure real. There may or may not be an additional memory allocation when shrinking from complex to pure real ... again these rules are not published.

10 Comments
Show 8 older comments Hide 8 older comments

david gasperini on 12 Dec 2020

I trust you

But then I don't understand why, with large number of columns and small number of rows, when I preallocate a complex row with a complex, I get correct speed, whereas when I preallocate a column, it slows down

Paul on 12 Dec 2020

James,

Referring to the code in the original question, doesn't the deep copy from real to complex occur only once, i.e., the first time that i > n/2? But when I run the code I see it slow down for all iterations where i > n/2. So it seems that this issue relates more to how elements of M are accessed and assigned to, rather than whether or not M is preallocated real or complex?

There's a lot to digest in this thread, I'm just trying to understand it all ....

Sign in to comment.

Answer 4

david gasperini on 12 Dec 2020

Edited: david gasperini on 12 Dec 2020

Open in MATLAB Online

0 votes

As far I understand, Bruno's and James's solution work, but would need to replace NaN entries by 0, or every small entries if the goal is to sparsify.

Walter's one seems to be a good trade off: my table is of large length and relatively small width so I create an additional last row with 1i, which seems to be enough to force the format along the whole matrix, then I populate as wanted, and sparsify everything but the last row

M = [zeros(m,n,'like',1i); 1i + zeros(n,1)];
%apply the fill in
% then sparsify M(1:end-1,:) according to the expected size

Here we hope it will be sufficient for the different widths of my matrices, since we see that it forces the complex format only for a certain block around the complex entry. If to large, then the same issue appears.

It seems that if we put a complex entry A at (a1,a2), it forces the format for a block like (a1-50 : a1+10, a2-50 : a2+10)

It is definitely weird

We would need something to force in the whole matrix, regardless to its size...

2 Comments
Show None Hide None

Bruno Luong on 12 Dec 2020

Edited: Bruno Luong on 12 Dec 2020

"We would need something to force in the whole matrix, regardless to its size..."

I second that. Currently there is no-way to control the complexity and leave it persistent for a given array.

The automatic data deep copying sometime is anoying, command such as COMPLEX(A) is useless due to dynamic MATLAB decision to allocate internal data as real or complex. The matter of efficiency gets worse with recent complex data interleave.

James Tursa on 14 Dec 2020

Edited: James Tursa on 14 Dec 2020

Really, the only guaranteed way to keep a variable complex when assigning elements is to essentially bypass MATLAB processing and modify variables inplace via a mex routine. Not official, not recommended, has potential side effects, etc. etc. etc.

There are plenty of unused bits left in the flags int of the mxArray for a new "persistently complex" bit. This is where the "is 2D", "is complex", "is numeric", etc. bits are stored. But it would take an enhancement request to TMW to accomplish this of course.

Sign in to comment.

Answer 5

Walter Roberson on 13 Dec 2020

0 votes

test_complex.m

Test code included.

On the first line, set only_do_subset to false to run the larger suite of tests.

[1:2]no complex: real half min = 0.00167084, max = 0.0968766, mean = 0.00534692

[1:2]no complex: complex half min = 0.457809, max = 0.540868, mean = 0.486138

[1:2](1,:): real half min = 0.0214277, max = 0.539432, mean = 0.21429

[1:2](1,:): complex half min = 0.45741, max = 0.569145, mean = 0.489417

[1:2](:,1): real half min = 0.00175333, max = 0.12962, mean = 0.00614126

[1:2](:,1): complex half min = 0.00141391, max = 0.00303835, mean = 0.00224259

[1:2](end,:): real half min = 0.0014808, max = 0.090944, mean = 0.00523808

[1:2](end,:): complex half min = 0.00156304, max = 0.0023197, mean = 0.00202867

[1:2](:,end): real half min = 1.11345, max = 1.55058, mean = 1.17526

[1:2](:,end): complex half min = 0.458782, max = 0.709971, mean = 0.494252

[1](end,:): real half min = 0.00172286, max = 0.149381, mean = 0.00841403

[1](end,:): complex half min = 0.00109197, max = 0.00278375, mean = 0.00192156

[1,4](end,:): real half min = 0.0019306, max = 0.0885242, mean = 0.00633371

[1,4](end,:): complex half min = 0.00206219, max = 0.00495296, mean = 0.00286406

[1:4](end,:): real half min = 0.00207948, max = 0.0889896, mean = 0.00749856

[1:4](end,:): complex half min = 0.00206808, max = 0.0133852, mean = 0.00356371

[1:2](diagonal): real half min = 0.00146681, max = 0.0696898, mean = 0.00450054

[1:2](diagonal): complex half min = 0.00196183, max = 0.00275398, mean = 0.00212438

[1:2](end,1): real half min = 0.00156037, max = 0.0687157, mean = 0.00459935

[1:2](end,1): complex half min = 0.0014585, max = 0.00262812, mean = 0.00203858

The notation here is that the leading [] shows the row indices of each column that are written to. The part immediately after that talks which parts of M were pre-initialized to be complex-valued -- for example (end,:) meaning M(end,:)=1i was used for initialization. (diagonal) was linear (m+1:m+1:end) which runs down the +1 diagonal but wrapping as often as needed

The "real half" is the segments up to n/2 that have m1 written into the array locations. The "complex half" is the segments beyond that have m2 written into the array locations.

The "min", "max", and "mean" refer to timings in second for each of the groups of n/100. With n = 100000, each segment corresponds to 1000 columns.

Row index [1:2] corresponds to the original test M(1:2,i) = coef * Me where the first two rows of each column were written into. The subset also shows [1] and [1:4] to explore how much the number of rows written to affects the results, and shows [1,4] to compare to [1:2] to explore how much effect there is if the locations being written to are not consecutive rows.

Discussion

The worst case by far is [1:2](:,end) which is the case where each row has a complex value in it. Writing real values into slots that are 0 inside such an array is the slowest operation -- over 1 second per 1000 columns. Any discussion that claims that the writing speed should depend only on the global properties cannot account for this case.

Agggh... I just added a test that appears to account for major aspects of the time cases (though not necessarily the fine details)

After each potential assignment, MATLAB needs to test to see whether the new matrix is functionally complex-valued or not, in order to know whether to convert to real-only or complex-only. Imagine it does that by starting at the beginning of the matrix and scanning through the complex components in linear order until it encounters a non-zero complex value, or gets to the end. If it gets to the end and no values had non-zero complex component, it needs to convert the matrix to real-valued; otherwise if it was real-valued and a complex component is encountered, it needs to convert the array to complex-valued.

Now, if this is correct... then the portion of the array it has to travel through before it finds the non-zero complex value could have a major impact on the speed of the assignments.

It just so happens that the first half of the columns, David happened to write only real values. With M initialized to all 0, and with the case involving arrays both marked as real optimized, writing to the first half of the columns is fast. Then when you start writing the complex values in, each time the assignment hypothesizes that maybe the array is no longer complex valued, and starts scanning through it... and in this particular code, has to get about half way through the array before it finds a non-zero imaginary component and says "Okay, guess it is still complex-valued".

It is clear to me that there are heuristics that could be used to optimize the process. If MATLAB kept an approximate count of non-zero imaginary components, and decremented it once for each real-only value assigned in, but did not increment it for each imaginary component assigned in, and re-scanned the array each time the approximate count reached zero, then you could get away with a lot fewer checks.

My first thought had been to increase the approximate count for each complex value written in, but I realized that if you kept assigning to the same location, you could increase the count arbitrarily; if it was the only location that was complex valued then even just a single assignment of a real value could turn the entire array real-valued. But if you know you have 10 non-zero components imaginary components, and you have only written 5 real-only components so far, then no matter whether you wrote those to 5 different locations or to the same location, you would know logically that there must be at least 5 non-zero components out there somewhere. You might have more than 5 -- A(1)=0;A(1)=1i repeated 5 times would leave either 10 or 11 locations with non-zero components, so maybe there are more efficient ways that do not involve having to track each individual location, but this heuristic should be fairly cheap.

1 Comment
Show -1 older comments Hide -1 older comments

Paul on 13 Dec 2020

Open in MATLAB Online

Wouldn't it be quite surprising if Matlab actually scanned through all of the imaginary components either before or after the assignment to determine if the result needs to be converted from complex to double? I mean, wouldn't it be better to keep track of whether or not the imaginary part has any non-zero element as the matrix is created and then operated on?

On a related note, is there any way to make a complex matrix with zero imaginay part other than

M = complex(realpart,0)
M = complex(realpart)

In other words, if M already exists and is complex is there any way to force the imaginary part of M to zero but have M maintain its complex attribute w/o rewiting M?

Sign in to comment.

Inexplicable different execution speeds when filling a matrix with complex entries

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (4)

14 Comments
Show 12 older comments Hide 12 older comments

10 Comments
Show 8 older comments Hide 8 older comments

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

Inexplicable different execution speeds when filling a matrix with complex entries

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (4)

14 Comments Show 12 older comments Hide 12 older comments

10 Comments Show 8 older comments Hide 8 older comments

2 Comments Show None Hide None

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

2 Comments
Show None Hide None

14 Comments
Show 12 older comments Hide 12 older comments

10 Comments
Show 8 older comments Hide 8 older comments

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments