Inefficient or slow use of colon-operator within a matrix

Dear all,
I'm struggling with the performance of my code for a while. I just found out the big issue.
It appeared that the use of colon-operator within a matrix (to include all subscripts in a particular array dimension) is incredibly slow or inefficient:
Data = ones(500,500,500);
clear Data2 Data3;
tic
Data2 = Data;
toc; %--> Elapsed time is 0.006142 seconds.
tic
Data3 = Data(:,:,:,1);
toc; %--> Elapsed time is 0.430237 seconds.
Why is the use of ":" / Colons 70 times (!!) slower. Is there a solution to give this a boost? I need the flexibility of "Data(:,:,:,d)" since in some cases my data is multidimensional.
Many thanks in advance. Looking forward to your response.
Kees

4 Comments

"It appeared that the use of colon-operator ... is incredibly slow or inefficient"

Unfortunately those timings are meaningless as they compare apples with oranges.

"Why is the use of ":" / Colons 70 times (!!) slower"

Slower than what: not doing that operation? Not doing an operation takes zero seconds, so the factor is infinity.

Note that you are comparing two totally different sequences of operations:

  • allocation of a variable to a new variable, which uses no new memory (just adds a pointer to the same array).
  • subscript indexing into a variable creating a new array in memory to be allocated for the extracted elements (all 125 million of them), and then allocation of that array to a new variable.

So really all you are measuring is the time required to create a 125 million element array.

@Stephen, since the last statement given in OP's code, Data3 = Data(:,:,:,1);. Is also assigning the complete matrix Data to Data3, it is reasonable to expect that MATLAB execution engine will see this advance and optimize the execution.

"Indexing takes time, because operations take time."

then how can you explain the following execution times, here I am also indexing Data.

Data = ones(500,500,500);
clear Data2 Data3;
tic
Data2 = Data;
toc; %--> Elapsed time is 0.000034 seconds.
tic
Data3 = reshape(Data(:), size(Data));
toc; %--> Elapsed time is 0.000073 seconds.

@Ameer,

In term of memory Data(:) is just a reshape, that means that the only thing that needs copying into new memory is the matrix header (that specifies the size of the matrix). The matrix content itself does not need copying. The explicit reshape is the same, only the header needs copying. So, Data2, Data, the temporary Data(:) and Data3 all point to the same chunk of memory. Hence why it takes no time.

You're going to incur the copying penalty, as soon as you change just one value of any of these shared copy.

Note: you can use James Tursa's isshared to verify this.

Thanks @Guillaume, the copy-on-write makes complete sense to explain the execution time. Although, when will MATLAB use copy-on-write is still not clear. As you mentioned in your comment Data(:) is just a reshape, similarly colon operator documentation page also describes Data(:,:) as reshape operation but the execution of following lines show that this does not result in creating a reference to Data rather a copy is created immediately

tic
  Data3 = Data(:, :);
toc  % Elapsed time is 0.600342 seconds.

Sign in to comment.

 Accepted Answer

Yes, as Stephen says your comparison is not valid. In particular, due to the copy-on-write nature of matlab the two operations are completely different.
Data2 = Data;
In the background, all that does is create a new variable Data2 and tells it to find the content of the variable where Data is stored. There is no copying occuring (yet).
Data3 = Data(:, :, :, 1);
Creates a new variable Data3 (same step as before). But now, since you're taking a slice of Data, matlab can't just tell it to find its content where Data is stored. Instead it now has to copy that slice into a new memory address. That takes time.
Now after that, if you did:
Data3(1) = 2;
That would be pretty much instant. Data3 is the only one pointing to its chunk of memory since data copy has already occured. Hence changing one value in that memory is quicky
Data2(1) = 2;
Now since Data2 and Data share the same memory, before matlab can change that Data2 value it must first copy the whole array into a new memory location, so Data2 and Data don't share it. That single line is going to take time to make it on par with your slice operation
>> tic; Data2(1) = 2; toc
Elapsed time is 0.478432 seconds.
>> tic;Data3(1) = 2; toc
Elapsed time is 0.000584 seconds.

6 Comments

After the explanation of copy-on-write nature of MATLAB execution engine, the execution times completely makes sense. But still, it is not clear that when will MATLAB copy on write and when instantly create a copy. Since in my code snippet given in question's comment, it is again using copy-on-write despite Data(:) indexing.

I'm so happy to see others are struggling as well with similar things. And thanks for the explanation to understand better why there is a significant difference in execution times.

However, if I would do similar thing in c for example, I might use pointers to the data part I want to read. Is there no similar thing available in Matlab? Or is there any workaround/solution to retrieve a "concatenated data part from the memory"?

Thanks again!

@Kees if you need C reference semantics than you can define your own handle class. They variable of that class will show almost same behavior as referenced variable in C.

HI Ameer, yes I understand, but I just mentioned this as example. I'll have to "stay within Matlab".
But I don't want to get into all kind of inconvenient use of cell arrays like Data{1}, etc.

For your reference, Handle classes are also part of core MATLAB language and are not related to C. They just show the same behavior.

Also, it seems that @Guillaume answered your question. Therefore please accept the answer so that other people, searching for the same problem, could get the maximum benefit.

Mathworks do not document the memory management aspects of matlab. Yes Data(:, :) could be implemented with copy-on-write but for some reason they didn't. Even the slice operation could be implemented with c-o-w (similarly to C++ slice arrays).

Should you care about it? Generally, no. Sooner or later, you're probably going to trigger the c-o-w since it's unlikely that you're copying a whole array to never modify any of the copies. once you've triggered c-o-w, all the examples reduce more or less to the same timing.

"Or is there any workaround/solution to retrieve a "concatenated data part from the memory"

I think you mean retrieve a pointer for a slice of a whole block of memory. No, matlab has no mechanism for that. You could probably hack something together using mex but that would be very fragile and likely to break in a future version (e.g R2018a has completely changed the way complex arrays are stored in memory).

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!