Hi Guys,

imagine you have an x-vector

x = [1 1 2 2 1 1];

as well as an y-vector of equal length.

y = [1 2 3 4 5 6]

And you want to find out, which values the y-vector has at those indices where the x-vector has the value 1, as well as the y-vector values at those indices where the x-vector has the value 2.
So that the solution for this example would be

solution = {[1 2 5 6], [3 4]}

That is the first cell of 'solution' contains all y-values, that belong to the first unique x-value (1) & the second cell of it contains all y-values, that belong to the second unique x-value (2).

My ideas on how to implement this for vectors of any length are given below.

Does anybody has an idea, how to avoid the for-loop in approach A???

x = randi(10, 1, 100000);
y = randi(5, 1, 100000);

% Approach A:
tic
xVals = unique(x); % get unique x values
tf = x == xVals'; % get true-or-false logical array (each row specifies where the n-th value of xVals occurs in x).
yVals = cell(1, size(tf,1)); % preallocate yVals
for i = 1:size(tf,1)
yVals(i) = {y(tf(i,:))}; % Assign all y-values that belong to one xVal-entry to one cell.
end
toc

% Approach B:
tic
[xs, id] = sort(x); % xsorted
ys = y(id); % sort y the same way x was sorted
[xVals, ia] = unique(xs);
% Divide the ys-vector into sections whose corresponding xs-indices have the same value in the xs vector.
yVals = mat2cell(ys, [1], [diff(ia)', length(ys)-sum(diff(ia))]);
toc

% Runtimes:
% x = randi(10, 1, 1000000);
% y = randi(5, 1, 1000000);
% A: Elapsed time is 0.100635 seconds.
% B: Elapsed time is 0.149284 seconds.

% x = randi(100, 1, 1000000);
% y = randi(5, 1, 1000000);
% A: Elapsed time is 0.886396 seconds.
% B: Elapsed time is 0.139918 seconds.

% Comparison
% A is faster than B if there are few different x-values (about 10-20).
% The number of different x-values has high impact on speed of A.
% A is faster than B if x & y-vector are extremely long (> 100000).
% The vector length has low impact on speed of B.

## 7 Comments

## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558727

⋮## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558727

## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558745

⋮## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558745

## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558818

⋮## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558818

@Star Strider

Could you please share your code?

Using accumarry for the above example works perfectly and is by far the fastest solution (see code below). The problem with using accumarray is that this code only works if the x-vector contains only positive integers. Do you have any idea how accumaray could be used with all kind of real numbers in the x-vector?

@KSSV

I've read the documentation, but unfortunately I don't understand what you're getting at.

## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558827

⋮## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_558827

@Zwithouta —Yours is essentially the same as mine, except that I included acall:uniqueThe timings for

andaccumarrayon my laptop are typically:‘Approach A’respectively. (The timings earlier today were different for some reason, although the approximate

ratio held for both runs.)2/1## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_559059

⋮## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_559059

@Star Strider

Thanks for sharing!

I think the combination of unique & accumarray is the most elegant approach I will use from now on. For extremely long vectors with many unique-values I tend to use approach B.

## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_559061

⋮## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_559061

My pleasure!

I was surprised that

was slower than the loop, even though it makes for neater code.accumarrayI didn’t test

, since‘Approach B’(chosen randomly) was significantly faster than my‘Approach A’call.accumarray## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_559074

⋮## Direct link to this comment

https://uk.mathworks.com/matlabcentral/answers/396100-speed-performance-find-all-y-vector-entries-that-have-the-same-value-in-an-x-vector-of-equal-length#comment_559074

Enjoy it with caution! Approach A really suffers terribly from increasing the number of unique-values in the x-vector, since it increases the number of needed for-loop iterations. Approach B in contrast shows a really stable performance.

Sign in to comment.