Speed up distance calculation in N-dimensional space
5 views (last 30 days)
Show older comments
Hi all,
I have a function I'd like your help to speed up. The function takes two vectors x and Y of points in N-dimensional space as inputs (x and Y have the same size, typically N=500 by m=50,000). For every point in x I want to find the closest (according to squared difference) point in Y and return its number. The inputs are almost random. The function is :
function I = findnearest(x,Y)
I = zeros(size(x,1),1);
o = ones(size(Y,1),1);
for i = 1:size(x,1)
sqdiff = (o*x(i,:) - Y).^2;
xdiff = sum(sqdiff,2);
[~,I(i)] = min(xdiff);
end
end
I've tried to vectorize it but it turns out even slower than the current version. Any ideas on how to improve performance would be greatly appreciated! The function is called upon many times and as such takes immense computational time.
Thanks in advance!
Edit: I just realized that the part of the code that takes 99% of the time I only have N=5 to N=20... I'm terribly sorry for this confusion! The parts where N>=400 are not run that often so they are not that big a deal.
7 Comments
John D'Errico
on 28 Feb 2015
An important question is if BOTH sets of points change with each evaluation. If Y stays fixed for example, then one of the tree schemes might be very good, as then the cost of building the tree will be spread out over the many times it will be used.
Answers (1)
Matt J
on 28 Feb 2015
5 Comments
Matt J
on 28 Feb 2015
Edited: Matt J
on 28 Feb 2015
@Anders,
Matt - in really impressed by your use of bsxfun in my other question!
Perhaps you could Accept-click the answer then :D ...
As for the rest of your last post, IPDM does use bsxfun, so if John doesn't recommend it, I'm not sure how much to hope for. Arrayfun is not designed for high speed computation, so I wouldn't recommend it for what you are trying to do.
The only thing I can think of at this point is to divide the task with parfor in combination with Worker Object Wrapper ( Download ). The Worker Object Wrapper will allow you to make Y persist on the parallel workers across multiple calls to parfor, so you don't have to rebroadcast it each time you repeat the parfor loop. Since you say Y changes rarely, this will save you some overhead.
See Also
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!