Speed up distance calculation in N-dimensional space

5 views (last 30 days)
Hi all,
I have a function I'd like your help to speed up. The function takes two vectors x and Y of points in N-dimensional space as inputs (x and Y have the same size, typically N=500 by m=50,000). For every point in x I want to find the closest (according to squared difference) point in Y and return its number. The inputs are almost random. The function is :
function I = findnearest(x,Y)
I = zeros(size(x,1),1);
o = ones(size(Y,1),1);
for i = 1:size(x,1)
sqdiff = (o*x(i,:) - Y).^2;
xdiff = sum(sqdiff,2);
[~,I(i)] = min(xdiff);
end
end
I've tried to vectorize it but it turns out even slower than the current version. Any ideas on how to improve performance would be greatly appreciated! The function is called upon many times and as such takes immense computational time.
Thanks in advance!
Edit: I just realized that the part of the code that takes 99% of the time I only have N=5 to N=20... I'm terribly sorry for this confusion! The parts where N>=400 are not run that often so they are not that big a deal.
  7 Comments
John D'Errico
John D'Errico on 28 Feb 2015
An important question is if BOTH sets of points change with each evaluation. If Y stays fixed for example, then one of the tree schemes might be very good, as then the cost of building the tree will be spread out over the many times it will be used.
Anders Österling
Anders Österling on 28 Feb 2015

Hi again! The Y will sometimes be constant 1,000's of times, but will change every now and then. So then the trees might be good after all.

Sign in to comment.

Answers (1)

Matt J
Matt J on 28 Feb 2015
See IPDM ( Download ) and maybe also the files it acknowledges and the files it inspired.
  5 Comments
Matt J
Matt J on 28 Feb 2015
Edited: Matt J on 28 Feb 2015
@Anders,
Matt - in really impressed by your use of bsxfun in my other question!
Perhaps you could Accept-click the answer then :D ...
As for the rest of your last post, IPDM does use bsxfun, so if John doesn't recommend it, I'm not sure how much to hope for. Arrayfun is not designed for high speed computation, so I wouldn't recommend it for what you are trying to do.
The only thing I can think of at this point is to divide the task with parfor in combination with Worker Object Wrapper ( Download ). The Worker Object Wrapper will allow you to make Y persist on the parallel workers across multiple calls to parfor, so you don't have to rebroadcast it each time you repeat the parfor loop. Since you say Y changes rarely, this will save you some overhead.
Anders Österling
Anders Österling on 6 Mar 2015
Hi again,
I'm sorry for a late reply - something else came up that I needed to deal with before this. Thank you soo much John and Matt! Its getting better, but I'm not yet there.
I've tried a bunch of different things but still haven't found anything that is quite as fast as I'd like it to be. My homemade findnearest is an obviously bad first attempt. I also tried John's ipdm, as well as the built in functions createns/knnsearch combo. Please see below for computational comparisons. I'll keep looking and will report back if I find anything faster than the createns/knnsearch. If anyone has additional ideas on how to improve this, feel free to comment below. [Approximate nearest neighbour might be good enough for me, so I might try to compile FLANN (Link)]. The 2.5seconds might not seem that bad at first, but they are very unwelcome when repeated hundreds of thousands of times!
disp('50000 by 1')
A=rand(50000,1); B = rand(50000,1);
tic;
disp('Built in: createns/knnsearch')
disp(' -- Grow tree --')
kdtree=createns(B,'NSMethod','kdtree','Distance','Euclidean');
toc
tic
disp(' -- Search --')
res1=knnsearch(kdtree,A)';
toc
tic;
disp('Johns IPDM')
res2=ipdm(A,B,'Subset','nearest','result','struct')';
toc
tic;
disp('My findnearest')
res3=findnearest(A,B);
toc
disp(' ')
disp('50000 by 8')
A=rand(50000,8); B = rand(50000,8);
tic;
disp('Built in: createns/knnsearch')
disp(' -- Grow tree --')
kdtree=createns(B,'NSMethod','kdtree','Distance','Euclidean');
toc
tic
disp(' -- Search --')
res1=knnsearch(kdtree,A)';
toc
tic;
disp('Johns IPDM')
res2=ipdm(A,B,'Subset','nearest','result','struct')';
toc
tic;
disp('My findnearest')
res3=findnearest(A,B);
toc
Results
50000 by 1
Built in: createns/knnsearch
-- Grow tree --
Elapsed time is 0.069353 seconds.
-- Search --
Elapsed time is 0.055465 seconds.
Johns IPDM
Elapsed time is 0.029600 seconds.
My findnearest
Elapsed time is 12.559964 seconds.
50000 by 8
Built in: createns/knnsearch
-- Grow tree --
Elapsed time is 0.128941 seconds.
-- Search --
Elapsed time is 2.504368 seconds.
Johns IPDM
Elapsed time is 175.717344 seconds.
My findnearest
Elapsed time is 295.582135 seconds.|

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!