How to speed up Bag of Visual Words Calculation ?
Show older comments
I am building an image retrieval system where I would give an input image and the output would be the most similar images from the dataset. I have created a bag of visual word from my dataset with 2000 words. I followed the following process:
1. Extracted SIFT features from the images in the dataset and performed k-means clustering with k = 2000. Then I stored the centroids in a mat file.
2. For an input image, I am creating its bovw as follows:
indices_f=[];
for i=1:size(features,1) % this loop is taking too much time.
for j=1:size(centroids,1)
d(j)=pdist2(features(i,:),centroids(j,:)); % Compute distance from cluster centroids
end
[~,indices_f(i)]=min(d); %Compute Minimum Distance
end
sum=1;
feature_hist_local=zeros(1,2000);
temp=indices_f;
%Compute histogram
for m=1:length(temp)
feature_hist_local(temp(m))=feature_hist_local(temp(m))+1;
end
For my specific use case, the features matrix contains 3127 key points of dimension 128 each. The code is converting it into a bovw representation of size 1x2000. But this code is taking approximately 40 minutes to calculate the bovw representation on a machine with 32 GB RAM and 6 cores.
Is there a better way to do so ? Also, are there alternate approaches improvising upon bag of visual word approaches, since I plan to work with large datasets now.
3 Comments
dpb
on 21 Feb 2015
...
dist=[];
for j=1:size(centroids,1)
d(j)=pdist2(features(i,:),centroids(j,:));
...
What's dist? It doesn't appear to be used; eliminate it.
Have you preallocated d??? If not, that dynamic reallocation will be a bottleneck.
sid
on 21 Feb 2015
dpb
on 21 Feb 2015
"... does not pre-allocating d become such a high bottleneck..."
NO. It's quite quick and efficient to allocate a fixed array; what is the bottleneck is the necessity to reallocate it on every pass thru a loop. That entails saving the present to a temporary, reallocating the existing memory of the target and then copying the content to that location. Since you've got counted loops and you know the size of the return vector, you can allocate the space required a priori.
You can also experiment with the use of the optional parameters in pidst2 and see if computing fewer is quicker there or whether the overhead internally is more than you save; I guess it probably depends on the size; if long maybe, if not so much, probably not.
Accepted Answer
More Answers (0)
Categories
Find more on Image Category Classification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!