rangesearch

Find all neighbors within specified distance using input data

Description

Idx = rangesearch(X,Y,r) finds all the X points that are within distance r of the Y points. The rows of X and Y correspond to observations, and the columns correspond to variables.

example

[Idx,D] = rangesearch(X,Y,r) also returns the distances between the Y points and the X points that are within a distance of r.

example

[Idx,D] = rangesearch(X,Y,r,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify the nearest neighbor search method and the distance metric used in the search.

Examples

collapse all

Find the X points that are within a Euclidean distance 1.5 of each Y point. Both X and Y are samples of five-dimensional normally distributed variables.

rng('default') % For reproducibility
X = randn(100,5);
Y = randn(10,5);
[Idx,D] = rangesearch(X,Y,1.5)
Idx=10×1 cell
    {1x7  double}
    {1x2  double}
    {1x11 double}
    {1x2  double}
    {1x12 double}
    {1x9  double}
    {[       89]}
    {1x0  double}
    {1x0  double}
    {1x0  double}

D=10×1 cell
    {1x7  double}
    {1x2  double}
    {1x11 double}
    {1x2  double}
    {1x12 double}
    {1x9  double}
    {[   1.1739]}
    {1x0  double}
    {1x0  double}
    {1x0  double}

In this case, the last three Y points are more than 1.5 distance away from any X point. X(89,:) is 1.1739 distance away from Y(7,:), and no other X point is within distance 1.5 of Y(7,:). X contains 12 points within distance 1.5 of Y(5,:).

Generate 5000 random points from each of three distinct multivariate normal distributions. Shift the means of the distributions so that the randomly generated points are likely to form three separate clusters.

rng('default')  % For reproducibility
N = 5000;
dist = 10;
X = [mvnrnd([0 0],eye(2),N);
     mvnrnd(dist*[1 1],eye(2),N);
     mvnrnd(dist*[-1 -1],eye(2),N)];

For each point in X, find the points in X that are within a radius dist away from the point. For faster computation, specify to keep the indices of the nearest neighbors unsorted. Select the first point in X, and find its nearest neighbors.

Idx = rangesearch(X,X,dist,'SortIndices',false);
x = X(1,:);
nearestPoints = X(Idx{1},:);

Find the values in X that are not the nearest neighbors of x. Display those points in one color and the nearest neighbors of x in a different color. Label the point x with a black, filled circle.

nonNearestIdx = true(size(X,1),1);
nonNearestIdx(Idx{1}) = false;

scatter(X(nonNearestIdx,1),X(nonNearestIdx,2))
hold on
scatter(nearestPoints(:,1),nearestPoints(:,2))
scatter(x(1),x(2),'black','filled')
hold off

Find the patients in the patients data set that are within a certain age and weight range of the patients in Y.

Load the patients data set. The Age values are in years, and the Weight values are in pounds.

load patients
X = [Age Weight];
Y = [20 162; 30 169; 40 168];   % New patients

Create a custom distance function distfun that determines the distance between patients in terms of age and weight. For example, according to distfun, two patients that are one year apart in age and have the same weight are one distance unit apart. Similarly, two patients that have the same age and are five pounds apart in weight are also one distance unit apart.

type distfun.m % Display contents of distfun.m file
function D2 = distfun(ZI,ZJ)
ageDifference = abs(ZI(1)-ZJ(:,1));
weightDifference = abs(ZI(2)-ZJ(:,2));
D2 = ageDifference + 0.2*weightDifference;
end

Note: If you click the button located in the upper-right section of this example and open the example in MATLAB®, then MATLAB opens the example folder. This folder includes the function file distfun.m.

Find the patients in X that are within the distance 2 of the patients in Y.

[Idx,D] = rangesearch(X,Y,2,'Distance',@distfun)
Idx=3×1 cell
    {1x0 double}
    {1x0 double}
    {[      41]}

D=3×1 cell
    {1x0 double}
    {1x0 double}
    {[  1.8000]}

The third patient in Y is the only one to have a patient in X within a distance of 2.

Display the Age and Weight values for the nearest patient in X to the patient with age 40 and weight 168.

X(Idx{3},:)
ans = 1×2

    39   164

Input Arguments

collapse all

Input data, specified as an mx-by-n numeric matrix, where each row represents one n-dimensional point. The number of columns n must equal the number of columns in Y.

Data Types: single | double

Query points, specified as an my-by-n numeric matrix, where each row represents one n-dimensional point. The number of columns n must equal the number of columns in X.

Data Types: single | double

Search radius around each query point, specified as a nonnegative scalar. rangesearch finds all X points (rows) that are within distance r of each Y point. The meaning of distance depends on the 'Distance' name-value pair argument.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: rangesearch(X,Y,1.4,'Distance','seuclidean','Scale',iqr(X)) specifies to find all the observations in X within distance 1.4 of each observation in Y, using a standardized Euclidean distance scaled by the interquartile range of X.

Nearest neighbor search method, specified as the comma-separated pair consisting of 'NSMethod' and one of these values.

ValueDescription
'kdtree'

Create and use a Kd-tree to find nearest neighbors. 'kdtree' is valid only when the distance metric is one of these options:

  • 'chebychev'

  • 'cityblock'

  • 'euclidean'

  • 'minkowski'

'exhaustive'Use the exhaustive search algorithm. The software computes the distances from all X points to each Y point to find nearest neighbors.

'kdtree' is the default value when the number of columns in X is less than or equal to 10, X is not sparse, and the distance metric is one of the valid 'kdtree' metrics. Otherwise, the default value is 'exhaustive'.

Example: 'NSMethod','exhaustive'

Distance metric that rangesearch uses, specified as the comma-separated pair consisting of 'Distance' and one of the values in this table.

ValueDescription
'euclidean'Euclidean distance.
'seuclidean'Standardized Euclidean distance. Each coordinate difference between a row in X and a query point is scaled by dividing by the corresponding element of the standard deviation computed from X, nanstd(X). To specify another scaling, use the 'Scale' name-value pair argument.
'mahalanobis'Mahalanobis distance, computed using a positive definite covariance matrix C. The default value of C is the sample covariance matrix of X, as computed by nancov(X). To specify a different value for C, use the 'Cov' name-value pair argument.
'cityblock'City block distance.
'minkowski'Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' name-value pair argument.
'chebychev'Chebychev distance (maximum coordinate difference).
'cosine'One minus the cosine of the included angle between observations (treated as vectors).
'correlation'One minus the sample linear correlation between observations (treated as sequences of values).
'hamming'Hamming distance, the percentage of coordinates that differ.
'jaccard'One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.
'spearman'One minus the sample Spearman's rank correlation between observations (treated as sequences of values).
@distfun

Custom distance function handle. A distance function has the form

function D2 = distfun(ZI,ZJ)
% calculation of distance
...
where

  • ZI is a 1-by-n vector containing one row of X or Y.

  • ZJ is an m-by-n matrix containing multiple rows of X or Y.

  • D2 is an m-by-1 vector of distances, and D2(j) is the distance between the observations ZI and ZJ(j,:).

For more information, see Distance Metrics.

Example: 'Distance','minkowski'

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'.

Example: 'P',4

Data Types: single | double

Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a positive definite matrix. This argument is valid only when 'Distance' is 'mahalanobis'.

Example: 'Cov',eye(4)

Data Types: single | double

Scale parameter value for the standardized Euclidean distance metric, specified as the comma-separated pair consisting of 'Scale' and a nonnegative vector. Scale has length equal to the number of columns in X. Each coordinate difference between a row in X and a query point is scaled by the corresponding element of Scale. This argument is valid only when 'Distance' is 'seuclidean'.

Example: 'Scale',iqr(X)

Data Types: single | double

Maximum number of data points in the leaf node of the Kd-tree, specified as the comma-separated pair consisting of 'BucketSize' and a positive integer scalar. This argument is valid only when NSMethod is 'kdtree'.

Example: 'BucketSize',20

Data Types: single | double

Flag to sort returned indices according to distance, specified as the comma-separated pair consisting of 'SortIndices' and either true (1) or false (0).

For faster performance when Y contains many observations that have many nearest points in X, you can set SortIndices to false. In this case, rangesearch returns the indices of the nearest points in no particular order. When SortIndices is true, the function arranges the indices of the nearest points in ascending order by distance.

Example: 'SortIndices',false

Data Types: logical

Output Arguments

collapse all

Indices of nearest points, returned as a cell array of numeric vectors.

Idx is an my-by-1 cell array, where my is the number of rows in Y. The vector Idx{j} contains the indices of points (rows) in X whose distances to Y(j,:) are not greater than r. If SortIndices is true, then rangesearch arranges the indices in ascending order by distance.

Distances of the nearest points to the query points, returned as a cell array of numeric vectors.

D is an my-by-1 cell array, where my is the number of rows in Y. D{j} contains the distance values between Y(j,:) and the points (rows) in X(Idx{j},:). If SortIndices is true, then rangesearch arranges the distances in ascending order.

Tips

  • For a fixed positive real value r, rangesearch finds all the X points that are within a distance r of each Y point. To find the k points in X that are nearest to each Y point, for a fixed positive integer k, use knnsearch.

  • rangesearch does not save a search object. To create a search object, use createns.

Algorithms

Alternative Functionality

If you set the rangesearch function 'NSMethod' name-value pair argument to the appropriate value ('exhaustive' for an exhaustive search algorithm or 'kdtree' for a Kd-tree algorithm), then the search results are equivalent to the results obtained by conducting a distance search using the rangesearch object function. Unlike the rangesearch function, the rangesearch object function requires an ExhaustiveSearcher or KDTreeSearcher model object.

Extended Capabilities

Introduced in R2011b