Deleting X-Y points that are not near other points on a field of data points

12 views (last 30 days)
I have a set of data. This data is around 900 rows of two columns. Each row has an X and a Y value which specifies a point on the X-Y plane. The X-Y plane is from 0 to 100 and 0 to 100 respectively. All of these points are randomly scattered throughout the X-Y plane. My problem is there are too many X-Y points cluttering up the scatter plot. So what I want to do is have Matlab look at each point and say: Is this point a distance of 10 or less to another point. If it is then keep it. If it isn’t then delete the row containing that X, Y value. A shortened example of my data:
X=[1 2 3 4 20];
Y=[1 3 4 3 59];
Since (20,59) is more than a distance of 10 away from the other points, delete it and return the following:
X2=[1 2 3 4];
Y2=[1 3 4 3];
If anyone knows how I could do this, It would be a very great thing.

Accepted Answer

per isakson
per isakson on 6 Jun 2012
See Doug's video Advanced: making a 2d or 3d histogram to visualize data density and search the FEX for "hist2"
I failed to find a solution in the FEX. Here is a naive code with "10" hard-coded in the magic number "100".
X=[1,2,3,4,20];
Y=[1,3,4,3,59];
to_be_removed = false(size(X));
for ii = 1 : length(X)
is = (X-X(ii)).^2+(Y-Y(ii)).^2 <= 100;
is(ii) = false;
if not( any( is ) )
to_be_removed(ii) = true;
end
end
X(to_be_removed)=[];
Y(to_be_removed)=[];

More Answers (2)

Geoff
Geoff on 6 Jun 2012
Naive (brute force) implementation given by per isakson looks sufficient for this problem. O(N^2) is okay for 900 rows. For larger sets, I'd consider partitioning the points into a quad tree.
However, without making things complicated, I would say that the number of candidates for removal will be small due to your X and Y range. You could easily speed up the naive algorithm by first approximating the local point-density into a 21x21 array (cel-sizes of 5 with extra one for the ends) and then only do a search on points that are unique to a cel address.
  2 Comments
charles atlas
charles atlas on 7 Jun 2012
Sorry I havent been able to get into the office and test the code until today.
the read data is latitude and longitudes but for simplicity's sake, I said it was 0 to 100 on the X and Y axis (which would actually be the longitude and latitude axes respectively.
The code did what it was supposed to do when I tested it, but It neglected half the values that were jumbled together (that is at a distance of about 600 yards away, aka <= .005 as a difference in lat and long squared, added and then square rooted.

Sign in to comment.


Image Analyst
Image Analyst on 7 Jun 2012
If you displayed it as an image instead of a scatterplot, you wouldn't have that problem. Why not give it a try?

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!