Maximum separation of certain points on a 2d plot
1 view (last 30 days)
Show older comments
Hello matlab users,
I'm trying to solve the following problem. Suppose we have a number of point on a 2d plot with X&Y coordinates. Each point has it's own label (e.g. 1 or 2). I want to draw a line, which will separate a region of plot with maximum number of points of certain label (e.g. 2). Here is a sample code for the first part of the problem, which plots points and assigns labels:
X = [0.5588;0.3603;0.3309;0.3603;0.1985];
Y = [1.9234;0.3406;1.1295;0.8748;0.2416];
scatter(X,Y);
a = [1;2;2;2;1];
b = num2str(a); c = cellstr(b);
%displacement so the text does not overlay the data points
dx = 0.01; dy = 0.01;
text(X+dx,Y+dy,c);
As I see it the brut force approach would be to write a cycle which would iterate coefficients k and b of linear line equation (y=kx+b) and calculate number of points with different labels to each side of the line trying to maximize number of points with given label (note: the optimum kx+b line should separate points in such a manner so that there are no points with label 1 in region of points with label 2). But I am afraid that this approach will take lots of machine time, since I will be trying different combinations of X&Y and for each X&Y combination I'll have to find it's optimal separating line.
May be someone cane suggest more elegant and less time consuming approach? May be there is an analytical approach to solve this problem (instead of graphical).
I'll appreciate your help.
0 Comments
Answers (3)
Walter Roberson
on 6 Jul 2012
If you are looking at straight lines to divide clusters, then consider using svm.
0 Comments
Matt Kindig
on 5 Jul 2012
You write that "the optimum kx+b line should separate points in such a manner so that there are no points with label 1 in region of points with label 2". Can you guarantee that such a perfect discrimination exists? It seems highly likely that the points will be distributed such that a single straight line is insufficient to completely isolate the label 1 points from the label 2 points.
That said, if you have the Optimization Toolbox, you should be able to code up your algorithm fairly easily. I suspect that given that you only have two variables, your algorithm will converge fairly quickly.
2 Comments
Matt Kindig
on 5 Jul 2012
So if I understand you correctly, you basically want a line that separates the largest cluster of uniform points. On one side of this line can be any number of points, as long as they are all labeled as 2.
Hmmm... this isn't really my field of expertise, but I still think this sounds like a clustering problem. I know there are some tools in the Statistics Toolbox to do clustering -- kmeans comes to mind. Maybe start there?
Walter Roberson
on 6 Jul 2012
The natural search space for an algorithm like this involves both slope and intercept. Both are important to get the best answer, but we found in practice that getting "close" to the best was somewhat insensitive to slope but was fairly sensitive to intercept.
Starting from "transvariation" techniques suggested by my co-workers, I found a way to optimize the search so that the calculation of best intercept for any one slope, became linear in the number of points. Latching on to the absolute best slope could still require a fairly fined-grained search, but in practice it was amenable to bracketing and refinement, that if you went more than a little way away from "close" to the peak, that the counts would fall fast enough that managing to dig out a tiny tiny angle of separation would not be enough to counter the grosser-level losses. For our purposes, worrying about angles that narrow implied that we were overtraining.
Unfortunately, the project that this was all implemented for was permanently shut down this week, so the code is no longer available.
0 Comments
See Also
Categories
Find more on Random Number Generation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!