How to get a smooth boundary between clusters?

Question

0 votes

So for my dataset I am doing K-means clustering (Fig.1) with smoothing (Fig.2). Then I would like to define boundaries between different clusters and package them into a .csv file so I can import them into Fusion360 for CAM machining. Currently I am doing boundary detection based on "KDTreeSearcher" and "knnsearch" but results are not perfect (Fig.3). First problem is that I have double boundaries for each cluster (this I can fix somehow) and second one is that I would like more smooth lines / boundaries between clusters:

I am attaching my cluster dataset with xyz position of points and smoothed cluster data. I can also attach my existing code for boundaries detection and smoothing if necessary.

*Edit, thank you for all the answers and I apologize for my delayed response. I have now also added the sample Matlab program ("clustering forum") with my dataset.

1 Comment
Show -1 older comments Hide -1 older comments

Garmit Pant on 30 Jan 2024

Hello RoboTomo

Can you please share the code for boundary detection and smoothing that you are using?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Image Analyst on 1 Feb 2024

0 votes

Minimum Perimeter Polygon.pdf

I doubt you need to smooth the boundary. That would just give you as many coordinates as you had before and most likely more coordinates since you'd have to have extra points in between the grid points to make sure it was smooth. What I think you want (and I'm not even sure that is necessary to do before importation into your other software) is to reduce the number of points. So I think you may want the "minimum perimeter polygon" -- Google it. What this will do is to take long stretches like your roughly linear run of jaggies and replace that run with just two points - the endpoints of a line running through them. But in areas where there is a lot of change, like turning corners or tight curves, it will keep as many coordinates as it needs to follow the curve there.

See attached tutorial paper.

6 Comments
Show 4 older comments Hide 4 older comments

Image Analyst on 30 Mar 2024

Open in MATLAB Online

I'm not sure if you're dealing with 2 features or 3 features. Honestly when I'm doing classification (both training and prediction), either supervised (like discriminant analysis for example) or unsupervised (like kmeans), I don't care what the boundary looks like. I don't really care. It doesn't matter unless you have dense test data that falls into a region not spanned by your training features. But if that's the case you'll notice the misclassification in your confusion matrix. So it's the confusion matrix that one should really care about, not the shape of the classified feature space. It's how many it got right and how many it got wrong that is what you really care about and that information is in the confusion matrix. What if you had , say, 4 classes and 5 or 10 or 20 features for predicting those 4 classes? You can't visualize 20 dimensions to see the boundaries but you can visualize a 4-by-4 confusion matrix to see how many classes it got right and how many it got wrong, and which classes get confused the most (like it commonly confuses classes 1 and 4 but never 1 and 2 or 1 and 3 for example). So I am and was really hesitant to tell you how to smooth boundaries because I just am not convinced it's necessary. In my opinion you'd be chasing wild geese down a rabbit hole.

And kmeans is unsupervised so even if you did it on one set of data that densely represented every single possibility of feature combinations (like your dot grid) then the boundaries you get from that probably will not be the same for a different set of data that you plug into kmeans() the next time, especially if the data covers bigger or smaller regions of feature space.

If you insist on this route, yes you could convert your feature space into an image, especially if you had only 2 features. But every pixel in the image would have to have a class number associated with it. So it's be a uint8 or double image with values like 1,2,3 or 4 (if you had 4 classes). Then to get the boundaries of just one class you'd extract that one class from the classification image (that contains all classes) like this

class3 = classificationImage == 3; % Extract class 3 pixels ONLY into a binary (logical image)

Then you'd get a list of (x,y) coordinates like this

boundaries = bwboundary(class3);

It will be a cell array with each cell giving an N-by-2 list of (x,y) coordinates of just one of the regions. Then you could smooth it. But when you smooth the boundaries, realize you are changing the class of some pixel/location from what it thought was the best prediction to some other class that may not be anything like the original class. For example let's say a pixel was originally class 2 with the original "boundaries" but now you smoothed the boundaries and that pixel is no longer in the region for class 2 because the boundary got tightened or pulled in there. So what class will that pixel be? Let's say originally that region was next to class 4. Does that pixel now belong to class 4? But what if you smooth class 4 pixels and it pulls in, instead of expanding out to "capture" that pixel that was originally in class 2? So now where does that pixel belong? Not in class 2 anymore, and not in class 4. So it's in no man's land with an undefined class. Realize that we're dealing with classes and feature space, and it's not the same as gray levels and physical locations like an image is. So it's not like two classes might be similar (like two image pixels are similar) just because they're next to each other when you mapped the feature space onto an image. Class 2 and class 4 might be totally different even though they're next to each other in this space where you artificially mapped the feature space into physical space (an image).

RoboTomo on 2 Apr 2024

Edited: RoboTomo on 8 Apr 2024

Thank you for your extensive comment. Yes, I am already facing with this smoothing problem as you mentioned at the end. When I initially run the K-means clustering with directions as input I get a lot of scattered data, which is correct, but is not suitable for my application, so I need to find a proper threshold criteria to reorganize the points. The same thing happens with boundaries - I can't export a zig-zag straight lines between boundary points but a nice curve. Will try again all of the methods provided in answers and give a feedback.

RoboTomo on 8 Apr 2024

With your help I have managed to better plot the boundaries and also smooth them, so thank you!

I have one bonus question if you'd be willing to give some suggestion. What would be the best way to plot a single boundary line between different clusters, or fit a new line between existing boundary lines? The result should look something like the red line below which is hand drawn.

Sign in to comment.

Answer 2

Matt J on 30 Jan 2024

1 vote

Once you've extracted the boundary points, you could use sgolayfilt to smooth them.

1 Comment
Show -1 older comments Hide -1 older comments

RoboTomo on 29 Mar 2024

Thank you. I used the function, but did not get good results.

Sign in to comment.

Answer 3

Garmit Pant on 1 Feb 2024

Open in MATLAB Online

1 vote

Hello RoboTomo

From what I gather, you have a dataset and you have clustered the data points into 3 clusters using ‘kmeans” functions. You have also smoothed the data and found the boundaries between the clusters using ‘KDTreeSearcher” and “knnsearch”. Now you need to smooth the boundaries and store them in a CSV-file.

You can consider the following techniques to smooth the boundary points.

Smoothing the data using “smoothdata”: Use the “smoothdata” function to smooth the single boundary points that you have extracted. Use “gaussian” smoothing method. The code snippet compares the effects of the “window” input argument. A smaller window size retains details while a larger window size produces smoother data.

% Sample data with a spike

x = linspace(0, 10, 100);

y = sin(x); % Adding random noise between x=4 and x=6

y_noisy = y + ((x > 4) & (x < 6)) .* (0.5 - rand(1,100));

smallWindowSize = 5; % Small Gaussian window size

y_smooth_small = smoothdata(y_noisy, 'gaussian', smallWindowSize);

% Smooth the data using a large Gaussian window

largeWindowSize = 21; % Large Gaussian window size

y_smooth_large = smoothdata(y_noisy, 'gaussian', largeWindowSize);

% Plot the results

figure;

plot(x, y_noisy, 'b', 'DisplayName', 'Noisy Data');

hold on;

plot(x, y_smooth_small, 'g--', 'DisplayName', 'Smoothed Data (Small Gaussian)');

plot(x, y_smooth_large, 'm--', 'DisplayName', 'Smoothed Data (Large Gaussian)');

legend;

title('Data Smoothing using smoothdata with Small vs Large Gaussian window');

xlabel('x');

ylabel('y');

Use the “smooth” function to smooth the data: The boundary extracted can also be smoothed using the Curve Fitting Toolbox function “smooth”. You can smooth the boundaries between different endpoints to get smoother boundaries. The function has multiple filters. The following code snippet shows how to smooth just a segment of the data.

x = (0:0.1:15)';

y = sin(x) + 0.5*(rand(size(x))-0.5);

y([90,110]) = 3;

% Define the segment to smooth

segmentIndices = (x >= 5) & (x <= 10);

% Smooth only the segment with the loess and rloess methods

yy1_segment = smooth(x(segmentIndices),y(segmentIndices),0.1,'loess');

yy2_segment = smooth(x(segmentIndices),y(segmentIndices),0.1,'rloess');

% Plot the original and smoothed data for the entire range

subplot(2,1,1)

plot(x,y,'b.',x(segmentIndices),yy1_segment,'r-')

set(gca,'YLim',[-1.5 3.5])

legend('Original data','Smoothed data using ''loess''',...

'Location','NW')

subplot(2,1,2)

plot(x,y,'b.',x(segmentIndices),yy2_segment,'r-')

set(gca,'YLim',[-1.5 3.5])

legend('Original data','Smoothed data using ''rloess''',...

'Location','NW')

To save the smoothed data to a CSV-file, you need to convert the data to a table and then write to the file using “writetable” function.

For further understanding on the methods mentioned above, you can refer to the following MATLAB Documentation:

“smoothdata” function: https://www.mathworks.com/help/matlab/ref/smoothdata.html
“smooth” function: https://www.mathworks.com/help/curvefit/smooth.html
“writetable” function: https://www.mathworks.com/help/matlab/ref/writetable.html

I hope you find the above explanation and suggestions useful!

1 Comment
Show -1 older comments Hide -1 older comments

RoboTomo on 29 Mar 2024

Thank you for your answer. I did some testing, but did not get good results just like in sgolayfilt, so maybe I am doing something wrong. I attached the original code.

Sign in to comment.

How to get a smooth boundary between clusters?

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

6 Comments
Show 4 older comments Hide 4 older comments

More Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

How to get a smooth boundary between clusters?

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

6 Comments Show 4 older comments Hide 4 older comments

More Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

6 Comments
Show 4 older comments Hide 4 older comments

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments