How to get a smooth boundary between clusters?

So for my dataset I am doing K-means clustering (Fig.1) with smoothing (Fig.2). Then I would like to define boundaries between different clusters and package them into a .csv file so I can import them into Fusion360 for CAM machining. Currently I am doing boundary detection based on "KDTreeSearcher" and "knnsearch" but results are not perfect (Fig.3). First problem is that I have double boundaries for each cluster (this I can fix somehow) and second one is that I would like more smooth lines / boundaries between clusters:
I am attaching my cluster dataset with xyz position of points and smoothed cluster data. I can also attach my existing code for boundaries detection and smoothing if necessary.
*Edit, thank you for all the answers and I apologize for my delayed response. I have now also added the sample Matlab program ("clustering forum") with my dataset.

1 Comment

Hello RoboTomo
Can you please share the code for boundary detection and smoothing that you are using?

Sign in to comment.

 Accepted Answer

I doubt you need to smooth the boundary. That would just give you as many coordinates as you had before and most likely more coordinates since you'd have to have extra points in between the grid points to make sure it was smooth. What I think you want (and I'm not even sure that is necessary to do before importation into your other software) is to reduce the number of points. So I think you may want the "minimum perimeter polygon" -- Google it. What this will do is to take long stretches like your roughly linear run of jaggies and replace that run with just two points - the endpoints of a line running through them. But in areas where there is a lot of change, like turning corners or tight curves, it will keep as many coordinates as it needs to follow the curve there.
See attached tutorial paper.

6 Comments

Thank you for your answer. The paper you sent me is similar to what I am trying to do - I want to reproduce some of the results about region-based machining from those articles (https://link.springer.com/article/10.1007/s00170-018-1982-1 https://www.mdpi.com/2072-666X/13/12/2163). My goal is to extract smooth boundary curve from my cluster grid which can then be exported and projected to the surface.
Well glad I could help, though again, I don't think the boundaries need to be smoothed.
If you have the (x,y) coordinates you can smooth the curve using my attached demo.
Boundaries don't necessarily need to be smoothed; however, their extraction must be significantly improved compared to the original code. Extracted boundaries from your demo are looking great.
Ok, I can transform 3D points to 2D, but then I would have to create an image from those points so they could be run by "bwboundaries" command and later "sgolayfilt". This "image" would probably have to be some kind of matrix with 1 and 0 (binary image) where let's say 1 would be my existing boundary points. Is this the correct way?
I'm not sure if you're dealing with 2 features or 3 features. Honestly when I'm doing classification (both training and prediction), either supervised (like discriminant analysis for example) or unsupervised (like kmeans), I don't care what the boundary looks like. I don't really care. It doesn't matter unless you have dense test data that falls into a region not spanned by your training features. But if that's the case you'll notice the misclassification in your confusion matrix. So it's the confusion matrix that one should really care about, not the shape of the classified feature space. It's how many it got right and how many it got wrong that is what you really care about and that information is in the confusion matrix. What if you had , say, 4 classes and 5 or 10 or 20 features for predicting those 4 classes? You can't visualize 20 dimensions to see the boundaries but you can visualize a 4-by-4 confusion matrix to see how many classes it got right and how many it got wrong, and which classes get confused the most (like it commonly confuses classes 1 and 4 but never 1 and 2 or 1 and 3 for example). So I am and was really hesitant to tell you how to smooth boundaries because I just am not convinced it's necessary. In my opinion you'd be chasing wild geese down a rabbit hole.
And kmeans is unsupervised so even if you did it on one set of data that densely represented every single possibility of feature combinations (like your dot grid) then the boundaries you get from that probably will not be the same for a different set of data that you plug into kmeans() the next time, especially if the data covers bigger or smaller regions of feature space.
If you insist on this route, yes you could convert your feature space into an image, especially if you had only 2 features. But every pixel in the image would have to have a class number associated with it. So it's be a uint8 or double image with values like 1,2,3 or 4 (if you had 4 classes). Then to get the boundaries of just one class you'd extract that one class from the classification image (that contains all classes) like this
class3 = classificationImage == 3; % Extract class 3 pixels ONLY into a binary (logical image)
Then you'd get a list of (x,y) coordinates like this
boundaries = bwboundary(class3);
It will be a cell array with each cell giving an N-by-2 list of (x,y) coordinates of just one of the regions. Then you could smooth it. But when you smooth the boundaries, realize you are changing the class of some pixel/location from what it thought was the best prediction to some other class that may not be anything like the original class. For example let's say a pixel was originally class 2 with the original "boundaries" but now you smoothed the boundaries and that pixel is no longer in the region for class 2 because the boundary got tightened or pulled in there. So what class will that pixel be? Let's say originally that region was next to class 4. Does that pixel now belong to class 4? But what if you smooth class 4 pixels and it pulls in, instead of expanding out to "capture" that pixel that was originally in class 2? So now where does that pixel belong? Not in class 2 anymore, and not in class 4. So it's in no man's land with an undefined class. Realize that we're dealing with classes and feature space, and it's not the same as gray levels and physical locations like an image is. So it's not like two classes might be similar (like two image pixels are similar) just because they're next to each other when you mapped the feature space onto an image. Class 2 and class 4 might be totally different even though they're next to each other in this space where you artificially mapped the feature space into physical space (an image).
Thank you for your extensive comment. Yes, I am already facing with this smoothing problem as you mentioned at the end. When I initially run the K-means clustering with directions as input I get a lot of scattered data, which is correct, but is not suitable for my application, so I need to find a proper threshold criteria to reorganize the points. The same thing happens with boundaries - I can't export a zig-zag straight lines between boundary points but a nice curve. Will try again all of the methods provided in answers and give a feedback.
With your help I have managed to better plot the boundaries and also smooth them, so thank you!
I have one bonus question if you'd be willing to give some suggestion. What would be the best way to plot a single boundary line between different clusters, or fit a new line between existing boundary lines? The result should look something like the red line below which is hand drawn.

Sign in to comment.

More Answers (2)

Once you've extracted the boundary points, you could use sgolayfilt to smooth them.

1 Comment

Thank you. I used the function, but did not get good results.

Sign in to comment.

Hello RoboTomo
From what I gather, you have a dataset and you have clustered the data points into 3 clusters using ‘kmeans” functions. You have also smoothed the data and found the boundaries between the clusters using ‘KDTreeSearcher” and “knnsearch”. Now you need to smooth the boundaries and store them in a CSV-file.
You can consider the following techniques to smooth the boundary points.
  • Smoothing the data using “smoothdata: Use the “smoothdata” function to smooth the single boundary points that you have extracted. Use “gaussian” smoothing method. The code snippet compares the effects of the “window” input argument. A smaller window size retains details while a larger window size produces smoother data.
% Sample data with a spike
x = linspace(0, 10, 100);
y = sin(x); % Adding random noise between x=4 and x=6
y_noisy = y + ((x > 4) & (x < 6)) .* (0.5 - rand(1,100));
smallWindowSize = 5; % Small Gaussian window size
y_smooth_small = smoothdata(y_noisy, 'gaussian', smallWindowSize);
% Smooth the data using a large Gaussian window
largeWindowSize = 21; % Large Gaussian window size
y_smooth_large = smoothdata(y_noisy, 'gaussian', largeWindowSize);
% Plot the results
figure;
plot(x, y_noisy, 'b', 'DisplayName', 'Noisy Data');
hold on;
plot(x, y_smooth_small, 'g--', 'DisplayName', 'Smoothed Data (Small Gaussian)');
plot(x, y_smooth_large, 'm--', 'DisplayName', 'Smoothed Data (Large Gaussian)');
legend;
title('Data Smoothing using smoothdata with Small vs Large Gaussian window');
xlabel('x');
ylabel('y');
  • Use the “smooth” function to smooth the data: The boundary extracted can also be smoothed using the Curve Fitting Toolbox function “smooth”. You can smooth the boundaries between different endpoints to get smoother boundaries. The function has multiple filters. The following code snippet shows how to smooth just a segment of the data.
x = (0:0.1:15)';
y = sin(x) + 0.5*(rand(size(x))-0.5);
y([90,110]) = 3;
% Define the segment to smooth
segmentIndices = (x >= 5) & (x <= 10);
% Smooth only the segment with the loess and rloess methods
yy1_segment = smooth(x(segmentIndices),y(segmentIndices),0.1,'loess');
yy2_segment = smooth(x(segmentIndices),y(segmentIndices),0.1,'rloess');
% Plot the original and smoothed data for the entire range
subplot(2,1,1)
plot(x,y,'b.',x(segmentIndices),yy1_segment,'r-')
set(gca,'YLim',[-1.5 3.5])
legend('Original data','Smoothed data using ''loess''',...
'Location','NW')
subplot(2,1,2)
plot(x,y,'b.',x(segmentIndices),yy2_segment,'r-')
set(gca,'YLim',[-1.5 3.5])
legend('Original data','Smoothed data using ''rloess''',...
'Location','NW')
To save the smoothed data to a CSV-file, you need to convert the data to a table and then write to the file using “writetable” function.
For further understanding on the methods mentioned above, you can refer to the following MATLAB Documentation:
  1. smoothdata” function: https://www.mathworks.com/help/matlab/ref/smoothdata.html
  2. smooth” function: https://www.mathworks.com/help/curvefit/smooth.html
  3. writetable” function: https://www.mathworks.com/help/matlab/ref/writetable.html
I hope you find the above explanation and suggestions useful!

1 Comment

Thank you for your answer. I did some testing, but did not get good results just like in sgolayfilt, so maybe I am doing something wrong. I attached the original code.

Sign in to comment.

Products

Release

R2023b

Asked:

on 22 Jan 2024

Edited:

on 8 Apr 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!