Hi Everyone, can someone help me on how to use the K-mean clustering or perhaps share with me the suitable coding use to cluster wind speed data. I hava wind speed data in the form of Latitude, Longitude, Wind Speed. I want to cluster the data into 3 groups.

 Accepted Answer

If you have all the lat and lon values, then just put each into kmeans separately:
numColumns = 26; % Or however many columns you know there to be.
[xIndexes, xCentroids] = kmeans(lon, numColumns);
numRows = 50; % Or however many rows you know there to be.
[yIndexes, yCentroids] = kmeans(lat, numRows);
The values of the columns (x or longitude values) will be in xCentroids.
The values of the rows (y or lat values) will be in yCentroids.

16 Comments

So there will be different centroids based on the Lat Long ? or it will produce the same centroids ?
Youi will have different locations. Imagine that you ran lines through every column and every row of your spots. Doing this will get you the x locations of every column, and the y location of every row. Isn't that what you want to achieve with kmeans?
, it doesn't make sense. Why is Y1 random? And why is Y1 a row vector while X1 is a column vector? Even if Y1 was also a column vector, it doesn't make sense to cluster random data.
And where is K in your kmeans() call? You read in the badly-named "k" but don't even consider it when you're doing kmeans? Did you realize you're calling kmeans without your data???
I would have fixed it for you but I realized I don't know what each row of k represents.
% Demo by Image Analyst
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 22;
% Read in data
k = readmatrix('WIND_26YEARS.csv');
% Plot raw data
subplot(3, 1, 1);
plot(k, 'b-')
grid on;
xlabel('index', 'FontSize',fontSize);
ylabel('Value of k', 'FontSize',fontSize)
title('All the k Values', 'FontSize',fontSize)
% Plot histogram of k data.
subplot(3, 1, 2);
histogram(k);
grid on;
xlabel('k', 'FontSize',fontSize);
ylabel('Count', 'FontSize',fontSize)
title('Distribution of k. Note no clusters!', 'FontSize',fontSize)
% Original poster's (bad) code below:
subplot(3, 1, 3);
X1=(1:6943)';
Y1=randn(6943,1);
numClusters=3;
idx1=kmeans([X1, Y1],numClusters,'Replicates',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
for j=1:numClusters
plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'Color',colors(j,:));
if j==1
hold on;
end
end
hold off;
xlabel('X1', 'FontSize',fontSize);
ylabel('Y1', 'FontSize',fontSize)
title('Clusters are in different colors', 'FontSize',fontSize)
grid on;
g = gcf;
g.WindowState = 'maximized'
Okay, thanks a lot for your help .. I tried to modified which part that I want to use and also suitable for my data. I tried a few coding because this is also my first time use Matlab for CLustering.
@MAT NIZAM UTI, I can continue to help but you'd have to tell me how to split apart your data. There might be clusters but until your data is organized correctly they may not be evident. Again, what does each row of your data represent? Can it be divided evenly into a number of subsets? Like morning and evening windspeeds, or by month or something?
MAT NIZAM UTI
MAT NIZAM UTI on 16 Nov 2021
Edited: MAT NIZAM UTI on 16 Nov 2021
Since my data is gridded, the actual format for my wind speed data is
Lat1, Long1, Wind Speed Value2
Lat2, Long2, Wind Speed Value2
and continue until last latitude and longitude
So the value of wind speed is actually the monthly average of wind speed on each latitude and longitude during the northeast monsoon. So one point of location (Latitude, Longitude) only consist with one value of wind speed as shown in the excel data. But I only provide the wind speed value.
So the idea of the clustering (K-mean) is to produce 3 clusters of data based on the wind speed value.
Then why is the data not a multiple of 3?
k = readmatrix('WIND_26YEARS.csv');
lats = k(1:3:end); % 2315 long
lons = k(2:3:end); % 2314 long
speeds = k(3:3:end); % 2314 long
Do you mean 'multiple of 3' is refer to wind speed value times 3 (Wind speed x 3) or refer to the 3 column ?
If the format is as you said, all arrays (lats, lons, and speeds) should be the same length, right? Why is lats one element longer?
Based on the data i has shared with you, both lattitude and longitude have the same number of column which is 3527 number of rows.
I think you uploaded a different data file than you think. Look what happens when I run this code:
% Read in data
k = readmatrix('WIND_26YEARS.csv');
k = readmatrix('WIND_26YEARS.csv');
lats = k(1:3:end); % 2315 long
lons = k(2:3:end); % 2314 long
speeds = k(3:3:end); % 2314 long
whos k
whos lats
whos lons
whos speeds
Name Size Bytes Class Attributes
k 6943x1 55544 double
Name Size Bytes Class Attributes
lats 2315x1 18520 double
Name Size Bytes Class Attributes
lons 2314x1 18512 double
Name Size Bytes Class Attributes
speeds 2314x1 18512 double
As you can see, k is not a multiple of 3 so lats is one element longer than the other two. Why is that?
Moreover, the wind speeds are practically the same value as lats and lons (they are all around values 0-8), which is suspicious unless you measured the wind near the north pole. Please attach the actual data.
MAT NIZAM UTI
MAT NIZAM UTI on 18 Nov 2021
Edited: MAT NIZAM UTI on 18 Nov 2021
https://drive.google.com/drive/folders/1tFOl0ZHQo4XzB-VGvi-LBLPg_lERG98u?usp=sharing The reason why the wind speeds value are approximately arround 0-8 because the location is at the equator region and generally the equator region recieved less wind compare with northern and southern hemisphere. Here i attach the actual data.
But what about this question that you didn't answer:
As you can see, k is not a multiple of 3 so lats is one element longer than the other two. Why is that?
MAT NIZAM UTI
MAT NIZAM UTI on 18 Nov 2021
Edited: MAT NIZAM UTI on 18 Nov 2021
I have tried run this coding
% Read in data
%k = xlsread('ACTUAL DATA_WIND SPEED.csv');
k = xlsread('AVERAGE_WINDSPEEDS_1.csv');
lats = k(1:end,1); % 2315 long
lons = k(2:end,2); % 2314 long
speeds = k(3:end,3); % 2314 long
whos k
whos lats
whos lons
whos speeds
But, when I compare the values of lats, lons and speeds with the actual data. I get this
1) The number of elements of lats (or LATITUDE_AFTER READ) is the same with the actual latitude.
2) But for lons (or LONGITUDE_AFTER READ) and speeds (or WIND SPEEDS_AFTER READ), there were differences in terms values and number of elements compared to the actual longitude and wind speeds.
As you can see too, at the end of the data for each column, the number of elements for lons and speeds is not same with the actual, thus may be this is the reason why lats is longer than lons and speeds.
So can we just take the first 2314 values and ignore the extra lat?
MAT NIZAM UTI
MAT NIZAM UTI on 18 Nov 2021
Edited: MAT NIZAM UTI on 18 Nov 2021
Sure..well I dont really know how the matlab works, because after comparing the actual values and the after read values, both lons and speeds were different with the actual data.
https://drive.google.com/drive/folders/1tFOl0ZHQo4XzB-VGvi-LBLPg_lERG98u (this is my very actual data) Column C until LB is the wind speeds values.

Sign in to comment.

More Answers (1)

H R
H R on 9 Nov 2021
If your data is in a matrix format X, then you can use the following:
[idx,C] = kmeans(X,3,'Distance','cityblock','Replicates',5);

6 Comments

H R
H R on 9 Nov 2021
Make sure to use a correct 'Distance' that makes sene to you. Try different distances and observe the results.
My wind speed data is in grid format which is the data were arranged based on 0.25 degree for each latitude and logitude and each point is provided with one value of wind speed, so in your opinion what is the suitable "Distance" that I should use to cluster the wind speed data.
H R
H R on 12 Nov 2021
Edited: H R on 12 Nov 2021
It seems you basically have v=f(x,y). So, I don't think you can mix dependent and independent variables in clustering i.e. (x,y,v) to gain useful information (if this is the case). I think it's better to use a supervised method instead. Alternatively, you may try to perform clustering using (x,y) and then see if the outcome of the clustering can be useful to give information about v. In doing so, I guess Euclidean distance would be enough. It seems there is a relevant paper on your subject as well: https://ieeexplore.ieee.org/document/7884477.
Let say, I remove the Lat Long data and just used the wind speed data and arrange the wind speed data into one matrix (100:1), is it possible ?
H R
H R on 12 Nov 2021
Yes, every thing is possible (even using 1D data) , but you have to finally check what you are looking for from the clustering task and check if the outcome makes sense to you.
Here is my coding, and I have an error on it
Error using horzcat
Dimensions of matrices being concatenated are not consistent.
Error in k_mean (line 7)
idx1=kmeans([X1 Y1],numClusters,'Replicates',5);
This is the code:
k = xlsread('WIND_26YEARS.csv');
X1=(1:6943);
Y1=randn(6943,1);
numClusters=3;
idx1=kmeans([X1 Y1],numClusters,'Replicates',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
for j=1:numClusters,
plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'Color',colors(j,:));
if j==1,
hold on;
end;
end,
hold off;

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!