How to group different sensor's data based on their similarities?

I have multiple sensors’ data over one year. I wanted to know if there are any unsupervised methods to divide and group sensors’ data that have close characteristics/behavior.
For example, if I have electricity consumption data for 1000 buildings stored in a table with 1000 columns, how I can divide or cluster these columns such that those that have close characteristics are placed in a specific group?
I appreciate your time in advance.
Thank you.
Time D1 D2 D3 D4 D5 Dn
____________________ _______ _______ _______ _______ _______ .... _______
01-Jan-2020 00:00:00 2.9675 32.502 23.454 3.5067 . .
01-Jan-2020 00:01:00 -6.298 -96.793 -64.711 -9.9581 . .
01-Jan-2020 00:02:00 -5.5285 -75.355 -54.29 -8.215 . .
01-Jan-2020 00:03:00 -1.4514 -34.475 -24.879 -3.468 . .
01-Jan-2020 00:04:00 3.9736 66.112 42.284 6.639 . .
01-Jan-2020 00:05:00 3.1481 64.577 41.262 6.9614 . .
01-Jan-2020 00:06:00 -44.042 -699.24 -414.33 -75.339 . .
01-Jan-2020 00:07:00 4.4172 69.015 37.355 6.6763 . .
01-Jan-2020 00:08:00 23.509 284.8 186.89 32.597 . .
01-Jan-2020 00:09:00 17.329 214.71 124.45 20.634 . .

6 Comments

Well, I guess that depends on what your definition of "group" or "cluster" really means...you can address the variables as
tTable(:,1:2:end)
or
tTable(:,"D"+[1:2:size(tmp,2)])
that would address D1, D3, D5, ... to whatever number of columns were in the timetable tTable assumig the same naming convention.
But, that still leaves unanswered just what you really want/mean...
Thank you dpb for your comment. I rephrased my post and I hope it makes more sense now.
What I meant by grouping is that if I have a table with 1000 columns, how I can separate columns that have close characteristics or behavior. For example, if I have electricity consumption of 1000 buildings (buildings can be different in terms of size, application, etc.), is there any unsupervised method that can look at all datasets and divide columns into certain groups such as group-1, group-2, …group-n?
The Statistics/Machine Learning TB has several cluster analysis tools you could try.
I've not researched nor used the ML Neural Net TB enough to know its capabilities in the arena.
Sure, I would look at them.
Thank you for suggestions.
principal component analysis, and cross-correlation might help
Thank you @Walter Roberson for your suggestions. I will try corr(x) to see their correlation and perhaps find those that are close to each other.

Sign in to comment.

Answers (1)

Abhas
Abhas on 28 May 2025
Edited: Abhas on 28 May 2025
Hi @smoa,
You can use several learning methods in MATLAB to cluster your building electricity consumption data by similar characteristics. Here are some effective approaches for your scenario:
1. K-means Clustering: This is ideal for your use case as it:
  • Groups buildings with similar consumption patterns
  • Identifies representative centroids for each cluster
  • Is efficient for large datasets (1000 buildings)
  • Provides clear membership assignments
2. Hierarchical Clustering: This creates a dendrogram that shows:
  • Relationships between all buildings
  • How clusters merge at different similarity levels
  • Flexibility to choose the number of clusters after analysis
  • Good for exploring the natural grouping structure
3. PCA + Clustering: This two-step approach will:
  • Reduce the dimensionality of your time series data
  • Identify the most important consumption patterns
  • Make clustering more effective by removing noise
  • Improve visualization of the clusters
4. Dynamic Time Warping (DTW): Particularly useful for energy data because:
  • It handles temporal shifts in consumption patterns
  • Buildings with similar patterns but different peak times can be grouped
  • It's more robust to phase differences than Euclidean distance
5. Spectral Clustering: Good for identifying complex relationships:
  • Can find non-convex cluster shapes
  • Often performs better on complex real-world data
  • Considers the global structure of your dataset
You may refer to the below MathWorks documentation links to know more about each of them:
  1. K-Means: https://www.mathworks.com/help/stats/kmeans.html
  2. Hierarchical: https://www.mathworks.com/help/stats/hierarchical-clustering.html
  3. PCA: https://www.mathworks.com/help/stats/pca.html
  4. DTW: https://www.mathworks.com/help/signal/ref/dtw.html
  5. Special Clustering: https://www.mathworks.com/help/stats/spectral-clustering.html
I hope this helps!

Categories

Find more on MATLAB in Help Center and File Exchange

Products

Release

R2022a

Asked:

on 23 Jun 2022

Edited:

on 28 May 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!