Subsets of uncorrelated features
Show older comments
Given a N by N correlation matrix of N features, how to find ALL subsets of pariwise uncorrelated features if we assume two features are uncorrelated if their correlation score is less than a threshold Alpha. There is no restriction on the number of features making the subsets. All features making a subset need to be pairwise uncorrelated.
Accepted Answer
More Answers (2)
Let R be the pairwise correlation matrix:
N = 10;
R = rand(N);
R(logical(eye(N))) = 1;
for i = 1:size(R, 1) - 1
for j = i+1:size(R, 1)
R(j, i) = R(i, j);
end
end
disp(R)
cutoff = 0.4; % independent features
idx = R < cutoff;
idx = triu(idx); % R(i, j) == R(j, i) in pairwise correlation matrix
features = "feature" + (1:N); % feature names
% there may be a simpler way to do this
indepFeatures = [];
for i = 1:N
indepFeatures = [indepFeatures, arrayfun(@(x)[x, features(i)], features(idx(i, :)), 'uni', false)];
end
indepFeatures = vertcat(indepFeatures{:});
% find all cliques of this set
nodes = zeros(size(indepFeatures, 1), 1);
[~, nodes(:, 1)] = ismember(indepFeatures(:, 1), features);
[~, nodes(:, 2)] = ismember(indepFeatures(:, 2), features);
G = graph(nodes(:, 1), nodes(:, 2));
M = maximalCliques(adjacency(G));
indepSets = cell(size(M, 2), 1);
for i = 1:numel(indepSets)
indepSets{i} = features(M(:, i) ~= 0);
end
indepSets(cellfun(@numel, indepSets) < 2) = []; % this can be further unified with indepFeatures
12 Comments
Kais
on 11 Jul 2021
Image Analyst
on 11 Jul 2021
@Kais, Why do you need this? What is the use case? What will you do with the information after this? Have you considered principal components analysis?
Kais
on 11 Jul 2021
Ive J
on 11 Jul 2021
@Kais So, have you looked at feature selection? There are quite copule of approaches to do so, especially for clinical/medical applications (I assumed you'll use them in a regression analysis afterwards). For instance, you can use simple F-test approaches like fsrftest, or penalized regression (e.g. lasso or ridge).
Kais
on 12 Jul 2021
Ive J
on 12 Jul 2021
@Kais So, you can try my modified answer; this should do the job (but note the problem may get complicated with large number of features). But, I'm sure you are aware of the drawbacks of this approach: the simplest scenario would be regression analysis when you don't know which [correlated] features better explain the response variable, and you may incorrectly exclude those features. Say A and B are highly correlated but A is a better predictor of response, but you select B (simply because you don't check the amount of response variance A or B explain).
Kais
on 12 Jul 2021
Kais
on 12 Jul 2021
As I commented on the last line, indepSets and indepFeatures (lenght = 22 with your data) should be merged, and in your example there are 7 (and not 5) triples. So, if you only keep sets with a length > 2, you can then merge this with indepFeatures,which has been already generated:
indepSets(cellfun(@numel, indepSets) < 3) = []; % my original example is < 2
So, as I said there are 7 triples:
"feature1" "feature6" "feature9"
"feature2" "feature6" "feature9"
"feature4" "feature6" "feature9"
"feature5" "feature6" "feature9"
"feature6" "feature7" "feature9"
"feature6" "feature8" "feature9"
"feature6" "feature9" "feature10"
Kais
on 15 Jul 2021
Image Analyst
on 11 Jul 2021
0 votes
Would stepwise regression be of any help?
Otherwise, just make an N by N table of correlation coefficients by corelating every feature with every other feature.
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!