Why is Matlab relativeEntropy Function inconsistent with Manual Method?

8 views (last 30 days)
Dear Matlab Experts,
When I use relativeEntropy function (R2020a) it appears inconsistent when I check it with a manual method.
Four questions:
1) Can I use real numbers with this Function,
2) Can I use different sample sizes (p vs q sample sizes)
3) Any idea why function vs manual method does not provide the same top-10 features with highest entropies?
4) Why does function vs manual has different orders of magnitude for all calculations?
Thank you for any insights.
Best Regards,
--Allen
% RELATIVE ENTROPY
% dataForRelEnt, 10 x 75 table
% dataForRelEnt values: 0.000 to 1.000
% 1:6 rows are controlled samples
% 7:10 rows are random samples
% 1:75 cols are features
% Sample Labels: Controlled=1, Random=0
%
% GOAL: Compute Relative Entropy for each feature (cols 1:75) to evaluate separation
% of controlled samples (rows 1:6) from random samples (rows 7:10)
% MATLAB FUNCTION
% https://www.mathworks.com/help/predmaint/ref/relativeentropy.html
relEntMatlab = relativeEntropy(dataForRelEnt{1:10,:},logical([1,1,1,1,1,1,0,0,0,0]));
[~, idxMaxMat] = maxk(~isinf(relEntMatlab),10); % 10 highest Entropies, Function
% MANUAL METHOD = sum(p(x)*ln(p(x)/q(x))
% p(x) true event prob; q(x) random or estimated event prob
% Since X values range from 0.000 to 1.000,
% let p(x for i-th feature)=mean(rows 1:6) for i-th col
% q(x for i-th feature)=mean(rows 7:10) for i-th col
% --> one controlled and random sample for each feature
for i=1:75
p_event = mean(dataForRelEnt{1:6,i}); % p(x)
q_event = mean(dataForRelEnt{7:10,i}); % q(x)
relEntManual(i)= p_event*log(p_event/q_event);
end
[~, idxMaxMan] = maxk(relEntManual,10); % 10 highest Entropies, Manual
% COMPARE FUNCTION vs MANUAL METHOD
compareTopFeatures=[idxMaxMat;idxMaxMan]';

Accepted Answer

Abhishek Krishna
Abhishek Krishna on 6 Jul 2023
Hi,
1) Yes, you can use real numbers with the `relativeEntropy` function in MATLAB R2020a.
2) The `relativeEntropy` function allows for different sample sizes. You can use different sample sizes for the "p" and "q" samples.
3) The inconsistency in the top-10 features with the highest entropies between the function and the manual method could be due to several reasons. It's possible that there is a difference in the calculation logic or implementation between the function and your manual method. Additionally, there could be differences in how missing values or edge cases are handled. It would be helpful to compare the specific calculations and investigate any discrepancies.
4) The difference in orders of magnitude between the function and the manual method could be due to the specific calculations used in each approach. It's important to ensure that both methods are using the same formula and handling the data consistently.
To further investigate the inconsistencies, you can compare the calculations step by step and check for any differences in the implementation or assumptions made. Additionally, it might be helpful to verify the inputs and ensure that the data is correctly processed in both cases.
I hope this helps!

More Answers (0)

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!