Generating PDF and Overplotting from Subset of Data Using Gaussian Mixture Model
3 views (last 30 days)
Show older comments
Assume there is a data set that contains n-measurments (In this case n=2) and that overlap. I wish to discriminate a subset of the total set by identifying the greater proportion and then applying a probability density function fit onto the distribution. I tried doing just that by applying the fitgmdist function onto the data set, knowing n=2 and then chose the higher "componentproportion" as the true subset I wish to keep the fit for. I thought what I could do was apply makedist using the mu and sigma from the distirbution I chose (TDist), then create a probability density function using pdf so I can overplot it on top of the histogram data.
clc; clear;
DataSize = 10000;
linLength = 1000;
Data1 = normrnd(-2,1,[1,DataSize]);
Data2 = normrnd(3,2,[1,DataSize]);
Data = [Data1,Data2]';
Dist = fitgmdist(Data,2); %Create fits for both distributions
DistNdex = find(Dist.ComponentProportion==max(Dist.ComponentProportion)); %Find distribuition with greater contribution
TDist.Mu = Dist.Mu(DistNdex); %Average
TDist.Sigma = Dist.Sigma(DistNdex); %Std. Dev
PD = makedist('Normal','mu',TDist.Mu,'sigma',TDist.Sigma); %Create normal distribution using mu/sigma
xPD = linspace(TDist.Mu - 3*TDist.Sigma,TDist.Mu + 3*TDist.Sigma,linLength); %Create linspace that spans 3 sigma
pdfValues = pdf('Normal',xPD,TDist.Mu,TDist.Sigma); %Create non-normalized pdf over defined linspace
NormPdfValues = normpdf(xPD,TDist.Mu,TDist.Sigma); %Create normalized pdf over defined linspace
%Plotting
figure
histogram(Data)
hold on
plot(xPD,pdfValues,'r','LineWidth',5)
plot(xPD,NormPdfValues,'g','LineWidth',5)
hold off
But my issue here is that the max y-value for the fit is incredibly small wrt the data (Regardless of whether it is normalized or not). Why is this and how do I specify what the max y-value should be for it's associated component distribution? I'm thinking I'm lacking a piece of stats knowledge here rather than having a Matlab issue.
PS - I know this code gives an error when I ran it within the browser. Not sure why it doesn't recognize the substructure "Mu" but it works just fine and runs on my local without issues.
1 Comment
Accepted Answer
Chris
on 1 Dec 2022
Edited: Chris
on 1 Dec 2022
I believe the area under the curve of these PDFs is 1.
One way to work around that (though probably not the most correct way) would be to scale the pdf by its max value, to the maximum of the histogram.
DataSize = 10000;
linLength = 1000;
Data1 = normrnd(-2,1,[1,DataSize]);
Data2 = normrnd(3,2,[1,DataSize]);
Data = [Data1,Data2]';
Dist = fitgmdist(Data,2); %Create fits for both distributions
DistNdex = find(Dist.ComponentProportion==max(Dist.ComponentProportion)); %Find distribuition with greater contribution
TDist.Mu = Dist.mu(DistNdex); %Average
TDist.Sigma = Dist.Sigma(DistNdex); %Std. Dev
PD = makedist('Normal','mu',TDist.Mu,'sigma',TDist.Sigma); %Create normal distribution using mu/sigma
xPD = linspace(TDist.Mu - 3*TDist.Sigma,TDist.Mu + 3*TDist.Sigma,linLength); %Create linspace that spans 3 sigma
pdfValues = pdf('Normal',xPD,TDist.Mu,TDist.Sigma); %Create non-normalized pdf over defined linspace
% NormPdfValues = normpdf(xPD,TDist.Mu,TDist.Sigma); %Create normalized pdf over defined linspace
%Plotting
figure
h = histogram(Data);
hold on
mult = max(h.BinCounts)/max(pdfValues);
plot(xPD, mult*pdfValues,'k--','LineWidth',2)
2 Comments
More Answers (0)
See Also
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!