Mean and 3-sgima for Lognormal distributions
Show older comments
Hello,
I am trying to plot the lognormal distribution over 10 iterations and would like to see the mean and 3 sigma outliers. Somehow, doing this for lognormal plots does not look easy. Also, is it possible that I can skip the histogram in my plots and only look at the lognormal curves? This is the code snippet I am using. Thanks!
figure(113);
for i = 1:10
norm=histfit(Current_c(i,:),10,'lognormal'); %%Current_c is a 10*100 matrix %%
[muHat, sigmaHat] = lognfit(Current_c(i,:));
% Plot bounds at +- 3 * sigma.
lowBound = muHat - 3 * sigmaHat;
highBound = muHat + 3 * sigmaHat;
yl = ylim;
%line([lowBound, lowBound], yl, 'Color', [0, .6, 0], 'LineWidth', 3);
%line([highBound, highBound], yl, 'Color', [0, .6, 0], 'LineWidth', 3);
line([muHat, muHat], yl, 'Color', [0, .6, 0], 'LineWidth', 3);
grid on;
set(gcf, 'Toolbar', 'none', 'Menu', 'none');
% Give a name to the title bar.
set(gcf, 'Name', 'Line segmentation', 'NumberTitle', 'Off')
hold on;
end
Accepted Answer
More Answers (2)
histfit with option dist does not really mean that the data has a lognormal distribution.
You expect it to be lognormal, so you tell histfit to fit a lognormal curve as reasonably tight as possible, over the bins of the histogram of the data you feed histfit with.
So, you feed your data in histfit, and with
rng default;
b = betarnd(3,10,100,1);
h1=histfit(b,20,'lognormal')
b_histogram=h1(1)
b_pdf_lognormal_fit=h1(2)
you catch the curve of the probability density function that histfit has applied
b_histogram contains the histogram, but you want the pdf:
b_pdf_lognormal_fit =
Line with properties:
Color: [1.00 0 0]
LineStyle: '-'
LineWidth: 2.00
Marker: 'none'
MarkerSize: 6.00
MarkerFaceColor: 'none'
XData: [1x100 double]
YData: [1x100 double]
ZData: [1x0 double]
Show all properties
AlignVertexCenters: 'off'
Annotation: [1x1 matlab.graphics.eventdata.Annotation]
BeingDeleted: 'off'
BusyAction: 'queue'
ButtonDownFcn: ''
Children: [0x0 GraphicsPlaceholder]
Clipping: 'on'
Color: [1.00 0 0]
CreateFcn: ''
DeleteFcn: ''
DisplayName: ''
HandleVisibility: 'on'
HitTest: 'on'
Interruptible: 'on'
LineJoin: 'round'
LineStyle: '-'
LineWidth: 2.00
Marker: 'none'
MarkerEdgeColor: 'auto'
MarkerFaceColor: 'none'
MarkerSize: 6.00
Parent: [1x1 Axes]
PickableParts: 'visible'
Selected: 'off'
SelectionHighlight: 'on'
Tag: ''
Type: 'line'
UIContextMenu: [0x0 GraphicsPlaceholder]
UserData: []
Visible: 'on'
XData: [1x100 double]
XDataMode: 'manual'
XDataSource: ''
YData: [1x100 double]
YDataSource: ''
ZData: [1x0 double]
ZDataSource: ''
let's focus on the probability distribution:
x=b_pdf_lognormal_fit.XData;
y=b_pdf_lognormal_fit.YData;
figure(2);
plot(x,y);
grid on;
the mean value is located somewhere between the 33rd and 34th elements of y
mean(y)
=
3.021789201683596
find(y>mean(y))
ans =
Columns 1 through 13
3 4 5 6 7 8 9 10 11 12 13 14 15
Columns 14 through 26
16 17 18 19 20 21 22 23 24 25 26 27 28
Columns 27 through 31
29 30 31 32 33
There are standard expressions for mean and variance of basic probability distributions. For lognormal I found, X be the data
- mean: E(X)=expected(mu+sigma^2/2)
- variance: V(X)=expected(2*mu+sigma^2/2)*(expected(sigma^2-1))
but I am not sure about this expression of variance, and since we have already obtained the pdf curve, we don't need to resort to cooking recipes, just apply
mean (1st statistical moment )
my=mean(y)
=
3.021789201683596
variance (2nd statistical moment)
vy=var(y)
=
17.421143231291438
standard deviation, the sigma you are after, from
is:
sy=(mean((y-my).^2)).^.5
=
4.152942547035599
standard deviation (your sigma): .. the standard deviation represents the root mean square value of the random variable around its mean ..
you may want to find the corresponding x coordinates, I can assist on this too.
If you find this answer of any help solving your question,
please click on the thumbs-up vote link,
thanks in advance
John
2 Comments
Kash022
on 1 Jun 2016
Kash022
on 2 Jun 2016
1 vote
4 Comments
John BG
on 3 Jun 2016
No worries ksnf3000, it's ok,
I give in that the explanation is a bit elaborate, but sometimes people go the other way around, that because not detailing they missed the thread.
The key points are
1.- after getting the probability distribution, interpolate to increase accuracy.
2.- the skewed asymmetric distribution with large amount of small samples has a mean value away from the central point of interest, where you want the +- sigma and +- 3sigma centred.
3.- to find the mu to center the sigmas just integrate manually: mu is located where 50% probability, 50% area of the probability function is reached.
The last plot shows mu with a o marker, +-sigma with red diamond markers, and +-3sigma with green diamond markers.
Regards
John
Kash022
on 15 Jun 2016
John BG
on 15 Jun 2016
I keep a copy of script lines only for some answers, for this one I didn't.
The key lines, for any probability distribution y over x, are the following ones:
1. find mu, where 50% probability is located:
pc_target=50
n=2
pc=sum(y([1:n]))/sum(y)*100
while pc<pc_target
n=n+1
pc=sum(y([1:n]))/sum(y)*100
endmind the gap: n is the vector reference for both y and x. You want x be the reference of y, so be it, but to integrate it is easier to use n= 1 2 3 .. do you understand this?
the loop stops on n=1601, mu is located on y(1600) that happens for x(1600)
2. for instance to find -sigma, -34% below mu:
pc_target=34.1
n=1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
while pc<pc_target
n=n+1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
endthis loop stops when the integration value is 34% below the previously found n=1600 the 50%
I named the variable after 'percentage target'
Regards
John BG
John BG
on 12 Mar 2017
Hi Kash022
I have abridged the above explanation in a function that may be of use, attached to this comment: lognormal_pdf_123sigma_locations.m
call example, replace b with data:
b = betarnd(3,10,100,1); % simulated data, replace betarand() with real data
[mu0,sm1,sp1,sm2,sp2,sm3,sp3,y2,ny2]=lognormal_pdf_123sigma_locations(b)
this produces
1.
mu0: mean
sm1: sigma1 minus
sp1: sigma1 plus
sm2: sigma2 minus
sp2: sigma2 plus
sm3: sigma3 minus
sp3: sigma3 plus
the lognormal probability distribution density approximation to the data, the 10^-10 can be scaled, do not pass the Armstrong order of magnitude to this function, scale it accordingly so that the x, the reference vector has a similar range to that of a betarnd(3,10,100,1).
2.
couple figures showing data, pdf approximation, and the location of the requested points:

.

.
3.
Copy of the function
function [mu0,sm1,sp1,sm2,sp2,sm3,sp3,y2,ny2]=lognormal_pdf_123sigma_locations(b)
% mu0: mean
% sm1: sigma 1 minus
% sp1: sigma 1 plus
% sm2: sigma 2 minus
% sp2: sigma2 plus
% sm3: sigma 3 minus
% sp3: sigma 3 plus
% author: John Bofarull Guix, jgb2012@sky.com
% b = betarnd(3,10,100,1); % simulated data, replace betarand() with real data
figure(1);
% h1=histfit(b,20,'lognormal') % test
h1=histfit(b,20,'lognormal')
b_histogram=h1(1);grid on;
b_pdf_lognormal_fit=h1(2)
y=b_pdf_lognormal_fit.YData;
sumy=sum(y);y=y/sumy;
ny=b_pdf_lognormal_fit.XData;
N=100;
y2=interp(y,N);y2=y2/sum(y2);
ny2=[1:1:numel(y)]*N;
pc_target=50; % mu: distribution mean value
n=2
pc=sum(y2([1:n])) *100
while pc<pc_target
n=n+1;pc=sum(y2([1:n])) *100;
end
mu=n;
figure(2);plot(y2);hold all
figure(2);plot(mu,y2(n),'bo');grid on
pc_target=34; [s1_min,s1_plus]=go_get_it(pc_target,mu,y2); % +- sigma
figure(2);plot(s1_min,y2(s1_min),'ro');
figure(2);plot(s1_plus,y2(s1_plus),'ro');
pc_target=47.5; [s2_min,s2_plus]=go_get_it(pc_target,mu,y2); % +-2*sigma
figure(2);plot(s2_min,y2(s2_min),'ro')
figure(2);plot(s2_plus,y2(s2_plus),'ro');
pc_target=49.7;[s3_min,s3_plus]=go_get_it(pc_target,mu,y2); % +-3*sigma
figure(2);plot(s3_min,y2(s3_min),'ro');
figure(2);plot(s3_plus,y2(s3_plus),'ro');
mu0=mu;
sm1=s1_min;
sp1=s1_plus;
sm2=s2_min;
sp2=s2_plus;
sm3=s3_min;
sp3=s3_plus;
function [sdown,sup]=go_get_it(pc_target,mu,y2)
% support function finds t1 t2 such sum(y([t1:mu]))=sum(y([mu:t2]))
n=1;pc=sum(y2([mu-n:mu])) *100; % search - s
while pc<pc_target
n=n+1;pc=sum(y2([mu-n:mu])) *100;
end
sdown=mu-n;
% figure(h);
% plot(h,sdown,y2(sdown),'ro');hold all;
n=1;pc=sum(y2([mu:mu+n])) *100; % search + s
while pc<pc_target
n=n+1;pc=sum(y2([mu:mu+n])) *100 ;
end
sup=n+mu;
% figure(2);
% plot(h,sup,y2(sup),'ro');
end
end
4.
Explanation:
in last year conversation the generation of the pdf curve was limited to the same x range of the generated data, that introduced error caused by step not small enough.
Reproducing the same code used earlier on
pc_target=50; % mu: distribution mean value
n=2
pc=sum(y([1:n])) *100
while pc<pc_target
n=n+1;pc=sum(y([1:n])) *100;
end
mu=n;
figure(2);plot(y);hold all
figure(2);plot(mu,y(n),'bo');grid on
pc_target=34; % -sigma
n=1;
pc=sum(y([mu-n:mu])) *100;
while pc<pc_target
n=n+1;pc=sum(y([mu-n:mu])) *100;
end
sigma1_min=mu-n;
figure(2);plot(sigma1_min,y(sigma1_min),'ro');
pc_target=34 ; % +sigma
n=1;
pc=sum(y([mu:mu+n])) *100
while pc<pc_target
n=n+1;pc=sum(y([mu:mu+n])) *100 ;
end
sigma1_plus=n+mu;
figure(2);plot(sigma1_plus,y(sigma1_plus),'ro');
pc_target=47.5; % -2*sigma
n=1;
pc=sum(y([mu-n:mu])) *100;
while pc<pc_target
n=n+1;pc=sum(y([mu-n:mu])) *100;
end
sigma2_min=mu-n;
figure(2);plot(sigma2_min,y(sigma2_min),'go');
pc_target=47.5 ; % +2*sigma
n=1;
pc=sum(y([mu:mu+n])) *100
while pc<pc_target
n=n+1;pc=sum(y([mu:mu+n])) *100 ;
end
sigma2_plus=n+mu;
figure(2);plot(sigma2_plus,y(sigma2_plus),'go');
pc_target=49.7; % -3*sigma
n=1;
pc=sum(y([mu-n:mu])) *100;
while pc<pc_target
n=n+1;pc=sum(y([mu-n:mu])) *100;
end
sigma3_min=mu-n;
figure(2);plot(sigma3_min,y(sigma3_min),'ko');
pc_target=49.7 ; % +3*sigma
n=1;
pc=sum(y([mu:mu+n])) *100
while pc<pc_target
n=n+1;pc=sum(y([mu:mu+n])) *100 ;
end
sigma3_plus=n+mu;
figure(2);plot(sigma3_plus,y(sigma3_plus),'ko');
.
the points are not wrongly located from the data point of view, the data is not accurate enough, and the rectangles approximating the integration are too wide.
.

.
checks
sum(y([mu:sigma1_plus]))
sum(y([sigma1_min:mu]))
sum(y([mu:sigma2_plus]))
sum(y([sigma2_min:mu]))
sum(y([mu:sigma3_plus]))
sum(y([sigma3_min:mu]))
sum(y([sigma3_plus:end]))
sum(y([1:sigma3_min]))
ans =
0.343910033120536
ans =
0.376779316893812
ans =
0.475739109483067
ans =
0.486722066501548
ans =
0.499152952111189
ans =
0.498547671119994
ans =
0.034710670665522
ans =
0.020295937115606
the first sum seems ok, 34% but the second should be 34% too, the 4th should be 47% and the last 2 tails should be 2.1% according to
.

.
hope it helps, and having a look to the other question you mentioned
John BG
Categories
Find more on Exploration and Visualization in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


