Mean and 3-sgima for Lognormal distributions

Question

0 votes

Hello,

I am trying to plot the lognormal distribution over 10 iterations and would like to see the mean and 3 sigma outliers. Somehow, doing this for lognormal plots does not look easy. Also, is it possible that I can skip the histogram in my plots and only look at the lognormal curves? This is the code snippet I am using. Thanks!

figure(113);
for i = 1:10
norm=histfit(Current_c(i,:),10,'lognormal'); %%Current_c is a 10*100 matrix %%
[muHat, sigmaHat] = lognfit(Current_c(i,:));
% Plot bounds at +- 3 * sigma.
lowBound = muHat - 3 * sigmaHat;
highBound = muHat + 3 * sigmaHat;
yl = ylim;
%line([lowBound, lowBound], yl, 'Color', [0, .6, 0], 'LineWidth', 3);
%line([highBound, highBound], yl, 'Color', [0, .6, 0], 'LineWidth', 3);
line([muHat, muHat], yl, 'Color', [0, .6, 0], 'LineWidth', 3);
grid on;
set(gcf, 'Toolbar', 'none', 'Menu', 'none');
% Give a name to the title bar.
set(gcf, 'Name', 'Line segmentation', 'NumberTitle', 'Off')
hold on;
end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

John BG on 1 Jun 2016

Edited: John BG on 1 Jun 2016

Open in MATLAB Online

2 votes

ok no worries

to visualize the mean, sigma and 3 sigma on the curve of the probability distribution that approximates your data you have to consider that the bins of the histogram and too wide to directly get accurate marks on the curve, let me explain:

If you try

find(y==my)
 =
   Empty matrix: 1-by-0

no match

But you can find the histogram bin where the mean is contained in:

find(y>my)
 =
  Columns 1 through 14
     3     4     5     6     7     8     9    10    11    12    13    14    15    16
  Columns 15 through 28
    17    18    19    20    21    22    23    24    25    26    27    28    29    30
  Columns 29 through 31
    31    32    33
find(y<=my)
 =
  Columns 1 through 14
     1     2    34    35    36    37    38    39    40    41    42    43    44    45
  Columns 15 through 28
    46    47    48    49    50    51    52    53    54    55    56    57    58    59
  Columns 29 through 42
    60    61    62    63    64    65    66    67    68    69    70    71    72    73
  Columns 43 through 56
    74    75    76    77    78    79    80    81    82    83    84    85    86    87
  Columns 57 through 69
    88    89    90    91    92    93    94    95    96    97    98    99   100

Let be ny the reference vector of y:

ny=[1:1:length(y)]

then the mean value is somewhere within the interval

[33 34]

So, to accurately plot my and 3*sy you first have to decide how accurate you need be.

While the y step is, and it varies along the curve

abs(y(33)-y(34))
ans =
   0.258998840336178

the x step is constant

abs(x(33)-x(34))
 =
   0.011562155305573
mean(diff(x)) 
=   0.011562155305573
max(diff(x))   
=   0.011562155305573
min(diff(x))    
=   0.011562155305573

So if you decide that 2 decimals precision (on y) is enough, we have to refine x small enough so that at least one point of y interpolated has the value my truncated after 2nd decimal:

3.02xxxx

for 0.2589 go down to 0.0001 the y step has to be fractioned at least

0.2589/dy=0.0001 hence dy=258.9 , let's take dy=259

the angle

alpha=atand(abs(y(33)-y(34))/abs(x(33)-x(34)))
=
 87.443914593184061

since

    dx=.0001/atand(alpha)
    =
         1.119259323130124e-06
x_step=abs(x(33)-x(34))/dx
   =
       1.033018449490165e+04

make it

x_step=10331

then

x2=linspace(x(1),x(end),x_step);
y2=interp1(x,y,x2);

overlap, check both y and y2 are the same:

figure(1);plot(x,y,x2,y2);grid on;grid minor

where in y2 is my located?

find(y2>my)
=
..
..
      3396        3397        3398        3399        3400        3401        3402
  Columns 3228 through 3234
        3403        3404        3405        3406        3407        3408        3409
  Columns 3235 through 3238
        3410        3411        3412        3413
x_mean=max(find(y2>my))

the actual mean value is going to be approximated with y2(3413) = 3.021925056586602

and this error is acceptable

 abs(my-y2(3413))
 =
    1.358549030059386e-04

put what looks like the mean on the curve:

hold on;plot(x2(x_mean),y2(x_mean),'bo')

. . but when checking

sum(y)
 =
     3.021789201683596e+02
length(y)
 =
   100
mean(y)
=
   3.021789201683596

and already found mean located between [33 34]

the upper tail cannot even accommodate half single sigma (34.1%)

sum(y([33:end]))/sum(y)*100
 =
  13.483260209259832

Let's find mu summing samples:

pc_target=50
n=2
pc=sum(y([1:n]))/sum(y)*100
while pc<pc_target
  n=n+1
  pc=sum(y([1:n]))/sum(y)*100
end
..
..
n =    14
pc =  42.855348381237192
n =    15
pc =  46.872419796584815
n =    16
pc =  50.701800361715335

So it turns out mu is within [15 16]

the x boundaries for y~mu

x([15 16])
=
0.201607447590281   0.213169602895854

Using y2 for more detail

pc_target=50
n=2
pc=sum(y2([1:n]))/sum(y2)*100
while pc<pc_target
  n=n+1
  pc=sum(y2([1:n]))/sum(y2)*100
end
..
..
n =  1600
pc = 49.982072607170089
n =  1601
pc = 50.018166984408886

for y2, mu is within [1600 1601]

n_mu=1600
and x2 boundaries
x2([1600 1601])
ans =
   0.216920307874458   0.217031116526467
n_mu=1600

the upper y2 boundary for 1 sigma (+34.1%)

pc_target=34.1
n=1
pc=sum(y2([1600:1600+n]))/sum(y2)*100
while pc<pc_target
  n=n+1
  pc=sum(y2([1600:1600+n]))/sum(y2)*100
end
..
..
n =
        1475
pc =
  34.089262560230956
n =
        1476
pc =
  34.101853576106748

the numeral distance for +1sigma is

n_up_1sigma=1475;

And the location on x2 of +1sigma is:

x2_up_1sigma=x2(1600+n_up_1sigma)
x2_up_1sigma =
   0.380363069587555

the lower y2 boundary for 1 sigma (-34.1%)

pc_target=34.1
n=1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
while pc<pc_target
  n=n+1
  pc=sum(y2([1600-n:1600]))/sum(y2)*100
end
..
..
n=
842
pc =
  34.078560325462846
n =
   843
pc =
  34.117107246027864

the numeral distance for -1sigma is

n_down_1sigma=842;

And the location on x2 of -1sigma is:

x2_down_1sigma=x2(1600-n_down_1sigma)
 =
   0.123619422882982

Repeating for 3 sigma, understanding that all +- sigma interval cover 89%, then up it is:

pc_target=44.5
n=1
pc=sum(y2([1600:1600+n]))/sum(y2)*100
while pc<pc_target
  n=n+1
  pc=sum(y2([1600:1600+n]))/sum(y2)*100
end
..
..
n =
        2836
pc =
  44.496938162892050
n =
        2837
pc =
  44.501155111466957
n_up_3sigma=2836;
   x2(n_up_3sigma)
  =
     0.353879801757433

and down we go:

pc_target=44.5
n=1
pc=sum(y2([1600-n:1600]))/sum(y2)*100
while pc<pc_target
  n=n+1
  pc=sum(y2([1600-n:1600]))/sum(y2)*100
end
..
..
n =
        1161
pc =
  44.484883787669325
n =
        1162
pc =
  44.509871734900422
n_down_3sigma=1161;
x2(n_down_3sigma)
ans =
   0.168275309642560

and the plot you asked for:

plot(x2,y2,x2(1600),y2(1600),'ro',...
x2(n_mu+n_up_1sigma),y2(n_mu+n_up_1sigma),'rd',x2(n_mu-n_down_1sigma),y2(n_mu-n_down_1sigma),'rd',...
x2(n_mu+n_up_3sigma),y2(n_mu+n_up_3sigma),'gd',x2(n_mu-n_down_3sigma),y2(n_mu-n_down_3sigma),'gd')
grid on;grid minor;

additional comments:

1. do not mix lognormal mu sigma and normal mu sigma parameters: mind the notation gap. Your x is a reference vector and your y is the actual distribution.

However, in

http://uk.mathworks.com/help/stats/lognpdf.html?searchHighlight=lognpdf

Y=lognpdf(X,mu,sigma)

MATLAB function lognpdf calculates the lognormal distribution Y out of the normal distribution X, where X has mean mu and standard variance sigma. In lognpdf help page, both Y and X are random variables, data, each with their respective reference vectors.

As explained in lognpdf help page, the 'mu' and 'sigma' of Y are related to those belonging to X by:

m=exp(mu+sigma^2/2)
v=exp(2*mu+sigma^2)*(exp(sigma^2)-1)

and

mu=log(m^2/(v+m^2)^.5)
sigma=(log(v/m^2)+1)^.5

2.- while this one got it right http://keisan.casio.com/exec/system/1180573214

this one may be having same problem of displaced mean for assymetric distribution https://667fdf68dbe4f0e67f750fedc96d9bd17eba3000.googledrive.com/host/0B2GQktu-wcTiVFpoRThHN3Z3cUE/

You may find Mr Wheeler remarks about lognormals interesting http://www.qualitydigest.com/inside/six-sigma-column/100515-properties-probability-models-part-3.html

bankers seem to have a lot of interest on lognormal distributions

MIT: https://www.youtube.com/watch?v=92WaNz9mPeY

would you mind to both vote on the thumbs-up and

mark my answer as accepted?

thanks in advance

John

jgb2012@sky.com

1 Comment
Show -1 older comments Hide -1 older comments

Jan on 24 Feb 2017

Edited: Jan on 25 Feb 2017

John BG has accepted this answer by his own. According to his comment it is not clear, if it was useful.

Sign in to comment.

Answer 2

John BG on 23 May 2016

Edited: John BG on 1 Jun 2016

Open in MATLAB Online

1 vote

histfit with option dist does not really mean that the data has a lognormal distribution.

You expect it to be lognormal, so you tell histfit to fit a lognormal curve as reasonably tight as possible, over the bins of the histogram of the data you feed histfit with.

So, you feed your data in histfit, and with

    rng default;  
    b = betarnd(3,10,100,1);
    h1=histfit(b,20,'lognormal')
    b_histogram=h1(1)
    b_pdf_lognormal_fit=h1(2)

you catch the curve of the probability density function that histfit has applied

b_histogram contains the histogram, but you want the pdf:

b_pdf_lognormal_fit = 
  Line with properties:
              Color: [1.00 0 0]
          LineStyle: '-'
          LineWidth: 2.00
             Marker: 'none'
         MarkerSize: 6.00
    MarkerFaceColor: 'none'
              XData: [1x100 double]
              YData: [1x100 double]
              ZData: [1x0 double]
    Show all properties
      AlignVertexCenters: 'off'
              Annotation: [1x1 matlab.graphics.eventdata.Annotation]
            BeingDeleted: 'off'
              BusyAction: 'queue'
           ButtonDownFcn: ''
                Children: [0x0 GraphicsPlaceholder]
                Clipping: 'on'
                   Color: [1.00 0 0]
               CreateFcn: ''
               DeleteFcn: ''
             DisplayName: ''
        HandleVisibility: 'on'
                 HitTest: 'on'
           Interruptible: 'on'
                LineJoin: 'round'
               LineStyle: '-'
               LineWidth: 2.00
                  Marker: 'none'
         MarkerEdgeColor: 'auto'
         MarkerFaceColor: 'none'
              MarkerSize: 6.00
                  Parent: [1x1 Axes]
           PickableParts: 'visible'
                Selected: 'off'
      SelectionHighlight: 'on'
                     Tag: ''
                    Type: 'line'
           UIContextMenu: [0x0 GraphicsPlaceholder]
                UserData: []
                 Visible: 'on'
                   XData: [1x100 double]
               XDataMode: 'manual'
             XDataSource: ''
                   YData: [1x100 double]
             YDataSource: ''
                   ZData: [1x0 double]
             ZDataSource: ''

let's focus on the probability distribution:

x=b_pdf_lognormal_fit.XData;
y=b_pdf_lognormal_fit.YData;
figure(2);
plot(x,y);
grid on;

the mean value is located somewhere between the 33rd and 34th elements of y

mean(y)
 =
   3.021789201683596
find(y>mean(y))
ans =
  Columns 1 through 13
     3     4     5     6     7     8     9    10    11    12    13    14    15
  Columns 14 through 26
    16    17    18    19    20    21    22    23    24    25    26    27    28
  Columns 27 through 31
    29    30    31    32    33

There are standard expressions for mean and variance of basic probability distributions. For lognormal I found, X be the data

mean: E(X)=expected(mu+sigma^2/2)
variance: V(X)=expected(2*mu+sigma^2/2)*(expected(sigma^2-1))

but I am not sure about this expression of variance, and since we have already obtained the pdf curve, we don't need to resort to cooking recipes, just apply

mean (1st statistical moment )

my=mean(y)
 =
   3.021789201683596

variance (2nd statistical moment)

vy=var(y)
 =
  17.421143231291438

standard deviation, the sigma you are after, from

http://www.amazon.com/Probability-Random-Variables-Stochastic-Processes/dp/0071226613/ref=sr_1_1?ie=UTF8&qid=1464030723&sr=8-1&keywords=Papoulis page 144

is:

   sy=(mean((y-my).^2)).^.5
     =
       4.152942547035599

standard deviation (your sigma): .. the standard deviation represents the root mean square value of the random variable around its mean ..

you may want to find the corresponding x coordinates, I can assist on this too.

If you find this answer of any help solving your question,

please click on the thumbs-up vote link,

thanks in advance

John

jgb2012@sky.com

2 Comments
Show None Hide None

Kash022 on 1 Jun 2016

@John: thanks a lot for the great response! Yes, the curves now seem to come out much better...however, a couple of things, 1. Yes, I would like to know the corresponding x values as they will tell me the actual mean of my variable and the 3-sgima distributions.. 2. It would be great if we could see these (mean and 3sigma) on the histogram plot! Thanks!

Kash022 on 1 Jun 2016

Edited: Kash022 on 1 Jun 2016

@John..please have a look at the attached image showing the lognormal plot for my distribution..for this I need the mean and 3 sigma values and if possible, it could be shown on the graph. Thanks! Please note...x axis has value in the xE-10 range

Sign in to comment.

Answer 3

Kash022 on 2 Jun 2016

1 vote

Sorry @John..your answer is way too big and confusing..honestly now I have no idea what to plot..would have been great if your answer would have been concise and to the point rather than so many lines where I seem to be lost..thanks for the effort anyways!

4 Comments
Show 2 older comments Hide 2 older comments

John BG on 15 Jun 2016

I keep a copy of script lines only for some answers, for this one I didn't.

The key lines, for any probability distribution y over x, are the following ones:

1. find mu, where 50% probability is located:

      pc_target=50
      n=2
      pc=sum(y([1:n]))/sum(y)*100
      while pc<pc_target
      	n=n+1
      	pc=sum(y([1:n]))/sum(y)*100
      end

mind the gap: n is the vector reference for both y and x. You want x be the reference of y, so be it, but to integrate it is easier to use n= 1 2 3 .. do you understand this?

the loop stops on n=1601, mu is located on y(1600) that happens for x(1600)

2. for instance to find -sigma, -34% below mu:

      pc_target=34.1
      n=1
      pc=sum(y2([1600-n:1600]))/sum(y2)*100
      while pc<pc_target
      	n=n+1
      	pc=sum(y2([1600-n:1600]))/sum(y2)*100
      end

this loop stops when the integration value is 34% below the previously found n=1600 the 50%

I named the variable after 'percentage target'

Regards

John BG

jgb2012@sky.com

John BG on 12 Mar 2017

Open in MATLAB Online

lognormal_pdf_123sigma_locations.m

Hi Kash022

I have abridged the above explanation in a function that may be of use, attached to this comment: lognormal_pdf_123sigma_locations.m

call example, replace b with data:

b = betarnd(3,10,100,1);  % simulated data, replace betarand() with real data
[mu0,sm1,sp1,sm2,sp2,sm3,sp3,y2,ny2]=lognormal_pdf_123sigma_locations(b)

this produces

1.

mu0: mean
sm1: sigma1 minus
sp1: sigma1 plus
sm2: sigma2 minus 
sp2: sigma2 plus
sm3: sigma3 minus
sp3: sigma3 plus

the lognormal probability distribution density approximation to the data, the 10^-10 can be scaled, do not pass the Armstrong order of magnitude to this function, scale it accordingly so that the x, the reference vector has a similar range to that of a betarnd(3,10,100,1).

2.

couple figures showing data, pdf approximation, and the location of the requested points:

.

3.

Copy of the function

function [mu0,sm1,sp1,sm2,sp2,sm3,sp3,y2,ny2]=lognormal_pdf_123sigma_locations(b)
% mu0: mean
% sm1: sigma 1 minus
% sp1: sigma 1 plus
% sm2: sigma 2 minus 
% sp2: sigma2 plus
% sm3: sigma 3 minus
% sp3: sigma 3 plus
% author: John Bofarull Guix, jgb2012@sky.com
% b = betarnd(3,10,100,1);  % simulated data, replace betarand() with real data
figure(1);
% h1=histfit(b,20,'lognormal') % test
h1=histfit(b,20,'lognormal')
b_histogram=h1(1);grid on;
b_pdf_lognormal_fit=h1(2)
y=b_pdf_lognormal_fit.YData;
sumy=sum(y);y=y/sumy;
ny=b_pdf_lognormal_fit.XData;
N=100;
y2=interp(y,N);y2=y2/sum(y2);
ny2=[1:1:numel(y)]*N;
pc_target=50;                                      %  mu: distribution mean  value
n=2
pc=sum(y2([1:n])) *100
while pc<pc_target
    n=n+1;pc=sum(y2([1:n])) *100;
end
mu=n;  
figure(2);plot(y2);hold all
figure(2);plot(mu,y2(n),'bo');grid on
pc_target=34; [s1_min,s1_plus]=go_get_it(pc_target,mu,y2);               % +- sigma
figure(2);plot(s1_min,y2(s1_min),'ro');
figure(2);plot(s1_plus,y2(s1_plus),'ro');
pc_target=47.5; [s2_min,s2_plus]=go_get_it(pc_target,mu,y2);          %  +-2*sigma
figure(2);plot(s2_min,y2(s2_min),'ro')
figure(2);plot(s2_plus,y2(s2_plus),'ro');
pc_target=49.7;[s3_min,s3_plus]=go_get_it(pc_target,mu,y2);            %  +-3*sigma
figure(2);plot(s3_min,y2(s3_min),'ro');
figure(2);plot(s3_plus,y2(s3_plus),'ro');
mu0=mu;
sm1=s1_min;
sp1=s1_plus;
sm2=s2_min;
sp2=s2_plus;
sm3=s3_min;
sp3=s3_plus;
function [sdown,sup]=go_get_it(pc_target,mu,y2)
% support function finds t1 t2 such sum(y([t1:mu]))=sum(y([mu:t2]))
n=1;pc=sum(y2([mu-n:mu])) *100;  % search - s
while pc<pc_target
        n=n+1;pc=sum(y2([mu-n:mu])) *100;
end
sdown=mu-n;
% figure(h);
% plot(h,sdown,y2(sdown),'ro');hold all;
n=1;pc=sum(y2([mu:mu+n])) *100;   % search + s
while pc<pc_target
        n=n+1;pc=sum(y2([mu:mu+n])) *100 ;
end
sup=n+mu;
% figure(2);
% plot(h,sup,y2(sup),'ro');
end
end

4.

Explanation:

in last year conversation the generation of the pdf curve was limited to the same x range of the generated data, that introduced error caused by step not small enough.

Reproducing the same code used earlier on

pc_target=50;                                      %  mu: distribution mean  value
n=2
pc=sum(y([1:n])) *100
while pc<pc_target
    n=n+1;pc=sum(y([1:n])) *100;
end
mu=n;  
figure(2);plot(y);hold all
figure(2);plot(mu,y(n),'bo');grid on
pc_target=34;                                                                        %  -sigma
n=1;
pc=sum(y([mu-n:mu])) *100;
while pc<pc_target
        n=n+1;pc=sum(y([mu-n:mu])) *100;
end
sigma1_min=mu-n;
figure(2);plot(sigma1_min,y(sigma1_min),'ro');
pc_target=34 ;                                                                    %  +sigma
n=1;
pc=sum(y([mu:mu+n])) *100 
while pc<pc_target
        n=n+1;pc=sum(y([mu:mu+n])) *100 ;
end
sigma1_plus=n+mu;
figure(2);plot(sigma1_plus,y(sigma1_plus),'ro');
pc_target=47.5;                                                                        %  -2*sigma
n=1;
pc=sum(y([mu-n:mu])) *100;
while pc<pc_target
        n=n+1;pc=sum(y([mu-n:mu])) *100;
end
sigma2_min=mu-n;
figure(2);plot(sigma2_min,y(sigma2_min),'go');
pc_target=47.5 ;                                                                    %  +2*sigma
n=1;
pc=sum(y([mu:mu+n])) *100 
while pc<pc_target
        n=n+1;pc=sum(y([mu:mu+n])) *100 ;
end
sigma2_plus=n+mu;
figure(2);plot(sigma2_plus,y(sigma2_plus),'go');
pc_target=49.7;                                                                        %  -3*sigma
n=1;
pc=sum(y([mu-n:mu])) *100;
while pc<pc_target
        n=n+1;pc=sum(y([mu-n:mu])) *100;
end
sigma3_min=mu-n;
figure(2);plot(sigma3_min,y(sigma3_min),'ko');
pc_target=49.7 ;                                                                    %  +3*sigma
n=1;
pc=sum(y([mu:mu+n])) *100 
while pc<pc_target
        n=n+1;pc=sum(y([mu:mu+n])) *100 ;
end
sigma3_plus=n+mu;
figure(2);plot(sigma3_plus,y(sigma3_plus),'ko');

.

the points are not wrongly located from the data point of view, the data is not accurate enough, and the rectangles approximating the integration are too wide.

.

checks

sum(y([mu:sigma1_plus]))
sum(y([sigma1_min:mu]))
sum(y([mu:sigma2_plus]))
sum(y([sigma2_min:mu]))
sum(y([mu:sigma3_plus]))
sum(y([sigma3_min:mu]))
sum(y([sigma3_plus:end]))
sum(y([1:sigma3_min]))
ans =
   0.343910033120536
ans =
   0.376779316893812
ans =
   0.475739109483067
ans =
   0.486722066501548
ans =
   0.499152952111189
ans =
   0.498547671119994
ans =
   0.034710670665522
ans =
   0.020295937115606

the first sum seems ok, 34% but the second should be 34% too, the 4th should be 47% and the last 2 tails should be 2.1% according to

.

hope it helps, and having a look to the other question you mentioned

John BG

<mailto:jgb2012@sky.com jgb2012@sky.com>

Sign in to comment.

Mean and 3-sgima for Lognormal distributions

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (2)

2 Comments
Show None Hide None

4 Comments
Show 2 older comments Hide 2 older comments

Categories

Tags

Community Treasure Hunt

Mean and 3-sgima for Lognormal distributions

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (2)

2 Comments Show None Hide None

4 Comments Show 2 older comments Hide 2 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

2 Comments
Show None Hide None

4 Comments
Show 2 older comments Hide 2 older comments