generate a random variable that is log normal distributed and has a correlation coefficent

6 views (last 30 days)
I am trying to generate random shear wave velocity profiles. I have the mean and standard deviation for several different layers and the correlation coefficient between them. for example the top layer has a mean of 230 a standard deviation of 40. The second layer has a mean of 290 and stdev of 20. The correlation coefficient between layers is 0.7. How can I generate random profiles that include the correlation.
  1 Comment
Image Analyst
Image Analyst on 1 Mar 2016
Edited: Image Analyst on 1 Mar 2016
Are you using a normal distribution? Or log-normal like your tag? Are you using random() or randn()? Good question. I know how to get a set of random numbers, but not how to make sure that that first set has a specified correlation with some second set of random numbers.

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 1 Mar 2016
Edited: John D'Errico on 1 Mar 2016
Not at all trivial to do. I assume that the mean and standard deviation that you refer to are the actual mean and std, not the mean and std of the underlying logged variable. In fact, it is common to parameterize the lognormal in terms of the mean and standard deviation of the logged variable. Then the lognormal variate ends up as
exp(mu + sigma*Z)
where Z is a standard normal variate. For example, see this link:
You can go backwards. Thus, given the mean and standard deviation of the lognormal, you can recover the effective underlying parameters in the logged domain. For example, given a lognormal that has mean 230, and standard deviation of 40, solve for sigma here:
40/230 = sqrt(exp(sigma^2) - 1)
so
sigma = sqrt(log(545/529))
sigma =
0.17262
Then we compute mu, the mean in the logged domain as:
230 = exp(mu + sigma^2/2)
Solving for mu, we get mu = 5.4232.
We can verify that the above values do indeed generate the desired distribution statistics.
mean(exp(randn(1,1e7)*sigma + mu))
ans =
229.99
std(exp(randn(1,1e7)*sigma + mu))
ans =
39.996
As for generating multiple lognormals that have a given correlation coefficient, this will be less than trivial.
You might try starting with correlated unit normals, then convert to lognormal as I describe above. The correlations won't be correct, but they may not be too terribly far off.
As an example, with two layers (as I point out, you have offered insufficient information if there are more than two laters) here is a simple way to generate samples that have the APPROXIMATE distribution that you specify.
These functions convert actual distribution mean and variance into the lognormal parameters, mu, sigma.
muconv = @(m,v) log(m/sqrt(1+v/m^2))
sigmaconv = @(m,v) sqrt(log(1+v/m^2))
% sample correlation matrix
C = [1 .7;.7 1]
mu1 = muconv(230,40^2);
mu2 = muconv(290,20^2);
sigma1 = sigmaconv(230,40^2);
sigma2 = sigmaconv(290,20^2);
Note that use of mvnrnd is valid here, since a correlation matrix may be treated as simply a covariance matrix with unit variances.
sample = mvnrnd([0 0],C,1000000);
sample = bsxfun(@plus,bsxfun(@times,sample, [sigma1,sigma2]),[mu1,mu2]);
sample = exp(sample);
How well did we do?
mean(sample)
ans =
230.02 289.98
std(sample)
ans =
40.037 20.018
corrcoef(sample)
ans =
1 0.6966
0.6966 1
% Histograms of the two layers hist(sample,100)
So, the sample as generated did indeed have (roughly) the desired parameters.
  2 Comments
John D'Errico
John D'Errico on 1 Mar 2016
Edited: John D'Errico on 1 Mar 2016
Note: I cannot even show you how to generate numbers as you wish if there are more than two layers, because your information is insufficient in that case. You tell us that the correlation between adjacent layers is 0.7, but you fail to tell us the remainder of the correlations.
C(1,2) = 0.7
C(2,3) = 0.7
But what is the correlation coefficient for
C(1,3) = ???
In fact, this number may take on a variety of values, and may be any number between the limits [-1/50,1]. Thus if we do...
% u must lie in [-1,1] as a correlation coef
syms u
C = [1 .7 u;.7 1 .7;u .7 1]
C =
[ 1, 7/10, u]
[ 7/10, 1, 7/10]
[ u, 7/10, 1]
E = eig(C)
E =
1 - u
u/2 - (u^2 + 98/25)^(1/2)/2 + 1
u/2 + (u^2 + 98/25)^(1/2)/2 + 1
Those eigenvalues MUST be non-negative, if C is a valid correlation matrix. That happens only when u lies in the interval [-1/50,1].
Thomas Barham
Thomas Barham on 1 Mar 2016
Thanks John this is helpful. The correlation between 1,3 and any subsequent layer is the square of the previous correlation. so for between layer 1 and 2 0.7, and between 1 and 3 0.49. Thanks

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!