Normality test of the Matlab function 'randn'
Show older comments
Hi Matlab community,
I have generated 2000 samples using Matlab function 'randn', and was trying to test it for normality. The h-value return 0 which is expected, however, the p-value is too high! What is going wrong here? Please see the below simple example:
n = randn(1,2000);
[h,p] = lillietest(n)
h =
0
p =
0.4115 %%% Why it is too high?
[h,p] = kstest(n)
h =
logical
0
p =
0.8158 % Same here!
Accepted Answer
More Answers (2)
I always use the Wilks-Shapiro test(*) and have little experience with Lillifors test, but some trials with various N have convinced me the MATLAB PRNG for normal isn't much good...especially in the tails. Without more digging than had time for right now, it's not clear which generator it is using by default to get a normal; it says it uses the Mersene Twister for the default uniform stream; it doesn't easily relate to the actual normal generation algorithm that I found quickly, anyways.
I increased the sample size to 10K and it's obvious even from just a histogram the distribution is skewed in the tails...

One observes a lot of white under the theoretical on the left, not nearly so much on the right...for this particular instantiation.
Looking at the probability plot it's a lot more obvious, but the tails caught my eye just from the observed histogram w/o the pdf overlaid on it. As John d' Errico noted just a week or so ago in reply to a question raised about similar observation of unusual result from a chi-square test, "when you have a lot of data, it's far easier to reject the hypothesis" because then the data have to be really, really normal. These aren't quite enough to actually reject the hypothesis, but they're not all that close -- and, in fact, the MATLAB implementation just caps the return p value at 0.5 in this case.

This really shows up the shortness of observation in the LH tails region and, to a lesser extent a slight overage in the right. But, the deviation really begins to show up at about the 0.5% tails regions, not just in the extreme tails as the bulk of the asterisks by that point are above/below the line.
(*) A paper on power of various tests for normality I was made aware of via my past reactor background and interaction with NRC is
And, my link back to the NIST discussion and the other respondents Q? is https://www.mathworks.com/matlabcentral/answers/850710-confusino-regarding-statistical-tests-for-given-distribution?s_tid=srchtitle
3 Comments
Is 10000 points enough to see the tails? I tried a bigger number and it looks better, at least to my eye:
rng(100);
data = randn(1e7,1);
x = -4:.001:4; p = normpdf(x,0,1);
figure;
histogram(data,'Normalization','pdf');
hold on
plot(x,p,'r');
axis([-4 -3 0 0.002])
figure;
histogram(data,'Normalization','pdf');
hold on
plot(x,p,'r');
axis([3 4 0 0.002])
figure
normplot(data)
MahdiH
on 23 Jun 2021
Peter Perkins
on 24 Jun 2021
Madhi, as I said in my reply below, your expectations of what value the p-value "should have" seem misguided. Expect to see p-values uniformly dist'd between 0 and 1 when drawing from the null hypothesis.
MahdiH
on 22 Jun 2021
3 Comments
dpb
on 22 Jun 2021
I'm not sure without more digging what would be better options -- I did find a blog by Cleve that talks about the MATLAB algorithm implementing George Marsaglia's ziggurat generator, but it doesn't address just how reliable it really is in practice.
I've really been out of the simulation racket for 20 years so have lost track of what is recent.
I am surprised to see as much deviance in these estimates as are; seems like used to get better plots years ago from the IMSL libraries that used on the mainframes back when that was something was doing routinely. But, it's been a long, long time now since I did MC simulation "in anger"...
dpb
on 23 Jun 2021
Mayhaps John d' Errico will stumble across this thread -- he's a far better mathematician than I and while retired is quite a bit more recently active than I.
He probably would have some salient input...
Peter Perkins
on 24 Jun 2021
Depending on when way back actually was, IMSL on mainframes likely used a multiplicative congruential generator; I don't think we want to go back to those days.
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



