How to check if data is normally distributed
Show older comments
Hi all,
I want to run a f-test on two samples to see if their variances are independent. Wikipedia says that the f test is sensitive to non normality of sample (<http://en.wikipedia.org/wiki/F-test)>. How can I check if my samples are normally distributed or not.
I read some forums which said I can use kstest and lillietest. When can I use either? I get an answer h=0. Does that mean my data is normally distributed?
Thanks. Nancy
Accepted Answer
More Answers (2)
Sean
on 7 Aug 2012
Hello Nancy,
You cannot tell from only 2 samples whether they are normally distributed or not. If you have a larger sample set and you are only testing them in pairs, then you could use the larger sample set to test for a particular distribution.
For example: (simple q-q plot)
data= randn(100); %generate random normally distributed 100x100 matrix
ref1= randn(100); %generate random normally distributed 100x100 matrix
ref2= rand(100); %generate random uniformly distributed 100x100 matrix
x=sort(data(:));
y1=sort(ref1(:));
y2=sort(ref2(:));
subplot(1,2,1); plot(x,y1);
subplot(1,2,2); plot(x,y2);
The first plot should be a straight line (indicating that the data distribution matches the reference distribution. The second plot isn't a straight line, indicating that the distributions do not match.
3 Comments
Nancy
on 7 Aug 2012
The fewer points you have available, the less definitive the test is. If you run the previous set of sample code for a smaller set of data and reference points you should see what I mean. (e.g. The shape of the lines, is less well defined and more affected by random noise with a smaller sample set.)
Regarding a test for independence... you might try scatter plotting them with respect to each other.
For example:
data1=randn([100,1]);
data2=(data1.^2-3*data1+5)+0.01*randn([100,1]);
%data2 is a function of data1 + noise
ref=randn([100,1]);
subplot(1,2,1);scatter(data1(:),ref(:));
subplot(1,2,2);scatter(data1(:),data2(:));
As you can see, the independent reference variable is all across the plot, but the relationship between the two data samples is clearly evident.
Another way to look at this would be:
subplot(1,2,1);plot(conv(data1,data2))
subplot(1,2,2);plot(conv(data1,ref))
Note: I have not vetted/proved these methods in a rigorous way, so I would use it with the understanding that it MAY reveal some dependencies, but isn't guaranteed, especially if there is a real but weak relationship or a time delayed relationship.
Nancy
on 7 Aug 2012
Sarutahiko
on 11 Dec 2013
1 vote
Assuming you agree with the Anderson-Darling test for Normality, I'd just use Matlab's prebuilt function for that. It is http://www.mathworks.com/help/stats/adtest.html
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!