How to best determine the probability of a distribution given an outlying observation?
Show older comments
Hi,
I have a classification problem. I have a set of data from a reference process (let's call that "known") and a set of data from a second process (let's call that "test").
Hypothesis 0 is that the test sample came from an identical process as the "known", and will therefore have the same distribution.
Hypothesis 1 is that the test sample came from a different process. However, here is the catch: for all but one sample, this process has an identical distribution to the "known". Just one sample will be "suspiciously" low.
I will add a picture to better explain:

In this case, the red histogram is the reference "known" distribution. The blue histogram is the questioned "test" distribution. In this case, I already know that the test came from a different process. It might not be completely clear due to the overlaying, but it can be seen that the distributions pretty well match, except for a single blue sample which is suspiciously low.
What I need now is to take each distribution and work out some method of returning a probability that the extremely low blue value would be observed given the distribution is the "known" distribution. I know how to calculate the probability of a particular single observation, but how do I properly balance this with the number of observations? Would just a KS test be appropriate? It strikes me as stats 101, but it's been a while, and I don't want to get this wrong.
Thanks in advance.
Accepted Answer
More Answers (1)
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!