How to best determine the probability of a distribution given an outlying observation?

Question

0 votes

Hi,

I have a classification problem. I have a set of data from a reference process (let's call that "known") and a set of data from a second process (let's call that "test").

Hypothesis 0 is that the test sample came from an identical process as the "known", and will therefore have the same distribution.

Hypothesis 1 is that the test sample came from a different process. However, here is the catch: for all but one sample, this process has an identical distribution to the "known". Just one sample will be "suspiciously" low.

I will add a picture to better explain:

In this case, the red histogram is the reference "known" distribution. The blue histogram is the questioned "test" distribution. In this case, I already know that the test came from a different process. It might not be completely clear due to the overlaying, but it can be seen that the distributions pretty well match, except for a single blue sample which is suspiciously low.

What I need now is to take each distribution and work out some method of returning a probability that the extremely low blue value would be observed given the distribution is the "known" distribution. I know how to calculate the probability of a particular single observation, but how do I properly balance this with the number of observations? Would just a KS test be appropriate? It strikes me as stats 101, but it's been a while, and I don't want to get this wrong.

Thanks in advance.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Ilya on 12 Sep 2012

Edited: Ilya on 12 Sep 2012

0 votes

If you know the reference distribution analytically, you can compute its cdf at the smallest observed value. Suppose this cdf value is p. The p-value for your test would be then one minus the binomial probability of not observing any successes in N trials, where N is the sample size and p is the success probability. That is, it would be 1-(1-p)^N.

1 Comment
Show -1 older comments Hide -1 older comments

Tim on 19 Sep 2012

Oh, so obvious now! Thank you. I was over-thinking it with the variance of the variance and all that jazz. My only excuses are lack of sleep and rusty stats - honestly, I avoid them when I can.

Sign in to comment.

Answer 2

per isakson on 12 Sep 2012

0 votes

See: FBD - "Find the Best Distribution" tool in the File Exchange

1 Comment
Show -1 older comments Hide -1 older comments

Tim on 12 Sep 2012

Open in MATLAB Online

Thanks for your answer, per, but I'm not sure that this is what I'm looking for. I'll try and clarify with a simple code example.

KnownSet = randn(1000,1);
TestSet1 = randn(100,1);
TestSet2 = [randn(99,1); -4];

In this case, I know all three sets of data are mostly drawn from the same Gaussian distribution. However, TestSet2 has an outlier. The value -4 is very unlikely, and I'm hoping to use that single outlying value to provide a probability that each TestSet is purely from the same distribution as KnownSet. In this case, TestSet1 should have a high 'p-value', and TestSet2 should have a low 'p-value' and be rejected. I use the term p-value, but there might be something else.

FBD would help me determine the distribution of KnownSet (which I can assume is at least for the most part the same as that of the TestSets), but that is only the first step. How do I go from there to determining how likely/unlikely the set of observations is, given the distribution, and given the outlier?

Sign in to comment.

How to best determine the probability of a distribution given an outlying observation?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Tags

Community Treasure Hunt

How to best determine the probability of a distribution given an outlying observation?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments