- Assume that predictors (columns) are uncorrelated and compute the distance between a new sample (row) and the mean of the training set (set of known samples). Compare with the reference distribution obtained by taking the distance between every row in the training set and the mean of all other rows.
- Assume that the known samples come from a Gaussian mixture of distributions. Find this mixture using gmdistribution from Statistics Toolbox. Compute Mahalanobis distance between the new sample and every Gaussian component. Estimate the probability assuming chisq distribution for the squared Mahalanobis distance.
- Find k nearest neighbors for every sample in the training set using knnsearch. Compute the distribution of the average distance between every sample and its k nearest neighbors. Find k nearest neighbors in the training set for the new sample and take the average of their distance values. Compare to the reference distribution.
Binary classification
6 views (last 30 days)
Show older comments
Hey all!
My question: Is it possible to use classification methods to determine if an unknown sample fits the distribution of known samples?
I have a known dataset that constitutes an object parameters distribution (various circles with various proprieties as circularity, area, perimeter, solidity, etc.). Rows are independent samples, and columns are each parameters. The problem is that I need the function to determine if a new sample is a circle or not. From what I saw in classification, you need to specify every class, there is no "everything else" class. What should be the best way to find if the new object is a circle or not (here circle is really just an example) and have an error or confidence measurements on the decision?
Regards,
Olivier
0 Comments
Accepted Answer
Ilya
on 24 May 2012
You might want to start here http://en.wikipedia.org/wiki/One-class_classification The 1st reference (PhD thesis) gives an overview of methods.
There are no utilities in the official MATLAB release you could use right away, but it would be fairly easy to code some of the reviewed methods. For example, in the ascending order by complexity:
And so on. If your training set is pure (all objects are indeed circles) and if your data are low-dimensional, you really have plenty of methods at your disposal. Without purity or in high dimensions, the problem can become substantially harder.
More Answers (1)
Walter Roberson
on 24 May 2012
It might be possible with some classifiers, but not for most.
Some classifiers just divide the area into two planes or two hyperplanes, and define the class according to which side of the hyperplane one is on.
Other classifiers provide a probability of belonging to a particular class, but those probabilities are never 0. You could, naturally, arbitrarily say that a sample is not in either class if the probability of belonging is "small enough" for both of the classes.
0 Comments
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!