how to detect and remove outliers?

Hi,
I have a problem detecting outliers in a set of data. Let's say I have two arrays x and y, and y is a quadratic function of x. Some of the values of y do not follow this function. How can I detect them?
I had a look at the `rmoutliers` function, but it doesn't seem to solve this problem since it only deals with normally distributed data.

4 Comments

but it doesn't seem to solve this problem since it only deals with normally distributed data
Doen't seem that's true to me... There seem to be different method selections some of which don't assume normality
What does "have a look at" mean? Did you actually try it or did you just look at the help? And it can handle data that is not normally distributed. Did you try it and it didn't work? What options did you try? Attach your data if you can't find the proper options to get it to work. Point out the outliers, unless they're really obvious.
Yeah i've already tried all the methods of rmoutliers and they didn't work. This is my data set:
x = [-485.35 144.42 623.97 1178.63 1733.29 2287.95 2842.61]
y = [47.85 190.25 164.98 196.69 206.16 186.53 154.81]
the outlier is obviosuly the 2nd element in y
x = [-485.35, 144.42, 623.97, 1178.63, 1733.29, 2287.95, 2842.61]
y = [47.85 190.25 164.98 196.69 206.16 186.53 154.81]
plot(x, y, 'b.-', 'LineWidth', 2, 'MarkerSize', 30);
grid on;
No, not obviously. The first, leftmost point could just as well be an outlier as the second point. If your data are all expected to be 170 +/- 50, then the first point is an outlier.

Sign in to comment.

Answers (1)

Cris LaPierre
Cris LaPierre on 31 Dec 2020
You can interactively explore many of the options using the Clean Outlier Data task in a live script.
Another option is to apply your function to X and then take the absolute difference between the result and Y. You can then select a threshhold and use logical indexing to identify any values that deviate more than your threshhold.

2 Comments

Thanks for your reply
I've tried the clean outlier data task and none of the availble outliers detection methods worked for me. Basically, I am trying to study some material behavior. The behavior is represented by "y". I use an optimization method to get the different values of y as a function of x. Because sometimes the optimization doesn't yield accurate results, I get outliers.The relatioship I am expecting should follow some nearly quadratic function, but the coefficients of this function are variable based one the provided set of data, so I can't use your second suggested solution.
You need to have some sort of model to compare against to determine an outlier. Without that, I don't see any way to automate the process.

Sign in to comment.

Tags

Asked:

on 31 Dec 2020

Commented:

on 1 Jan 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!