how can I run a program using a dataset with missing values?

Hello all, I have a problem with my code. I get incorrect results when in dataset are some 'inevitable' missing value. How can I do? I don't want to remove them. Do you have any idea? Thank you, Doriana

8 Comments

Hi Doriana, please give a bit more information about the type of analysis which you are doing.
What exactly do you mean with missing values? Maybe setting the 'inevitable' values to NaN would help? If not, you should provide more information.
Thank you Amir and Michael.
When I test my code using a dataset without missing values, it works correctly and in output I get n outliers. But, when I build the input dataset starting from database(in this database are too many missing values), the program doesn't work correctly because in output I don't get any outliers.
ds_double = double(ds(:,1:end));
MU = mean(ds_double);
SIGMA = cov(ds_double);
v=size(ds_double,2);
Ytilde = bsxfun(@minus,ds_double, MU);
d=sum((Ytilde/SIGMA).*Ytilde,2);
%UpperLimit = v+k*std(k);
[r_out, ~, ~] = find(d >Upperlimit);
r_good=setdiff(1:length(ds),r_out);
Thank you
Still, we can only guess what you mean. You realized that you outcommented the line in which you set UpperLimit? Thus, even when changing ds, the value of UpperLimit remains. In case that's not the origin of your problem, please tell us what missing values means. Is ds smaller than it's supposed to be? Is ds full of NaNs? Is there a possibility to estimate the missing values, e.g. by interpolation?
The problem isn't the UpperLimit.
The problem is the ds that is full of NaN-s values.
Ok, so it's NaNs. Then, will it be sufficient to just filter out the NaNs? If so, please have a look at the function nanmean and the related functions.
If not, do you have a model to estimate the missing values? Interpolation in the simplest case?
I can't filter out the NaNs. I have to estimate the missing values. I can use Intepolation?
This question must be answered by you. You seem to make some statistics and I'm not sure if interpolation makes a lot of sense in this context. If you think it does make sense, you can use the following lines (1D only):
function y = interp1_nan(x,y)
y(isnan(y))=interp1(x(~isnan(y)),y(~isnan(y)),x(isnan(y)),'linear','extrap');

Answers (0)

This question is closed.

Asked:

on 5 Aug 2014

Closed:

on 20 Aug 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!