Finding the predominant value in multiple sets of data

I have multiple sets of XY data that look like the following graph, where each set has a different color. The maximum of the blue lines is an error and it should be at the same level as the black and red lines (i.e. around 1.9). The same may appliey to the minimum of the green line, which is erroneous, but it is not as critical as the blue line since at least it shows a monotnous increase. If I calculate the avreage of all values I will be entering spme unwanted error into the whole set, and the maximum of the blue curves will influence that average. How can I automatically make the program decide to bring the blue line also to the same level as the red and black ones?

6 Comments

You can replace the outlier using filloutliers, by the mean or median values of previous, next or neighbouring elements.
How is the data stored? If possible, can you attach your data using the paperclip button.
What's the determination of what is an outlier/error -- the population of points at a given C value or the linearity (or lack thereof) of the trend line for each set of observations?
I know it is a bit difficult to answer that question, but something like the peak in the dataset that I have shown is definitely an outlier.
That's not a distinct determination, you are relying on visuals, not mathematics.
Say the data point was near 2, instead of being near 2.2, would it be considered an outlier? If yes, then on what basis? If no, then on what basis?
You will have to define for how much deviation should a data point be considered an outlier.
As I suggested earlier, experiment with filloutliers.
I already did, and it works for this case. Thank you!
The Q? is one of fundamental concept -- is it the population of points at each level of C or the correlation of A vs C for each unique set?

Sign in to comment.

Answers (1)

if I have correctly interpreted the request
% assuming all the lines have the same
% number of elements
% (numel(blueLine) = numel(redLine) ...
[maxBlueLine,idxMaxBlueLine] = max(blueLine); % locate the position of the incorrect value in the blue line
blueLine(idxMaxBlueLine) = redLine(idxMaxBlueLine); % alternatively one of the following assign the correct value
blueLine(idxMaxBlueLine) = blackLine(idxMaxBlueLine);
blueLine(idxMaxBlueLine) = mean([redLine(idxMaxBlueLine) blackLine(idxMaxBlueLine)]);

Products

Release

R2023a

Asked:

on 20 Aug 2023

Commented:

dpb
on 22 Aug 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!