How to fit a line through this data "smartly" ?
Show older comments
I have the x-y data shown in the figure below (for the curious, it is the logarithm of the amplitude of a Hilbert transform).

I'm trying to write an algorithm to automatically get the slope of the initial portion of the plot, i.e. before the noise overcomes the signal.
What's causing the task to be even more challenging, are the small saw-tooth jumps that you can see in the zoomed-in portion on the right.
I tried several approaches, but could not find a reliable way to automatically fit a line only of the left-hand side linear portion.
Do you have any suggestion?
PS: I know it would help, but I don't (and cannot have) the curve fitting toolbox.
4 Comments
Image Analyst
on 13 Jan 2019
You forgot to attach your data, so we can't try anything. What I'd do is to plot increasing amounts of data, and keep track of the residuals. Just when the residuals start to increase a lot, then you've found the number of elements to use.
Rik
on 13 Jan 2019
Can you post your data (by attaching a mat file)?
And is your data garanteed to start with the sloping portion?
Luca Amerio
on 13 Jan 2019
Luca Amerio
on 13 Jan 2019
Accepted Answer
More Answers (2)
John D'Errico
on 13 Jan 2019
Edited: John D'Errico
on 13 Jan 2019
A serious problem is your data also starts with a strongly nonlinear portion.

You essentially need to trim it at both ends. I'd suggest the trick is to use a Savitsky-Golay style filter, combined with a simple call to median. Median will implicitly trim those ends for you, and because we use a filter of a fairly short span, that allows median to do some smoothing too. And the linear fits are short enough they will give you sufficient points inside that quasi-linear region.
The step is constant at 0.002.
dx = 0.002;
nfilt = 20;
M = [dx*(0:nfilt-1)',ones(nfilt,1)];
Mp = pinv(M);
slopeFilter = Mp(1,:);
allSlopes = conv(Y,slopeFilter,'valid');
histogram(allSlopes,1000)

linearSegmentSlope = median(allSlopes)
linearSegmentSlope =
0.15569
As you see, the histogram has a very high spike, as we would expect. A simple median will give us a good estimate now. You could change the filter length, and the result should be fairly robust to that value, because the essentially linear segment isquite long.
4 Comments
Image Analyst
on 13 Jan 2019
John, since he wants the slope of the left hand, linear, fairly noise-free section, shouldn't we just get the mean of only the negative slopes, not all of them including the positive slopes?
Luca Amerio
on 13 Jan 2019
John D'Errico
on 13 Jan 2019
Yes. That makes some sense. If you know the quasi-linear segment will have a negative slope, I suppose it makes sense to exclude the positive slopes. I might still use median to do the final part though, not mean. At the least, I would use a trimmed mean. Discard say the lowest and highest 25% of the slope estimates, then compute the mean of those that remain.
Luca Amerio
on 13 Jan 2019
Image Analyst
on 13 Jan 2019
I plotted the data then it looked like the squirrely stuff stopped around element 2300 or so, and started to get fairly linear after that. So I found the slopes from element 2300 to the end of the data and plotted it.
fontSize = 15;
s = load('data.mat')
x = s.X;
y = s.Y;
subplot(1, 2, 1);
plot(x, y, 'b-');
grid on;
title('Original Data', 'FontSize', fontSize);
xlabel('x', 'FontSize', fontSize);
ylabel('y', 'FontSize', fontSize);
startingElement = 2310;
for k = 2310 : 100 : length(x) - 1
thisX = x((startingElement-10):k);
thisY = y((startingElement-10):k);
coefficients = polyfit(thisX, thisY, 1);
slope(k) = coefficients(1);
end
slope(1) = slope(2);
subplot(1, 2, 2);
mask = slope < 0;
x2 = x(mask);
y2 = slope(mask);
plot(x2, y2, 'b-');
grid on;
caption = sprintf('Slope from element %d to x', startingElement);
title(caption, 'FontSize', fontSize);
xlabel('x', 'FontSize', fontSize);
ylabel('Slope', 'FontSize', fontSize);

As you can see there is sort of a flat place for slopes, where they're fairly constant, but not quite. So it kind of becomes a judgment call as to what slope you want to use, or where you want to stipulate that the slopes are starting to move away from a "constant" value. There is no clear winner - it's not like the slopes are totally constant and then abruptly switch to a different value. There is more of a gradual change of slope as you progress into the noisy area. Since there is no clear winner, you can just pick wherever you want to stop. How about around x=30? Anything wrong with that value of -0.165?
1 Comment
Luca Amerio
on 13 Jan 2019
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!