Why does my nonlinear (exponential or trigonometric) model fit poorly?

17 views (last 30 days)
This is a question I see often online by users of curve fitting tools. Typically, I see someone try to use a tool like the curve fitting toolbox to fit some data, but all they get is something strange out the end.
Assuming the user has chosen a valid model for their data, a commn problem is simply poor starting values. So my answer here will attempt to explain what hapens, to show why things sometimes go astray.
I'll use the curve fitting toolbox for my examples, mainly because it is a nice tool that handles this sort of problem well and makes plots easily, but the stats TB or the optimization TB can do as nicely.
(If others wish to add their own answer to this question, feel free.)

Accepted Answer

John D'Errico
John D'Errico on 10 Oct 2021
Suppose I wish to fit some data to a simple exponential model. I'll make up some data, then add moderate noise to the data.
X = rand(50,1)/10;
a0 = 3; % ground truth coefficients
b0 = 0.5;
c0 = -20;
Y = a0 + b0*exp(c0*X) + randn(size(X))/40;
plot(X,Y,'o')
grid on
xlabel X
ylabel Y
Note that data with high noise is often a difficult problem. Large residual nonlienar least squares problems will often be poorly convergent. You will then need GREAT starting values to have a chance at a fit, but also robust methods may help, perhaps using an iterative scheme that downweights the data points with the largest residuals. (Commonly known as iteratively reweighted nonlinear least squares.)
The data I show above is not too bad. There is a reasonable signal present, certainly so in context of the amount of data. Having too little data is a VERY common problem. However, I can only remember one client over many years of consulting I ever came across where they willingly gave me more data than I wanted.
So imagine we wish to fit the curve above using the curve fitting toolbox? Surely this is a simple problem, and the CFTB should hve no problem at all in the fit.
mdl = fittype('a + b*exp(c*x)','indep','x');
fittedmdl = fit(X,Y,mdl)
Warning: Start point not provided, choosing random start point.
fittedmdl =
General model: fittedmdl(x) = a + b*exp(c*x) Coefficients (with 95% confidence bounds): a = 205.5 (-8.241e+04, 8.282e+04) b = -202.1 (-8.282e+04, 8.241e+04) c = 0.01938 (-7.894, 7.933)
Hmm. WHAT THE HECK HAPPENED? I expected something close to the parameters [3,1,-20] for [a,b,c]. Where did I go wrong?
plot(fittedmdl)
hold on
plot(X,Y,'o')
grid on
xlabel X
ylabel Y
hold off
The problem is I gave it no starting values for the parameters. And worse, I gave it a model with implicitly the WRONG sign on the exponential rate parameter, thus c. We would expect to see a model with a NEGATIVE parameter c. But the starting value for c was by default, positive. The CFTB, when given no information about the starting values, uses a random start point, in the interval [0,1], as I recall.
But, surely fit is smart enough to find -20 for c? In fact, what you should see is that fit chose a very small, but still positive value for c. The problem is that the fundamental shape of that exponential term changes as we swap signs for c.
fplot(@(x) exp(-x),[-1,1])
hold on
fplot(@(x) exp(x),[-1,1])
legend('Negative exponential','Positive exponential')
Those two curves have fundamentally different shapes. An optimizer is a dumb thing, It does not understand that as soon as you swap the sign inside that exponential, the curve changes shape. And so, it tries to make the exponential rate parameter something that fits the data as well as it can, but it never tries to change the sign! (Some days, you might get lucky, but not here.)
The trick of course, is to just give it a starting point that gives it a chance to converge to a reasonable solution.
fittedmdl2 = fit(X,Y,mdl,'start',[1 1 -1])
fittedmdl2 =
General model: fittedmdl2(x) = a + b*exp(c*x) Coefficients (with 95% confidence bounds): a = 3.02 (2.98, 3.06) b = 0.4877 (0.4537, 0.5218) c = -21.61 (-26.12, -17.11)
That seems to have worked amazingly well. As you can see here, I did not try very hard to give it intelligent starting coefficients. But I did give it the correct sign for c. Had I screwed up the signs for a or b, that probably would not have been an issue.
figure
plot(fittedmdl2)
hold on
plot(X,Y,'o')
grid on
xlabel X
ylabel Y
hold off
There are similar problems that arise for trigonometric models. Again, it is the rate parameters that seem to be most problematic. For example...
X2 = rand(250,1)*10;
abcd0 = 2 + 3*rand(1,4) % this time, I'll use random ground truth parameters
abcd0 = 1×4
3.5357 4.6951 3.8206 4.3103
Y2 = abcd0(1) + abcd0(2)*sin(abcd0(3)*X2 + abcd0(4)) + randn(size(X2))/10;
figure
plot(X2,Y2,'o')
grid on
Honestly, that seems like pretty easy data to fit, right? This is a clear sine wave. I have lots of data. What could possibly go wrong?
mdlS = fittype('a + b*sin(c*X + d)','indep','X')
mdlS =
General model: mdlS(a,b,c,d,X) = a + b*sin(c*X + d)
fittedmdlS = fit(X2,Y2,mdlS)
Warning: Start point not provided, choosing random start point.
fittedmdlS =
General model: fittedmdlS(X) = a + b*sin(c*X + d) Coefficients (with 95% confidence bounds): a = -31.73 (-1.439e+05, 1.438e+05) b = 34.94 (-1.438e+05, 1.439e+05) c = 0.02722 (-56.15, 56.2) d = 7.676 (-359.1, 374.4)
figure
plot(fittedmdlS)
hold on
plot(X2,Y2,'o')
And those coefficient estimates are clear crap. What went wrong again? When we change the scale parameter ( c) here, a sine wave undergoes fundamental changes, ones that fit simply does not understand. So if we start c in the wrong place, fit is never able to escape what is a locally sub-optimal solution. But if I start the fit with something reasonable, then it has a chance to converge.
abcdstart = [mean(Y2),(max(Y2) - min(Y2))/2,2*pi/10*6,pi]
abcdstart = 1×4
3.0789 4.8770 3.7699 3.1416
fittedmdlS2 = fit(X2,Y2,mdlS,'start',abcdstart)
fittedmdlS2 =
General model: fittedmdlS2(X) = a + b*sin(c*X + d) Coefficients (with 95% confidence bounds): a = 3.533 (3.52, 3.546) b = 4.687 (4.67, 4.705) c = 3.82 (3.819, 3.821) d = 4.312 (4.304, 4.319)
figure
plot(fittedmdlS2)
hold on
plot(X2,Y2,'o')
xlabel 'X2'
ylabel 'Y2'
grid on
Better starting values often help a huge amount.
I often compare an optimizer to setting a blind person down on the face of the earth, and asking them to find the point with the lowest (or highest elevation.) Given only a cane, and the ability to know the local shape of the surface, will they have any chance of success, at least if you start them out in a strange, random place?

More Answers (2)

Matt J
Matt J on 10 Oct 2021
Edited: Matt J on 10 Oct 2021
Here are the most frequent pitfalls that I see:
(1) Bad Data
Either the data is too noisy or have too few data points. Garbage in, garbage out.
(2) Custom Model
The Curve FItting Toolbox has a built-in library of commonly encountered curve models. If your model is in the built-in library, fit() has a smart way of generating an initial guess for the iterative parameter search. However, for custom models, fit() has no choice but to make a random initial guess, which can lead to poor results. The user should provide an educated initial guess using the 'StartingPoint' fit option.
(3) Bad Units
Some users make unnatural choices of units for the x,y data and/or for the unknnowns parameters, like measuring the distance between atoms in miles instead of nanometers. This can create several problems...
First, Mathworks has defined default algorithm parameters, like TolX, TolFun, and DiffMinChange for the iterative search, with the assumption that 1e-6 represents a small change in your parameter values. With the wrong units, these defaults are completely inappropriate and can result in premature stopping of the iterations. You can change all the stopping parameters to suit your choice of units if you wish, but using more natural units can save you this hassle.
Second, it can make the iterative search ill-conditioned, because the predicted curve can change much more sharply in response to some parameters than others.
Third, if the algorithm is forced to add/subtract large numbers with very small numers, double float precision limitations may produce inaccurate results.

John D'Errico
John D'Errico on 11 Oct 2021
BASINS OF ATTRACTION:
Since I will be unable to make my first answer too long, let me expand on the issue of poor starting values, as a second answer.
I have often used the metaphor of a nonlinear solver as a blind man, placed on the surface of the earth. Now, using only a cane, we will task this poor fellow with finding the point of absolutely lowest elevation, over the entire surface of the earth. (Don't worry, he will be very well paid, and even given scuba gear if that proves necessary. No blind people were harmed in this thought experiment.)
But now suppose, you set him down in the vicinity of the Dead Sea, or perhaps any other place where he cannot escape merely by walking downhill? This gets into the idea of a basin of attraction. Any solver can be thought of as having different basins of attraction. Even on the same objective function, two different solvers may have different basins of attraction.
A basin of attraction need not even be a convex set. Newton type solvers may have very strange looking basins of attraction, since given some starting values, the first step may wander a huge distance away. Consider the folowing simple problem:
x = linspace(0,6*pi).';
y = sin(2*x) + randn(size(x))/100; % just a tiny amount of noise
plot(x,y,'o')
Now we wish to solve for the coefficient of the function: y = sin(a*x). So all I need to solve for is a. However, I will use different starting values for a, and see which ones converge to a==2 (roughly, since at best, they will only converge approximatly to 2, within a tolerance.) I could use any optimizer. fit is one, but I might also try fminsearch, or fminbnd.
mdl = fittype('sin(a*x)','indep','x')
mdl =
General model: mdl(a,x) = sin(a*x)
fittemmdl = fit(x,y,mdl)
Warning: Start point not provided, choosing random start point.
fittemmdl =
General model: fittemmdl(x) = sin(a*x) Coefficients (with 95% confidence bounds): a = 0.8975 (0.8732, 0.9218)
So, when I allowed fit to use its own random start point for a, the default is to pick a uniform random number between 0 and 1. That may be as good as anything else, but not for this problem.
a0 = linspace(-1,10,100);
afinal = NaN(size(a0));
for n = 1:numel(a0)
fittedmdl = fit(x,y,mdl,'start',a0(n));
afinal(n) = fittedmdl.a;
end
plot(a0,afinal,'.')
grid on
xlabel a(start)
ylabel a(final)
And that seems a bit strange. If I start out in the vicinity of 2, it looks like they seem to be arriving at 2 as the solution, but if I go too far away, the solver just does not converge to a==2. Let me redo that, with a narrower range for a0.
a0 = linspace(1,3,50);
afinal = NaN(size(a0));
for n = 1:numel(a0)
fittedmdl = fit(x,y,mdl,'start',a0(n));
afinal(n) = fittedmdl.a;
end
plot(a0,afinal,'.')
grid on
xlabel a(start)
ylabel a(final)
Here we see that only if I managed to start out between 1.75 and 2.25, did the solver actually find the known correct value for a. For any other start points, the solver seems to get stuck in a local (non-global) solution.
In fact, I would expect the same thing to be true for other solvers. I'll try the same thing for fminsearch. I'll use a finer sampling on a0, because it shows something interesting. I'll also show the final sum of squares, just to show that it did find the global solution for that set of start points.
a0 = linspace(1,3,250);
afinal = NaN(size(a0));
fval = NaN(size(a0));
obj = @(a) sum((y - sin(a*x)).^2);
for n = 1:numel(a0)
[afinal(n),fval(n)] = fminsearch(obj,a0(n));
end
plot(a0,afinal,'.')
grid on
xlabel a(start)
ylabel a(final)
plot(a0,fval,'.')
grid on
xlabel a(start)
ylabel 'final sum of squares'
Do you see that the basin of attraction for fminsearch is subtly different for fminsearch, versus fit? In fact, this is because of the initial starting simplex that fminsearch uses.
The concept will be subtly different for fminbnd, because fminbnd requires an upper and lower bound for the parameter.
[a0lo, a0hi] = meshgrid(linspace(0,10,100));
k = (a0lo < 2) & (a0hi > 2);
a0lo = a0lo(k);
a0hi = a0hi(k);
size(a0lo)
ans = 1×2
1600 1
Essentialy, I've chosen a large set of bounds for the search, all of which include the known ground truth solution. There are 1600 possible sets of boundaries in that set.
plot(a0lo,a0hi,'.')
Now, on how many of those sets will fminbnd converge to a == 2?
afinal = NaN(size(a0lo));
fval = NaN(size(a0lo));
obj = @(a) sum((y - sin(a*x)).^2);
for n = 1:numel(a0lo)
[afinal(n),fval(n)] = fminbnd(obj,a0lo(n),a0hi(n));
end
tol = 1e-3;
successind = abs(afinal - 2) < tol;
plot(a0lo(successind),a0hi(successind),'.')
grid on
xlabel a0lo(start)
ylabel a0hi(start)
Every point in that plot represents a pair of lower and upper bounds for fmninbnd that converged (approximately) to the known solution. So the basin of attraction for fminbnd is a highly discontinuous thing. Depending on the lower and upper bounds of the solver, fminbnd will SOMETIMES converge to a == 2 here.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!