Ezyfit: Single vs. Two-Step Fitting Problem

Ezfit (actually I am using the updated version from Walter Roberson) fails to accurately determine all 4 parameters in a single curve fit (see Method A). A workaround, setting 1 parameter initially and then fitting the remaining 3 in a second step (Method B), looks instead successful.
Is there a method or setting within Ezfit to obtain accurate 4-parameter results in a single fitting run (Method A)?
% (1) Input Data
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
% (2) Method A: Perform a simple curve fitting with Ezfit
ft1 = ezfit(x,log(y),'log(a*x^(-b) + c*exp(-d*x))','MaxFunEvals', 1e6, 'MaxIter', 1e6); %, 'usega', usega);
array2table([ft1.m(1) ft1.m(2) ft1.m(3) ft1.m(4) (ft1.r)^2],'VariableNames',{'a','b','c','d','R2'})
% a b c d R2
% _____ ______ ______ _____ _______
%
% 56106 2.2236 -85026 49582 0.85991
% (3) Method B: Perform curve fitting by using Ezfit 2 times
% (3.1) With the first Ezfit, find the exponent "b" (among a range of values between 1.5 and 2.5)
% which maximise the coefficient of determination R^2 (I set it as R^2 = 0.99).
% The resulting exponent will be "b_range(idx_max_b) = 1.86".
i = 1;
b_range = 1.5 : 0.01 : 2.5;
for b = b_range
ft2 = ezfit(x,log(y),sprintf('log(a*x^(-%.2f) + c*exp(-d*x))',b),'options',optimset('MaxFunEvals', 1e6, 'MaxIter', 1e6));
ftr(i) = (ft2.r)^2;
i = i + 1;
end
idx_max_b = find(ftr == max(ftr(ftr < 0.99)));
% (3.2) Following the initial determination of the optimal exponent "b" (i.e. "b_range(idx_max_b)"),
% the second Ezfit step finds the remaining parameters
ft3 = ezfit(x,log(y),sprintf('log(a*x^(-%.2f) + c*exp(-d*x))',b_range(idx_max_b)),'options',optimset('MaxFunEvals', 1e6, 'MaxIter', 1e6));
array2table([ft3.m(1) b_range(idx_max_b) ft3.m(2) ft3.m(3) (ft3.r)^2],'VariableNames',{'a','b','c','d','R2'})
% a b c d R2
% ______ ____ _____ _________ _______
%
% 2084.4 1.86 4.806 0.0064196 0.98323
% (4) Plot
hold on
plot(x, y, 'o', 'MarkerFaceColor',[0.7 0.7 0.7],'DisplayName', 'Input Data');
xx = 10 : 10000;
plot(xx, ft1.m(1) .* xx.^(-ft1.m(2)) + ft1.m(3) .* exp(-ft1.m(4) .* xx), 'color','blue', 'Linewidth',2, 'DisplayName', 'Method A');
plot(xx, ft3.m(1) .* xx.^(-b_range(idx_max_b)) + ft3.m(2) .* exp(-ft3.m(3) .* xx), 'color', 'red', 'Linewidth',2, 'DisplayName', 'Method B')
xlabel('x');
ylabel('y');
legend('show');
set(gca, 'XScale', 'log', 'YScale', 'log')

7 Comments

As a note regarding Method B, the workaround I used was simply the first approach I thought of to achieve a successful fit. However, I’m unsure whether this method for determining the exponent "b" that maximizes the coefficient of determination (R²) is appropriate.
It appears that as the range of exponents "b" increases from 1.5 to 2.5, the R² values grow and even exceed 1 (please see the Figure here below), which is problematic since R² should always fall between 0 and 1. Therefore, in this method, I defined the "best" exponent "b" as the one that brings the R² value as close as possible to the maximum of 0.99.
>> idx_max_b
37
>> plot(ftr); ylabel('R^2'); yline(1,'--');
Torsten
Torsten on 13 Mar 2025
Edited: Torsten on 13 Mar 2025
Why do you fit (x,log(y) to (x,log(f(x,p))) instead of (x,y) to (x,f(x,p)) ? The fitting parameters will be distorted.
Sim
Sim on 14 Mar 2025
Edited: Sim on 14 Mar 2025
Hi @Torsten. The short answer is "I do not know". The long answer follows. Initially, I used (x,y) to (x,f(x,p)), but the resulting fit did not look good to me (I initially though it was a feature of ezyfit that did not process well the data, even though quite unlikely) and, since I wanted to explore the data on a log scale, I was thinking to transform the data already in a log form. In that way, the resulting fit looked more consistent to me. I feel like I am navigating troubled waters :-)
The solution with "fminspeas"
from the File Exchange looks quite acceptable for the original data:
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef,xdata) xdata.^(-coef(1)),@(coef,xdata) exp(-coef(2)*xdata )};
NLPstart = [1,1];
options = optimset('disp','iter');
[INLP,ILP] = fminspleas(funlist,NLPstart,x,y,[],[],[],options)
hold on
plot(x,ILP(1)*x.^(-INLP(1))+ILP(2)*exp(-INLP(2)*x))
plot(x,y,'*')
hold off
Thanks @Torsten :-) With that code, I get the following on a log-log scale (it looks like it does not fit well for middle-to-large values of "x"):
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef,xdata) xdata.^(-coef(1)),@(coef,xdata) exp(-coef(2)*xdata )};
NLPstart = [1,1];
options = optimset('disp','iter');
[INLP,ILP] = fminspleas(funlist,NLPstart,x,y,[],[],[],options)
hold on
plot(x,ILP(1)*x.^(-INLP(1))+ILP(2)*exp(-INLP(2)*x))
plot(x,y,'*')
hold off
set(gca, 'XScale', 'log','YScale', 'log')
Torsten
Torsten on 14 Mar 2025
Edited: Torsten on 14 Mar 2025
As said: you must decide whether you want to fit log(y) against log(x) using the function or y against x. Both will give different parameters for the fitting process. Usually, fitting y against x is intended.
Say you have the function y = a*x^b as fitting function. Then taking the log on both sides gives log(y) = log(a) + b*log(x), thus a function where log(y) depends linearily on log(x). The latter is numerically simpler, but the results for a and b will differ from the nonlinear fit of y against x.
Thanks @Torsten for your clarifications, thanks :-)

Sign in to comment.

 Accepted Answer

Is there a method or setting within Ezfit to obtain accurate 4-parameter results in a single fitting run (Method A)?
No.

8 Comments

Setting appropriate initial guesses for the parameters does not work ?
ft1 = ezfit(x,log(y),'log(a*x^(-b) + c*exp(-d*x))',[a0,b0,c0,d0],'MaxFunEvals', 1e6, 'MaxIter', 1e6);
You would have to know the initial guesses to good precision
Rightly, it is almost impossible to guess the appropriate initial start-values of the parameters manually。
The global solution looks like below
Sum Squared Error (SSE): 74.2851718632016
Root of Mean Square Error (RMSE): 0.951796580178267
Correlation Coef. (R): 0.964863277180578
R-Square: 0.930961143651644
Parameter Best Estimate
--------- -------------
a 60599268711757.3
b 6.13357573642796
c 1.26384716322201
d 7.17292632604614E-5
A local solution that is very easy to fall into is:
Sum Squared Error (SSE): 125.938976894999
Root of Mean Square Error (RMSE): 1.239290596126
Correlation Coef. (R): 0.939653058841955
R-Square: 0.882947870991043
Parameter Best Estimate
--------- -------------
a 26.8488231438062
b 0.430598448947845
c 79639344.4918304
d 0.187017446608759
Hi @Sim
It should be noted that although @Alex Sha's global solution yields the highest R-squared value for your chosen model and the model appears to fit the exponential decay region (the first 10% of the data range) reasonably well, it does not fit the remaining 90% of the data range. The data converges to zero, while the model continues to decrease as x increases.
In standard curve fitting practice, you should first analyze whether the chosen model theoretically fits the data.
%% Data
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
%% Alex Sha's Global Solution
a = 60599268711757.3;
b = 6.13357573642796;
c = 1.26384716322201;
d = 7.17292632604614E-5;
%% Model
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = log(a*xf.^(-b) + c*exp(-d*xf));
%% Plot results
subplot(211)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([0 700]), grid on
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [0, 700]$'}, 'interpreter', 'latex')
subplot(212)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([700 7000]), grid on
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [700, 7000]$'}, 'interpreter', 'latex')
Analysis
syms f(x) a b c d
assume(a > 0)
assume(b > 0)
assume(c > 0)
assume(d > 0)
%% Chosen Model, f(x)
f = log(a*x^(-b) + c*exp(-d*x))
f = 
%% Derivative of the model, df/dx
df = diff(f, x)
df = 
%% Limit of the df/dx when x approaches 1 million.
ldf = limit(df, x, sym(1e6))
ldf = 
%% Checks if the limit of df/dx < 0
tf = isAlways(ldf < sym(0))
tf = logical
1
Sim
Sim on 14 Mar 2025
Edited: Sim on 14 Mar 2025
Hi @Torsten, thanks for your comment about the initial guesses :-)
Hi @Alex Sha, thanks for your comment and for the global and local solutions. Just a doubt: did you use ezfit or another function to get the global and local solutions? Maybe, did you use the @Torsten line of code, which includes the initial guesses? Indeed, I see the results but not the function/code used to get those results.
Hi @Sam Chak, thanks for your comment.. As far as I understand you make an analysis of the region of large x, by using the @Alex Sha's Global Solution. By analyzing the limit of the derivative as "x" approaches 10^6, you check whether the rate of change of the model remains negative ("lim(x-->10^6) df/dx <0"). The negative slope ensures that the model is consistently decreasing as "x" increases towards 10^6, which is expected for a valid decay model, right?
In summary, my understand from your comment is the following. By using the @Alex Sha's Global Solution, the model works for around 10% of the data, i.e. for very small "x", but it does not fit well the remaining 90% of the data. However, the theoretical analysis, related to the negative slope of the model's derivative for very large "x" (i.e. "lim(x-->10^6) df/dx <0") ensures that the model behaves correctly, i.e. "y" continues to decrease as "x" approaches 10^6. Hope my understanding is correct. Btw, in your code you only showed data, model and result, but I do not see the code to arrive to those results, i.e. did you use the the @Torsten line of code or other functions?
Thanks to everyone for your interesting comments! :-)
@Alex Sha used a third-party optimizer named 1stOpt. It is a somewhat expensive tool (about $US2000 per user), but from what I have seen, it produces excellent results much faster than competitors. I would consider buying it myself, but I cannot currently justify the price.
Thanks @Walter Roberson for sharing! :-) I have checked it on the web, and it looks very interesting and promising (even though the academics might find the cost prohibitive if it is the same for them).

Sign in to comment.

More Answers (2)

Hi @Sim
You can verify this. The model proposed by @Torsten fits quite well within the first 10% of the data range. However, after that point, the model decays much more slowly than the data, although, in theory, it will eventually converge to zero. Inspired by this model, it suggests that you should consider implementing a nonlinear power rate, as illustrated below. You can also find a better a nonlinear function, preferably a bounded one.
format long g
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef, xdata) xdata.^(-coef(1)*(xdata).^(0.4)), @(coef, xdata) exp(-coef(2)*xdata)};
NLPstart = [1, 1];
options = optimset('disp', 'none');
[INLP, ILP] = fminspleas(funlist, NLPstart, x, y, [], [], [], options)
Warning: Rank deficient, rank = 1, tol = 1.957112e+05.
Warning: Rank deficient, rank = 1, tol = 6.998085e-03.
Warning: Rank deficient, rank = 1, tol = 6.998085e-03.
INLP = 1×2
0.0594819972748915 3.50973243531023
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ILP = 2×1
1.0e+00 * 22.0279801300898 7302263919.12418
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = ILP(1)*xf.^(-INLP(1)*(xf).^(0.4)) + ILP(2)*exp(-INLP(2)*xf);
%% Plot results
figure
plot(x, y, 'o'), hold on
plot(xf, yf), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
set(gca, 'XScale', 'log','YScale', 'log')
figure
subplot(211)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([0 700]), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [0, 700]$'}, 'interpreter', 'latex')
subplot(212)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([700 7000]), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [700, 7000]$'}, 'interpreter', 'latex')

3 Comments

@Sim: The short answer is "I do not know". The long answer follows. Initially, I used (x,y) to (x,f(x,p)), but the resulting fit did not look good to me.
Hi Sim,
There is no simple model for that type of curve. However, since the curve exhibits a form of exponential decay that asymptotically converges to zero, I'd suggest an exponential function-based model:
where is a nonlinear power rate function and is a polynomial function. If unsatisfactory, I'd add a reciprocal function term as @Torsten did, in this form to add some controllability to the rate of decay. For simplicity, you can try:
Nevertheless, you should consult @Alex Sha, as he is very knowledgeable in curve fitting and optimization and possesses an extensive library of nonlinear functions.
Hi @Sam Chak, thanks for your nice answer :-)
I do not know how to get R-squared with "fminspleas", but it looks a nice fit!
It looks like you used a "power tower" or "tetration", plus the exponential decay... this makes things a bit more complicated :-) I was thinking that, by using the "power tower", probably, the exponential decay is not necessary anymore (?):
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef, xdata) xdata.^(-coef(1)*(xdata).^(coef(2)))};
NLPstart = [1, 1];
options = optimset('disp', 'none');
[INLP, ILP] = fminspleas(funlist, NLPstart, x, y, [], [], [], options)
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = ILP(1)*xf.^(-INLP(1)*(xf).^(INLP(2)));
hold on
plot(x, y, 'o')
plot(xf, yf)
set(gca, 'XScale', 'log','YScale', 'log')
However, the idea to use a "simple" power law and an exponential decay, would be supported by some theoretical argument... by using, instead, a power tower, the theoretical argument would be more complicated.... :-)
Sim
Sim on 14 Mar 2025
Edited: Sim on 14 Mar 2025
Thanks @Sam Chak! I have just read your comment, just a second after I posted mine...! Yes, it could also be an exponential function-based model. I will look into it further. However, the sum of a simple power law and a simple exponential decay can be justified through a theoretical argument (in particular the power law term). If we use nonlinear power rates—whether power law-based or exponential-based one—it could make the results much harder to interpret.... :-)
...I can keep an eye on this webpage to see if @Alex Sha might be interested in adding more on this topic. By the way, there are some great fitting tools I wasn’t aware of, like "fminspleas", which looks quite impressive! :-)

Sign in to comment.

Alex Sha
Alex Sha on 15 Mar 2025
Moved: Sam Chak on 15 Mar 2025
Hi, all, if add one more parameter to the original fitting function, i.e. y=ln(a*x^(-b)-c*exp(-d*x))*f, the result will be improved like:
Sum Squared Error (SSE): 62.0296923820521
Root of Mean Square Error (RMSE): 0.869746896054109
Correlation Coef. (R): 0.970745675455549
R-Square: 0.94234716641565
Parameter Best Estimate
--------- -------------
a 0.97506498591964
b -0.0034886080404169
c 1.00586174849318
d 0.0069594641193838
f -5.56698125787981

4 Comments

I believe that your solution deserves recognition. It appears that @Sim wants the fitting model to follow the trend of the data distribution on a base-10 logarithmic scale for both the x-axis and the y-axis.
In fact, @Sim is open to suggestions for other more appropriate power law or exponential-based functions, where its derivative should eventually converge to zero, or at least to a very small negative value at x = 7000.
%% Data
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
%% Alex Sha's New Global Solution
a = 0.97506498591964;
b = -0.0034886080404169;
c = 1.00586174849318;
d = 0.0069594641193838;
f = -5.56698125787981;
%% Model
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = log(a*xf.^(-b) - c*exp(-d*xf))*f;
%% Plot results
figure
subplot(211)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([0 700]), grid on
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [0, 700]$'}, 'interpreter', 'latex')
subplot(212)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([700 7000]), grid on
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [700, 7000]$'}, 'interpreter', 'latex')
figure
plot(x, y, 'o'), hold on
plot(xf, yf), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
set(gca, 'XScale', 'log','YScale', 'log')
Warning: Negative data ignored
Thanks a lot to @Alex Sha for your solution, which fits the data very nicely!
And thanks to @Sam Chak for further comments and analysis!
I was focusing on the @Alex Sha's improved model, i.e.
y = ln(a*x^(-b) - c*exp(-d*x)) * f
that can be rewritten as:
y = f * ln(a*x^(-b) - c*exp(-d*x))
with the following parameters:
%% Alex Sha's New Global Solution
a = 0.97506498591964;
b = -0.0034886080404169;
c = 1.00586174849318;
d = 0.0069594641193838;
f = -5.56698125787981;
I was thinking that, maybe, given "a" and "c" being very close, or basically equal, to 1, the equation can be simplified as follows:
y = f * ln(x^(-b) - exp(-d*x))
(However, as shown by @Sam Chak the "model drops" way much faster than the data, after around x = 10^3)
Yes, these data are quite challenging to describe, and I truly appreciate your insightful comments and analysis!
This discussion on methods and approaches has raised a few doubts, which I believe could apply to other problems and analyses, not just this one. My doubts are the following ones:
  • From a curve fitting or regression perspective, does the "Method B" I initially described, which seems to reasonably fit the data (at least to me, to some extent), actually make sense?
  • Why does "Method B" appear to offer a generally good representation of the data (at least to me, to some extent)?
I think these are valid concerns from both computational and mathematical perspectives, even though I may never get a definitive answer. Probably, a more general discussion might be suitable for Stack Overflow. However, given the typical reactions from that community, the question is likely to be closed immediately for trivial reasons (I much prefer this forum!).
Thanks again to everyone for your thoughtful and engaging comments!
Not sure whether or not my understanding is correct, if want fitting result good in log() scale, is it reasonab to change y data to y1=log(y) firstly, then the fitting function become "y1=log(a*x^(-b)-c*exp(-d*x))", the result will be:
Sum Squared Error (SSE): 3.16256795387294
Root of Mean Square Error (RMSE): 0.196387122481336
Correlation Coef. (R): 0.991266174106857
R-Square: 0.982608627928446
Parameter Best Estimate
--------- -------------
a 10.6590120083535
b 0.69245173591868
c -1.65676470503943
d 0.00313457062827038
if taking function as "y1=f*log(a*x^(-b)-c*exp(-d*x))"
Sum Squared Error (SSE): 2.95997975085014
Root of Mean Square Error (RMSE): 0.189992931538933
Correlation Coef. (R): 0.99182732299325
R-Square: 0.983721438635958
Parameter Best Estimate
--------- -------------
f 4.78460068319797
a 0.918658120426212
b 0.0761834706279075
c -0.540969703020373
d 0.00132426521534454
Thanks a lot @Alex Sha! Now it is clear, thanks! :-)
Therefore, we can get an R-square equal to 0.982608627928446, by using
y1 = log(a*x^(-b) - c*exp(-d*x))
and we can get an R-square equal to 0.983721438635958, by using
y1 = f * log(a*x^(-b) - c*exp(-d*x))
By multiplying by a factor "f", the fit improves further, but even without it, the fit is already quite good. :-)
Thanks a lot!

Sign in to comment.

Asked:

Sim
on 13 Mar 2025

Edited:

Sim
on 17 Mar 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!