how to find the best curve fit for a set of data

1,056 views (last 30 days)
Maryam Mapar
Maryam Mapar on 21 Jul 2017
Answered: Bhimrao on 28 Mar 2024 at 6:27
i have a set of data which i want to curve fit. how i can determine the best curve fit of it using MATLAB and cftool ? i have to write an script or what?
  4 Comments
Maryam Mapar
Maryam Mapar on 24 Jul 2017
Let me say some details in fact i have 2 set of data which are outputs of an equation now i want to guess the original equation or function or the best guess using Matlab. i want to use CFTOOL to estimate the best curve fit with the less SSE and RMSE. thank you in advance
prabha verma
prabha verma on 25 Apr 2019
Edited: prabha verma on 25 Apr 2019
% type your set of data in this way and call cftool function
X=[1 2 3 5 6 8 9];
Y= [4 6 7 9 12 15 20];
cftool
% It would leads you to a gui window
% Select the X values at the "X data" section, and Y values at "Y data" section. Choose the type of fit options you want to selct and click on "fit".
% It will give you the goodness of fit parameters like SSE, R-square value , RMSE etc. On the basis of which you could decide which fitting is best for your data set

Sign in to comment.

Answers (9)

John D'Errico
John D'Errico on 25 Jul 2017
Edited: John D'Errico on 25 Jul 2017
I tried not to wade in here, because every time someone asks a question like this, it means they know virtually nothing at all about modeling.
You cannot use the curve fitting toolbox, or ANY such toolbox to know the best fitting curve, IF you are not willing to provide a model form. The curve fitting toolbox is not a magic tool that can look at your data, and somehow know what the underlying model should have been. There is no such tool, although I have heard of tools that try to do so. They cannot.
The best possible fitting model is an exact interpolant. It has ZERO residuals, so it will be the best. And there are infinitely many such exact interpolants. Start with polynomials - infinitely many of them that will fit exactly. (Although a sufficiently high order polynomial to interpolate the data will produce complete garbage here.) Or start with splines, again, infinitely many of them. But they too will produce interpolatory garbage.
So, now lets look at your data.
plot(Var1,Var2,'-o')
Then, just for kicks, I added an interpolating spline.
As you can see, it fits perfectly. But I seriously doubt it is what would be the best fit. Nor would be a Fourier interpolant anything of value here either. You asked about that, and rational models. Don't waste your time.
Anyway, you are kidding me, right? I doubt your data is good enough to be worth anything more than a cubic polynomial fit. Pretty noisy, not much signal there beyond a straight line fit. Maybe you can convince me there is some curvature in there, so I'd accept a cubic polynomial.
If you managed to convince me that the cubic fit is not adequate, because it is not monotonic over the support of the data, I'd tell you to use my SLM toolbox. You can find it at that link on the File Exchange. (It will use the optimization toolbox though.)
slm = slmengine(Var1,Var2,'knots',5,'increasing','on','reg','cr','plot','on')
A careful choice of the parameters for SLM gives me what I'd call a pleasing, monotonic curve fit. Is this the best possible? Arguably it is about as good as you can do. I used cross validation to choose the extent of regularization, over the set of monotonic splines on that support.
  3 Comments
John D'Errico
John D'Errico on 25 Jul 2017
So what? Sorry, but this type of analysis is not that productive here. Deciding to use a given degree polynomial is not going to give you the "best" fit, because you already know that a least square quadratic or cubic is not the underlying model for that curve. It is surely something nonlinear. The point is, a polynomial fit here, with an order based on statistical heuristics about whether a coefficient may or may not be zero is essentially never a good choice.
And this is also why in the beginning of my response I said that I really did not want to touch this post with a 10 foot pole. Sorry, but true, because you appear to know just enough about curve fitting to be able to parrot back what you read, but not understand what was done there, AND why. That means this can turn into a lengthy class taught by me to you in the comments.
In general, the best fit for something like this is arguably that fit that yields what you expect to see. This is because you, as the creator of the data (i.e., scientist, engineer, technician, analyst, whatever) usually have knowledge about the system. You may know things like having an expectation of monotonicity. You may have some understanding to know how much of those bumps in the curve are probably real, and how much is just noise.
None of that information is available to simple modeling tools. Only you have that knowledge, and it is all in your head. Heuristics that try to determine a polynomial order are meaningless here, again because they lack that information, and because the true model was never a low order polynomial at all.
I will say only one thing more here, that the entire purpose of SLM is to allow the user to translate the information they hold in their brain into a description of the fundamental shape of the curve, and then allow them to produce a model that fits the data.
The best fit is the one that satisfies your eyes, the fit that makes sense to your brain, that is consistent with your expectations about the process.
Image Analyst
Image Analyst on 25 Jul 2017
Maryam, is this homework? Because it looks like it has a fairly specific set of steps it wants you to follow. If so, I'd advise you to do that, because it's what your instructor/grader will expect and what your grade will depend on.

Sign in to comment.


Amin Oroji
Amin Oroji on 3 Jul 2018
Edited: Walter Roberson on 2 Aug 2019
One of the most interesting problems in supervised machine learning is to find the best fit for the existing data which called regression. Suppose you have three points. Using a polynomial with degree two you can fit the curve to your point. Generally it is not a proper way. It may cause over-fitting. It means that the generalization is not considered here. In other words the error of the chosen model will be high when the model is applied for a new data. Therefore, validation techniques such as cross validation are used to solve this problem. In cross validation, the data is split into k folds (e.g. 10 folds). Then, the first fold is used to validate the model. After that, the next fold is used for validation and so on. Finally the mean of errors is calculated. The model with lower mean error is chosen. For more details you can see
  1. https://stanford.edu/class/ee103/lectures/ls-fitting_slides.pdf
  2. https://towardsdatascience.com/overfitting-vs-underfitting-a-complete-example-d05dd7e19765?gi=7dbe4692ec9b
  3. http://scikit-learn.org/stable/modules/learning_curve.html
Keywords: validation techniques, cross validation, overfitting

Chad Greene
Chad Greene on 21 Jul 2017
Often the term "best fit" refers to linear regression, so perhaps you want polyfit. Here's an example of fitting a line to some scattered data:
x = 100*rand(20,1);
y = 3*x + 5 + 15*randn(size(x));
plot(x,y,'bo')
P = polyfit(x,y,1);
xfit = 0:100;
yfit = polyval(P,xfit);
hold on
plot(xfit,yfit,'-')
  3 Comments
Maryam Mapar
Maryam Mapar on 23 Jul 2017
i have some data which comes from an equation , i dont know the equation and using Matlab, i want to determine the the equaton or at least the general form of it. How i can use Matlab cftool to estimate the best curve fit and the equation finally
Maryam Mapar
Maryam Mapar on 24 Jul 2017
Let me say some details in fact i have 2 set of data which are outputs of an equation now i want to guess the original equation or function or the best guess using Matlab. i want to use CFTOOL to estimate the best curve fit with the less SSE and RMSE. thank you in advance

Sign in to comment.


Image Analyst
Image Analyst on 23 Jul 2017
I agree 100% with John D'Errico. If you're willing to have a polynomial, you can get an EXACT fit with zero error if you use a Lagange interpolating polynomial: https://en.wikipedia.org/wiki/Lagrange_polynomial It will go through all your points exactly, though for points not at those training point locations, you'll get wildly oscillating numbers. So a zero error "fit" is pretty much useless in the vast majority of situations.
I'm not familiar with the cftool function in the curve fitting toolbox so I can't help you with that.
  6 Comments
Image Analyst
Image Analyst on 25 Jul 2017
Sorry, I don't have the Curve Fitting Toolbox, and it looks like no one that does have it is answering you. All I can offer is polyfit(). It looks like a polynomial of order 2 or 3 should be good. If you really want a model with residuals of zero, see this code:
Var1 = [0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5]
Var2 = [84.766716 85.775747 89.227803 94.347621 98.11531 94.175022 98.449363 108.917848 114.173113 133.533924 140.114545 145.876143 148.447757 165.888946 169.258454 173.680863]
plot(Var1, Var2, 'b*-', 'LineWidth', 2);
grid on;
[coeffs, s, mu] = polyfit(Var1, Var2, length(Var1)-1);
xFit = linspace(min(Var1), max(Var1), 100);
yFit = polyval(coeffs, xFit);
hold on;
plot(xFit, yFit, 'r.-', 'LineWidth', 2, 'MarkerSize', 13);
However you should not think that it's a good model just because the residuals are zero. Try passing in 2 or 3 instead of "length(Var1)-1" to polyfit and see what you get.
Maryam Mapar
Maryam Mapar on 25 Jul 2017
Edited: Maryam Mapar on 25 Jul 2017
thank you for your answer
what if i want to use fourier model or rational model ? is it possible to use the same way to do that ? i have another function calling chevron which i want to use and estimate the best fit using the output numbers ?

Sign in to comment.


marwan mokbil
marwan mokbil on 17 Apr 2018
Mr. John D'Errico, I have read your answer regarding the curve best fit. So what I understood that we will never get the exact equation for these known points of (x,y). That mean no one can get the exact equation for known points of (x,y) even if these points coming from real equation. Is that true?
  1 Comment
John D'Errico
John D'Errico on 19 Apr 2018
Yes. That is generally true. You can find an infinite number of "equations" that represent any set of data, even if the data is "exact". You can never infer the true equation that generated a set of data.

Sign in to comment.


marwan mokbil
marwan mokbil on 17 Apr 2018
Mr. John i am still looking for way can give me the exact equation of these known point, put in confederation that these known point of (x,y) are not random I mean they come from an equation and i want to find this exact equation. Please inform me if you know or any one else know the way to get this equation.
  1 Comment
Image Analyst
Image Analyst on 18 Apr 2018
If you've read the above answers you know that there is an unlimited number of equations that can fit a finite set of points. One guaranteed way to get an equation is to use the Lagrange interpolating polynomial, like I gave in my answer https://ww2.mathworks.cn/matlabcentral/answers/349776-how-to-find-the-best-curve-fit-for-a-set-of-data#answer_275220 though I don't think you'd want that method. Otherwise you should make some best guess at the equation using your knowledge of the physics of the situation.

Sign in to comment.


marwan mokbil
marwan mokbil on 18 Apr 2018
Mr. John i am still looking for way can give me the exact equation of these known point, put in confederation that these known point of (x,y) are not random I mean they come from an equation and i want to find this exact equation. Please inform me if you know or any one else know the way to get this equation.
  5 Comments
John D'Errico
John D'Errico on 19 Apr 2018
You are wrong. Flat out wrong. Period. Wrong. Did I mention that you are wrong?
I picked 3 points before. What stopped me from picking sets of 6 arbitrary points? Nothing. I could have picked sets of 127000 points. Still the theory is the same. There are infinitely many curves that pass through any set of points.
You keep wanting to see some magical solution, where the magic wand of mathematics will give you the function you need. Sorry, but it does not exist.
There are infinitely many ways to pass a curve exactly through ANY set of points. All are equally as valid as any other.
If you want more, then you need to invest the effort:
1. Choose a model, a family of functions that are consistent with your process. This choice should usually be driven by physical modeling considerations. So you need to understand your process. Or, if the model family is some well known family, i.e., splines, polynomials from one of the many families, trig functions, bessel functions, etc., then you need to know how to work with that set of functions and how to use them, what properties they have.
2. Learn how to fit the model to your data. Different function families are defined in different ways.
3. Learn to use the mathematical tools to help you to fit the model. It may involve linear or nonlinear regression. It may involve spline modeling. It may involve fourier transforms. Learn sufficient mathematics behind the fitting tools to be able to use those tools intelligently.
But I'm sorry, there is no magic wand you can just wave and find the equation that generated any set of data points.
Walter Roberson
Walter Roberson on 19 Apr 2018
I gave a constructive proof that there really are an infinite number of equations that fit at https://www.mathworks.com/matlabcentral/answers/347829-how-to-fit-curve-using-derivatives-as-a-constraint#comment_467765 . I also show there that it is not sufficient to match derivatives, by showing how to construct a new equation whose derivative is the same as the original equation at each of the given points.
The argument I gave does not depend upon the number of points (though I suppose it could be argued that it needs to be improved for the degenerate case of there being only one specified point.)

Sign in to comment.


Alex Sha
Alex Sha on 1 Aug 2019
this function is good enough:
y = p1/(1+p2*exp(-p3*x))+p4;
Root of Mean Square Error (RMSE): 3.46432941573129
Sum of Squared Residual: 192.025252811218
Correlation Coef. (R): 0.993674523512372
R-Square: 0.98738905867754
Adjusted R-Square: 0.9854489138587
Determination Coef. (DC): 0.98738905867754
Chi-Square: 0.80116205907639
F-Statistic: 313.184887131818
Parameter Best Estimate
---------- -------------
p1 99.6836025681754
p2 49.1906740359158
p3 0.802021238782871
p4 84.3099000840935c201.jpg
  1 Comment
Victor Zoratti Ferreira
Victor Zoratti Ferreira on 30 Jul 2020
Alex Sha, why did you choose exactly this type of equation and how did you find the parameters?

Sign in to comment.


Bhimrao
Bhimrao on 28 Mar 2024 at 6:27
For the linear equation, polyfit function for linear curve fitting. The polyfit function fits a polynomial of a specified degree to the data using the method of least squares.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!