Fitting a nonlinear curve to a small dataset

My data is
Data = ...
[2.5 -14.741408
3.0 -14.765364
4.0 -15.854609
5.0 -16.058246
6.0 -16.103032
7.0 -16.595257];
and looks like this
I want to fit a single curve to this and get the equation of that curve. How may I do this?

 Accepted Answer

Image Analyst
Image Analyst on 15 Sep 2020
Edited: Image Analyst on 15 Sep 2020
Any idea of what curve you want to fit it to? Like a polynomial, or an exponential decay (demo attached), or something else?
That said, the formula you get, whatever it is, will be virtually worthless in it's predicting ability of points not in your training set. I mean with so few and so noisy data, whatever parameters you come up with could be vastly different with a different training set. You need to get a lot more points. For example if I put in 3.5, I could get almost anything between -15 and -15.6 depending on the formula. In other words, you train with that set and you might get -15, but then you take some more measurements that are nominally the same but since there's a high amount of noise you'd get a different formula and now you might get -15.3 or -15.6. You couldn't really trust the prediction. Again, get more points!
Test4.m is the attached demo with your data plugged in, and it gives this:

10 Comments

This is fourier, and works good in [2.5 7], pchip works better but it does not have a equation. In polynomal aproach it is weak for this datas.
But as you said, if more clear aproach needs more clear data
fourier equation
y= -15.79 + -0.6927*cos(x* 1.016) + 0.07262*sin(x* 1.016) + 0.2647*cos(2*x* 1.016) + -0.3676*sin(2*x* 1.016)
Here's a full demo of your code:
Data = ...
[2.5 -14.741408
3.0 -14.765364
4.0 -15.854609
5.0 -16.058246
6.0 -16.103032
7.0 -16.595257];
x = Data(:, 1);
y1 = Data(:, 2);
plot(x, y1, 'b.', 'MarkerSize', 50, 'LineWidth', 2);
grid on;
hold on;
a0 = -15.79
a1 = -0.6927
b1 = 0.07262
a2 = 0.2647
b2 = -0.3676
w = 1.016
x2 = linspace(2.5, 7, 1000);
y2= a0 + a1*cos(x2*w) + b1*sin(x2*w) + a2*cos(2*x2*w) + b2*sin(2*x2*w)
plot(x2, y2, 'r-', 'LineWidth', 2);
We could get it to go exactly through the points if we use a Lagrange Interpolating Polynomial, which would be a 5th order polynomial in this case of 6 points.
Data = ...
[2.5 -14.741408
3.0 -14.765364
4.0 -15.854609
5.0 -16.058246
6.0 -16.103032
7.0 -16.595257];
x1 = Data(:, 1);
y1 = Data(:, 2);
plot(x1, y1, 'm.', 'MarkerSize', 40, 'LineWidth', 2);
grid on;
hold on;
coefficients = polyfit(x1, y1, 5)
x2 = linspace(2.5, 7, 1000);
y2= polyval(coefficients, x2);
plot(x2, y2, 'r-', 'LineWidth', 2);
You get
y2 = 0.037683 * x^5 -0.93705 * x^4 + 9.0208 * x^3 -41.784 * x^2 + 92.18 * x -92.07;
But we don't know what's really needed. What does "fit a single curve to this" mean exactly? That's ambiguous/vague. Does he need a regression/fit or does he need some ad hoc formula to describe just the few points given? Or does Imran even want anything anymore, considering that he has not responded to either of us?
I am really sorry for not replying earlier. It was late night here.
I wanted to fit a curve like tanhx (hyperbolic tan). The answer of 'esat gulhan' gives different equations for different intervals. By 'fit a single curve to this', I wanted to mean that if it's possible to have one equation for that curve.
Try fitnlm(). If you can't figure it out by (my) tomorrow, then tomorrow I can try it for you. In the meantime, attached are some fitnlm() examples for you to adapt.
Thank you a lot. I will try and inform you inshaAllah.
Imran
Imran on 16 Sep 2020
Edited: Imran on 16 Sep 2020
Image Analyst, can you please say your time zone? This information will help me to knock you, if I can't figure out by then.
I'm in the Eastern USA time zone.
tanh() is a sigmoid shape that looks like the rate equation demo I gave. Did you try anything in the last 10 hours? Like the rate equation demo? You forgot to upload your code.
It was pretty easy so I assume you did it and you got this:
% Uses fitnlm() to fit a non-linear model (a hyperbolic tangent) through noisy data.
% Requires the Statistics and Machine Learning Toolbox, which is where fitnlm() is contained.
% By Image Analyst
% Initialization steps.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
Data = ...
[2.5 -14.741408
3.0 -14.765364
4.0 -15.854609
5.0 -16.058246
6.0 -16.103032
7.0 -16.595257];
% Create the X coordinates from 0 to 20 every 0.5 units.
X = Data(:, 1);
Y = Data(:, 2);
% Plot the training data -- what we're going to fit
hFig2 = figure;
plot(X, Y, 'b.-', 'MarkerSize', 15);
grid on;
xlabel('X (Time)', 'FontSize', fontSize);
ylabel('Y', 'FontSize', fontSize);
title('Data to fit exponential to', 'FontSize', fontSize);
drawnow;
% Convert X and Y into a table, which is the form fitnlm() likes the input data to be in.
tableXY = table(X, Y);
%-----------------------------------------------------------------------------------------------------------------------------------
% First we'll try that exact equation. After that, we'll try it again with an offset added to it to see if we can improve it.
modelFunction = @(b, tbl) b(1) * tanh(b(2) * tbl(:, 1) + b(3)) + b(4);
beta0 = [1, -1, 6, -16]; % Guess values to start with. Just make your best guess.
% Once you get close and see what the coefficients are, you can set beta0 to those values and
% Make another run to get better coefficients.
% Now the next line is where the actual model computation is done.
model1 = fitnlm(tableXY, modelFunction, beta0);
% Now the model creation is done and the coefficients have been determined.
% YAY!!!!
coefficients = model1.Coefficients{:, 'Estimate'}
% Create smoothed/regressed data using the model:
xForFit = linspace(min(X), max(X), 1920); % HDTV resolution so curve will be smooth all the way across the screen.
yFitted = coefficients(1) * tanh(coefficients(2) * xForFit + coefficients(3)) + coefficients(4);
% Now we're done and we can plot the smooth model as a red line going through the noisy blue markers.
hold on;
plot(xForFit, yFitted, 'r-', 'LineWidth', 2);
grid on;
title('Fitting Noisy Data to the Rate Equation', 'FontSize', fontSize);
drawnow;
%------------------------------------------------------------------------------
% Set up figure properties:
% Enlarge figure to full screen.
set(gcf, 'Units', 'Normalized', 'OuterPosition', [0, 0.04, 1, 0.96]);
% Get rid of tool bar and pulldown menus that are along top of figure.
% set(gcf, 'Toolbar', 'none', 'Menu', 'none');
% Give a name to the title bar.
set(gcf, 'Name', 'Demo by ImageAnalyst', 'NumberTitle', 'Off')
xticks(0:ceil(max(X))); % Put up tick mark every 1
xl = xlim; % Get the limits of the axes.
yl = ylim;
% Put the equation up on the figure.
xt = xl(1) + 0.31 * (xl(2) - xl(1));
yt = yl(1) + 0.54 * (yl(2) - yl(1));
message1 = sprintf('Model Equation : Y = %.3f * tanh(%.3f * X + %.3f) + %.3f', ...
coefficients(1), coefficients(2), coefficients(3), coefficients(4));
text(xt, yt, message1, 'Color', 'r', 'FontSize', fontSize);
% Put up a legend.
legend('Noisy data', 'Model', 'Location', 'north');
% Enlarge the tick labels.
ax = gca;
ax.FontSize = fontSize;
Is that what you got?
Why do you think your data theoretically follows that curve? Is there any theory or physical justification that the sample data would lie along a tanh function?
I didn't try the rate equation and the Gaussian demo. I tried others but the results were not that good. I thought to knock you around 3.00 am GMT.
Till now, to my knowledge, there is no theory developed relating the x values and y values which are from my work on a biosensor. I guessed that the data might follow tanh curve after visualizing them in the plot. Also I have some test data, from which I thought a tanh curve might work well for the range 2.5-6.0.
Is it possible to have a downward move of the curve at 6.0? That is from 6.0 to 7.0, it will have a behavior similar in the range 3.0-4.0
The red curve is the best fit possible. If you lower it, which you can do by just subtracting some value from the coefficients(4) value (this is the vertical offset), then it will no longer be the best possible fit.
Of course you can lower the blue curve if you want by subtracting a value from all of your training points' y values, but I don't know the point of that. In fact I don't know why you want to lower either curve. Your measurements are what they are - why change them? And the best fit is just that - the best fit given the training data you provided. Why change that? If you want you can change your model from tanh to something else. You could even use splines for an empirical (numeric) fit (no equation but you just plug your x value into spline() instead of some formula from fitnlm() so it's pretty much the same from an end user perspective.
Hmm, I understand that I neither can change my training set nor the best fit curve. You have helped a lot. Thank you.

Sign in to comment.

More Answers (1)

x=[2.5 3.0 4.0 5.0 6.0 7.0];
y=[-14.741408 -14.765364 -15.854609 -16.058246 -16.103032 -16.595257];
s=pchip(x,y) %you can use pchip or cape instead of pchip
xx=linspace(2.5,7,100);yy=ppval(s,xx)
plot(xx,yy,'LineWidth',1.5);grid on;hold on;plot(x,y,'o')
ppval(s,3) %if you want to know y when x 3 this code works, if you want y when x 5 you should enter 5
ppval(s,5) %if you want y when x 5 you should enter 5 like that

3 Comments

How can I get the function that fits the data?
You can get a function from Cfit data tool
a0 = -15.79
a1 = -0.6927
b1 = 0.07262
a2 = 0.2647
b2 = -0.3676
w = 1.016
y= a0 + a1*cos(x*w) + b1*sin(x*w) + a2*cos(2*x*w) + b2*sin(2*x*w)
You can get a fourier function like that. Pchip is better to use
if it works please accept the answer.

Sign in to comment.

Categories

Products

Release

R2015a

Asked:

on 15 Sep 2020

Commented:

on 17 Sep 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!