Fitting a data set by optimising multiple parameters

Question

Mohamed Elasmar on 17 May 2024

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/2119986-fitting-a-data-set-by-optimising-multiple-parameters

Commented: Star Strider on 21 May 2024

Hello,

I am developing a model for simulating experimental data sets. Each data set corresponds to two specific variables (P and T). The model is generally represented by the following equation:

V_total = V1 + V2 + A log (a/a1) + B log (a/a2)

where a = 0:1.5:700;

The terms of V1, V2, A and B are calculated through specific equations so you can consider them constants. The coefficients of a1 and a2 are functions of P and T as follows:

a1 = x1 * P^0.5 * exp(16.95 - (5052 / T))

a2 = x2 * P^0.5 * exp(8.47 - (2526 / T))

For each data set, I want to obtain the best fitting by optimising coefficients x1 and x2.

For better understanding, here is a plot of the data set of P = 5 & T = 90. I randomly used x1 = 0.1 and x2 = 0.005.

I would be grateful if someone helps me in solving this issue.

Thank you in advance.

2 Comments
Show NoneHide None

the cyclist on 17 May 2024

Can you upload one or two example datasets? You can use the paper clip icon in the INSERT section of the toolbar.

Mohamed Elasmar on 17 May 2024

Edited: Mohamed Elasmar on 17 May 2024

Ask_17.05.2024.xlsx

Thank you @the cyclist for your answer. Here is the dataset of P = 5 & T = 90.

Please consider T = 363 instead of 90 as it is the temperature and should be used in Kelvin in the equations of a1 and a2.

Sign in to comment.

Sign in to answer this question.

Answer 1

Star Strider on 17 May 2024

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/2119986-fitting-a-data-set-by-optimising-multiple-parameters#answer_1459611

Open in MATLAB Online

Ask_17.05.2024.xlsx

Perhaps something like this —

T1 = readtable('Ask_17.05.2024.xlsx')

T1 = 431x2 table

a V_total _______ _______ 0 1.32 0.80556 1.33 2.4444 1.34 4.0833 1.36 5.6944 1.38 5.6944 1.39 8.1389 1.42 9.7778 1.44 10.583 1.46 13.028 1.48 14.639 1.48 16.278 1.5 17.083 1.51 18.722 1.52 21.167 1.53 21.167 1.54

a = T1.a;

a(1) = 1E-4; % Please Avoid Calculating 'log(0)'

V_total = T1.V_total;

P = 5;

T = 90 + 273

T = 363

V1 = rand; % Provide Missing Value

V2 = rand; % Provide Missing Value

A = rand; % Provide Missing Value

B = rand; % Provide Missing Value

a1 = @(x,a) x(1) * sqrt(P) * exp(16.95 - (5052 / T));

a2 = @(x,a) x(2) * sqrt(P) * exp(8.47 - (2526 / T));

V_total_fcn = @(x,a) V1 + V2 + A*log(a/a1(x,a)) + B*log(a/a2(x,a))

V_total_fcn = function_handle with value:

@(x,a)V1+V2+A*log(a/a1(x,a))+B*log(a/a2(x,a))

x0 = rand(2,1);

X = lsqcurvefit(V_total_fcn, x0, a, V_total)

Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.

X = 2x1

5.4249 0.5849

figure

plot(a, V_total, '.', 'DisplayName','Data')

hold on

plot(a, V_total_fcn(X,a), '-r', 'DisplayName','Regression Fit')

hold off

grid

xlabel('a')

ylabel('V\_total')

legend('Location','best')

This may work without further changes with the correct values for ‘V1’, ‘v2’, ‘A’, and ‘B’, however it may be necessary to use the real, imag, or abs functions to deal with the complex results if they persist after that.

.

5 Comments
Show 3 older commentsHide 3 older comments

Mohamed Elasmar on 21 May 2024

Edited: Mohamed Elasmar on 21 May 2024

Thank you very much @Star Strider for your answer. The code already works, but with unstable behaviour. The final code that I have developed using your answer and including the values of V1, V2, A and B is as follows:

clc

clear

close all

Table1 = readtable('Ask_17.05.2024.xlsx');

a = Table1.a(1:15:end,:); % Reduction of the exp. data

V_total = Table1.V_total(1:15:end,:); % Reduction of the exp. data

a(1) = 1E-2; % An arbitrary starting value

P = 5;

T = 90 + 273;

A = 0.0379;

B = 0.0338;

V1 = 1.2148;

V2 = a .* 0.36 .* (6.7399e-04 + R); % where R is a function of 'a' and is defined through a specific equation.

R = ...;

a1 = @(x,a) x(1) * sqrt(4.2575) * 20.8113;

a2 = @(x,a) x(2) * sqrt(4.2575) * 4.5619;

V_total_fcn = @(x,a) V1 + V2 + (A .* log(a./(a1(x,a) .* (1 - G)))) + (B .* log(a./a2(x,a) .* (1 - G))); % Final equation of V_total, where G is also a function of 'a' and is defined through a specific equation.

G = ...;

x0 = rand(2,1);

X = lsqcurvefit(V_total_fcn, x0, a, V_total)

figure

plot(a, V_total_fcn(X,a),'Color','#D95319','LineWidth',1.3)

hold on

plot(a, V_total,'o','MarkerEdgeColor','#D95319','LineWidth',1.3)

hold off

lgd = legend('Simulation','Experiment','Location','southeast');

I ran the code three times with the same value of a(1) = 1E-2 and I had three different fitting values as follows:

%% 1st trial

x(1) = 0.003

x(2) = 0.9482

%% 2nd trial

x(1) = 0.6424 - 0.0038i

x(2) = 0.0001 + 5.2915e-7i

%% 3rd trial

x(1) = 0.0003

x(2) = 0.4734

Are there any ideas to overcome this unstable behaviour?

Star Strider on 21 May 2024

Edited: Star Strider on 21 May 2024

Open in MATLAB Online

Ask_17.05.2024.xlsx

My pleasure!

I cannot run this because ‘R’ and ‘G’ are missing. (They are required for ‘V2’ and ‘V_total_fcn’.)

The problem with nonlinear parameter estimation is that the results are sensitive (extremely sensitive in some instances) to the initial parameter estimates, especially if there are multiple local minima in the response surface. The genetic algorithm optimisation should work, and should give the global values for the ‘x’ parameter vector that give the best overall fit. (I am also not constrainiing ‘x’ in this instance.)

I will run this when the required paramters appear.

Table1 = readtable('Ask_17.05.2024.xlsx');

a = Table1.a(1:15:end,:); % Reduction of the exp. data

V_total = Table1.V_total(1:15:end,:); % Reduction of the exp. data

a(1) = 1E-2; % An arbitrary starting value

P = 5;

T = 90 + 273;

A = 0.0379;

B = 0.0338;

V1 = 1.2148;

R = 0.02777 .* ((1./(143.5860 .* (1 - ((1.1261e-08 .* a) ./ (3e-5 + (1.1261e-08 .* a)))).^1.5)) + (1./(143.5860 .* (1 - ((5.6303e-09 .* a) ./ (3e-5 + (5.6303e-09 .* a)))).^1.5)));

G = 8.9717e-3 .* (a .* 10).^0.3;

V2 = a .* 0.36 .* (6.7399e-04 + R); % where R is a function of 'a' and is defined through a specific equation.

a1 = @(x,a) x(1) * sqrt(4.2575) * 20.8113;

a2 = @(x,a) x(2) * sqrt(4.2575) * 4.5619;

V_total_fcn = @(x,a) V1 + V2 + (A .* log(a./(a1(x,a) .* (1 - G)))) + (B .* log(a./a2(x,a) .* (1 - G))); % Final equation of V_total, where G is also a function of 'a' and is defined through a specific equation.

% x0 = rand(2,1);

% X = lsqcurvefit(V_total_fcn, x0, a, V_total)

ftns = @(x) norm(V_total - V_total_fcn(x,a));

PopSz = 500;

Parms = 2;

optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems

tic

% [theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);

[X,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],[],[],[],[],optsAns);

ga stopped because the average change in the fitness value is less than options.FunctionTolerance.

toc

Elapsed time is 0.482099 seconds.

fprintf('\nParameters —\n\t\tx(1) = %.6f\n\t\tx(2) = %.6f\n\n',X)

Parameters — x(1) = 0.017415 x(2) = 0.004000

NrGenerations = output.generations

NrGenerations = 66

FitnessValue = fval

FitnessValue = 0.3318

V_t_mdl = fitnlm(a,V_total, V_total_fcn, X)

Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well (they are not identifiable). Use caution in making predictions.

V_t_mdl =

Nonlinear regression model: y ~ F(x,a) Estimated Coefficients: Estimate SE tStat pValue _________ __________ ______ __________ b1 0.018479 0.00035689 51.778 2.3849e-29 b2 0.0042741 0.0013761 3.106 0.0043143 Number of observations: 29, Error degrees of freedom: 28 Root Mean Squared Error: 0.0625 R-Squared: 0.863, Adjusted R-Squared 0.863 F-statistic vs. zero model: 2.48e+04, p-value = 8.05e-43

X = V_t_mdl.Coefficients.Estimate;

fprintf('\nTweaked Parameters —\n\t\tx(1) = %.6f\n\t\tx(2) = %.6f\n\n',X)

Tweaked Parameters — x(1) = 0.018479 x(2) = 0.004274

figure

plot(a, V_total_fcn(X,a),'Color','#D95319','LineWidth',1.3)

hold on

plot(a, V_total,'o','MarkerEdgeColor','#D95319','LineWidth',1.3)

hold off

lgd = legend('Simulation','Experiment','Location','southeast');

EDIT — (21 May 2024 at 17:01)

Changed code to include definitions for the missing parameters and the fitnlm call to provide statistics and to tweak the parameters returned by ga.

.

Mohamed Elasmar on 21 May 2024

I actually could not understand the following part of the code:

ftns = @(theta) norm(V_total - V_total_fcn(x,a));

PopSz = 500;

Parms = 2;

optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems

tic

% [theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);

[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],[],[],[],[],optsAns);

toc

NrGenerations = output.generations

Nevertheless, I have added it to my code and have commented the following two lines:

x0 = rand(2,1);

X = lsqcurvefit(V_total_fcn, x0, a, V_total)

In addition, I have defined the equations of R and G. Hence, the error was: Unrecognized function or variable 'x'.

You can try yourself by defining R and G as follows:

R = 0.02777 .* ((1./(143.5860 .* (1 - ((1.1261e-08 .* a) ./ (3e-5 + (1.1261e-08 .* a)))).^1.5)) + (1./(143.5860 .* (1 - ((5.6303e-09 .* a) ./ (3e-5 + (5.6303e-09 .* a)))).^1.5)));

G = 8.9717e-3 .* (a .* 10).^0.3;

Star Strider on 21 May 2024

My revised code (edited a few minutes ago in my previous Comment) implements the genetic algorithm (ga), and the additional assignments define the fitness function ‘ftns’ and the options structure I use with my ga calls.

I did not previously correct my code because I could not run it, so there are several obvious errors.. The current version runs and gives a reasonable result. I added a fitnlm call both to provide statistics on the parameters and the fit, and to tweak the parameters to give the best result. (The ga function actually has a version of this as part of its options and refers to it as a hybrid parameter estimation.)

There may still be some differences in the estiamted parameters between the runs, however they should be within the confidence intervals for any set of estimated parameters. The fit in general is quite good.

.

Sign in to comment.

Fitting a data set by optimising multiple parameters

2 Comments
Show NoneHide None

Answers (1)

5 Comments
Show 3 older commentsHide 3 older comments

See Also

Categories

Tags

Community Treasure Hunt

Fitting a data set by optimising multiple parameters

2 Comments Show NoneHide None

Answers (1)

5 Comments Show 3 older commentsHide 3 older comments

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

5 Comments
Show 3 older commentsHide 3 older comments