Fitting a Bimodal pore size distribution does not give the expected results, how to solve for correct solution?

1 view (last 30 days)
Ultimately, I would like to determine two different pore porosities (and radii) of a perfusion chromatography resin, using retention volume of a range of dextran molecules with different sizes.
Equations involved are described as this (See literature below):
where, n = 1 for macropores and n = 2 for micropores, rh = hydrodynamic radius [nm], rpore,n is the pore size and epsilons are porosities. Eq. 3 is used to calculate the dissociation constant (Kd) of dextrans which are then fitted to equation 1 and 2. However, doing this directly did not give the correct results (gave a parabolic function). Thus a bimodal pore size distribution was attempted.
I have some example data from literature, which can be fitted with a bimodal pore size distribution to obtain a number of parameters for the specific liquid chromatography resin that was used. The resulting pore radii should be around 8.2-11 nm and 370.5-470 based on literature. Even if I use those values as initial guess, I do not arrive at the expected solution. I tried different methods from different papers (case switch in the Kd2 function), which also do not produce the expected solution.
Did I use the incorrect model or assumptions? Is there something regarding my code that is wrong?
See below the code that I used:
rh = [1 2 3 7 19 36]; % Hydrodynamic radii dextrans [nm]
Kd_dex = [0.59 0.5 0.45 0.36 0.28 0.27]; % Dissociation constant dextrans [-]
xdata = rh;
ydata = Kd_dex;
% Initial guesses
rp0 = 100; % pore radius 1
sp0 = 1; % standard deviation 1
rp20 = 1; % pore radius 2
sp20 = 0.1; % standard deviation 2
f0 = 1; % parameter taking into account area contribution of 2 species
theta0 = [rp0;sp0;rp20;sp20;f0]; % Use this for bimodal distribution
% theta0 = [rp0;sp0]; % Use this for unimodal distribution
% Settings curve fitting
lb = [0,0,0,0,0];
ub = [1000,100,1000,100,1];
options = optimset('MaxFunEvals',Inf,'MaxIter',200,'Display', 'iter', 'TolX', 1e-12, 'TolFun', 1e-12);
% Perform fit
[F,rn,resid,~,~,~,jac] = lsqcurvefit(@Kdvec,theta0,xdata,ydata,lb,ub,options);
% Calculate statistics
JJ = full(jac);
corr = size(JJ,2)-rank(JJ'*JJ);
sd = sqrt(diag(pinv(JJ'*JJ)*rn/(size(JJ,1)-size(JJ,2)))); % standard deviation fit
se = sd./sqrt(5); % standard error fit
% Plot result
figure(1)
plot(xdata,ydata,'or')
hold on
xplot = linspace(min(xdata),max(xdata));
plot(xplot,Kdvec(F,xplot))
% hold off
function [Kvec] = Kdvec(theta,xdata)
% Vector of Kd for every xdata:
Kvec = zeros(size(xdata));
for i = 1:length(xdata)
[Kvec(i)] = Kd2(theta,xdata(i));
end
end
function [K,f] = Kd2(theta,rm)
% Calculate integral
method = 'Harlan_bi'; % select equations to be used for fitting
switch method
case {'Harlan_bi'}
% Bi-modal distribution (J.E. Harlan et al. 1995)
rp1 = theta(1);
sp1 = theta(2);
rp2 = theta(3);
sp2 = theta(4);
f = theta(5);
A = (sqrt(2/pi())/(sp1*(erf(rp1/(sqrt(2)*sp1))+1)+f*(sp2*(erf(rp2/(sqrt(2)*sp2))+1)))); % Coefficient of gauss
gauss = @(r) A.*(exp(-0.5.*((-r+rp1)./sp1).^2)+f.*exp(-0.5.*((-r+rp2)./sp2).^2)); % bi-modal gaussian distribution
K = 1 - integral(gauss,0,rm); % calculate dissociation constant
case {'Harlan_uni'}
% Uni-modal distribution (personal adaptation from J.E. Harlan et al. 1995)
rp1 = theta(1);
sp1 = theta(2);
A2 = (sqrt(2/pi())/(sp1*(erf(rp1/(sqrt(2)*sp1))+1)));
gauss = @(r) (A2./r).*exp(-0.5.*(log(r./rp1)./sp1).^2);
K = 1 - integral(gauss,0,rm); % calculate dissociation constant
case {'Yao_bi'}
% Bi-modal distribution (personal adaptation from Y. Yao, A.M. Lenhoff 2006)
rp1 = theta(1);
sp1 = theta(2);
rp2 = theta(3);
sp2 = theta(4);
f = theta(5);
A = 1; % Coefficient of gauss
gauss = @(r) (A./r).*(exp(-0.5.*(log(r./rp1)./sp1).^2)+f.*exp(-0.5.*(log(r./rp2)./sp2).^2)); %Method A - log
% gauss = @(r) A.*(exp(-0.5.*((r-rp1)./sp1).^2)+f.*exp(-0.5.*((r-rp2)./sp2).^2)); %Method B - normal
integrand = @(r) gauss(r).*(1-(rm./r).^2);
K = integral(integrand,rm,1000)/integral(gauss,0,1000);
case {'Yao_uni'}
% Uni-modal distribution (Y. Yao, A.M. Lenhoff 2006)
rp1 = theta(1);
sp1 = theta(2);
A = 1; % Coefficient of gauss
gauss = @(r) (A./r).*(exp(-0.5.*(log(r./rp1)./sp1).^2)); %Method A - log
% gauss = @(r) A.*(exp(-0.5.*((r-rp1)./sp1).^2)); %Method B - normal
integrand = @(r) gauss(r).*(1-(rm./r).^2);
K = integral(integrand,rm,1000)/integral(gauss,0,1000);
end
end
References
Data on pore sizes from these papers: (as well as fitting, but unknown how exactly they managed to do it)
Papers on bimodal PSD functions are:

Accepted Answer

John D'Errico
John D'Errico on 26 Jul 2019
Edited: John D'Errico on 26 Jul 2019
Almost always, when an optimization like this fails, the problem arises from one of several circumstances. Sadly, it is impossible to know which is the case here, at least without spending perhaps several hours of my time. The obvious possibilities are:
  1. You made a mistake in the code. That is entirely possible, given the complexity of this code. It is also virtually impossible for me to check, without a thorough read of your references, without actively testing your code.
  2. Your starting values are poor. This is very common, especially with any model that includes exponentials.
  3. You data may be insufficient for estimation, perhaps noisy.
  4. You have just chosen a poor model for this specific data.
We cannot know which, if any, of the above are the case, since you don't even provide the data!!
What should you do?
  1. First, plot your data. Look at it. Decide from the plot, what reasonable starting values might be for the patrameters in the model. If you cannot do this, then you don't understand your model. If you do not understand what the parameters mean in your model, then why in the name of blazes are you doing this in the first place?
  2. Next, evaluate the objective function at the starting values you have chosen. Does it produce reasonable predictions? Are the results normal numbers, or do they look like garbage? If they look like strange garbage, then either your model is worthless, or your implementation is the same, or your starting values are complete crapola. Look at what you get. THINK ABOUT WHAT YOU SEE.
  3. Next, tweak the starting values by a small amount, near the start point. Do the predictions for your model change? If not, then something is wrong in your model or in your starting values.
Do steps 1, 2, 3 as I indicate. In fact, do this for almost ANY optimization problem, but especially it is important when you don't know what you are doing and don't really understand the optimization process.
If everything still seems reasonable from steps 1, 2, 3 above, then decide if the model is simply incapable of fitting your data. I see people provide crap data all the time, and they are so often surprised when any arbitrarily random model does not fit it perfectly. If the model still seems reasonable, then I would do things like look at the steps the optimizer is following. Is it diverging, perhaps because the problem is a nasty one, and your starting values are still insufficient? A good idea is to then test the code on made up data, that you know follows the given model.
Edit:
After a re-read of your code, I see that you do provide data in the beginning of the code. 6 data points????????? Sorry, but this is insane. You are trying to estimate 5 parameters!!!!!!!!!!
Sigh.
  1 Comment
J Hofstede
J Hofstede on 28 Jul 2019
Hi, thanks for your answer! I see now that I indeed made a very basic mistake in trying to fit too many parameters with a limited amount of data points. I will revisit the data and model and follow your advise to come up with better results.

Sign in to comment.

More Answers (0)

Categories

Find more on Particle & Nuclear Physics in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!