Make curve fitting faster

93 views (last 30 days)
Marek Dostal
Marek Dostal on 24 Nov 2018
Edited: Bruno Luong on 10 Nov 2022
Hi,
I would like to make my curve fitting faster by using GPU. Is it possible? If yes how?
Thanks
  1 Comment
Matt J
Matt J on 25 Mar 2019
Marek's problem description copied here for reference:
I have MRI data of brain (each subject has approx. 0,5 mil voxels) with 10 different parameters and I want to in each voxel fit this 10 points by specific biexponencial curve (so I receive parametric maps of brain). I tried to optimize the script before I used parallel computing (1 brain ~4 hours) but I used only parfor (~1,25 hours) and because I am still beginner with Matlab I was curious if other users have experience with GPU and can help me make this faster than parfor.
Thanks
Marek

Sign in to comment.

Accepted Answer

Matt J
Matt J on 26 Nov 2018
Edited: Matt J on 21 Jun 2019
I want to in each voxel fit this 10 points by specific biexponencial curve (so I receive parametric maps of brain)
@Marek,
Do not fit one voxel at a time in a loop. Instead, write the model as an N-dimensional curve where N is the number of voxels and apply lsqcurvefit to this model. Do not let lsqcurvefit use its default finite difference derivative calculations. Instead, supply your own Jacobian*x operator using the JacobianMultiplyFcn option (EDIT: or provide a sparse block diagonal Jacobian matrix computation).
Your problem is a good example of what I was discussing with Joss: both the model function and the JacobianMultiplyFcn could be GPU-optimized if lsqcurvefit could work with gpuArray variables without constantly doing GPU/CPU transfers. However, I expect the fit will still be reasonably fast if you use appropriate Matlab vectorization techniques.
  16 Comments
Matt J
Matt J on 11 Apr 2019
Edited: Matt J on 11 Apr 2019
1) The error is larger when the Jacobian multiply function is supplied. I do not think there is an error in the derivative or in the indexing.
Assuming you're right, I would say the resnorm differences seen here are just "stopping noise": the iterations of the different computation methods take different paths toward the solution and stop wherever the stopping tolerances happen to be satisfied. The resnorm is not uniform over the range of these different possible stopping points, so you see essentially random differences in the resnorm between methods. Of course, it may be worth taking a look at the exitflag in each case...
I any event, it doesn't appear that JacobMult was ever capable of doing better in terms of accuracy than the default method. The worry that SpecifyObjectiveGradient tries to address, aside from speed, is that default finite difference calculations may limit the accuracy of the fit in some cases. But your tests show that that wasn't the case here. The finite differencing derivative calculations were good enough to reach an essentially zero resnorm, so there was no real room for improvement.
2) The time for the Jacobian multiply is slower than for supplying the exact sparse Jacobian.
Your Jacobian multiply code has various sub-efficiencies in it: for-looping, quantities like fp(:).*qp(:) or xp(:)-p(1) that are computed multiple times per k when they could be computed only once, subsref operations like fp=f(ff,:) which necessitate repeated memory allocation operations, etc...
3) What would you recommend for N fits that do not have the same number of data points?
You may find that it is not an issue once you start optimizing your computations and getting rid of your for-loops.
sarah goldberg
sarah goldberg on 6 May 2019
Hi Matt,
I've been trying to generalize the ND Jacobian solution, with Jacobian not calculated/calculated/only Jacobian multiply calculated. The Jacobian multiply option is currently not working.
Here k is the number of 1D fits with n points and p parameters, N=n x k, P=p x k. Jacobian size is indeed (N x P) - I checked the output of lsqcurvefit.
I'm running into the following issue with the Jacobian multiply function (myJ): the size of the input vector Y is either (Nx1) or (Px1) or (Px2). How is this possible?
Code below. I am simply looking at
size(Y)
Thanks,
Sarah
clear all; close all; fclose all; clc;
s=rng('shuffle');
rng(828016308,'twister');
global history;
history=[];
% plot
doPlot=false; %true;
num_plot=4; % how many fits to plot
% fit params
alg='trust-region-reflective' ; % 'trust-region-reflective' or 'levenberg-marquardt'
displaySetting='off'; % 'iter', 'off', 'final', etc.
maxIter=1e5;
maxFunEvals=1e5;
tolfun=1e-6;
% options
opts = optimoptions('lsqcurvefit','MaxIter',maxIter,'MaxFunEvals',maxFunEvals,'OutputFcn', @myoutput,'display',displaySetting,'Algorithm',alg,'tolfun',tolfun);
optsJ = optimoptions('lsqcurvefit','MaxIter',maxIter,'MaxFunEvals',maxFunEvals,'OutputFcn', @myoutput,'display',displaySetting,'Algorithm',alg,'tolfun',tolfun,'Jacobian','on');
optsJm = optimoptions('lsqcurvefit','MaxIter',maxIter,'MaxFunEvals',maxFunEvals,'OutputFcn', @myoutput,'display',displaySetting,'Algorithm',alg,'tolfun',tolfun,'SpecifyObjectiveGradient',true,'JacobianMultiplyFcn',@(Jinfo,Y,flag,xdata)myJ(Jinfo,Y,flag));
% THE MODEL. generate function handles for evaluating f (f_func) and its partial derivatives (dfdp_func).
[f_func,dfdp_func,num_params]=sym_model_derivatives();
% general 2D model
num_g=2; % number of gaussians
npoints=[3,5]; % image size [r,c]=[y,x], ROI intentionally not symmetric for debugging
min_width=1; % so we don't have sigma==0
a0=30;
a_std=1;
mu_std=0.1;
sigma0=[2,1]; % be careful, do not make these too small!
sigma_std=0.1;
theta=0*(pi/180); % in radians
theta_std=10*(pi/180);
bg=10;
bg_std=5;
mu0=npoints/2; % place gaussians roughly at center
im=zeros(npoints);
% generate data: num_g images of size(im)
ims=zeros(num_g,size(im,1),size(im,2)); % image set size
mus=mu0'*ones(1,num_g) + mu_std*randn(2,num_g); % 1 per dimension
sigmas=sigma0'*ones(1,num_g) + sigma_std*randn(2,num_g); % 1 per dimension
as=a0 + a_std*randn(1,num_g); % 1 per gaussian
sigmas(sigmas<0.1)=min_width; % to avoid division by 0
thetas=theta+theta_std*randn(1,num_g);
bgs=bg+bg_std*randn(1,num_g);
[X,Y]=meshgrid(1:size(im,2),1:size(im,1)); % x-y coordinates - weird order so that data can be reshaped correctly
x_ind=X(:); % x, column vector
y_ind=Y(:); % y, column vector
for mm=1:num_g
% parameter order: %Amplitude % Row (y) % Col (x) % Theta -3* 3* % Sigma_row (y) % Sigma_col (x) % Background
c0s(mm,:)=[as(mm) mus(2,mm) mus(1,mm) thetas(mm) sigmas(2,mm) sigmas(1,mm) bgs(mm)];
ims(mm,:)=model_func(c0s(mm,:),x_ind,y_ind,f_func); % 1D vector per 2D plot
end
% guesses
guesses=zeros(num_g,num_params);
% parameter order: %Amplitude % Row % Col % Theta -3* 3* % Sigma_row % Sigma_col % Background
for mm=1:num_g
[maxval,~]=max(ims(mm,:));
tmp_im=reshape(ims(mm,:)',size(im,1),size(im,2));
[xpos,ypos]=find(tmp_im==maxval);
[~,half_x]=min(abs(tmp_im(ypos,:) - maxval/2));
[~,half_y]=min(abs(tmp_im(:,xpos) - maxval/2));
guesses(mm,:)=[maxval xpos ypos 0 (abs(half_x(1)-xpos)+min_width) (abs(half_y(1)-ypos)+min_width) 0]; % adding something small to avoid 0
end
%% N x 2D - setup
im_data=zeros(num_g,prod(size(im)));
xdata=kron(ones(1,num_g),x_ind)';
ydata=kron(ones(1,num_g),y_ind)';
guesses_ND=zeros(num_g,num_params);
% images
for mm=1:num_g
im_data(mm,:)=ims(mm,:);
end
% guesses
guesses_ND=guesses;
%% N x 2D
history=[];
results_ND=zeros(num_g,num_params);
fprintf(1,['\n']);
tic
f_ND = @(c,xdata)model_func_ND(c,xdata,ydata,f_func); % need to pass x_ind, y_ind to model_func
[results_ND,resnormN,residualN,exitflagN,outputN,lambdaN,jacobianN]=lsqcurvefit(f_ND,guesses_ND,xdata,im_data,[],[],opts); % each row is 1 object
toc
fprintf(1,['Nx2D\n']);
resnormN
historyN=history;
exitflagN
%% N x 2D + Jacobian
results_ND_J=zeros(num_g,num_params);
fprintf(1,['\n']);
tic
f_ND = @(c,xdata)model_func_ND_J(c,xdata,ydata,f_func,dfdp_func); % need to pass args x_ind, y_ind, etc. to model_func
[results_ND_J,resnormN_J,residualN_J,exitflagN_J,outputN_J,lambdaN_J,jacobianN_J]=lsqcurvefit(f_ND,guesses_ND,xdata,im_data,[],[],optsJ); % each row is 1 object
toc
fprintf(1,['Nx2D + Jacobian\n']);
resnormN_J
exitflagN_J
%% N x 2D + Jacobian multiply
history=[];
results_ND_J=zeros(num_g,num_params);
fprintf(1,['\n']);
tic
f_ND = @(c,xdata)model_func_ND_Jm(c,xdata,ydata,f_func,dfdp_func); % need to pass args x_ind, y_ind, etc. to model_func
[results_ND_Jm,resnormN_Jm,residualN_Jm,exitflagN_Jm,outputN_Jm,lambdaN_Jm,jacobianN_Jm]=lsqcurvefit(f_ND,guesses_ND,xdata,im_data,[],[],optsJm); % each row is 1 object
toc
fprintf(1,['Nx2D + Jacobian multiply\n']);
resnormN_Jm
historyNm=history;
exitflagN_Jm
return
%% ----- functions -------
% model:
% f = Amplitude*.*exp(- ( A*(x-Col).^2 + 2*B*(x-Col).*(y-Row) + C*(y-Row).^2 ) ) + Background
% A = cos(Theta)^2/(2*Sigma_col) + sin(Theta)^2/(2*Sigma_row)
% B = sin(2*Theta) * (1/Sigma_row^2 - 1/Sigma_col^2)/4;
% C = sin(Theta)^2/(2*Sigma_col) + cos(Theta)^2/(2*Sigma_row);
% parameter order: %Amplitude % Row % Col % Theta -3* 3* % Sigma_row % Sigma_col % Background
function [f_func,df,num_params]=sym_model_derivatives()
num_params=7;
p=sym('p',[1 num_params]);
syms f x y A B C tmp_df;
% model definition HERE
A=cos(p(4))^2/(2*p(6)^2) + sin(p(4))^2/(2*p(5)^2);
B=sin(2*p(4)) * (1/p(5)^2 - 1/p(6)^2)/4;
C=sin(p(4))^2/(2*p(6)^2) + cos(p(4))^2/(2*p(5)^2);
f = p(1)*exp(- ( A*(x-p(3))^2 + 2*B*(x-p(3))*(y-p(2)) + C*(y-p(2))^2 ) ) + p(7);
% end of model definition
vars=[p x y];
for ii=1:length(p)
tmp_df=diff(f,p(ii));
% convert dfdp to matlab function
df{ii}=matlabFunction(tmp_df,'vars',vars);
end
% convert f to matlab function
f_func=matlabFunction(f,'vars',vars);
end
% 2D model - used to generate data
function [f]=model_func(p,x,y,f_func)
p_list=num2cell(p); % to avoid p(1),p(2),...,p(n) argument passing
f=f_func(p_list{:},x,y);
end
% Nx2D model
function [f]=model_func_ND(c,x,y,f_func)
G_n=size(x,2); % number of data points
G_k=size(x,1); % number of fits
f = zeros(G_k, G_n);
for ff=1:G_k
xp=x(ff,:); % points for this fit (n)
yp=y(ff,:);
cp=c(ff,:); % params for this fit (m)
cp_list=num2cell(cp);
% f
f(ff,:)=f_func(cp_list{:},xp,yp);
end
end
% Nx2D + Jinfo model
function [f,Jinfo]=model_func_ND_info(c,x,y,f_func)
G_n=size(x,2); % number of data points
G_k=size(x,1); % number of fits
f = zeros(G_k, G_n);
for ff=1:G_k
xp=x(ff,:); % points for this fit (n)
yp=y(ff,:);
cp=c(ff,:); % params for this fit (m)
cp_list=num2cell(cp);
% f
f(ff,:)=f_func(cp_list{:},xp,yp);
end
G_p=numel(c)/G_k; % number of parameters per fit
Jinfo = spalloc(G_n*G_k, G_p*G_k, 0);
end
% Nx2D + Jacobian
function [f,J]=model_func_ND_J(c,x,y,f_func,dfdp_func)
G_n=size(x,2); % number of data points
G_k=size(x,1); % number of fits
G_p=numel(dfdp_func); % number of parameters per fit
f = zeros(G_k, G_n);
J = spalloc(G_n*G_k, G_p*G_k, G_n*G_k*G_p);
% calculate fit values and partial derivatives for each fit
for ff=1:G_k
xp=x(ff,:); % points for this fit (n)
yp=y(ff,:);
cp=c(ff,:); % params for this fit (m)
cp_list=num2cell(cp);
% f
f(ff,:)=f_func(cp_list{:},xp,yp);
% J
for pp=1:length(dfdp_func)
df=dfdp_func{pp}; % function handle of this partial derivative
J(ff:G_k:end , (pp-1)*G_k + ff) = df(cp_list{:},xp,yp);
% full(J)
end
%size(J)
end
end
% Nx2D + Jacobian multiply
function [f,Jinfo] = model_func_ND_Jm(c,xdata,ydata,f_func,dfdp_func)
global G_df_func G_P G_n G_k G_xd G_yd; % needed for jacobian multiply function
G_n=size(xdata,2); % number of data points
G_k=size(xdata,1); % number of fits
f = zeros(G_k, G_n);
for ff=1:G_k
xp=xdata(ff,:); % points for this fit (n)
yp=ydata(ff,:);
cp=c(ff,:); % params for this fit (m)
cp_list=num2cell(cp);
% f
f(ff,:)=f_func(cp_list{:},xp,yp);
end
% update globals
G_P=c; % parameters
G_p=numel(dfdp_func);% number of parameters per fit
G_xd=xdata; % indices
G_yd=ydata;
G_df_func=dfdp_func; % partial derivative function handles
% Jinfo sparse matrix for preconditioning, required even when jacobian
% multiply is used. size should equal the size of the jacobian.
Jinfo = spalloc(G_n*G_k, G_p*G_k, G_n*G_k*G_p); % all zeros
end
% compute Jacobian multiply functions.
% note: Jacobian C is sparse. the only nonzero elements are
% derivatives of points xi belonging to fit j
% with respect to their own fitting parameters. there positions within the
% jacobian can be viewed by looking at the jacobian generated in model_func_ND_J
% using the command full(J).
function w = myJ(Jinfo,Y,flag)
size(Y) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
global G_df_func G_P G_n G_k G_xd G_yd;
P=G_P;
n=G_n;
k=G_k;
xd=G_xd;
yd=G_yd;
df_func=G_df_func;
if flag > 0
w = Jpositive(Y, df_func,P,n,k,xd,yd);
elseif flag < 0
w = Jnegative(Y, df_func,P,n,k,xd,yd);
else
w1= Jpositive(Y, df_func,P,n,k,xd,yd);
w = Jnegative(w1,df_func,P,n,k,xd,yd);
end
function a = Jpositive(q,df_func,P,n,k,xd,yd)
% Calculate C*q (dimensions NxP, Px1 => Nx1)
a = zeros(n*k,1);
for ff=1:k % loop over fits, calculate n sums for each fit
xp=xd(ff,:); % points for this fit (n)
yp=yd(ff,:);
p=P(ff,:); % params for this fit (m)
qp=q(ff:k:end); % values of q for this fit (m)
sump=zeros(1,n);
for pp=1:numel(df_func)
df=df_func{pp}; % 1 of m function handles
p_list=num2cell(p);
dfp=df(p_list{:},xp,yp); % n partial derivatives (may have 1 value if derivative is a const)
sump=sump+qp(pp)*dfp; % add n terms for each of m params
end
a(ff:k:end)=sump'; % n terms
end
end
function a = Jnegative(q,df_func,P,n,k,xd,yd)
% Calculate C'*q (dimensions PxN, Nx1 => Px1)
a = zeros(prod(size(P)),1); % Px1
for ff=1:k % loop over fits, calculate 3 sums for 3 fit parameters
xp=xd(ff,:); % points for this fit (n)
yp=yd(ff,:);
p=P(ff,:); % params for this fit (m)
qp=q(ff:k:end); % part of input vector q that contributes for this fit (n)
for pp=1:numel(df_func)
df=df_func{pp}; % 1 of m function handles
p_list=num2cell(p);
dfp=df(p_list{:},xp,yp); % n partial derivatives
a(ff+(pp-1)*k)=sum(qp'.*dfp);
end
end
end
end
% append history if state=='iter', useful for checking convergence
function stop = myoutput(x,optimvalues,state)
global history;
stop = false;
if isequal(state,'iter')
history = [history; x];
end
end

Sign in to comment.

More Answers (1)

Joss Knight
Joss Knight on 24 Nov 2018
Edited: Joss Knight on 24 Nov 2018
It does rather depend on what you're doing. The functions polyfit and interp1 work with gpuArray inputs.
  14 Comments
Emiliano Rosso
Emiliano Rosso on 10 Nov 2022
Can we modify Matlab code or it's p-coded?
I mean to follow the routine and eliminate errors , type mismatch , ecc...
Thanks!
Bruno Luong
Bruno Luong on 10 Nov 2022
Edited: Bruno Luong on 10 Nov 2022
Just to chim in the discussion about Matt's request of enable optimization functions on GPU.
I think the feature will be mosy interesting for people working on image (2D-3D) processing: denoising, debluring where the number of parameters are t number of pixels. The cost function are sum of some local operator so very suitable for GPU architecture.
If the transfert data between CPU and GPU is NOT required at each iteration or each gradient computation, it will be benefit for the run time.
There might be an new class of optimization method that is still develipped by researchers, but I think one must think about better exploiting GPU archiecture in TMW optimzation toolbox.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!