Not able to calculate gradient of loss function in a neural network program

Hi,
I am trying to solve a phisics-informed neural network problem in which I constructed a loss function as follows
function [loss,gradients] = loss_fun(parameters,x,C,alpha)
% C is a complex-valued constant
% alpha is a real-valued constant
NN = model(parameters,x); % Feedforward neural network
f = C*NN; % Intermediate function
g = fxx+alpha*f; % Objective function
gr = real(g); % Real-part of g
gi = imag(g); % Imaginary-part of g
zeroTarget_r = zeros(size(gr),"like",gr); % Zero targets for the real-part
loss_r = l2loss(gr, zeroTarget_r); % Real-part loss function
zeroTarget_i = zeros(size(gi),"like",gi); % Zero targets for the imaginary-part
loss_i = l2loss(gi, zeroTarget_i); % Imaginary-part loss function
loss = loss_r+loss_i; % Total loss function (real-valued)
gradients = dlgradient(loss,parameters); % Loss function gradients with respect to parameters
end
The function 'model' returns a feedforward neural network . I would like the minimize the function g with respect to the parameters (θ). The input variable x as well as the parameters θ of the neural network are real-valued. Here, which is a double derivative of f with respect to x, is calculated as . The presence of complex-valued constant C makes the objective function g a complex-valued. Hence, I split it into real and imaginary parts, calculated individual loss functions and added them.
While calculating the gradients I am encountering the following error
"Encountered complex value when computing gradient with respect to an input to fullyconnect. Convert all inputs to fullyconnect to real".
I checked indivial loss values and the parameter values. They are purely real.
I would be grateful to you if you could tell possible reasons for the error and resolution steps.
I am using fmincon with lbfgs hessian approximation for the optimization.

2 Comments

I posted an answer regarding the complex value issue, but as an aside, you might be interested in the lbfgsupdate function which was recently added to Deep Learning Toolbox in R2023a.
Thank you, Richard. Addition of lbfgsupdate function is a great help for the researchers working on physics-informed neural networks.

Sign in to comment.

 Accepted Answer

I think this may be due to your introduction of the complex value into the output of the model, NN. Even though you are later splitting this into two real halves, the gradient backwards computation will be stepping back through this (complex) C*(real) NN operation which reintroduces a complex gradient during the backwards.
Try calling NN = real(NN) before this step to insulate the real-valued model from the complex part of the calculation:
NN = model(parameters,x); % Feedforward neural network
NN = real(NN);
f = C*NN; % Intermediate function
It may seem counter-intuitive to apply this before the complex values are created, and indeed in the forwards computation this will have no effect because NN is already real. But in the backwards pass for gradients the computation flows in the other direction through the code, and the backwards for real(NN) will be after the backwards for C*NN. It will discard the imaginary parts of the gradient, which at this point have no meaning because there is no imaginary part of the NN value.

4 Comments

Thanks for your response.
I tried to implement your suggestion as it seems logically correct as per your argument. However, it does not bypass the error.
I am attaching code in live script (A modification to the existing code for Burger's equation to the Helmholtz equation) using the latest lbfgsupdate in R2023a.
There are two loss functions: modelLoss1 and modelLoss2. modelLoss1 is without complex number, and modelLoss2 is with complex number.
I hope this will helpful for the verification at your end.
Thanks for the code, I do reproduce the same issue. The reason that one call to real(U) did not fix it is that the dlgradient calls for Ux and Uxx are creating additional backward passes through U after your real(U) call. This means that they are before the real(U) when the final backwards pass is performed in the last dlgradient call.
The solution is to move/add more real(...) "assertions" so that they cover the Uxx output as well. You can either do this by adding a Uxx=real(Uxx) call after the dlgradient line that creates Uxx, or you can perform the real() call after U and Uxx are combined together, i.e. on the f variable, right before it is multiplied by the complex constant:
function [loss,gradients] = modelLoss2(net,X,X0,U0,k)
C = 2+3j;
% Make predictions with the initial conditions.
U = forward(net,X);
% Calculate derivatives with respect to X.
Ux = dlgradient(sum(U,"all"),X,EnableHigherDerivatives=true);
% Calculate second-order derivatives with respect to X.
Uxx = dlgradient(sum(Ux,"all"),X,EnableHigherDerivatives=true);
% Calculate mseF. Enforce Helmholtz equation.
f = Uxx + k^2*U;
% Enforce that f is real during forwards and backwards calculations
f = real(f);
f = f*C;
f_r = real(f);
f_i = imag(f);
zeroTarget_r = zeros(size(f_r),"like",f_r);
loss_r = l2loss(f_r,zeroTarget_r);
zeroTarget_i = zeros(size(f_i),"like",f_i);
loss_i = l2loss(f_i,zeroTarget_i);
U0Pred = forward(net,X0);
loss_b = l2loss(U0Pred,U0);
loss = loss_r + loss_i + loss_b;
% Calculate gradients with respect to the learnable parameters.
gradients = dlgradient(loss,net.Learnables);
end
This answer is really appreciated. Thanks so much! It helped me during a late night debugging session!!!

Sign in to comment.

More Answers (1)

Hi,
The error message suggests that there is a complex value in the input to the fully connected layer of your neural network model. This could be due to the fact that the output of the intermediate function "f" includes a complex constant "C" multiplied by the neural network output "NN". If "C" is complex, then "f" will be complex-valued as well, and the subsequent computations involving "f" may introduce complex values.
To resolve this error and perform backpropagation through your neural network, you need to ensure that all inputs to the neural network are real-valued. One way to do this would be to separate the real and imaginary parts of the complex input to the fully connected layer, and pass them separately as inputs. You can do this by using the "real" and "imag" functions to extract the real and imaginary parts of "f" separately:
NN = model(parameters,x); % Feedforward neural network
f = C*NN; % Intermediate function
f_real = real(f); % Real part of f
f_imag = imag(f); % Imaginary part of f
fc_in = [f_real; f_imag]; % Concatenate f_real and f_imag
fc_out = fullyconnect(fc_in, weights_fc, bias_fc); % Fully connected layer output
Here, the "fc_in" matrix is formed by concatenating the real and imaginary parts of "f", and then passed to the fully connected layer.
Refer to the following MathWorks documentation for more information:

3 Comments

Thank you, Karthik.
Are you implying that my neural network should give outputs in the form for a given N inputs? First Nrows for the real-part and remaining Nrows for the imaginary-part?
Yes, that can be a possible work around, if we seperate the real and imaginary parts and perform all the other calculations like loss calculation and gradient descent on them seperately.
Okay. Splitting real and imaginary parts results in several loss functions which are to be optimized simultaneously. It may pose a problem during training. But, I will give a try and get back to you.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!