# Solve Partial Differential Equation with L-BFGS Method and Deep Learning

This example shows how to train a physics informed neural network (PINN) to numerically compute the solution of the Burger's equation by using the limited-memory BFGS (L-BFGS) algorithm.

The Burger's equation is a partial differential equation (PDE) that arises in different areas of applied mathematics. In particular, fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flows. The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm.

Given the computational domain$$[-1,1]\times [0,1]$$, this example uses a physics informed neural network (PINN) [2] and trains a multilayer perceptron neural network that takes samples $$(x,t)$$ as input, where $$x\in [-1,1]$$ is the spatial variable, and $$t\in [0,1]$$ is the time variable, and returns $$u(x,t)$$, where u is the solution of the Burger's equation:

$$\frac{\partial u}{\partial t}+u\frac{\partial u}{\partial x}-\frac{0.01}{\pi}\frac{{\partial}^{2}u}{\partial {x}^{2}}=0,$$

with $$u(x,t=0)=-sin(\pi x)$$as the initial condition, and $$u(x=-1,t)=0$$ and $$u(x=1,t)=0$$ as the boundary conditions.

The example trains the model by enforcing that given an input $$(x,t)$$, the output of the network $$u(x,t)$$ fulfills the Burger's equation, the boundary conditions, and the initial condition. Training this model does not require collecting data in advance. You can generate data using the definition of the PDE and the constraints.

### Generate Training Data

Training the model requires a data set of collocation points that enforce the boundary conditions, enforce the initial conditions, and fulfill the Burger's equation.

Select 25 equally spaced time points to enforce each of the boundary conditions $$u(x=-1,t)=0$$ and $$u(x=1,t)=0$$.

numBoundaryConditionPoints = [25 25]; x0BC1 = -1*ones(1,numBoundaryConditionPoints(1)); x0BC2 = ones(1,numBoundaryConditionPoints(2)); t0BC1 = linspace(0,1,numBoundaryConditionPoints(1)); t0BC2 = linspace(0,1,numBoundaryConditionPoints(2)); u0BC1 = zeros(1,numBoundaryConditionPoints(1)); u0BC2 = zeros(1,numBoundaryConditionPoints(2));

Select 50 equally spaced spatial points to enforce the initial condition $$u(x,t=0)=-sin(\pi x)$$.

numInitialConditionPoints = 50; x0IC = linspace(-1,1,numInitialConditionPoints); t0IC = zeros(1,numInitialConditionPoints); u0IC = -sin(pi*x0IC);

Group together the data for initial and boundary conditions.

X0 = [x0IC x0BC1 x0BC2]; T0 = [t0IC t0BC1 t0BC2]; U0 = [u0IC u0BC1 u0BC2];

Select 10,000 points to enforce the output of the network to fulfill the Burger's equation.

numInternalCollocationPoints = 10000; points = rand(numInternalCollocationPoints,2); dataX = 2*points(:,1)-1; dataT = points(:,2);

### Define Neural Network Architecture

Define a multilayer perceptron neural network architecture with 9 fully connect operations with 20 hidden neurons each. The first fully connect operation has two input channels corresponding to the inputs $$x$$ and $$t$$. The last fully connect operation has one output $$u(x,t)$$.

Specify the network hyperparameters.

numLayers = 9; numNeurons = 20;

Create the neural network.

layers = featureInputLayer(2); for i = 1:numLayers-1 layers = [ layers fullyConnectedLayer(numNeurons) tanhLayer]; end layers = [ layers fullyConnectedLayer(1)]

layers = 18×1 Layer array with layers: 1 '' Feature Input 2 features 2 '' Fully Connected 20 fully connected layer 3 '' Tanh Hyperbolic tangent 4 '' Fully Connected 20 fully connected layer 5 '' Tanh Hyperbolic tangent 6 '' Fully Connected 20 fully connected layer 7 '' Tanh Hyperbolic tangent 8 '' Fully Connected 20 fully connected layer 9 '' Tanh Hyperbolic tangent 10 '' Fully Connected 20 fully connected layer 11 '' Tanh Hyperbolic tangent 12 '' Fully Connected 20 fully connected layer 13 '' Tanh Hyperbolic tangent 14 '' Fully Connected 20 fully connected layer 15 '' Tanh Hyperbolic tangent 16 '' Fully Connected 20 fully connected layer 17 '' Tanh Hyperbolic tangent 18 '' Fully Connected 1 fully connected layer

Convert the layer array to a `dlnetwork`

object.

net = dlnetwork(layers)

net = dlnetwork with properties: Layers: [18×1 nnet.cnn.layer.Layer] Connections: [17×2 table] Learnables: [18×3 table] State: [0×3 table] InputNames: {'input'} OutputNames: {'fc_9'} Initialized: 1 View summary with summary.

Training a PINN can result in better accuracy when the learnable parameters have data type double. Convert the network learnables to double using the `dlupdate`

function. Note that not all neural networks support learnables of type double, for example, networks that use GPU optimizations that rely on learnables with type single.

net = dlupdate(@double,net);

### Define Model Loss Function

Create the function `modelLoss`

, listed in the Model Loss Function section at the end of the example, that takes as input the neural network, the network inputs, and the initial and boundary conditions, and returns the gradients of the loss with respect to the learnable parameters and the corresponding loss.

### Specify Training Options

Specify the training options:

Train for 1500 iterations

Stop training early when the norm of the gradients or steps are smaller than 0.00001.

Use the default options for the L-BFGS solver state.

maxIterations = 1500; gradientTolerance = 1e-5; stepTolerance = 1e-5; solverState = lbfgsState;

### Train Neural Network

Convert the training data to `dlarray`

objects. Specify that the inputs `X`

and `T`

have format `"BC"`

(batch, channel) and that the initial conditions have format `"CB"`

(channel, batch).

X = dlarray(dataX,"BC"); T = dlarray(dataT,"BC"); X0 = dlarray(X0,"CB"); T0 = dlarray(T0,"CB"); U0 = dlarray(U0,"CB");

Accelerate the loss function using `dlaccelerate`

.

accfun = dlaccelerate(@modelLoss);

Create a function handle containing the loss function for the L-BFGS update. In order to evaluate the `dlgradient`

function inside the `modelLoss`

function using automatic differentiation, use the `dlfeval`

function.

lossFcn = @(net) dlfeval(accfun,net,X,T,X0,T0,U0);

Initialize the `TrainingProgressMonitor`

object. At each iteration, plot the loss and monitor the norm of the gradients and steps. Because the timer starts when you create the monitor object, make sure that you create the object close to the training loop.

monitor = trainingProgressMonitor( ... Metrics="TrainingLoss", ... Info=["Iteration" "GradientsNorm" "StepNorm"], ... XLabel="Iteration");

Train the network using a custom training loop using the full data set at each iteration. For each iteration:

Update the network learnable parameters and solver state using the

`lbfgsupdate`

function.Update the training progress monitor.

Stop training early if the norm of the gradients are less the specified tolerances or if the line search fails.

iteration = 0; while iteration < maxIterations && ~monitor.Stop iteration = iteration + 1; [net, solverState] = lbfgsupdate(net,lossFcn,solverState); updateInfo(monitor, ... Iteration=iteration, ... GradientsNorm=solverState.GradientsNorm, ... StepNorm=solverState.StepNorm); recordMetrics(monitor,iteration,TrainingLoss=solverState.Loss); monitor.Progress = 100 * iteration/maxIterations; if solverState.GradientsNorm < gradientTolerance || ... solverState.StepNorm < stepTolerance || ... solverState.LineSearchStatus == "failed" break end end

### Evaluate Model Accuracy

For values of $$t$$ at 0.25, 0.5, 0.75, and 1, compare the predicted values of the deep learning model with the true solutions of the Burger's equation using the relative $${l}^{2}$$ error.

Set the target times to test the model at. For each time, calculate the solution at 1001 equally spaced points in the range [-1,1].

```
tTest = [0.25 0.5 0.75 1];
numPredictions = 1001;
XTest = linspace(-1,1,numPredictions);
XTest = dlarray(XTest,"CB");
```

Test the model.

figure tiledlayout("flow") for i=1:numel(tTest) t = tTest(i); TTest = t*ones(1,numPredictions); TTest = dlarray(TTest,"CB"); % Make predictions. XTTest = cat(1,XTest,TTest); UPred = forward(net,XTTest); % Calculate target. UTest = solveBurgers(extractdata(XTest),t,0.01/pi); % Calculate error. UPred = extractdata(UPred); err = norm(UPred - UTest) / norm(UTest); % Plot prediction. nexttile plot(XTest,UPred,"-",LineWidth=2); ylim([-1.1, 1.1]) % Plot target. hold on plot(XTest, UTest,"--",LineWidth=2) hold off title("t = " + t + ", Error = " + gather(err)); end legend(["Prediction" "Target"])

### Supporting Functions

#### Solve Burger's Equation Function

The `solveBurgers`

function returns the true solution of Burger's equation at times `t`

as outlined in [3].

function U = solveBurgers(X,t,nu) % Define functions. f = @(y) exp(-cos(pi*y)/(2*pi*nu)); g = @(y) exp(-(y.^2)/(4*nu*t)); % Initialize solutions. U = zeros(size(X)); % Loop over x values. for i = 1:numel(X) x = X(i); % Calculate the solutions using the integral function. The boundary % conditions in x = -1 and x = 1 are known, so leave 0 as they are % given by initialization of U. if abs(x) ~= 1 fun = @(eta) sin(pi*(x-eta)) .* f(x-eta) .* g(eta); uxt = -integral(fun,-inf,inf); fun = @(eta) f(x-eta) .* g(eta); U(i) = uxt / integral(fun,-inf,inf); end end end

#### Model Loss Function

The model is trained by enforcing that given an input $$(x,t)$$ the output of the network $$u(x,t)$$ fulfills the Burger's equation, the boundary conditions, and the initial condition. In particular, two quantities contribute to the loss to be minimized:

$$\text{loss}={\text{MSE}}_{f}+{\text{MSE}}_{u}$$,

where $${\text{MSE}}_{f}=\frac{1}{{N}_{f}}\sum _{i=1}^{{N}_{f}}{|f({x}_{f}^{i},{t}_{f}^{i})|}^{2}$$ and $${\text{MSE}}_{u}=\frac{1}{{N}_{u}}\sum _{i=1}^{{N}_{u}}{|u({x}_{u}^{i},{t}_{u}^{i})-{u}^{i}|}^{2}$$.

Here, $$\{{x}_{u}^{i},{t}_{u}^{i}{\}}_{i=1}^{{N}_{u}}$$ correspond to collocation points on the boundary of the computational domain and account for both boundary and initial condition. $$\{{x}_{f}^{i},{t}_{f}^{i}{\}}_{i=1}^{{N}_{f}}$$ are points in the interior of the domain.

Calculating $${\text{MSE}}_{f}$$ requires the derivatives $$\frac{\partial u}{\partial t},\frac{\partial u}{\partial x},\frac{{\partial}^{2}u}{\partial {x}^{2}}$$ of the output $$u$$ of the model.

The function `modelLoss`

takes as input, the network `net`

, the network inputs `X`

and `T`

, the initial and boundary conditions `X0`

, `T0`

, and `U0`

, and returns the loss and the gradients of the loss with respect to the learnable parameters.

function [loss,gradients] = modelLoss(net,X,T,X0,T0,U0) % Make predictions with the initial conditions. XT = cat(1,X,T); U = forward(net,XT); % Calculate derivatives with respect to X and T. gradientsU = dlgradient(sum(U,"all"),{X,T},EnableHigherDerivatives=true); Ux = gradientsU{1}; Ut = gradientsU{2}; % Calculate second-order derivatives with respect to X. Uxx = dlgradient(sum(Ux,"all"),X,EnableHigherDerivatives=true); % Calculate mseF. Enforce Burger's equation. f = Ut + U.*Ux - (0.01./pi).*Uxx; zeroTarget = zeros(size(f),"like",f); mseF = l2loss(f,zeroTarget); % Calculate mseU. Enforce initial and boundary conditions. XT0 = cat(1,X0,T0); U0Pred = forward(net,XT0); mseU = l2loss(U0Pred,U0); % Calculated loss to be minimized by combining errors. loss = mseF + mseU; % Calculate gradients with respect to the learnable parameters. gradients = dlgradient(loss,net.Learnables); end

### References

Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization."

*Mathematical programming*45, no. 1 (1989): 503-528.Maziar Raissi, Paris Perdikaris, and George Em Karniadakis, Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations https://arxiv.org/abs/1711.10561

C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani, R. Peyret, P. Orlandi, A. Patera, Spectral and finite difference solutions of the Burgers equation, Computers & fluids 14 (1986) 23–41.

## See Also

`dlarray`

| `dlfeval`

| `dlgradient`