- If Y, X1 and X2 have compatible sizes you can concatenate them before customLossLayerMultiInput and pass these in as a single input to the loss.
- Use dlnetwork and a custom training loop - in this case you can write a much more flexible loss function rather than a custom loss layer, however you need to write a training loop following an example like this.
How to create Custom Regression Output Layer with multiple inputs for training sequence-to-sequence LSTM model?
    9 views (last 30 days)
  
       Show older comments
    
    Shubham Baisthakur
 on 16 Jun 2023
  
    
    
    
    
    Commented: Shubham Baisthakur
 on 23 Jun 2023
            For the neural network architecture I am using for my problem, I would like to define a Regression Output Layer with a custom loss function. For this, I would need the regression layer to have two inputs, one from a fully connected layer and other from the sequenceInput layer, however I am not able to achieve that. How do I get around this?
Following is the definition of the custom layer:
classdef customLossLayerMultiInput < nnet.layer.RegressionLayer & nnet.layer.Acceleratable
    % Custom regression layer with mean-absolute-error loss and additional properties.
    properties
        node_properties
        numFeature
    end
    methods
        function layer = customLossLayerMultiInput(name, node_properties, numFeature)
            % Constructor
            layer.Name = name;
            layer.Description = 'Physics-Informed loss function for LSTM training';
            layer.node_properties = node_properties;
            layer.numFeature = numFeature;
        end
        function loss = forwardLoss(layer, Y, T, varargin)
            % Calculate the forward loss
            % Reshape predictions and targets
            Y = reshape(Y, [], 1);
            T = reshape(T, [], 1);
            X1 = varargin{1};
            X2 = varargin{2}; 
            % Sequence input data
            sequence_input_data = reshape(X1, [], layer.numFeature);
            % Calculate mean residue
            mean_residue = PI_BEM_Residue(Y, T, sequence_input_data, layer.node_properties);
            % Calculate RMSE loss
            rmse_loss = rmse(Y, T);
            % Total loss
            loss = mean_residue + rmse_loss;
        end
    end
end
And this is the network architecture
layers = [
    sequenceInputLayer(numFeatures, 'Name', 'inputLayer') % Define the sequence input layer and name it
    lstmLayer(num_hidden_units, 'OutputMode', 'sequence', 'Name', 'lstmLayer') % Define the LSTM layer and name it
    fullyConnectedLayer(1, 'Name', 'fullyConnectedLayer') % Define the fully connected layer and name it
    dropoutLayer(x.dropout_rate, 'Name', 'dropoutLayer') % Define the dropout layer and name it
    customLossLayerMultiInput(LayerName, node_properties,numFeatures)
    ];
% Create a layer graph
lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph,"inputLayer",strcat(LayerName,'\in2'));
For this setup, I am getting an error
Error using nnet.cnn.LayerGraph>iValidateLayerName
Layer 'RegressionLayer_Node2\in2' does not exist.
0 Comments
Accepted Answer
  Ben
    
 on 20 Jun 2023
        Unfortunately it's not possible to define a custom multi-input loss layer.
The possible options are:
7 Comments
  Ben
    
 on 23 Jun 2023
				It's true that for dlgradient(loss,learnables) to work that loss must be computed using only dlarray methods on learnables, otherwise we can't compute the derivatives automatically. 
Sice Residue depends on Y which is the output of forward(net,X) you can't use a MEX function on Y and still get a traced output that we can compute gradients of. The extractdata calls break the tracing that is used to compute automatic derivatives.
At a glance it looks like CallStateResidual_ANN_mex is not vectorized over the batch size, so that's the first thing I'd suggest, though it's hard to know if that's plausible since I can't see the implementation of that function.
After that note that dlaccelerate can help optimize modelLoss if it's being called multiple times, so long as the only inputs to modelLoss that vary frequently are dlarray-s.
It's worth noting that the computation in modelLoss would have to happen for trainNetwork too if this was allowed as a custom loss layer, so it's not really avoidable. I would expect dlaccelerate to make up for most of the difference between speed of trainNetwork and custom training loops. As for convergence, this should be the same if your custom training loop implements all the same things as your trainingOptions does in trainNetwork - note that trainingOptions includes non-zero L2Regularization by default.
More Answers (0)
See Also
Categories
				Find more on Custom Training Using Automatic Differentiation in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

