# Train Sequence Classification Network Using Custom Training Loop

This example shows how to train a network that classifies sequences with a custom learning rate schedule.

You can train most types of neural networks using the `trainnet`

and `trainingOptions`

functions. If the `trainingOptions`

function does not provide the options you need (for example, a custom learning rate schedule), then you can define your own custom training loop using `dlarray`

and `dlnetwork`

objects for automatic differentiation. For an example showing how to train a convolutional neural network for sequence classification using the `trainnet`

function, see Sequence Classification Using 1-D Convolutions.

Training a network in a custom training loop with sequence data requires some additional processing steps when compared with image or feature data. Most deep learning functions require data passed as numeric arrays with a fixed sequence length. If you have sequence data where observations have varying lengths, then you must pad or truncate the sequences in each mini-batch so that they have the same length.

This example trains a network to classify sequences with the *time-based decay* learning rate schedule: for each iteration, the solver uses the learning rate given by ${\rho}_{\mathit{t}}=\frac{{\rho}_{0}}{1+\mathit{k}\text{\hspace{0.17em}}\mathit{t}}$, where *t* is the iteration number, $${\rho}_{0}$$ is the initial learning rate, and *k* is the decay.

### Load Training Data

Load the Waveform data set from `WaveformData.mat`

. The observations are `numTimeSteps`

-by-`numChannels`

arrays, where `numTimeSteps`

and `numChannels`

are the number of time steps and channels of the sequence, respectively. The sequences have different lengths.

`load WaveformData`

View the sizes of the first few sequences.

data(1:5)

`ans=`*5×1 cell array*
{103×3 double}
{136×3 double}
{140×3 double}
{124×3 double}
{127×3 double}

View the number of channels. To train the network, each sequence must have the same number of channels.

numChannels = size(data{1},2)

numChannels = 3

Visualize the first few sequences in a plot.

figure tiledlayout(2,2) for i = 1:4 nexttile stackedplot(data{i},DisplayLabels="Channel " + (1:numChannels)); title("Observation " + i + newline + "Class: " + string(labels(i))) xlabel("Time Step") end

Determine the number of classes in the training data.

classNames = categories(labels); numClasses = numel(classNames);

Partition the data into training and test partitions. Train the network using the 90% of the data and set aside 10% for testing.

numObservations = numel(data); idxTrain = 1:floor(0.9*numObservations); XTrain = data(idxTrain); TTrain = labels(idxTrain); idxTest = floor(0.9*numObservations)+1:numObservations; XTest = data(idxTest); TTest = labels(idxTest);

### Define Network

Define the network for sequence classification.

For the sequence input, specify a sequence input layer with input size matching the number of channels of the training data.

Specify three convolution-layernorm-ReLU blocks.

Pad the input to the convolution layers such that the output has the same size by setting the

`Padding`

option to`"same"`

.For the first convolution layer specify 20 filters of size 5.

Pool the time steps to a single value using a 1-D global average pooling layer.

For classification, specify a fully connected layer with size matching the number of classes

To map the output to probabilities, include a softmax layer.

When training a network using a custom training loop, do not include an output layer.

layers = [ sequenceInputLayer(numChannels) convolution1dLayer(5,20,Padding="same") layerNormalizationLayer reluLayer convolution1dLayer(5,20,Padding="same") layerNormalizationLayer reluLayer convolution1dLayer(5,20,Padding="same") layerNormalizationLayer reluLayer globalAveragePooling1dLayer fullyConnectedLayer(numClasses) softmaxLayer];

Create a `dlnetwork`

object from the layer array.

net = dlnetwork(layers)

net = dlnetwork with properties: Layers: [13×1 nnet.cnn.layer.Layer] Connections: [12×2 table] Learnables: [14×3 table] State: [0×3 table] InputNames: {'sequenceinput'} OutputNames: {'softmax'} Initialized: 1 View summary with summary.

### Define Model Loss Function

Training a deep neural network is an optimization task. By considering a neural network as a function $$f(X;\theta )$$, where $$X$$ is the network input, and $$\theta $$ is the set of learnable parameters, you can optimize $$\theta $$ so that it minimizes some loss value based on the training data. For example, optimize the learnable parameters $$\theta $$ such that for a given inputs $$X$$ with a corresponding targets $$T$$, they minimize the error between the predictions $$Y=f(X;\theta )$$ and $$T$$.

Create the function `modelLoss`

, listed in the Model Loss Function section of the example, that takes as input the `dlnetwork`

object, a mini-batch of input data with corresponding targets, and returns the loss, the gradients of the loss with respect to the learnable parameters, and the network state.

### Specify Training Options

Train for 60 epochs with a mini-batch size of 128.

numEpochs = 60; miniBatchSize = 128;

Specify the options for Adam optimization. Specify an initial learn rate of 0.005 with a decay of 0.01.

initialLearnRate = 0.005; learnRateDecay = 0.01;

### Train Model

Create a `minibatchqueue`

object that processes and manages mini-batches of data during training.

Mini-batch queue objects require data specified as datastores. Convert the sequences and labels to array datastores and combine them using the `combine`

function. To output sequences as a cell array of numeric arrays, specify an output type of `"same"`

for the sequence data.

```
adsXTrain = arrayDatastore(XTrain,OutputType="same");
adsTTrain = arrayDatastore(TTrain);
cdsTrain = combine(adsXTrain,adsTTrain);
```

Create a `minibatchqueue`

object that processes and manages mini-batches of data during training. For each mini-batch:

Use the custom mini-batch preprocessing function

`preprocessMiniBatch`

(defined at the end of this example) to pad the sequences to have the same length and convert the labels to one-hot encoded variables.Because the data has rows and columns that correspond to time steps and channels, respectively, format the sequence data with the dimension labels

`"TCB"`

(time, channel, batch). By default, the`minibatchqueue`

object converts the data to`dlarray`

objects with underlying type`single`

. Do not format the class labels.Train on a GPU if one is available. By default, the

`minibatchqueue`

object converts each output to a`gpuArray`

if a GPU is available. Using a GPU requires Parallel Computing Toolbox™ and a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).

mbq = minibatchqueue(cdsTrain, ... MiniBatchSize=miniBatchSize, ... MiniBatchFcn=@preprocessMiniBatch, ... MiniBatchFormat=["TCB" ""]);

Initialize the average gradient and average square gradient parameters for the Adam solver.

averageGrad = []; averageSqGrad = [];

Calculate the total number of iterations for the training progress monitor.

numObservationsTrain = size(XTrain,1); numIterationsPerEpoch = ceil(numObservationsTrain / miniBatchSize); numIterations = numEpochs * numIterationsPerEpoch;

Initialize the `TrainingProgressMonitor`

object. Because the timer starts when you create the monitor object, make sure that you create the object close to the training loop.

monitor = trainingProgressMonitor( ... Metrics="Loss", ... Info=["Epoch" "LearnRate"], ... XLabel="Iteration");

Train the network using a custom training loop. For each epoch, shuffle the data and loop over mini-batches of data. For each mini-batch:

Evaluate the model loss and gradients using the

`dlfeval`

and`modelLoss`

functions.Determine the learning rate for the time-based decay learning rate schedule.

Update the network parameters using the

`adamupdate`

function.Update the loss, learn rate, and epoch values in the training progress monitor.

Stop if the Stop property is true. The

`Stop`

property value of the`TrainingProgressMonitor`

object changes to true when you click the**Stop**button.

epoch = 0; iteration = 0; % Loop over epochs. while epoch < numEpochs && ~monitor.Stop epoch = epoch + 1; % Shuffle data. shuffle(mbq); % Loop over mini-batches. while hasdata(mbq) && ~monitor.Stop iteration = iteration + 1; % Read mini-batch of data. [X,T] = next(mbq); % Evaluate the model gradients and loss using dlfeval and the % modelLoss function. [loss,gradients] = dlfeval(@modelLoss,net,X,T); % Determine learning rate for time-based decay learning rate schedule. learnRate = initialLearnRate/(1 + learnRateDecay*iteration); % Update the network parameters using the Adam optimizer. [net,averageGrad,averageSqGrad] = adamupdate(net,gradients, ... averageGrad,averageSqGrad,iteration,learnRate); % Update the training progress monitor. recordMetrics(monitor,iteration,Loss=loss); updateInfo(monitor,Epoch=epoch,LearnRate=learnRate); monitor.Progress = 100 * iteration/numIterations; end end

### Test Model

Test the classification accuracy of the model by comparing the predictions on the test set with the targets.

Make predictions using the `minibatchpredict`

function and convert the scores to labels using the `scores2label`

function. By default, the `minibatchpredict`

function uses a GPU if one is available. Otherwise, the function uses the CPU. To specify the execution environment, use the `ExecutionEnvironment`

option.

scores = minibatchpredict(net,XTest); YTest = scores2label(scores,classNames);

Evaluate the classification accuracy.

accuracy = mean(TTest == YTest)

accuracy = 0.8200

Visualize the predictions in a confusion chart.

figure confusionchart(TTest,YTest)

Large values on the diagonal indicate accurate predictions for the corresponding class. Large values on the off-diagonal indicate strong confusion between the corresponding classes.

### Supporting Functions

#### Model Loss Function

The `modelLoss`

function takes a `dlnetwork`

object `net`

, a mini-batch of input data `X`

with corresponding targets `T`

and returns the loss and the gradients of the loss with respect to the learnable parameters in `net`

. To compute the gradients automatically, use the `dlgradient`

function.

function [loss,gradients] = modelLoss(net,X,T) % Forward data through network. Y = forward(net,X); % Calculate cross-entropy loss. loss = crossentropy(Y,T); % Calculate gradients of loss with respect to learnable parameters. gradients = dlgradient(loss,net.Learnables); end

#### Mini Batch Preprocessing Function

The `preprocessMiniBatch`

function preprocesses a mini-batch of predictors and labels using the following steps:

Pad the sequence data in the input cell array over the first dimension (the time dimension) using the

`padsequences`

function. The function returns the data as a`numTimeSteps`

-by-`numChannels`

-by-`numObservations`

array. To pass this information to downstream functions, specify that this data has a format of`"TCB"`

(time, channel, batch).Extract the label data from the incoming cell array and concatenate into a categorical array along the second dimension.

One-hot encode the categorical labels into numeric arrays. Encoding into the first dimension produces an encoded array that matches the shape of the network output.

function [X,T] = preprocessMiniBatch(dataX,dataT) % Pad sequences. X = padsequences(dataX,1); % Extract label data from cell and concatenate. T = cat(2,dataT{1:end}); % One-hot encode labels. T = onehotencode(T,1); end

## See Also

`trainingProgressMonitor`

| `dlarray`

| `dlgradient`

| `dlfeval`

| `dlnetwork`

| `forward`

| `adamupdate`

| `predict`

| `minibatchqueue`

| `onehotencode`

| `onehotdecode`

## Related Topics

- Train Generative Adversarial Network (GAN)
- Define Model Loss Function for Custom Training Loop
- Update Batch Normalization Statistics in Custom Training Loop
- Define Custom Training Loops, Loss Functions, and Networks
- Specify Training Options in Custom Training Loop
- Monitor Custom Training Loop Progress
- List of Deep Learning Layers
- List of Functions with dlarray Support