Generate bfloat16
Code for Deep Learning Networks
Deep learning networks use single-precision floating-point datatype to store information,
such as input, weights, activations, etc. Each element stored in single-precision format takes
32 bits in computer memory. The memory footprint required to store a deep learning network is
very large. The Brain Floating Point Format (bfloat16
) is a truncated
version of the single-precision floating-point format. It only occupies 16 bits in computer
memory. bfloat16
preserves approximately the same number range as
single-precision floating-point by retaining same number of exponent bits (8 bits).
bfloat16
has reduced accuracy in the fraction because it has only 7
fraction bits leading to accuracy loss.
For Deep Learning models that are resilient to precision loss, compressing learnables from
single-precision to bfloat16
greatly reduces memory usage with little
change in accuracy. The process does not need data or preprocessing step and it also improves
inference speed. This enables deployment of large deep learning networks to devices that have
low computational power and less memory resources. Hardware that supports single-precision
floating-point datatype now can use bfloat16
, with
no requirement of bfloat16
support from
the processor. For example, bfloat16
learnables compression could be used
on any ARM-M, ARM-A, Intel processors.
Learnable compression in bfloat16
format is only supported for
generating generic C/C++ code (that does not depend on third-party libraries).
Supported Layers and Classes
You can perform learnables compression in bfloat16
format and
generate generic C/C++ code for these layers:
Bidirectional LSTM layer (
bilstmLayer
(Deep Learning Toolbox))Fully connected layer (
fullyConnectedLayer
(Deep Learning Toolbox))Channel-wise convolution layer (
groupedConvolution2dLayer
(Deep Learning Toolbox))Gated recurrent unit (GRU) layer (
gruLayer
(Deep Learning Toolbox))Gated recurrent unit (GRU) projected layer (
gruProjectedLayer
(Deep Learning Toolbox))Long short-term memory (LSTM) layer (
lstmLayer
(Deep Learning Toolbox))LSTM projected layer (
lstmProjectedLayer
(Deep Learning Toolbox))
Generate Code
Generate code with learnables compression in bfloat16
format by
setting the LearnablesCompression
property of your coder.DeepLearningCodeConfig
object dlcfg
,
dlcfg = coder.DeepLearningConfig(TargetLibrary = 'none'); dlcfg.LearnablesCompression = 'bfloat16';
Alternatively, in the MATLAB®
Coder™ app or the Configuration Parameters dialog box, on the Deep
Learning tab, set Target library to
none
. Then set the Learnables Compression
property to bfloat16
.
Usage Notes and Limitations
convolution2dLayer
(Deep Learning Toolbox)
and channel-wise groupedConvolution2dLayer
(Deep Learning Toolbox) do not support bfloat16
compression. Learnables are still stored in the single-precision data format and do not
contribute to memory compression. However, when bfloat16
compression is
enabled, the least significant 16 bits of the learnables of these layers are set to zero.
This behavior causes the results of the inference computation involving these learnables to
mimic bfloat16
precision.
See Also
Functions
Objects
Related Topics
- Specify Configuration Parameters in Command-Line Workflow Interactively
- Code Generation for Sequence-to-Sequence Classification with Learnables Compression
- Optimize C/C++ Code Performance for Deep Learning Applications without Deep Learning Libraries
References
[1] Google Cloud Blog. “BFloat16: The Secret to High Performance on Cloud TPUs.” Accessed January 26, 2023. https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.