Sound Classifier

Classify sounds in audio signal

Since R2021b

Libraries:
Audio Toolbox / Deep Learning

Description

The Sound Classifier block uses YAMNet to classify audio segments into sound classes described by the AudioSet ontology. The Sound Classifier block combines necessary audio preprocessing and YAMNet network inference. The block returns predicted sound labels, predicted scores from the sounds, and class labels for predicted scores.

Examples

Detect Music in Simulink Using YAMNet

Detect music using the Sound Classifier block in Simulink^®.

Open Script

Compare Sound Classifier Block with Equivalent YAMNet Blocks

Show that Sound Classifier block is equivalent to the cascade of YAMNet Preprocess block and YAMNet block.

Open Script

Ports

Input

expand all

audioIn — Sound data
column vector

Sound data to classify, specified as a one-channel signal (column vector). If Sample rate of input signal (Hz) is 16e3, there are no restrictions on the input frame length. If Sample rate of input signal (Hz) is different from 16e3, then the input frame length must be a multiple of the decimation factor of the resampling operation that the block performs. If the input frame length does not satisfy this condition, the block throws an error message with information on the decimation factor.

Data Types: single | double

Output

expand all

sound — Predicted sound label
enumerated scalar

Predicted sound label, returned as an enumerated scalar.

Data Types: enumerated

scores — Predicted activations or scores
vector

Predicted activation or score values for each supported sound label, returned as a 1-by-521 vector, where 521 is the number of classes in YAMNet.

Data Types: single

labels — Class labels for predicted scores
vector

Class labels for predicted scores, returned as a 1-by-521 vector.

Data Types: enumerated

Parameters

expand all

Sample rate of input signal (Hz) — Sample rate of input signal in Hz
`16e3` (default) | positive scalar

Specify the sample rate of the input signal as a positive scalar in Hz. If the sample rate is different from 16e3, then the block resamples the signal to 16e3, which is the sample rate that YAMNet supports.

Data Types: single | double

Overlap percentage (%) — Overlap percentage between consecutive mel spectrograms
`50` (default) | [0 100)

Specify the overlap percentage between consecutive mel spectrograms as a scalar in the range [0 100).

Data Types: single | double

Classification — Select to output sound classification
`on` (default) | `off`

Enable the output port sound, which outputs the classified sound.

Predictions — Output all scores and associated labels
`off` (default) | `on`

Enable the output ports scores and labels, which output all predicted scores and associated class labels.

Block Characteristics

Data Types	`double` \| `single`
Direct Feedthrough	`no`
Multidimensional Signals	`no`
Variable-Size Signals	`no`
Zero-Crossing Detection	`no`

Algorithms

expand all

The Sound Classifier block algorithm consists of two steps:

Preprocessing –– YAMNet specific preprocessing. Generates mel spectrograms.
Prediction –– Predicting the sounds, scores, and labels of the input signal using the YAMNet sound classification network.

Preprocessing

Cast audioIn to single and resample to 16 kHz.
Compute the one-sided short-time Fourier transform (STFT) using a 25 ms periodic Hann window (400 samples) with a 10 ms hop (160 samples) and a 512-point DFT.
Convert the complex spectral values to magnitude and discard phase information.
Pass the one-sided magnitude STFTs through a 64-band mel-spaced filter bank. Doing so converts the 257-length STFT vectors to 64-length vectors in the mel scale.
Convert the 64-length vectors to a log scale.
Buffer the vectors into outputs of size 96-by-64, where 96 is the number of 10 ms frames in each mel spectrogram and 64 is the number of mel bands. The overlap between consecutive 96-by-64 mel spectrograms is determined by the value of the Overlap percentage (%) parameter.

Prediction

These 96-by-64 spectrograms are passed to the YAMNet block. The YAMNet block has a maximum of three outputs:

sound: The label of the most likely sound. You get one "sound" for each 96-by-64 spectrogram input.
scores: 1-by-512 vectors, with a score value for each supported sound label.
labels: 1-by-521 vectors containing the sound labels.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Usage notes and limitations:

To generate generic C code that does not depend on third-party libraries, in the Configuration Parameters > Code Generation general category, set the Language parameter to C.
To generate C++ code, in the Configuration Parameters > Code Generation general category, set the Language parameter to C++. To specify the target library for code generation, in the Code Generation > Interface category, set the Target Library parameter. Setting this parameter to None generates generic C++ code that does not depend on third-party libraries.
For a list of networks and layers supported for code generation, see Networks and Layers Supported for Code Generation (MATLAB Coder).

Version History

Introduced in R2021b

Sound Classifier

Description

Examples

Detect Music in Simulink Using YAMNet

Compare Sound Classifier Block with Equivalent YAMNet Blocks

Ports

Input

audioIn — Sound data
column vector

Output

sound — Predicted sound label
enumerated scalar

scores — Predicted activations or scores
vector

labels — Class labels for predicted scores
vector

Parameters

Sample rate of input signal (Hz) — Sample rate of input signal in Hz
`16e3` (default) | positive scalar

Overlap percentage (%) — Overlap percentage between consecutive mel spectrograms
`50` (default) | [0 100)

Classification — Select to output sound classification
`on` (default) | `off`

Predictions — Output all scores and associated labels
`off` (default) | `on`

Block Characteristics

Algorithms

Preprocessing

Prediction

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Apps

Blocks

Functions

Sound Classifier

Description

Examples

Detect Music in Simulink Using YAMNet

Compare Sound Classifier Block with Equivalent YAMNet Blocks

Ports

Input

audioIn — Sound data column vector

Output

sound — Predicted sound label enumerated scalar

scores — Predicted activations or scores vector

labels — Class labels for predicted scores vector

Parameters

Sample rate of input signal (Hz) — Sample rate of input signal in Hz 16e3 (default) | positive scalar

Overlap percentage (%) — Overlap percentage between consecutive mel spectrograms 50 (default) | [0 100)

Classification — Select to output sound classification on (default) | off

Predictions — Output all scores and associated labels off (default) | on

Block Characteristics

Algorithms

Preprocessing

Prediction

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Apps

Blocks

Functions

audioIn — Sound data
column vector

sound — Predicted sound label
enumerated scalar

scores — Predicted activations or scores
vector

labels — Class labels for predicted scores
vector

Sample rate of input signal (Hz) — Sample rate of input signal in Hz
`16e3` (default) | positive scalar

Overlap percentage (%) — Overlap percentage between consecutive mel spectrograms
`50` (default) | [0 100)

Classification — Select to output sound classification
`on` (default) | `off`

Predictions — Output all scores and associated labels
`off` (default) | `on`

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.