Main Content

YAMNet Preprocess

Preprocess audio for YAMNet classification

  • Library:
  • Audio Toolbox / Deep Learning

  • YAMNet Preprocess block

Description

The YAMNet Preprocess block generates mel spectrograms from audio input that can be fed to the YAMNet pretrained network or to a network that accepts the same inputs as YAMNet.

Ports

Input

expand all

Sound data to classify, specified as a one-channel signal (column vector). If Sample rate of input signal (Hz) is 16e3, there are no restrictions on the input frame length. If Sample rate of input signal (Hz) is different from 16e3, then the input frame length must be a multiple of the decimation factor of the resampling operation that the block performs. If the input frame length does not satisfy this condition, the block throws an error message with information on the decimation factor.

Data Types: single | double

Output

expand all

Mel spectrograms generated from audioIn, returned as a 96-by-64 matrix, where:

  • 96 –– Represents the number of 10 ms frames in each mel spectrogram

  • 64 –– Represents the number of mel bands spanning 125 Hz to 7.5 kHz

The overlap between consecutive 96-by-64 mel spectrograms is determined by the value of the Overlap percentage (%) parameter.

Each 96-by-64 matrix represents a single mel spectrogram. For more details on how this block generates mel spectrograms, see Algorithms.

Data Types: single

Parameters

expand all

Sample rate of the input signal in Hz, specified as a positive scalar.

Data Types: single | double

Specify the overlap percentage between consecutive mel spectrograms as a scalar in the range [0 100).

Data Types: single | double

Block Characteristics

Data Types

double | single

Direct Feedthrough

no

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Algorithms

expand all

The YAMNet Preprocess block generates mel spectrograms from audio input. These mel spectrograms can be fed to a YAMNet pretrained network or to a network that accepts the same inputs as YAMNet.

References

[1] Gemmeke, Jort F., Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. “Audio Set: An Ontology and Human-Labeled Dataset for Audio Events.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 776–80. DOI.org (Crossref), doi:10.1109/ICASSP.2017.7952261.

[2] Hershey, Shawn, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, et al. “CNN Architectures for Large-Scale Audio Classification.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2017, pp. 131–35. DOI.org (Crossref), doi:10.1109/ICASSP.2017.7952132.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Introduced in R2021b