Main Content

Audio Processing

Extend deep learning workflows with audio and speech processing applications

Apply deep learning to audio and speech processing applications by using Deep Learning Toolbox™ together with Audio Toolbox™. For signal processing applications, see Signal Processing. For applications in wireless communications, see Wireless Communications.


Signal LabelerLabel signal attributes, regions, and points of interest, and extract features


expand all

audioDatastoreDatastore for collection of audio files
audioDataAugmenterAugment audio data (Since R2019b)
audioFeatureExtractorStreamline audio feature extraction (Since R2019b)
openl3EmbeddingsExtract OpenL3 feature embeddings (Since R2022a)
pitchnnEstimate pitch with deep learning neural network (Since R2021a)
vggishEmbeddingsExtract VGGish feature embeddings (Since R2022a)
audioPretrainedNetworkPretrained audio neural networks (Since R2024a)
classifySoundClassify sounds in audio signal (Since R2020b)
pitchnnEstimate pitch with deep learning neural network (Since R2021a)
vggishEmbeddingsExtract VGGish feature embeddings (Since R2022a)
openl3EmbeddingsExtract OpenL3 feature embeddings (Since R2022a)
detectspeechnnDetect boundaries of speech in audio signal using AI (Since R2023a)
separateSpeakersSeparate signal by speakers (Since R2023b)


expand all

VGGishVGGish embeddings extraction network (Since R2022a)
VGGish EmbeddingsExtract VGGish embeddings (Since R2022a)
YAMNetYAMNet sound classification network (Since R2021b)
Sound ClassifierClassify sounds in audio signal (Since R2021b)
OpenL3OpenL3 embeddings extraction network (Since R2022b)
OpenL3 EmbeddingsExtract OpenL3 embeddings (Since R2022b)
CREPECREPE deep pitch estimation neural network (Since R2023a)
Deep Pitch EstimatorEstimate pitch with CREPE deep learning neural network (Since R2023a)
