Label Spoken Words in Audio Signals
This example shows how to label spoken words in Signal Labeler. The example uses the wav2vec 2.0 pretrained network, which requires Deep Learning Toolbox™ and installing the pretrained model. For more information on downloading and installing the wav2vec 2.0 pretrained model, see speech2text
(Audio Toolbox).
Read in Audio File
Load an audio data file containing speech with the sentence "The discrete Fourier transform of a real valued signal is conjugate symmetric".
[y,fs] = audioread("speech_dft.wav"); % To hear, type soundsc(y,fs)
Define Label
Open Signal Labeler and define a label to attach to the signal. Click Add on the Labeler tab, then Add Label Definition. Specify the Label Name as Words
, select a Label Type of ROI
, and select a Data Type of string
.
Import Speech Data
Import the signal into the app.
On the Labeler tab, click Import and select
From Workspace
in the Members list. In the dialog box, select the signaly
.Add time information. Select
Time
from the drop-down list and specifyfs
as the sample rate, which is measured in Hz.Click Import and Close. The signal appears in the Labeled Signal Set Members browser.
To hear the signal, select the check box next to its name in the browser, navigate to the Audio tab, and click on the Play icon.
Locate and Identify Spoken Words
Locate and identify the words spoken in the input signal.
Select
Words
in the Label Definitions browser.On the Automated Value gallery, select
Speech to Text
.Click Auto-Label and select
Auto-Label All Signals
.In the dialog box, select
wav2vec 2.0
from the Service Name list and selectWord
for Segmentation.Click OK.
Signal Labeler locates and labels the spoken words. In the Labeled Signal Set Members browser, select the check box next to y
to plot the signal. Expand Words
and select the check box next to each word to visualize the corresponding labeled region. Notice that the word "discrete" is incorrectly labeled as "discreet". You can correct this by right-clicking discreet
in the Value column, selecting Edit, and entering the correct value, "discrete".
Export Labeled Signal
Export the labeled signal. On the Labeler tab, click Export and select To File
from the Labeled Signal Set list. In the dialog box that appears, give the name Transcription.mat
to the labeled signal set and add an optional short description. Click Export.
Go back to the MATLAB Command Window. Load the labeled signal set. The set has only one member. Get the names of the labels, and use the name to obtain and display the transcribed words.
load Transcription
ln = getLabelNames(ls);
v = getLabelValues(ls,1,ln)
v=12×2 table
ROILimits Value
__________________ ___________
0.34063 0.4208 "the"
0.46088 0.82161 "discrete"
0.88174 1.2826 "fourier"
1.3427 1.984 "transform"
2.0441 2.1042 "of"
2.1644 2.2044 "a"
2.2846 2.485 "real"
2.5251 2.8658 "valued"
2.9259 3.3067 "signal"
3.3869 3.467 "is"
3.5271 4.0081 "conjugate"
4.0482 4.5091 "symmetric"
Change the label values from strings to categories. Use a signalMask
object to plot the signal using a different color for each word.
v.Value = categorical(v.Value,v.Value); msk = signalMask(v,SampleRate=fs); s = getSignal(ls,1); plotsigroi(msk,s.y)
Create a logical vector of the same length as the audio signal. Set the vector to true
where the corresponding part of the speech signal contains the word "conjugate".
bsl = binmask(msk,height(s));
plot(s.Time,[s.y bsl(:,v.Value=="conjugate")])
See Also
Apps
Objects
Related Examples
- Label Signal Attributes, Regions of Interest, and Points
- Label ECG Signals and Track Progress
- Examine Labeled Signal Set
- Automate Signal Labeling with Custom Functions
More About
- Use Signal Labeler App
- Import Data into Signal Labeler
- Create or Import Signal Label Definitions
- Label Signals Interactively or Automatically
- Custom Labeling Functions
- Customize Labeling View
- Feature Extraction Using Signal Labeler
- Dashboard
- Export Labeled Signal Sets and Signal Label Definitions
- Signal Labeler Usage Tips