Storing 200GB audio spectrograms in a tall table, is this possible?

Hi,
I'm processing 200GB of 1 minute audio files in a way that for each file I store in a table the filename, a timestamp and the spectum (1x64000) for each of the 60s. Then I save each table to a mat file:
for f=1:length(totFiles)
%Audio data
File=tot(f).name(1:end-4);
Fecha=datetime(str2double(File(9:12)),str2double(File(13:14)),...
str2double(File(15:16)),str2double(File(18:19)),...
str2double(File(20:21)),str2double(File(22:23)));
%Audio read
[x,fs]=audioread(strcat(tot(f).folder,'/',tot(f).name));
long=length(x)/fs;%long audio en s
%Spectrum calculation each second
xf=reshape(x,1*fs,[]);
sp=pwelch(xf,fs,fs/2,fs,fs,'power');%Ojo si wlen =! 1*fs
%Table creation
T(1:60,:)=table((Fecha+seconds(1:60))',...
strcat(repmat(File,60,1),suff),sp',...
'VariableNames',{'Fecha','File','sp'});
location=('/Volumes/Almacén/matlab/espectrosCortegada/');
save(strcat(location,'espectrosFile_',num2str(f),'.mat'),'T');
clear T;clear x;
end
The problem is that whe I want to recover all this files in a tall array trough a datastore i get the error:
ds=datastore('/Volumes/Almacén/matlab/espectrosCortegada/*.mat')
Error using datastore
Cannot determine the datastore type for the specified location.
Specify the 'Type' name-value pair argument to indicate the type of datastore to create.
>> ds=datastore('/Volumes/Almacén/matlab/espectrosCortegada/*.mat','Type','file')
Error using datastore
Incorrect number of input arguments. Specify a function handle with the 'ReadFcn' parameter.
Any clue on how to face this problem or if this even possible?

 Accepted Answer

Hi David,
Please find below a possible solution that uses Audio Toolbox functionality. It uses a sample dataset as an example.
% Download the Free Spoken Digit Data Set (FSDD).
% FSDD consists of 2000 recordings of four speakers saying the numbers 0
% through 9 in English.
downloadFolder = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip");
dataFolder = tempdir;
unzip(downloadFolder,dataFolder)
dataset = fullfile(dataFolder,"FSDD");
% Create an audioDatastore that points to the dataset.
ads = audioDatastore(dataset,IncludeSubfolders=true);
% Create a transformed datastore that computes spectra from audio data.
% Here, use pwelch.
adsSpec = transform(ads,@(x)pwelch(x,'power'));
% Use writeall to write spectra to disk. Set UseParallel to
% true to perform writing in parallel.
outputLocation = fullfile(tempdir,"MyFeatures");
writeall(adsSpec,outputLocation,WriteFcn=@myCustomWriter,UseParallel=true);
% Create a signalDatastore that points to the out-of-memory features. The
% read function returns a spectrum/timestamp pair.
sds = signalDatastore(outputLocation,IncludeSubfolders=true, ...
SignalVariableNames=["spec","timestamp"],ReadOutputOrientation="row");
% Read one pair of spectrum/timestamp
y = read(sds)
% Create a tall table
t = tall(sds);
function myCustomWriter(spec,writeInfo,~)
% myCustomWriter(spec,writeInfo,~) writes spectra/time stamps
% pair to MAT files.
filename = strrep(writeInfo.SuggestedOutputName,".wav",".mat");
% also write a time stamp as an example
timestamp = datetime('now');
save(filename,"spec","timestamp");
end

7 Comments

What if I want a more complex transform function, something like:
function [Kurtosis,Entropy]=Procesado(x)
fs=128000;
Kurtosis = spectralKurtosis(x,fs, ...
Window=hamming(round(0.03*fs)), ...
OverlapLength=round(0.006*fs), ...
Range=[1000,fs/2]);
Entropy=spectralEntropy(x,fs, ...
Window=hamming(round(0.03*fs)), ...
OverlapLength=round(0.006*fs), ...
Range=[1000,fs/2]);
end
I can manage to work... how should I change myCustomWrite?
Thanks in advance
The solution should work for more complex feature extraction. Here is an example below. Note I changed the sample rate in Procesado to 8000 Hz to match the data in the example.
% Download the Free Spoken Digit Data Set (FSDD).
% FSDD consists of 2000 recordings of four speakers saying the numbers 0
% through 9 in English.
downloadFolder = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip");
dataFolder = tempdir;
unzip(downloadFolder,dataFolder)
dataset = fullfile(dataFolder,"FSDD");
% Create an audioDatastore that points to the dataset.
ads = audioDatastore(dataset,IncludeSubfolders=true);
% Create a transformed datastore that computes spectra from audio data.
% Here, use pwelch.
adsSpec = transform(ads,@(x)Procesado(x));
% Use writeall to write spectra to disk. Set UseParallel to
% true to perform writing in parallel.
outputLocation = fullfile(tempdir,"MyFeatures");
writeall(adsSpec,outputLocation,WriteFcn=@myCustomWriter,UseParallel=true);
% Create a signalDatastore that points to the out-of-memory features. The
% read function returns a spectrum/timestamp pair.
sds = signalDatastore(outputLocation,IncludeSubfolders=true, ...
SignalVariableNames=["features","timestamp"],ReadOutputOrientation="row");
% Read one pair of spectrum/timestamp
y = read(sds)
% Create a tall table
t = tall(sds);
function myCustomWriter(features,writeInfo,~)
% myCustomWriter(spec,writeInfo,~) writes spectra/time stamps
% pair to MAT files.
filename = strrep(writeInfo.SuggestedOutputName,".wav",".mat");
% also write a time stamp as an example
timestamp = datetime('now');
save(filename,"features","timestamp");
end
function features = Procesado(x)
fs=8000; % I changed this sample rate to match my data
Kurtosis = spectralKurtosis(x,fs, ...
Window=hamming(round(0.03*fs)), ...
OverlapLength=round(0.006*fs), ...
Range=[1000,fs/2]);
Entropy=spectralEntropy(x,fs, ...
Window=hamming(round(0.03*fs)), ...
OverlapLength=round(0.006*fs), ...
Range=[1000,fs/2]);
features = {Kurtosis,Entropy};
end
Thanks again!!
This solution works but the result is a cell tall array where I can't access (Indexing expressions of the form T{...,...} are not supported for tall arrays.) to the data in the way:
t{1,1}(:,1) % Accesing the first file, first feature
So I guess is mandatory to use aux variables (¿? I didn't find other way) to access the data in the way:
aux=t{1,1}(:,1)
But in this way I'm loosing the power of the tall array because I can't batch the calculations. For example, if I want to do the mean of all the features (columns) I would do:
mean(vertcat(t{:,1}))
But this won't work with this kind of cell tall array..
The best option will be to have the possibility of creating a table with continuous rows with the features of each archive (with all their windows) instead of a single cell row for each one, is this even possible? I couldn't make it this way..
You cannot use a single-subscript {} index to index a table or tall table . t = tall(sds); is not creating a cell array of anything: it is creating a tall table (in this case) which is a single continuous table that you index with two () subscripts to get a sub-table, or with two {} subscripts to get at the content of the subset, or with dot indexing like t.VARIABLENAME or t.VARIABLENAME(SUBSCRIPT)
Thanks for your answer @Walter Roberson !
Just to clarify:
  • My question is related to the second answer of @jibrahim
  • I wrote t{1}(:,1) but I wanted to write t{1,1}(:,1) (fixed)
Following that code matlab answer that t is a tall cell that can't be accessed in the way t{..,.}:
>> t
t =
6701×2 tall cell array
{2499×6 double} {'20210915_084200'}
{2499×6 double} {'20210915_084300'}
{2499×6 double} {'20210915_084400'}
{2499×6 double} {'20210915_084500'}
{2499×6 double} {'20210915_084600'}
{2499×6 double} {'20210915_084700'}
{2499×6 double} {'20210915_084800'}
{2499×6 double} {'20210915_084900'}
: : :
: : :
>> t{1,1}
Indexing expressions of the form T{...,...} are not supported for tall arrays.
So if I want to access to the content of each feature (columns of the first collumn of the tall cell array) I have to use and aux variable because I can't use what you propose, can you follow me now?
Not sure if this is what you want, but if you change one line of code in Procesado to:
features = [Kurtosis,Entropy];
then this works:
t = tall(sds);
Y = mean(cell2mat(t(:,1)));
Y = gather(Y)

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!