Deep learning with partitionable datastores on a cluster
Show older comments
Hello,
I have a data store which contains 1000 .mat files. Each file contains a X*4 table which has the following format (see attached). 'X' is typically 700-900. The tf_ridge column is my data, for this study "sleep stage" is my lable of intrest.
MATLAB deep learning expects a n*2 table input; therefore I created a custom read function to read in the data and strip out the extra two colums and make my lable data categorical as shown below in mys custon read function;
% Calling ds as shown
ds = fileDatastore('C:\mydata',"ReadFcn",@custom_load_FN,"FileExtensions",".mat");
I also make a subset of the data for training and test purposes;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Create a subset of the datastore for test train val purposes
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% [Train, val, test] as a whole percentage i.e. [60,20,20]
split = [90,0,10];
[split_idx] = round(length(ds_org.Files)*(split/100));
train_idx = [1:split_idx(1)];
val_idx = [split_idx(1):split_idx(1)+split_idx(2)];
test_idx = [split_idx(1)+split_idx(2):split_idx(1)+split_idx(2)+split_idx(3)];
% Generate subset for train/test split; will inherit ds properties of
% isPartitionable
dstrain = subset(ds,train_idx);
dsval = subset(ds,val_idx);
dstest = subset(ds,test_idx);
% Custom read function to strip out arousal and epoch columns, and make
% lable categorical
function [a] = custom_load_FN(l)
disp('In load function')
load(l);
%disp(l)
data= removevars(data,{'Arousal','Epoch'});
valSet = {'N1' 'N2' 'N3' 'W' 'R'};
data.Sleep_Stage = categorical(data.Sleep_Stage,valSet);
a = data;
end
When I test this with;
tf = isPartitionable(ds)
MATLAB returns a logical 1; so the datastore is partitionable. However on the cluster I get the following error that the datastore is not partitionable.
The input datastore is not Partitionable and does not support parallel operations.
As a work around; I have also tried to use the @load handle and a transform datastore function which is just a rehash of my custom_load_FN however this has been unsuccessful. I am aware of this post, and this one. However it seems like there should be an easier soloution in my case. I just don't have enough experiance of working with datastores to know what this is.
If anyone has advice on how to make this type of datastore into a partitionable datastore with the ExecutionEnvironment="parallel" option for deep learning I would apprshate the advice!
options = trainingOptions("adam", ...
ExecutionEnvironment="parallel",
...
)
Kind regards,
Christopher
3 Comments
Joss Knight
on 14 Mar 2023
That's an odd one. Does it work to type getReport(MException.last.UnderlyingCause)?
Christopher McCausland
on 14 Mar 2023
Joss Knight
on 16 Mar 2023
Ah yes, this is just an incorrect error message that was fixed in R2022a. I will answer now.
Accepted Answer
More Answers (0)
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!