Deep learning with partitionable datastores on a cluster
    17 views (last 30 days)
  
       Show older comments
    
    Christopher McCausland
      
 on 10 Mar 2023
  
    
    
    
    
    Answered: Joss Knight
    
 on 16 Mar 2023
            Hello,
I have a data store which contains 1000 .mat files. Each file contains a X*4 table which has the following format (see attached). 'X' is typically 700-900. The tf_ridge column is my data, for this study "sleep stage" is my lable of intrest. 
MATLAB deep learning expects a n*2 table input; therefore I created a custom read function to read in the data and strip out the extra two colums and make my lable data categorical as shown below in mys custon read function; 
% Calling ds as shown 
ds = fileDatastore('C:\mydata',"ReadFcn",@custom_load_FN,"FileExtensions",".mat");
I also make a subset of the data for training and test purposes;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   % Create a subset of the datastore for test train val purposes
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% [Train, val, test] as a whole percentage i.e. [60,20,20]
split = [90,0,10];
[split_idx] = round(length(ds_org.Files)*(split/100));
train_idx = [1:split_idx(1)];
val_idx = [split_idx(1):split_idx(1)+split_idx(2)];
test_idx = [split_idx(1)+split_idx(2):split_idx(1)+split_idx(2)+split_idx(3)];
% Generate subset for train/test split; will inherit ds properties of
% isPartitionable
dstrain = subset(ds,train_idx);
dsval = subset(ds,val_idx);
dstest = subset(ds,test_idx); 
% Custom read function to strip out arousal and epoch columns, and make
% lable categorical
function [a] = custom_load_FN(l)
    disp('In load function')
    load(l);
    %disp(l)
    data= removevars(data,{'Arousal','Epoch'});
    valSet = {'N1' 'N2' 'N3' 'W' 'R'};
    data.Sleep_Stage = categorical(data.Sleep_Stage,valSet);
    a = data;
end
When I test this with;
tf = isPartitionable(ds)
MATLAB returns a logical 1; so the datastore is partitionable. However on the cluster I get the following error that the datastore is not partitionable. 
The input datastore is not Partitionable and does not support parallel operations.
As a work around; I have also tried to use the @load handle and a transform datastore function which is just a rehash of my custom_load_FN however this has been unsuccessful. I am aware of this post, and this one. However it seems like there should be an easier soloution in my case. I just don't have enough experiance of working with datastores to know what this is. 
If anyone has advice on how to make this type of datastore into a partitionable datastore with the ExecutionEnvironment="parallel" option for deep learning I would apprshate the advice! 
options = trainingOptions("adam", ...
    ExecutionEnvironment="parallel", 
    ...
    )
Kind regards,
Christopher
3 Comments
  Joss Knight
    
 on 16 Mar 2023
				Ah yes, this is just an incorrect error message that was fixed in R2022a. I will answer now.
Accepted Answer
  Joss Knight
    
 on 16 Mar 2023
        This error message is incorrect. It should say that your datastore is not PartionableByIndex. This was fixed in R2022a.
As long as your datastore is Subsettable you can now (since R2022b) work around this issue by using this Adapter I knocked together. No promises but it's mostly worked so far.
0 Comments
More Answers (0)
See Also
Categories
				Find more on Parallel and Cloud in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
