How to send a big data (loaded into datastore object) to a classifier in Matlab?
1 view (last 30 days)
Show older comments
this is my first experince working with data storages in `Matlab`. I hoping I can get some guidance here. I have a big data that I have saved features and corresponding labels of each rows into two `txt` file: one is `data.txt` and one is `label.txt`. Each file has `264e6 rows`. I did the following steps:
%creating datastore objects
datafile='data.txt';
ds=datastore(datafile,'TreatAsMissing','NA');
labelfile='label.txt';
ds_lbl=datastore(labelfile,'TreatAsMissing','NA');
After sending to classifier, I am facing the following error:
Mdl=fitcnb(read(ds),read(ds_lbl));
Error using classreg.learning.FullClassificationRegressionModel.prepareDataCR (line 201)
X and Y do not have the same number of observations.
Error in classreg.learning.classif.FullClassificationModel.prepareData (line 487)
classreg.learning.FullClassificationRegressionModel.prepareDataCR(...
Error in ClassificationNaiveBayes.prepareData (line 143)
prepareData@classreg.learning.classif.FullClassificationModel(X,Y,varargin{:},'OrdinalIsCategorical',true);
Error in classreg.learning.FitTemplate/fit (line 213)
this.PrepareData(X,Y,this.BaseFitObjectArgs{:});
Error in ClassificationNaiveBayes.fit (line 132)
this = fit(temp,X,Y);
Error in fitcnb (line 307)
this = ClassificationNaiveBayes.fit(X,Y,RemainingArgs{:});
With predefined `Readsize`, which is `20000` the classifier works. But even whenever I change the Readsize to `1e6`, it is showing the same error. The other point is that with predefined readsize, classifier is only able to classify `20000` records, while I have `264e6 rcords`.
I really appreciate if you suggest a solution. How can I send datastorage to the classifier?
0 Comments
Answers (1)
Don Mathis
on 30 May 2017
I think you need to pass tall arrays or a tall table to fitcnb. See the documentation here: http://www.mathworks.com/help/stats/fitcnb.html?searchHighlight=fitcnb&s_tid=doc_srchtitle#bvnjlgv
and here:
You can get a tall table from a datastore like this:
tt = tall(ds)
3 Comments
Don Mathis
on 5 Jun 2017
Edited: Don Mathis
on 5 Jun 2017
I have not tried to do this myself, but from the error message it looks like you need to create your two tall arrays from the same datastore. So you'll need to put your labels in the same datastore as your features. I guess you could concatenate your two txt files "side by side", and then create your single datastore. After that, I think you would create a single tall array from that datastore, and then pass the 'features' columns of that as X and the 'label' column as Y, using the syntax fitcnb(X,Y).
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!