Saving results from loop with intervals
Show older comments
Hi everyone, I am applying Naive Bayes Classification by hand to a data set with formula. My data has 45.000 columns and no class informations. I would like to seperate it with classes. I will have nearly 100 classes but I don't want to write it by hand. I want to handle it with loop and I am looking for hours but I couldn't figure it out.
veri = readtable("2018.csv");
maxgh = max(veri.GlobalHoriz_W_m_2_);
mingh = min(veri.GlobalHoriz_W_m_2_);
meangh = mean(veri.GlobalHoriz_W_m_2_);
stdgh = std(veri.GlobalHoriz_W_m_2_);
vericell = table2cell(veri);
ghcolumn = cell2mat(vericell(:,3));
*class1 = vericell(ghcolumn>=mingh & ghcolumn <=95,:);
class2 = vericell(ghcolumn>95 & ghcolumn <=195,:);
class3 = vericell(ghcolumn>195 & ghcolumn <=295,:);
class4 = vericell(ghcolumn>295 & ghcolumn <=395,:);
class5 = vericell(ghcolumn>395 & ghcolumn <=495,:);
class6 = vericell(ghcolumn>495 & ghcolumn <=595,:);
class7 = vericell(ghcolumn>595 & ghcolumn <=695,:);
class8 = vericell(ghcolumn>695 & ghcolumn <=795,:);
class9 = vericell(ghcolumn>795 & ghcolumn <=895,:);
class10 = vericell(ghcolumn>895 & ghcolumn <=995,:);
class11 = vericell(ghcolumn>995 & ghcolumn <=maxgh,:);*
I would like to create seperate tables for the values which is between -5,95 95,195 195,295 etc.. with loop of course
For now I increased the numbers for 100 but I would like to increase them by 10 and I don't want to do it one by one with hand . Any help will be appreciated so much. Thanks in advance.
4 Comments
Your whole code shows that you don't know how to manipulate tables. For example, your inefficient table2cell + cell2mat is simply
ghcolumn = veri{:, 3};
Creating numbered variables, manually or by code, is always a bad idea. You should have a single array/cell array/whatever container instead of these numbered variables.
It is possible to split your tables into a cell array of tables with just a few lines of code (loop not needed). However, it is just as likely that you don't need to do that and that whatever you want to do afterward will be much easier if you don't split your table at all. Therefore, can you explain why you want to do it?
Stephen23
on 28 Feb 2018
@Sukru Yavuz: in general you should keep data together as much as possible, because this usually makes processing it easier and means that you do not create unnecessary duplicates of the same data in memory (what a waste!).
Most likely the table class already has some grouping method that would help you to achieve your goal, in which case that would be a much simpler solution: so what do you actually want to do with this data?
Note that creating variable names like that in a loop is very inefficient, and is not recommended:
Sukru Yavuz
on 28 Feb 2018
Edited: Sukru Yavuz
on 28 Feb 2018
Sukru Yavuz
on 28 Feb 2018
Edited: Sukru Yavuz
on 28 Feb 2018
Answers (1)
As Stephen and I said, splitting your table into multiple tables is probably going to complicate things for you. I would recommend you create a new column instead which tells you which class each row belongs to:
veri = readtable("2018.csv");
veri.Class = discretize(veri{:, 3}, min(veri{:, 3}):20:max(veri{:, 3})+20)
That's all that is needed. No loop required. You can then do calculations by class using rowfun or varfun with the 'GroupingVariable', 'Class' option. For example, to get the mean of column 4 per each class:
varfun(@mean, veri, 'InputVariables', 4, 'GroupingVariables', 'Class')
If you really want to split the table:
splittables = splitapply(@(rows) {veri(rows, :)}, (1:height(veri))', veri.Class)
splittables{i} is the table of class i. But again, this should not be needed. Calculating the above mean is more complicated once it's split.
1 Comment
Sukru Yavuz
on 28 Feb 2018
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!