I am using the readMatrix function to read in 90 rows and 13 columns of data (with each column being a measurement such as age, gender etc) from a .csv file, I then need to perform if statements on this data to follow a decision tree based on the data in each row to give me an outcome. I have been told that using a for loop will help me to implement this decision tree on the data but I haven't been told how?
This is what I have so far
myData = readMatrix('BME501_Coursework_Testdata.csv');
for i= 1:90
Val1 = Column1(1);
Val2 = Column2(1);
Val3 = Column3(1);
Val4 = Column4(1);
Val5 = Column5(1);
Val6 = Column6(1);
Val7 = Column7(1);
Val8 = Column8(1);
Val9 = Column9(1);
Val10 = Column10(1);
Val11 = Column11(1);
Val12 = Column12(1);
Val13 = Column13(1);
end
As you can see I have gotten the file to be read in and then have defined each of the values in row 1 so that they can be used in the decision tree.
My question is how is it possible to do this for all 90 rows so that each row (which is 1 set of data) can be read as one set of data so that the decision tree can use the variables within it to give me an outcome for each row?

4 Comments

dpb
dpb on 6 Dec 2020
"file to be read in and then have defined each of the values in row 1 ..."
Well, you read the data in... :)
There is no variable ColumnN in sight, however, so those are going to be undefined variables and will die.
However, you absolutely do NOT want to build such sequentially named variables in MATLAB; this way leads to impossible code to read and even more impossible to debug when it inevitably fails to work as intended.
You have an array myData, everything from here on should refer to it.
Without some idea of what this decision tree is to be doing, it's not possible to give any further pointers...show us the problem statement.
"However, you absolutely do NOT want to build such sequentially named variables in MATLAB; this way leads to impossible code to read and even more impossible to debug when it inevitably fails to work as intended."
Yeah I've noticed that too, it's something I'm going to change to make diagnosing problems easier.
The data is a different pieces of patient data which I want to be bulk read in and analysed to show if they have a certain condition or not (Patient0utput =1 is true and =0 is false)
The decision tree is quite complex and can be seen as follows (I have tested it using very basic manual input of data and it seems to work as it should)
if Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 <=1 && Val2 <=0
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 <=1 && Val2 >0 && Val7 <=1 && Val3 <=46
Patient0utput = 1;
elseif Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 <=1 && Val2 >0 && Val7 <=1 && Val3 >46
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 <=1 && Val2 >0 && Val7 >1
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 >1
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 <=0
Patient0utput = 1;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 <=0
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 <=0 && Val3 <=1
Patient0utput = 1;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 <=0 && Val3 >1 && Val4 <=128 && Val8 <=142
Patient0utput = 1;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 <=0 && Val3 >1 && Val4 <=128 && Val8 >142
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 <=0 && Val3 >1 && Val4 >128
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 >0 && Val7 <=1
Patient0utput = 1;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 >0 && Val7 >1 && Val12 <=0 && Val7 <=271
Patient0utput = 0;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 >0 && Val7 >1 && Val12 <=0 && Val7 >271
Patient0utput = 1;
elseif Val3 <=3 && Val9 <=0 && Val1 >55 && Val7 >0 && Val2 >0 && Val7 >0 && Val7 >1 && Val12 >0
Patient0utput = 1;
elseif Val3 <=3 && Val9 >0 && Val1 <=1
Patient0utput = 0;
elseif Val3 <=3 && Val9 >0 && Val1 >1
Patient0utput = 1;
elseif Val3 >3 && Val5 <=0
Patient0utput = 1;
elseif Val3 >3 && Val5 >0 && Val10 <=0.8 && Val2 <=0 && Val9 <=0
Patient0utput = 0;
elseif Val3 >3 && Val5 >0 && Val10 <=0.8 && Val2 <=0 && Val9 >0 && Val7 <=0
Patient0utput = 1;
elseif Val3 >3 && Val5 >0 && Val10 <=0.8 && Val2 <=0 && Val9 >0 && Val7 >0
Patient0utput = 0;
elseif Val3 >3 && Val5 >0 && Val10 <=0.8 && Val2 >0 && Val12 <=0 && Val13 <=3
Patient0utput = 0;
elseif Val3 >3 && Val5 >0 && Val10 <=0.8 && Val2 >0 && Val12 <=0 && Val13 >3
Patient0utput = 1;
elseif Val3 >3 && Val5 >0 && Val10 <=0.8 && Val2 >0 && Val12 >0
Patient0utput = 1;
elseif Val3 >3 && Val5 >0 && Val10 >0.8 && Val2 <=0 && Val13 <=3 && Val9 <=0
Patient0utput = 0;
elseif Val3 >3 && Val5 >0 && Val10 >0.8 && Val2 <=0 && Val13 <=3 && Val9 >0
Patient0utput = 1;
elseif Val3 >3 && Val5 >0 && Val10 >0.8 && Val2 <=0 && Val13 >3
Patient0utput = 1;
else
Patient0utput = 1;
Is there a way I can get the programme to read each row as a set of data and assign each of the variables to the correct value so that it can then work through the decision tree to give me a Patient0utput value?
Well, that's totally illegible as to try to decipher -- would have to work back from that to try to derive the actual logic from which to write the algorithm.
It starts off with
data(:,1)<=3
as the first decision point and goes from there.
Where's the problem definition you were given?
But, that aside, the way to code it even if were to keep the compound if...elseif...end block is to substitute the array variable indices for the numbered variables.
Patient0utput = 1; % Global "else" if nothing matches
if X(i,3)<=3 % first level on parameter 3 <= 3
if X(i,9) <=0 % second level on parameter 9 <= 0
if X(i,1) <=55 % third level on parameter 1 <=55
if X(i,3) <=1 % fourth level on parameter 3 <=1
if (X(i,2)<=0 % fifth level on parameter 2 <=0
Patient0utput = 0;
else % Val2 >0
&& Val7 <=1 && Val3 <=46
Patient0utput = 1;
Val7 <=1 && Val3 >46
Patient0utput = 0;
Val7 >1
end % Val2 level 5
Patient0utput = 0;
else % Val3 >1
Patient0utput = 0;
end % Val3 level 4
else % Val1 >55
Val7 <=0
Patient0utput = 1;
Val7 >0 && Val2 <=0
Patient0utput = 0;
Val7 >0 && Val2 >0 && Val7 <=0 && Val3 <=1
Patient0utput = 1;
Val7 >0 && Val2 >0 && Val7 <=0 && Val3 >1 && Val4 <=128 && Val8 <=142
Patient0utput = 1;
Val7 >0 && Val2 >0 && Val7 <=0 && Val3 >1 && Val4 <=128 && Val8 >142
Patient0utput = 0;
Val7 >0 && Val2 >0 && Val7 <=0 && Val3 >1 && Val4 >128
Patient0utput = 0;
Val7 >0 && Val2 >0 && Val7 >0 && Val7 <=1
Patient0utput = 1;
Val7 >0 && Val2 >0 && Val7 >0 && Val7 >1 && Val12 <=0 && Val7 <=271
Patient0utput = 0;
Val7 >0 && Val2 >0 && Val7 >0 && Val7 >1 && Val12 <=0 && Val7 >271
Patient0utput = 1;
Val7 >0 && Val2 >0 && Val7 >0 && Val7 >1 && Val12 >0
Patient0utput = 1;
Val7 <=0
end % Val1 >55
else % Val9 >0
if X(i,1) <=1
Patient0utput = 0;
else % Val1 >1
Patient0utput = 1;
end
else % Val3 >3
&& Val5 <=0
Patient0utput = 1;
&& Val5 >0 && Val10 <=0.8 && Val2 <=0 && Val9 <=0
Patient0utput = 0;
&& Val5 >0 && Val10 <=0.8 && Val2 <=0 && Val9 >0 && Val7 <=0
Patient0utput = 1;
&& Val5 >0 && Val10 <=0.8 && Val2 <=0 && Val9 >0 && Val7 >0
Patient0utput = 0;
&& Val5 >0 && Val10 <=0.8 && Val2 >0 && Val12 <=0 && Val13 <=3
Patient0utput = 0;
&& Val5 >0 && Val10 <=0.8 && Val2 >0 && Val12 <=0 && Val13 >3
Patient0utput = 1;
&& Val5 >0 && Val10 <=0.8 && Val2 >0 && Val12 >0
Patient0utput = 1;
&& Val5 >0 && Val10 >0.8 && Val2 <=0 && Val13 <=3 && Val9 <=0
Patient0utput = 0;
&& Val5 >0 && Val10 >0.8 && Val2 <=0 && Val13 <=3 && Val9 >0
Patient0utput = 1;
&& Val5 >0 && Val10 >0.8 && Val2 <=0 && Val13 >3
Patient0utput = 1;
end
A start at factoring the conditions...
Your conditions sometimes clash. For example your first condition
if Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 <=1 && Val2 <=0
Val3 must be <= 3 (first part) but also <= 1 (fourth part) .
You use that same clash on tests 2, 3, 4. But then on test 5 you have
elseif Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 >1
Val3 must be <= 3 (first part) but also > 1 (fourth part) . That makes more sense to test together
But look at test 3:
elseif Val3 <=3 && Val9 <=0 && Val1 <=55 && Val3 <=1 && Val2 >0 && Val7 <=1 && Val3 >46
Val3 must be <= 3 (first part) but also <= 1 (4th part) but also > 46 (7th part)
Consider going through your tests, and for each variable, make a list of the used conditions, each in numeric sorted order within the test, such as
val1 <= 1, val1 <= 55, val1 > 55
val2 <= 0, val2 > 0
val3 <= 1 & <= 3, val3 <= 1 & <= 3 & <= 46, val3 <= 1 & <= 3 & > 46, val3 > 1 & <= 3, val3 <= 3
logically, val3 <= 1 & <= 3 & <= 46 could be simplified to val3 <= 1, but you need to review the medical part to see whether that makes sense or whether instead you named the wrong val* for one of the tests. But val3 <= 1 & <= 3 & > 46 is just plain false, so you need to review the medical part to see if you named the wrong test or if instead the test is truly impossible to succeed
After all tests are resolved and simplified, for each variable you will end up with a list of breakpoints, such as
val1 <= 1, val1 <= 55, val1 > 55
and you can process that test by discretizing the values into exclusive domains
val1 < 1, val1 = 1, val1 < 55, val1 = 55, val1 > 55
if you assign a number (possibly categorical or enumeration) to each, then in your tests you can code something like
ismember(val1idx, [EQ1, LT55])
which would encode val1 == 1 or (val1 > 1 & val1 < 55)
It is a bit of a nuisance to have to code the == separately from the < or >, but at the moment it looks to me as if some of your ranges are valN >= A & valN <= B, and others ar valN > A & valN <= B, and others are valN >= A & valN < B .
With this kind of coding, if you wanted to express a strict < 55 including <= 1, you would code the entire list up to that point,
ismember(val1idx, [LT1, EQ1, LT55])
This kind of setup helps to think of the conditions more methodically, and can even express "or" for disjoint ranges. But for your purposes it might turn out that you do not need disjoint ranges, and you might just need to know starting and ending condition numbers, like
ismember(val1idx, EQ1:LT55)
that would then potentially lend itself to encoding in a table, like
[1, EQ1, LT55; %val1 [1, 55)
2, -inf, LT0; %val2 0)
3, GT3, EQ46] %val3 (3, 46]

Sign in to comment.

 Accepted Answer

myData = readMatrix('BME501_Coursework_Testdata.csv');
for i = 1 : size(mYData,1)
thisrow = myData(i,:);
now use thisrow in your decision tree
end

2 Comments

Thank you, this looks like it'll work.
I'm quite new to Matlab so not quite sure how exactly it will look for me decision tree though? Could you give me an example of how I would use thisrow in the decision tree? Even if you could make a very basic decision tree that i could use as a reference?
Thanks for your help!
You can just feed it sample data and associated class labels, and it will automatically figure out what the tree looks like.

Sign in to comment.

More Answers (0)

Categories

Find more on Creating, Deleting, and Querying Graphics Objects in Help Center and File Exchange

Products

Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!