classify row numbers of a table

Dear experts,
I would like to classify this data by rows, for example:
For every x and y point if the median of that row is T <= -12 classify it as "sunny area"
if the median of that row is -12< T <= -14 classify it as "sun-shade area"
finally, if the median of that row is T> -14 classify it as "shaded area"
I have no idea how to generate a matlab function to do that. Please, can you help with that?
Thank you!

 Accepted Answer

Already a couple different solid answers here, with slightly different methods, but I'm going to post it anyway. One difference with mine is that it keeps everything in a single table:
IMPORTANT NOTE: I did not correct your inequalities as @Voss did. But I believe the same thing he does, that you mixed them up.
% Read the table
tbl = readtable("T.xlsx",'VariableNamingRule','preserve');
% Calculate the median T from the columns, and add it as a column
tbl.medianT = median(tbl{:,["T0","T1","T2","T3","T4","T5","T6"]},2);
% Add an empty column for the sunniness
tbl = addvars(tbl,strings(height(tbl),1),'NewVariableNames','Sunniness');
% Fill in the sunniness for each median range
tbl{tbl.medianT <= -12, "Sunniness"} = "sunny area";
tbl{tbl.medianT > -12 & tbl.medianT <= -14,"Sunniness"} = "sun-shade area";
tbl{tbl.medianT > -14, "Sunniness"} = "shaded area";
disp(tbl)
Distance from CR29 (ft) X (ft) Y (ft) T0 T1 T2 T3 T4 T5 T6 medianT Sunniness _______________________ __________ __________ ______ ______ ______ ______ ______ ______ ______ _______ _____________ 0 2.2645e+06 1.0002e+06 -14.44 -14.44 -14.39 -14.3 -14.5 -14.41 -14.3 -14.41 "sunny area" 1.122 2.2645e+06 1.0002e+06 -14.47 -14.5 -14.41 -14.33 -14.44 -14.39 -14.36 -14.41 "sunny area" 1.6667 2.2645e+06 1.0002e+06 -13.96 -14.33 -14.19 -14.1 -14.59 -14.27 -14.19 -14.19 "sunny area" 6.4009 2.2645e+06 1.0002e+06 -13.99 -14.39 -14.27 -14.24 -14.47 -14.33 -14.24 -14.27 "sunny area" 8.1988 2.2645e+06 1.0002e+06 -14.01 -14.21 -14.16 -14.33 -14.44 -14.36 -14.19 -14.21 "sunny area" 9.6358 2.2645e+06 1.0002e+06 -14.01 -14.16 -13.79 -13.87 -14.36 -14.36 -14.24 -14.16 "sunny area" 11.434 2.2645e+06 1.0002e+06 -14.19 -14.33 -13.79 -14.19 -14.3 -14.24 -14.19 -14.19 "sunny area" 12.867 2.2645e+06 1.0002e+06 -14.19 -14.24 -13.76 -13.53 -14.24 -14.24 -14.21 -14.21 "sunny area" 14.669 2.2645e+06 1.0002e+06 -14.04 -14.04 -13.87 -13.9 -13.93 -14.13 -14.13 -14.04 "sunny area" 16.102 2.2645e+06 1.0002e+06 -14.04 -14.01 -13.73 -13.79 -13.79 -13.73 -13.9 -13.79 "shaded area" 17.402 2.2645e+06 1.0002e+06 -13.82 -14.04 -13.82 -13.79 -13.73 -13.9 -13.87 -13.82 "shaded area" 19.465 2.2645e+06 1.0002e+06 -13.87 -14.01 -13.87 -13.84 -13.67 -13.76 -13.82 -13.84 "shaded area" 21.017 2.2645e+06 1.0002e+06 -14.1 -14.01 -13.87 -13.93 -13.73 -13.45 -13.62 -13.87 "shaded area" 22.569 2.2645e+06 1.0002e+06 -14.01 -14.07 -13.96 -13.99 -13.87 -13.59 -13.64 -13.96 "shaded area" 24.121 2.2645e+06 1.0002e+06 -14.13 -14.19 -14.04 -13.96 -14.1 -13.84 -13.47 -14.04 "sunny area" 26.201 2.2645e+06 1.0002e+06 -14.21 -14.24 -14.19 -14.07 -14.16 -14.07 -13.79 -14.16 "sunny area" 27.74 2.2645e+06 1.0002e+06 -14.39 -14.3 -14.33 -14.13 -14.53 -14.07 -13.82 -14.3 "sunny area" 28.763 2.2645e+06 1.0002e+06 -14.27 -14.41 -14.44 -14.21 -14.59 -14.27 -13.99 -14.27 "sunny area" 30.843 2.2645e+06 1.0002e+06 -14.3 -14.44 -14.41 -14.33 -14.5 -14.39 -14.13 -14.39 "sunny area" 32.395 2.2645e+06 1.0002e+06 -14.36 -14.53 -14.47 -14.3 -14.44 -14.44 -14.1 -14.44 "sunny area" 34.308 2.2645e+06 1.0002e+06 -14.21 -14.3 -14.3 -14.04 -14.39 -14.33 -14.04 -14.3 "sunny area" 35.623 2.2645e+06 1.0002e+06 -14.19 -14.24 -14.19 -14.04 -14.44 -14.36 -14.07 -14.19 "sunny area" 36.919 2.2645e+06 1.0002e+06 -14.24 -14.24 -14.24 -14.24 -14.39 -14.33 -13.99 -14.24 "sunny area" 38.881 2.2645e+06 1.0002e+06 -14.21 -14.3 -14.27 -14.3 -14.53 -14.39 -14.13 -14.3 "sunny area" 40.843 2.2645e+06 1.0002e+06 -14.41 -14.53 -14.41 -14.36 -14.36 -14.27 -14.27 -14.36 "sunny area" 42.159 2.2645e+06 1.0002e+06 -14.44 -14.53 -14.39 -14.33 -14.36 -14.3 -14.27 -14.36 "sunny area" 44.121 2.2645e+06 1.0002e+06 -14.47 -14.5 -14.36 -14.33 -14.47 -14.41 -14.36 -14.41 "sunny area" 44.787 2.2645e+06 1.0002e+06 -14.47 -14.47 -14.3 -14.27 -14.5 -14.44 -14.41 -14.44 "sunny area" 47.398 2.2645e+06 1.0002e+06 -14.41 -14.5 -14.33 -14.3 -14.47 -14.44 -14.27 -14.41 "sunny area"

10 Comments

Incorporating @Steven Lord's use of discretize into my solution makes it much cleaner.
% Read the table
tbl = readtable("T.xlsx","VariableNamingRule","preserve");
% Calculate the median T from the columns, and add it as a column
tbl.medianT = median(tbl{:,["T0","T1","T2","T3","T4","T5","T6"]},2);
% Define the categories and their boundaries
categories = ["shaded", "sun-shade", "sunny"];
boundaries = [-Inf, -14, -12, Inf];
% Add the sunniness to the table, based on the median T
tbl.Sunniness = discretize(tbl.medianT, boundaries, 'categorical', categories)
tbl = 29×12 table
Distance from CR29 (ft) X (ft) Y (ft) T0 T1 T2 T3 T4 T5 T6 medianT Sunniness _______________________ __________ __________ ______ ______ ______ ______ ______ ______ ______ _______ _________ 0 2.2645e+06 1.0002e+06 -14.44 -14.44 -14.39 -14.3 -14.5 -14.41 -14.3 -14.41 shaded 1.122 2.2645e+06 1.0002e+06 -14.47 -14.5 -14.41 -14.33 -14.44 -14.39 -14.36 -14.41 shaded 1.6667 2.2645e+06 1.0002e+06 -13.96 -14.33 -14.19 -14.1 -14.59 -14.27 -14.19 -14.19 shaded 6.4009 2.2645e+06 1.0002e+06 -13.99 -14.39 -14.27 -14.24 -14.47 -14.33 -14.24 -14.27 shaded 8.1988 2.2645e+06 1.0002e+06 -14.01 -14.21 -14.16 -14.33 -14.44 -14.36 -14.19 -14.21 shaded 9.6358 2.2645e+06 1.0002e+06 -14.01 -14.16 -13.79 -13.87 -14.36 -14.36 -14.24 -14.16 shaded 11.434 2.2645e+06 1.0002e+06 -14.19 -14.33 -13.79 -14.19 -14.3 -14.24 -14.19 -14.19 shaded 12.867 2.2645e+06 1.0002e+06 -14.19 -14.24 -13.76 -13.53 -14.24 -14.24 -14.21 -14.21 shaded 14.669 2.2645e+06 1.0002e+06 -14.04 -14.04 -13.87 -13.9 -13.93 -14.13 -14.13 -14.04 shaded 16.102 2.2645e+06 1.0002e+06 -14.04 -14.01 -13.73 -13.79 -13.79 -13.73 -13.9 -13.79 sun-shade 17.402 2.2645e+06 1.0002e+06 -13.82 -14.04 -13.82 -13.79 -13.73 -13.9 -13.87 -13.82 sun-shade 19.465 2.2645e+06 1.0002e+06 -13.87 -14.01 -13.87 -13.84 -13.67 -13.76 -13.82 -13.84 sun-shade 21.017 2.2645e+06 1.0002e+06 -14.1 -14.01 -13.87 -13.93 -13.73 -13.45 -13.62 -13.87 sun-shade 22.569 2.2645e+06 1.0002e+06 -14.01 -14.07 -13.96 -13.99 -13.87 -13.59 -13.64 -13.96 sun-shade 24.121 2.2645e+06 1.0002e+06 -14.13 -14.19 -14.04 -13.96 -14.1 -13.84 -13.47 -14.04 shaded 26.201 2.2645e+06 1.0002e+06 -14.21 -14.24 -14.19 -14.07 -14.16 -14.07 -13.79 -14.16 shaded
Thanks the cyclist;
You did exactly what I was looking for.
Appriciate it!
Thank you for pointing that out. You guys are right I mixed the inequalities up
You're welcome. Here is yet another version, which finds the columns with names that have the pattern "T"+numeric digits, rather than needing to type them out. It's slightly more complicated, but more general.
% Read the table
tbl = readtable("T.xlsx","VariableNamingRule","preserve");
% Calculate the median T from the columns, and add it as a column
tbl.medianT = median(tbl{:,contains(tbl.Properties.VariableNames,regexpPattern("T\d*"))},2);
boundaries = [-Inf, -14, -12, Inf];
categories = ["shaded", "sun-shade", "sunny"];
tbl.Sunniness = discretize(tbl.medianT, boundaries, 'categorical', categories);
Your help is very much appriciated!
I tried to generate a new column named standard deviation using this command:
tbl.STD_T = std(tbl{:,4:end},2); % it did not work.
Error using var
W must be a double or single vector of nonnegative weights, or a scalar 0 or 1.
Error in std (line 66)
y = sqrt(var(varargin{:}));
For the std function, the second argument is not the dimension, but the weighting scheme (which has to do with sample vs. population). Try
std(tbl{:,4:end},0,2) % Note the new second argument before the dimension
That makes sense.
Is it gonna be the same procedure if I want to generate a column "mean"?
For any function you use, you need to understand the proper syntax for that function. If you don't know, then you need to read the documentation.
In the specific case of mean, the second argument is the dimension along which the mean is taken. (It is equivalent to how the median is done, not how the standard deviation is done).

Sign in to comment.

More Answers (1)

Another way to do this, which could be simpler if you have more than 3 categories, is to use discretize.
data = randi([-16, -10], 10, 1); % 10-by-1 array of integers in range [-16, -10]
boundaries = [-Inf, -14, -12, Inf];
categories = ["shaded", "sun-shade", "sunny"];
D = discretize(data, boundaries, 'categorical', categories);
result = table(data, D)
result = 10×2 table
data D ____ _________ -10 sunny -10 sunny -14 sun-shade -14 sun-shade -13 sun-shade -13 sun-shade -15 shaded -13 sun-shade -11 sunny -14 sun-shade

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!