How to output specific rows from tables depending on values within the table?

63 views (last 30 days)
Sean Byrne
Sean Byrne on 5 Jun 2018
Answered: Peter Perkins on 3 Jul 2018
I have a table of variable I have pulled form an excel spread sheet (the actual file is 45 columns X 2000 rows). But this gives the idea of what I am trying to achieve.
I would like to find each separate participants (identified by their 'ID') maximum jump height for that 'season' of testing and remove the other rows.
The Table I'm working with is like this (but extended):
[Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
'Pre' 1 0001 U14 40 35 685 0.3 1100
'Pre' 2 0001 U14 40 32 630 0.25 1200
'Pre' 1 0002 U14 42 40 750 0.42 1000
'Pre' 2 0002 U14 42 36 700 0.4 1300
'Pre' 1 0003 U14 45 32 610 0.3 1111
'Pre' 2 0003 U14 45 28 600 0.3 1600
'Post' 1 0001 U14 40 35 685 0.3 1100
'Post' 2 0001 U14 40 32 630 0.25 1200
'Post' 1 0002 U14 42 40 750 0.42 1000
'Post' 2 0002 U14 42 36 700 0.4 1300
'Post' 1 0003 U14 45 32 610 0.3 1111
'Post' 2 0003 U14 45 28 600 0.3 1600
What I aim to end up with is something more like
[Season] [TrialNo] [ID] [AgeGroup] [bodyMass] [JumpHeight] [Force] [FlighTime] [LandingForce]
'Pre' 1 0001 U14 40 35 685 0.3 1100
'Pre' 1 0002 U14 42 40 750 0.42 1000
'Pre' 1 0003 U14 45 32 610 0.3 1111
'Post' 1 0001 U14 40 35 685 0.3 1100
'Post' 1 0002 U14 42 40 750 0.42 1000
'Post' 1 0003 U14 45 32 610 0.3 1111
  2 Comments
Sean Byrne
Sean Byrne on 5 Jun 2018
Attached is a sample of the data I'm working with. As you can see jump height is random within the three trials.

Sign in to comment.

Answers (3)

Razvan Carbunescu
Razvan Carbunescu on 5 Jun 2018
Edited: Razvan Carbunescu on 5 Jun 2018
If you're using R2018a and only interested in maximum JumpHeight can use groupsummary on table T:
>> GT = groupsummary (T,{'Season','ID'},'max','JumpHeight')
GT =
6×4 table
Season ID GroupCount max_JumpHeight
______ __ __________ ______________
'Post' 1 2 35
'Post' 2 2 40
'Post' 3 2 32
'Pre' 1 2 35
'Pre' 2 2 40
'Pre' 3 2 32
If you want to get all the row information or on an earlier release can use findgroups / splitapply workflow
idx = findgroups(T.Season,T.ID);
GT = splitapply(@maxidx,T,idx);
GT.Properties.VariableNames = T.Properties.VariableNames
function T = maxidx(varargin)
[~,i] = max(varargin{6});
tmpvarargout = cellfun(@(x) x(i,:),varargin,'UniformOutput',false);
T = table(tmpvarargout{:});
end
Sample Output
GT =
6×9 table
Season TrialNo ID AgeGroup bodyMass JumpHeight Force FlighTime LandingForce
______ _______ __ ________ ________ __________ _____ _________ ____________
'Post' 1 1 14 40 35 685 0.3 1100
'Post' 1 2 14 42 40 750 0.42 1000
'Post' 1 3 14 45 32 610 0.3 1111
'Pre' 1 1 14 40 35 685 0.3 1100
'Pre' 1 2 14 42 40 750 0.42 1000
'Pre' 1 3 14 45 32 610 0.3 1111
Edit: Script assumes JumpHeight is 6th column in table, might have to modify for correct position
  6 Comments
Razvan Carbunescu
Razvan Carbunescu on 7 Jun 2018
I had missed the fact that you're on R2014a. findgroups/splitapply were introduced in R2016b.
I think the way to try to get it in R2014a is to use sortrows and unique with the rows flag to find the indexing to the first sorted highest value.
ST = sortrows(T,{'Season' 'ID' 'JumpHeight'},{'ascend' 'ascend' 'descend'});
% taking advantage here of the fact that ST is sorted by JumpHeight and unique returns first element
[~,idx] = unique([double(categorical(ST.Season)) ST.ID],'rows');
GT = ST(idx,:)

Sign in to comment.


Are Mjaavatten
Are Mjaavatten on 5 Jun 2018
Edited: Are Mjaavatten on 11 Jun 2018
I am a little uncertain about the type of data structure you use. For completeness I therefore entered your data in an Excel workbook that I read using readtable.
If there are always exactly two trials per ID and season:
T0 = readtable('Byrne.xlsx');
rows = [];
for i = 1:2:size(T0,1)-1
[~,j] = max(T0.JumpHeight(i:i+1));
rows = [rows;i+j-1];
end
T2 = T0(rows,:);
If the number of trials may vary:
T0 = sortrows(T0,'ID');
T0 = sortrows(T0,'Season','descend');
J = [find(diff([0;T0.ID])~=0);size(T0,1)]; % Indices for each ID change
rows = [];
for i = 1:length(J)-1
[~,j] = max(T0.JumpHeight(J(i):J(i+1)-1));
rows = [rows;J(i)+j-1];
end
T2 = T0(rows,:);

Peter Perkins
Peter Perkins on 3 Jul 2018
In more recent versions of MATLAB there are several ways to do this. In R2014a, do a grouped varfun, using @max as the function to apply, ID and Season as the grouping Variables, and JumpHight as the InputVariable.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!