Combining multiple test results based on ID’s and categories

7 views (last 30 days)
Hello,
I have a 2400x10 table (excel file) that I need help to work with. The first 3 columns are categorical while the rest are numerical (double) arrays. The table contains multiple combinations of the categorical A (20 categories/IDs/persons), B (4 categories/test areas) and C (20 test#) while the rest are different test results, etc.
Using table2struct, I have converted it to a 2400x1 structure with 10 fields. I wonder if I can:
a) organise a new 20x1 structure with the unique A categories as rows and then have means of the 20 tests for each B category as a column?
b) get means for all the different combinations of A,B while maintaining the entire structure
I have tried finding unique values of A but can’t seem to find out how exactly to do this.
Thanks for your help!
- Danny
  5 Comments
Star Strider
Star Strider on 30 Sep 2018
I agree that retaining your data as a table is likely the best option.
Then, I would begin by using the findgroups (link) function, then use splitapply and other functions as necessary. These are all linked to in and at the end of the findgroups documentation page.
We can’t provide more definitive help without more definitive information (and ideally your Excel file).

Sign in to comment.

Answers (1)

jonas
jonas on 1 Oct 2018
Edited: jonas on 1 Oct 2018
As previous posters already pointed out, you will want to keep your data in a table which are ideal for dealing with categorical data. The function grpstats is your best friend here.
"get means for all the different combinations of A,B while maintaining the entire structure"
Consider T to be your table. If I understand correctly, you want the first two columns as grouping variables when calculating the mean of the last 7 (?) columns. This line should output a nice little summary table:
grpstats(T,[1 2],'mean','DataVars',[4:10])
where the second arguments is an array of column numbers of your grouping variables, the third argument is the statistics of your choice (multiple inputs possible, e.g. {'mean','std','max'}) and the last name-value pair specifies the column numbers that you want to apply those statistics on (those columns must contain numerical data).

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!