Combining multiple test results based on ID’s and categories
Show older comments
Hello,
I have a 2400x10 table (excel file) that I need help to work with. The first 3 columns are categorical while the rest are numerical (double) arrays. The table contains multiple combinations of the categorical A (20 categories/IDs/persons), B (4 categories/test areas) and C (20 test#) while the rest are different test results, etc.
Using table2struct, I have converted it to a 2400x1 structure with 10 fields. I wonder if I can:
a) organise a new 20x1 structure with the unique A categories as rows and then have means of the 20 tests for each B category as a column?
b) get means for all the different combinations of A,B while maintaining the entire structure
I have tried finding unique values of A but can’t seem to find out how exactly to do this.
Thanks for your help!
- Danny
5 Comments
@Danny: tables are perfect for grouping mixed data, applying functions to those groups, merging, and those kind of things (much like R). Sadly I don't have much experience with them, but if you have some patience (weekends are a bit slow), I am sure that someone will show you how.
Danny
on 30 Sep 2018
Star Strider
on 30 Sep 2018
I agree that retaining your data as a table is likely the best option.
Then, I would begin by using the findgroups (link) function, then use splitapply and other functions as necessary. These are all linked to in and at the end of the findgroups documentation page.
We can’t provide more definitive help without more definitive information (and ideally your Excel file).
Answers (1)
As previous posters already pointed out, you will want to keep your data in a table which are ideal for dealing with categorical data. The function grpstats is your best friend here.
"get means for all the different combinations of A,B while maintaining the entire structure"
Consider T to be your table. If I understand correctly, you want the first two columns as grouping variables when calculating the mean of the last 7 (?) columns. This line should output a nice little summary table:
grpstats(T,[1 2],'mean','DataVars',[4:10])
where the second arguments is an array of column numbers of your grouping variables, the third argument is the statistics of your choice (multiple inputs possible, e.g. {'mean','std','max'}) and the last name-value pair specifies the column numbers that you want to apply those statistics on (those columns must contain numerical data).
Categories
Find more on Workspace Variables and MAT Files in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!