Combining multiple test results based on ID’s and categories

Hello,
I have a 2400x10 table (excel file) that I need help to work with. The first 3 columns are categorical while the rest are numerical (double) arrays. The table contains multiple combinations of the categorical A (20 categories/IDs/persons), B (4 categories/test areas) and C (20 test#) while the rest are different test results, etc.
Using table2struct, I have converted it to a 2400x1 structure with 10 fields. I wonder if I can:
a) organise a new 20x1 structure with the unique A categories as rows and then have means of the 20 tests for each B category as a column?
b) get means for all the different combinations of A,B while maintaining the entire structure
I have tried finding unique values of A but can’t seem to find out how exactly to do this.
Thanks for your help!
- Danny

5 Comments

Using a table would make a) much easier, and possibly b) too. Is there a particular reason why you need to store the data in a structure?
None at all. I started with a table but thought it would be easier to work with if I had structures with IDs consisting of different test subjects and then different tests much inspired by this . So that's why I was aiming for a).
I'm open to any suggestions and I'm eager to learn. I've tried grouping the categories in the table but haven't successfully done so so I could retrieve the means as I wanted.
Thanks again!
@Danny: tables are perfect for grouping mixed data, applying functions to those groups, merging, and those kind of things (much like R). Sadly I don't have much experience with them, but if you have some patience (weekends are a bit slow), I am sure that someone will show you how.
I agree that retaining your data as a table is likely the best option.
Then, I would begin by using the findgroups (link) function, then use splitapply and other functions as necessary. These are all linked to in and at the end of the findgroups documentation page.
We can’t provide more definitive help without more definitive information (and ideally your Excel file).

Sign in to comment.

Answers (1)

As previous posters already pointed out, you will want to keep your data in a table which are ideal for dealing with categorical data. The function grpstats is your best friend here.
"get means for all the different combinations of A,B while maintaining the entire structure"
Consider T to be your table. If I understand correctly, you want the first two columns as grouping variables when calculating the mean of the last 7 (?) columns. This line should output a nice little summary table:
grpstats(T,[1 2],'mean','DataVars',[4:10])
where the second arguments is an array of column numbers of your grouping variables, the third argument is the statistics of your choice (multiple inputs possible, e.g. {'mean','std','max'}) and the last name-value pair specifies the column numbers that you want to apply those statistics on (those columns must contain numerical data).

Products

Asked:

on 29 Sep 2018

Edited:

on 1 Oct 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!