Is There a Way to Execute splitapply Functionality on Subtables of Master Table?

Suppose I have a function that operates on a table and returns a row vector:
function rowvec = myfunc(Table)
Suppose I have a master table, call it T with one of its variables being Name. I'd like to do something like the following to group by Name and concatenate rowvec computed from each subgroup:
G = findgroups(T.Name);
R = splitapply(@myfunc,T,G);
This won't work because splitapply sends the group of each variable in T to myfunc and not the subtable of T defined by G.
Is there already a function that does what I'm trying to do?
Or do I have to use the code here: https://www.mathworks.com/matlabcentral/answers/457422-separate-table-data-in-to-sub-tables to generate the cell array of subtables, loop over the subtables with a call to myfunc, and then concatenate the rows myself? Or maybe use cellfun on the cell array of subtables?

2 Comments

I don't follow what's wrong with splitapply?
Show us an example that doesn't do the right thing.
>> x1=rand(4,1);x2 = rand(4,1);colors=["red"; "green"; "red"; "green"]; % create some data
>> T=table(colors,x1,x2); % create a table
>> myfunc = @(T) ([sum(T.x1) sum(T.x2)]); % function that returns the sums of the data columns as row vector
>> myfunc(T) % show that it works
ans =
2.28 2.44
>> [G,Vars]=findgroups(T.colors); % group by color
>> splitapply(myfunc,T,G) % try to sum the data for each color
Error using splitapply (line 132)
Applying the function '@(T)([sum(T.x1),sum(T.x2)])' to the 1st group of data generated the following error:
Too many input arguments.
According to the doc, this workflow should not work because myfunc is expecting one input, but splitapply is trying to pass all three variables in T to myfunc. So no complaints there. I was hoping that there is a built-in that would give me the desired output:
>> [myfunc(T([1 3],:)) ; myfunc(T([2 4],:))] % desired result
ans =
1.22 1.45
1.06 1.00
I was able accomplish this by using Walter Roberson's code in the link in the OP to generate a cell array of subtables grouped by color, and then using cellfun(myfunc, etc) on the resulting cell array. So it wasn't too bad; I was just looking for something built-in.

Sign in to comment.

Answers (1)

you'd better do this way:
splitapply(@(x) sum(x, 1), T{:, 2:3}, G);

5 Comments

I was using a simple function to illustrate the problem. In reality, myfunc is more complicated and is expecting a table T with a certain structure. As an alternative example, maybe myfunc is supposed to dispatch based on the color of the input;
function rowvec = myfunc(T)
if T.colors{1} == "green"
rowvec = dogreenfunction(T);
elseif T.colors{1} == "red"
rowvec = doredfunction(T);
end
You shouldn’t use splitapply as it accepts matrix or cell array. Table is not a type that it accepts.
Another scenario, you have to feed splitapply a matrix or cell from you table. As long as you know which input is corresponding to for example the color property you can adapt your function that way. Will give an example later. Using cellphone now.
" myfunc is more complicated and is expecting a table T with a certain structure"
That's not the input form splitapply expects for the function inputs for a table -- from the doc:
T — Data variables
table
Data variables, specified as a table. splitapply treats each table variable
as a separate data variable.
So, if your table has N columns, the function must accept N input variables; each of the appropriate type for those in the table by column.
Based on my original post, one of my comments, and both of your comments, we all agree that splitapply doesn't work for the workflow I want to execute, which is:
generate a group vector for a table
generate subtables from that table based on the group vector
apply a function to each subtable
conctatenate the results from that function as applied to each subtable.
What I was asking, or trying to ask if I wasn't clear, is if there is a different built-in function that can accomplish this workflow. I couldn't find one in the doc, but thought I'd ask here.
For now, I have an approach that works well for my application, which is to use Walter Roberson's code to generate a cell array of the subtables and then use cellfun to apply myfunc to each subtable.
It is not clear why you want to subtable your table before splitapply. splitapply is supposed to finish the job for you. Anyway, I'm happy that you found out the solution for your problem!

Sign in to comment.

Categories

Products

Release

R2019b

Asked:

on 9 May 2020

Commented:

on 11 May 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!