Group data in one table to analyze data by set intervals in another table

Hello all,
I would like to create a script that would perform different statistical calculation on one table based on ranges given in another table.
Table A has two columns that include the intervals ("Start" in the first column and "Finish" in the second column). The number of rows of Start-Finish intervals can vary, based on data to be analyzed in the second table). Example of the first table (Table A):
Start Finish
80.6 82
106.5 108.9
134.2 136.5
157.2 159.6
201.3 205.2
...
I would like to create a script that would perform different statistical calculation on the second table (Table B) based on ranges given in Table A, e.g. analyze "Data1", "Data2", "Data3", "Data4" by looking up the ranges within "length" column. The example of the second table (Table B) is as follows :
Length,Data1,Data2,Data3,Data4
...
144.135,0.00,103.00,100.00,68.00,0.00
144.137,0.00,104.00,100.00,67.00,0.00
144.140,0.00,105.00,100.00,67.00,0.00
144.142,0.00,107.00,100.00,67.00,0.00
144.145,0.00,108.00,100.00,66.00,0.00
144.202,0.00,165.00,69.00,52.00,0.00
144.205,0.00,176.00,83.00,51.00,0.00
...
Table B can include as much as few millions of rows with data.
So, in the end, I would like to be able to make a table that would look like Table C below (example) with added columns of calculated statistics for given intervals:
Start Finish Data1SNR Data2SNR Data3Mean Data4Mean Data1STD ...ect
80.6 82
106.5 108.9
134.2 136.5
157.2 159.6
201.3 205.2
...
This seems to be a simple task, and I am beginning to learn Matlab, so I would greatly appreciate your help. Thanks.

 Accepted Answer

The fact that you have discontiguous length bins makes this more complicated than it otherwise would be. There are several ways to handle this, here's one:
Create a new variable in B, maybe called LengthBin, by calling discretize on B.Length, using a merged combination of A.Start and A.Finish as bin edges. Perhaps
edges = sort([A.Start; A.Finish]);
B.LengthBin = discretize(B.Length,edges);
You probably want to do something to account for the fact that discretize expects contiguous bins, and thus takes a list of "edges", whereas you have discontiguous bins and two lists of "lower edges" and "upper edges", but that's a minor detail. Maybe just throw out the rows in B where B.LengthBin is not an odd number. Perhaps
B(mod(B.BinLength,2)==0,:) = []
Then call varfun on B, using LengthBin as your grouping variable.
BStats = varfun(@mean,B,'GroupingVariable','LengthBin','InputVariables',{'Data1' 'Data2' 'Data3' 'Data4'})
Now add the column vector (1:n)' to A as a new variable called A.LengthBin, and join A and B, throwing away the LengthBin variables you no longer need.
BStats = join(A,BStats,'Key,'LengthBin','LeftVariables',{'Start' 'Finish'},'RightVariables',{<your stats variables>})
You'll probably call varfun more than once because you want to compute multiple stats on each variable. Just horzcat all the results. Hope this helps.

More Answers (0)

Asked:

on 29 Jun 2017

Commented:

on 29 Jun 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!