Splitting up large arrays based on datetimes without using loops

Hi, I've a large dataset consisting of 10min samples and an acompanying datetime array spanning many years on which I wish to perform certain functions on each month. Is there a way to operate on each individual month without using nested loops? I wish to calculate the skewdness and kurtosis every month for every column in the dataset and then store the results to run control charts on and update at a later date. Thanks in advance!

3 Comments

What's wrong with nested loops? Without knowing, how the data are represented in your "dataset", it is hard to suggest some code for processing it. I'd expect findgroup and splitapply to solve this problem without creating explicit loops.
hi Jan. primarily nested loops are slow and cumbersome to code. I tried changing the datetime format to "yyyymm" and using accumarray but it only returns zeros! See below code snip
d = temp_struct.timestamps.(turbines{i_wec});
d.Format = 'yyyyMM';
temp_subs = datenum(d);
temp_vals = temp_struct.(atrib_name{i_atrib}).(turbines{i_wec})';
test = accumarray(temp_subs, temp_vals,[], @kurtosis);
I think this should work but not sure why it doesn't now. note vals is a 52704x1 array of double. In this instance "test" is a 736696x1 array of doubles all zero! Not sure why its so much bigger either.
Apparently you have a function kurtosis already. One way to debug calls to ACCUMARRAY (assuming that you already checked out that indices are fine) is to output a cell array of grouped values:
groups = accumarray(temp_subs, temp_vals,[], @(x){x});
so you can checkout what is passed to your aggregation function. If all groups are empty there is an issue with your IND and/or VAL inputs. If groups make sense, the issue is with your aggregation function.

Sign in to comment.

 Accepted Answer

Here's how you would do this using a table and varfun:
>> t = table(datetime(2017,1,randi(365,20,1)),randn(20,1),'VariableNames',{'Date' 'Value'})
t =
20×2 table
Date Value
___________ ________
05-Mar-2017 2.1778
23-May-2017 1.1385
31-Oct-2017 -2.4969
21-Oct-2017 0.44133
23-Jan-2017 -1.3981
[snip]
>> t.Month = month(t.Date)
t =
20×3 table
Date Value Month
___________ ________ _____
05-Mar-2017 2.1778 3
23-May-2017 1.1385 5
31-Oct-2017 -2.4969 10
21-Oct-2017 0.44133 10
23-Jan-2017 -1.3981 1
[snip]
>> varfun(@mean,t,'GroupingVariable','Month','InputVariables','Value')
ans =
10×3 table
Month GroupCount mean_Value
_____ __________ __________
1 2 -0.3667
2 1 0.32321
3 3 0.41779
4 1 -0.48094
5 4 -0.12632
6 3 0.97795
7 1 0.1644
8 2 0.65163
10 2 -1.0278
12 1 0.085189

More Answers (1)

If you have your data stored in a timetable, use retime. Specify @skewness or @kurtosis as the aggregation method, assuming you have Statistics and Machine Learning Toolbox available. If you don't, you will need to write your own functions to compute those statistics and specify those as the aggregation method when you call retime.

1 Comment

@ Steven I only have Matlab2015, so that solution wont work. : (

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!