Calculate average for each hour in day

I have a bunch of data in datetime format similar to this:
2010-Nov-30 18:00.00 0 8.7 17.05 61.38
I have summed all the data into hours, but I would like to calculate the average at every hour 00:00, 01:00, 02:00... 23:00 for every day in the year.
I have data for a whole year, so I'm interested in seeing the average values in a day, per hour, with the data from the whole year.
I've tried to use retime but I can't really get it to function correctly. Maybe it's the wrong function for this purpose.

3 Comments

retime is definitely the way to go. Why didn't it work for you?
I can't get it to group all the hours from the whole year into on matrix that's 24-by-x with the mean from every hour.
Do you have a hint? :)
Could you show us what you tried and describe the input variables used? It will be easier to help you tweek that instead of creating a different example out of our imaginations that may not even be suitable with your data.

Sign in to comment.

 Accepted Answer

dpb
dpb on 20 Jun 2020
IIUC, retime is not the tool for this; it bins data into desired time segments such as hourly but averages within those bins.
To compute an average of all observations in the dataset at each exact hour, use a grouping variable and varfun or groupsummary or grpsummary (latter requires Statistics Toolbox and has essentially been replaced by former in base MATLAB).
You will have to group by sufficiently-fine resolution to isolate the one actual top-oif-the-hour reading by hours(), minutes(), seconds(), ... depending upon the time-of-day readings in the dataset.

6 Comments

I may have misinterpretted the question.
If the goal is to compute the hourly avg across some time frame, let's say a year, so that there are 8760 mean values, one for each hour of the year, then you can use
% Create timetable
TT = timetable(datetime(2019,6,20)+minutes(linspace(0,525600,50000)'), rand(50000,1), 'VariableNames',{'Data'});
% Hourly avg
TT_hourly = retime(TT,'hourly', 'mean');
If the goal is to compute the average by hour-of-day combining all days, so that you have 24 mean values across the entire year,
% Create timetable
TT = timetable(datetime(2019,6,20)+minutes(linspace(0,525600,50000)'), rand(50000,1), 'VariableNames',{'Data'});
% Identify the hour of each datetime value. This shifts all datetime
% values to the *start* of the hour.
hourNumber = hour(dateshift(TT.Time,'Start','Hour'));
% If you have the Stats & ML toolbox,
hourlyMeans = grpstats(TT.Data,hourNumber);
T = table(unique(hourNumber), hourlyMeans, 'VariableNames', {'Hour', 'Mean_Data'})
% Same thing without S&ML toolbox
[hourIdx, hourGroups] = findgroups(hourNumber);
hourlyMeans = splitapply(@mean, TT.Data, hourIdx);
T = table(hourGroups, hourlyMeans, 'VariableNames', {'Hour', 'Mean_Data'})
% Result
% T =
% 24×2 table
% Hour Mean_Data
% ____ _________
% 0 0.49151
% 1 0.49628
% 2 0.49473
% 3 0.49118
% 4 0.51048
% 5 0.49791
% 6 0.49877
% 7 0.501
% 8 0.50227
% 9 0.50334
% 10 0.49643
% 11 0.50641
% 12 0.49736
% 13 0.50344
% 14 0.50906
% 15 0.50823
% 16 0.50117
% 17 0.49397
% 18 0.509
% 19 0.49521
% 20 0.49733
% 21 0.5005
% 22 0.49892
% 23 0.49418
Thanks for the inputs!
I have tried to use varfun and GroupingVariables and here is a snippet of my output. I was expecting column 2 to consist of about 300 in each from 00:00:00 to 23:00:00 and calculated the mean of all the values that has, as an example, 13:00:00.
A=varfun(@mean,TT,'GroupingVariables', 'Time')
TT is a timetable that basicly looks the same as what's beneath here, but without the column of 1s.
A=
'00:00:00' 1 0 0 17.90 26.6150000000000
'01:00:00' 1 0 0.4500 17.800 26.9866666666667
'02:00:00' 1 0 0 18.066 41.2250000000000
'03:00:00' 1 0 0.4666 17.95 45.5250000000000
'04:00:00' 1 0 0 17.81 40.520
'05:00:00' 1 0 0.03333 17.750 41.3300000000000
'06:00:00' 1 0 0.4333 5.68 28.1700000000000
'07:00:00' 1 0 0 0 22.2033333333333
'08:00:00' 1 0 0.4666 0 22.9183333333333
'09:00:00' 1 0 0 0 17.5100000000000
'10:00:00' 1 0 0.116 0 9.76833333333334
'11:00:00' 1 9.033 6.966 13.583 44.2016666666667
'12:00:00' 1 9.7000 0.5166 9.4166 34.3300000000000
'13:00:00' 1 0 29.316 1.933 49.1333333333333
'14:00:00' 1 0 4.433 18.133 30.3250000000000
'15:00:00' 1 0 1.0500 10.116 17.4916666666667
'16:00:00' 1 0 0 0 14.0966666666667
'17:00:00' 1 0 0.466 0 34.4500000000000
'18:00:00' 1 0 0 0 48.0066666666667
'19:00:00' 1 0 0.2 7.666 50.1200000000000
'20:00:00' 1 0 0.183 17.31 47.8850000000000
'21:00:00' 1 11.48 0 14.783 45.0766666666667
'22:00:00' 1 10.800 0.466 0 22.1066666666667
'23:00:00' 1 0 0 0 6.59666666666667
'00:00:00' 1 0 0.466 0 6.31000000000000
'01:00:00' 1 0 0 0 5.92166666666666
'02:00:00' 1 0 0 0 5.43166666666666
'03:00:00' 1 0 0.46 0 6.11666666666667
'04:00:00' 1 0 0 8.6833 14.4850000000000
'05:00:00' 1 0 0.4500 0 6.04833333333333
'06:00:00' 1 0 0 0 18.8566666666667
'07:00:00' 1 0 0 10.53 35.6866666666667
and about 80000 more rows similar to this.
So, my goal is to only have 24 rows, each consisting of the time and the the mean of the variables.
Hope it clears it up, and again, thanks for the input!
dpb
dpb on 20 Jun 2020
Edited: dpb on 20 Jun 2020
That's 'cuz you grouped on the Time vector itself -- which is unique for each element -- NOT unique for each hour for the collection of days. With a datetime variable, even if the display format shows only the time, the date portion is still associated with the value itself, it's only the output format that was changed.
You can see this two ways -- first, indirectly by the observation that there's only a single value in each group returned from varfun; secondly, directly by setting the .Format property for the date back to 'default' which will show the full datetime value, not just the output format form requested.
Since you have hourly observations, the above solution by AD will work -- I didn't check for sure, but I'm pretty sure if you had observations every 15 minutes, say, the use of hour(dateshift()) would have the effect of averaging all four of the observations at each hour (0, 15, 30, 45 min), not just the 0 minutes at top of the hour observation.
That might also be the desired way to treat the data, too, ... just noting a possible behavior if you were to have another set of data besides the specific shown.
Ohhh I see! Thanks for pointing that out!
My solution is now simply to set the date to 0 when I'm making my datetime.
t = datetime(0,0,0,dataAR(:,4),dataAR(:,5),dataAR(:,6),'Format','HH:mm:ss');
This seems to do the job!
Thanks alot for the explanation!
Have a good evening (depending on your timezone) :)
The better solution is probably to just use hour() of the real datetime -- that will return a double from 0-23 for each.
Since you have only the one hourly observation, you won't need the retime step.

Sign in to comment.

More Answers (1)

A very simple way to solve this if you have whole days starting at 00:00 is to use reshape. Its a bit old school but I like it! However I have come onto this page to solve this problem for data which doesn't start at 00:00 (not my choice!)
NHours=size(VarA,1);
NDays=NHours/24;
VarA_AH=nanmean(reshape(VarA,[24,NDays]),2);

Categories

Products

Release

R2020a

Asked:

on 20 Jun 2020

Answered:

on 22 Jul 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!