MATLAB Answers


Trouble using retime to obtain 1 year of hourly average values from 15 years of hourly average values

Asked by Osnofa
on 10 Apr 2019
Latest activity Commented on by Peter Perkins
on 9 May 2019
I have a long dataset (15 years of data) for several location. for the sake of the example I'm attaching a file with only 4 locations.
What is my problem? Well, I need to obtain the hourly average values for 1 year (letscall it a typical year). I have 15 years of hourly average values, and I want to average those 15 years (365 or 366 days * 24 hours) to 1 year of hourly average values (366*24 - including the non-regular years).
I was trying retime(TT,'hourly,'mean'), being TT the timetable obtained from the attached file. But this won't do the trick... I've been reading about retime and other possibilites and still did not managed to do it.
How should I proceed? or there is a better way to do this?


Sign in to comment.

3 Answers

Answer by Peter Perkins
on 10 Apr 2019
 Accepted Answer

Here's a sol'n that is along the lines of what Steve and Cris suggest, but uses varfun to create a table. I think groupsummary or findgroups/splitapply could be used to do the same thing.
>> tt = readtimetable('test123.txt');
tt.DoY = day(tt.Time,'dayofyear');
>> tt.HoD = hour(tt.Time);
>> head(tt)
ans =
8×6 timetable
Time Data1 Data2 Data3 Data4 DoY HoD
____________________ _____ _____ _____ _____ ___ ___
31-Dec-2000 23:30:00 0 0 0 0 366 23
01-Jan-2001 00:30:00 0 0 0 0 1 0
01-Jan-2001 01:30:00 0 0 0 0 1 1
01-Jan-2001 02:30:00 0 0 0 0 1 2
01-Jan-2001 03:30:00 0 0 0 0 1 3
01-Jan-2001 04:30:00 0 0 0 0 1 4
01-Jan-2001 05:30:00 0 0 0 0 1 5
01-Jan-2001 06:30:00 0 0 0 0 1 6
>> t = varfun(@mean,tt,'GroupingVariables',{'DoY' 'HoD'},'OutputFormat','table');
>> head(t)
ans =
8×7 table
DoY HoD GroupCount mean_Data1 mean_Data2 mean_Data3 mean_Data4
___ ___ __________ __________ __________ __________ __________
1 0 15 0 0 0 0
1 1 15 0 0 0 0
1 2 15 0 0 0 0
1 3 15 0 0 0 0
1 4 15 0 0 0 0
1 5 15 0 0 0 0
1 6 15 0 0 0 0
1 7 15 0 0 0 0


Show 1 older comment
Not 100% sure what you mean by "take 29th of february out of the equation" and where you would go from there, but perhaps
i = (tt.Month ~= 2) | (tt.Day ~= 29);
tt = tt(i,:)
Well, the phrasing was no the best.
I've tried:
tt.DoY(month(tt.Time)==2 & day(tt.Time)==29) = []
Still, I get hourly data from DoY 1 to DoY 366.
I wanted to remove/ignore february 29th before averaging everything and obtain averages from DoY 1 to DoY 365.
I don't know what this part
is supposed to indicate. You can't delete rows of one variable in a timetable, so as typed, that should be an error.
The Gregorian calendar is what it is. datetime does not support a version of it that doesn't have leap days. Maybe you want to delete all the 29-Feb rows from your timetable and add an additional variable for day of year that runs 1:365 for all years. Maybe that's what you've done already. Beyond that, I don't know what you are asking for.

Sign in to comment.

Answer by Cris LaPierre
on 10 Apr 2019
Edited by Cris LaPierre
on 10 Apr 2019

There are several options to try if you want to explore. The functions varfun and rowfun work on all variables in a timetable. It just might take a little to figure out what the appropriate grouping should be.
The other option is to use findgroups and splitapply. The challenge here is splitapply does not work on all variables in a table. Just on one. I fully admit I lost my motivation and just merged all the locations (separate variables when using readtimetable), into a single variable. Then it works.
The trick is to find the appropriate way to goup you data. With findgroups, I can use month, day, and hour from the Time data.
data = readtimetable('test123.txt');
% Found the simplest way was to recreate the matrix and then take mean
data = mergevars(data,1:width(data));
% Make sure all rows are hourly increments.
data = retime(data,data.Time(1):hours(1):data.Time(end));
% identify groups
G = findgroups(month(data.Time),day(data.Time),hour(data.Time));
% Find the average by hour and location for all data in group
func = @(x) mean(x,1,'omitnan');
hrAvgAll = splitapply(func,data,G);
I had to create a custom function for mean to include the options needed to handle your data.
I also admit I have not verified the result is correct. I leave that for you.


I assume it would also be nice to have the results in a time table. Add this bit of code to the end for that.
Note that I had to specify a year that is a leap year to create dates that included 29-Feb. I've set the display format to not show year.
Also, I'm not sure if you want to ignore the NANs so I've preceeded the new code with a version of splitapply that keeps them by using nanmean.
G = findgroups(month(data.Time),day(data.Time),hour(data.Time));
% Find the average by hour and location for all data in group. Include NaNs
func = @(x) nanmean(x,1);
hrAvgAll = splitapply(func,data,G);
tmFunc = @(x) unique(datetime(2000,month(x),day(x),hour(x),minute(x),second(x),'Format',"dd-MMM HH:mm:ss"));
tm = splitapply(tmFunc,data.Time,G);
hrAvgAllTT = timetable(tm,hrAvgAll);
If I understand the question correctly, it's NOT strictly possible to have the results in a timetable, because the groups are "unanchored" time. I think you've sort of cheated your way around that, which may or may not be a good thing. I've posted a sol'n that creates a table below.

Sign in to comment.

Answer by Steven Lord
on 10 Apr 2019

So if I understand correctly, you want to take the mean of (for example) your data points for 1 AM January 1st 2018, 1 AM January 1st 2017, 1 AM January 1st 2016, ... and have that be the final result for 1 AM January 1st, then do the same for 2 AM January 1st, 3 AM January 1st, etc.?
I think you can do this using groupsummary as long as you have grouping variables containing the hour of the day (computed using hour) and the day of the year (computed using day with the 'dayofyear' flag) in your timetable. See the "Multiple Grouping Variables" example on the groupsummary documentation page for an example that does the type of summarizing I think you want.


Hi Steven,
Yes, that is it. 366*24 rows of hour data.
Will check that example.
thank you.
Steven is correct.
retime is about re timing along a continuous timeline. What you're doing is grouping by hour within day, ignoring year. So you need some grouping variables as he and Cris suggest.
Yeah that is what i was starting to think but wanted to make it clear and understand which other option would exist. thanks.

Sign in to comment.