Building a "Typical Year" from averaging daily-hourly data over several years

2 views (last 30 days)
Ia ora na,
I have to build a "Typical Year" of solar GHI data. For that, I got an hourly dataset, spanning over almost 5 years. What I need would be to average GHI value for any Hour in a Day, of any Day in any Year (not Day in Month). For exemple, if I have :
DATE_TIME GHI
2016-01-01 09:00:00 150
2016-01-01 10:00:00 300
2016-02-01 09:00:00 400
2017-01-01 09:00:00 250
2018-01-01 09:00:00 200
I should get something like :
DAY_OF_YEAR_AND_TIME GHI
01 09:00:00 200
01 10:00:00 300
32 09:00:00 400
Because Jannuary 1st of 2016, 2017 and 2018 are the "same" day, whereas 1st of February isn't.
In the end, I should get a 365*24 rows (or more likely : 366*24 rows as I got in my dataset years 2016 and 2020 which are leap years with Feb. 29th in it).
I arranged my dataset in a matLab timetable format with variable named DATE_TIME and GHI. Attached file provided for exeample has been cutted down from 2016/01 to 2018/06.

Accepted Answer

dpb
dpb on 9 May 2021
Edited: dpb on 10 May 2021
OK, the conversation in the other thread turned into more interesting one than I initially anticipated; I thought it was just a very simple syntax of using grouping variables; turns out it's a little more in a couple of areas.
First, I was thinking there was a DOY() builtin function with the datetime class to return the serial calendar day of the year; turns out there's not at least thru R2020b.
So, first one has to generate an appropriate grouping variable; then there's the Q? of how to handle leapyears to get a 365-day year summary instead of 366. There's more in that in some ways, too...
My initial answer was (using variable names supplied by Paul in the other thread comment first posted)--
MUS_TT.Hour = hour(MUS_TT.DATE_TIME); % hour of day grouping variable
MUS_TT.DOY =(MUS_TT.DATE_TIME); MUS_TT.DOY.Format='DDD'; % duplicate time; format as DOY
MUS_TT.DOY = str2double(string(MUS_TT.DOY)); % convert display DOY to double
TYP_YEAR = varfun(@mean,MUS_TT,"InputVariables",'GHI',"GroupingVariables",{'DOY','Hour'});
The above returns the right(*) averages, but over a 366-day year because the data table includes leap years.
Paul then, at my recommendation, cleaned out all the Feb-29 data from the original table and then regenerated the DOY variable. But, of course (after reflection and being kicked in the shins with reality) this still generated a 366-day result because, just because there is no data there, the DATE_TIME variable was still the actual datetime value of the corresponding date and so the resulting DOY reflects that date--changing the format doesn't change the value. Obvious in retrospect, but I overlooked it initially thinking would eliminate by not having the data. Wrong! of course.
The solution is one of two options -- the simplest coding-wise, and that also uses the Feb-29 data by "smooshing" it in with Feb-28 is to add/subtract calyears(1) to/from each year that is a leap year in the data--
MUS_TT.DOY =(MUS_TT.DATE_TIME); MUS_TT.DOY.Format='DDD'; % duplicate time; format as DOY
MUS_TT.DOY(isleapyr(year(MUS_TT.DOY)))=MUS_TT.DOY(isleapyr(year(MUS_TT.DOY)))+calyears(1); % make non-leap year year
MUS_TT.DOY = str2double(string(MUS_TT.DOY)); % convert display DOY to double
Now DOY will range 1:365 as adding a calyear to a datetime date of 29-Feb will keep the time field unchanged but limit the day value to the maximum allowable number of days in the month for the year. IOW, 29-Feb-2016 HH:MM:SS --> 2-Feb-2017 HH:MM:SS and datetime then "knows" there are only 28 days in Feb that year so March 1 is day 59, not day 60.
If don't want to include the data for the leap years at all, then still must do the above to get the right DOY count to have only 365 days in the grouping variable as well as remove the data from the table before applying varfun.
IFF one had complete data records for every year and all began at the first hour of the year, then one could just populate the DOY grouping variable with 1:365 for each repeated by the number of samples per day, but that's fraught with special cases if any data are missing or the sample rate isn't precisely uniform, etc., etc. While a little of a pain to have to make the above fixup, it's not dependent upon anything except getting the non-leapyear year into the datetime field first.
(*) For a paricular definition of "right". The two leap years will push all their data past Feb 28 down one day from the non-leap years so the averages, while technically correct given the real DOY for every year, aren't precisely what is wanted which is that March 1 should line up for all years, whether leay year or not.
  1 Comment
dpb
dpb on 9 May 2021
OBTW, another inexplicably missing utility...
>> type isleapyr
function is=isleapyr(yr)
% returns T for input year being a leapyear
% dpozarth -- from dark ages; modified to handle new datetime class
if isdatetime(yr), yr=year(yr); end
is=eomday(yr,2)==29;
>>

Sign in to comment.

More Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!