Filtering max values within every 3 hours of data
Show older comments
Hey everyone,
I attached the first 50 rows of a sample dataset. My time data is in the format of '2000-01-01 01:00' where the last 5 characters hold the hour (HH) and minute (MM) values respectively. I would like to have a code that goes through the time column, and assigns the maximum value of the second column between the 01,02,03 hours to the 03 hour timestep. I need to then have this done for the 04 - 06 time period and so forth until 22 - 00.
At the end the cell array would not need to keep data from other columns and should just be XXXX x 2 cell array and just reassigns the maximums to the 03,06,09 ...... 00 timesteps. There may also be missing hours, so the script would need to accommodate for that.
example:
'1984-01-01 00:00' 10.80 240 20.06 231 0.8
'1984-01-01 01:00' 11.83 230 20.06 6 0.6
'1984-01-01 02:00' 11.31 230 18.51 100 0.6
'1984-01-01 03:00' 10.80 230 18.51 212 0.7
'1984-01-01 04:00' 9.77 240 18.65 348 0.8
'1984-01-01 05:00' 9.25 240 16.97 406 0.8
'1984-01-01 06:00' 9.77 230 17.48 524 0.7
'1984-01-01 07:00' 9.77 230 16.97 612 0.7
would then become:
'1984-01-01 03:00' 11.83
'1984-01-01 06:00' 9.77
I am completely lost in writing this and any help would be appreciated!
1 Comment
Cedric
on 11 May 2014
Well, not all answers below will work with missing data, but if you work for understanding them all (with their limitations), you'll learn a lot about data manipulation in MATLAB ;-)
Accepted Answer
More Answers (2)
dpb
on 11 May 2014
An alternate approach...
>> dat=circshift(dat(:,1:2),-1); % place the first of groups of three on first row
>> ddat=cell2mat(dat(:,2)); % convert numeric to array for convenience(+)
>> [~,imx]=max(reshape(ddat(1:fix(length(dat)/3)*3),3,[])); % find max for each group
>> res=dat(imx+[0:3:(length(imx)-1)*3],1:2) % select those from full array
res =
'1984-01-01 01:00' [11.8300]
'1984-01-01 04:00' [ 9.7700]
>>
I didn't read Cedric's answer carefully enough to unequivocally answer your question re: missing values; I'd suggest trying it for a sample dataset and see how it works.
For correct operation of the above you will need to fill in missing data "holes" if they do exist similarly as to how Cedric built a date vector from your first to last to ensure the "divide by 3" works to build the right number of columns.
Andrei Bobrov
on 11 May 2014
Edited: Andrei Bobrov
on 11 May 2014
load upload.mat
a = datatoupload;
[~,~,~,hh] = datevec(a(:,1));
hh(hh==0) = 24;
h2 = (rem(hh-1,3)+1)==1;
h2(1) = 1;
ii = cumsum(h2);
[~,b] = unique(ii,'last');
out = [a(b,1) accumarray(ii,cat(1,a{:,2}),[],@(x){max(x)})];
add variant
t = datenum(1984,1,1,3*(0:ceil(diff(datenum(a([1,end],1)))*24/3))',0,0);
t0 = datenum(a(:,1));
t1=t+eps(1e6);
[~,ii] = histc(t0,t1);
out = [cellstr(datestr(t,'yyyy-mm-dd HH:MM')),...
accumarray(ii+1,cat(1,a{:,2}),[],@(x){max(x)})];
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!