Matching intermittant data to regular time base

3 views (last 30 days)
I've written a code that extracts meteorological data from the somewhat obtuse NCDC format and separates it into individual variables. I need these variables to be on a consistent hourly basis, but the input data varies. So I'm looking for a function that will match the input time base to a new one:
[oldtime olddata],[newtime] -> [newtime newdata]
so for every value in newtime if there's a matching value in oldtime then newdata gets populated with the corresponding value in olddata. There are times when oldtime is coarser than newtime, and times when it's finer.
Right now I'm doing this by brute force with a for loop stepping through the ~500,000 values of newtime - but this is hugely computationally inefficient and it seems altogether likely there that there's a simple way to do this if I only knew how to search for it.
Any suggestions would be appreciated!
  2 Comments
Walter Roberson
Walter Roberson on 9 May 2012
How would it be decided whether "there's a matching value in oldtime" ?
Geoff
Geoff on 10 May 2012
When the granularity is too fine, is it okay to discard readings? I guess I'm asking if it's acceptable to simply remove any time values from your data set that are not precisely on-the-hour (as I presume your target times are). This typically wouldn't be permissible for cumulative measurements like rainfall or insolation.
Do you collect many weather elements at once? Eg: if you have already mapped your input dates to your output dates for temperature, can you use the same mapping for wind speed etc?

Sign in to comment.

Answers (2)

bym
bym on 9 May 2012
Have you tried:
interp1 %?

Geoff
Geoff on 10 May 2012
I'm just throwing this out there... Here's what I do to match my weather data to a specified time scale. I do actually throw away readings though (but I don't really because mine are guaranteed to be at the required granularity or courser), so this might not be useful.
function merged = AddToTimeSeries ( masterEpoch, epoch, data )
% Merge the data with the time series. Iterate through the main epoch
% vector, and splice in values matching the supplied epoch.
tIt = 1;
eN = length(masterEpoch);
tN = length(epoch);
if iscell(data)
merged = cell(eN,1);
else
merged = nan(eN,1);
end
for eIt = 1:eN
% Find the next epoch that is greater than or equal to the master.
while tIt <= tN
if epoch(tIt) >= masterEpoch(eIt)
break;
end
tIt = tIt + 1;
end
% If the epoch matches the master, include the corresponding value.
if tIt <= tN && epoch(tIt) == masterEpoch(eIt)
merged(eIt) = data(tIt);
tIt = tIt + 1;
end
end
end
The time values I use are UNIX time values. I generate my target set in half-hour (1800-second) increments:
T = beginEpoch:1800:endEpoch;
And then, assuming I have a database query result that looks like this:
w.T [58786x1 double]
w.temperature [58786x1 double]
w.windspeed [58786x1 double]
w.rainfall [58786x1 double]
I can do this:
ts.temperature = AddToTimeSeries( T, w.T, w.temperature );
ts.windspeed = AddToTimeSeries( T, w.T, w.windspeed );
ts.rainfall = AddToTimeSeries( T, w.T, w.rainfall );
But actually, since my data shares a common epoch, I only have to map it once:
ts.temperature = nan( numel(T), 1 );
ts.windspeed = nan( numel(T), 1 );
ts.rainfall = nan( numel(T), 1 );
idx = AddToTimeSeries( T, w.T, 1:numel(w.T) );
valid = ~isnan(idx);
idx = idx(valid);
ts.temperature(valid) = w.temperature(idx);
ts.windspeed(valid) = w.windspeed(idx);
ts.rainfall(valid) = w.rainfall(idx);
So yes, it's a loop. Maybe there's a clever MatLabby way to do this stuff, but I never bothered to figure it out. This is about the first piece of MatLab code I ever wrote, and I regularly call it to map about 10 million rows.
In fact, I just tested it on my machine, despite there being another MatLab process training models right now and utilising all my cores to 98%. It took 0.45 seconds and 2 seconds respectively:
T = 1:500000;
tt = 1:2:500000;
tic;
AddToTimeSeries( T, tt, tt );
toc
tt = (1:4000000) / 8;
tic;
AddToTimeSeries( T, tt, tt );
toc
So, a second or two shouldn't break the bank, considering the kinds of things you do with weather data take monumentally more time.
Anyway, I realise that this entire answer may be of no use to you whatsoever... =) I'll get back to my work!

Categories

Find more on Data Import and Analysis in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!