Linking variable name/symbol to sequential retrieval of FRED data files.

As a relatively new user of Matlab I would greatly appreciate suggestions on how to: 1) Assign a unique variable name to data (values) retrieved for lists of multiple items from the FRED data base and 2) Save each item as a separate .mat file. FRED delivers the data for each item as a table without its identifier. I am currently retrieving the information using the following loop:
url = 'https://fred.stlouisfed.org/';
c=fred(url);
c.DataReturnFormat='table';
c.DatetimeType='datetime';
batch = {"WPU1141","WPU1144","WPU1174","WPU3021","WPS302201","WPS511104",...
"WPU512101","WPU067904","WPU106","WPU1423"};
for ind = 1:length(batch)
batchud = batch{ind}
%startdate='01-01-2020';
%enddate='01-01-2022';
d=fetch(c,batchud);
data=d.Data{1};
TT=table2timetable(data)
recent=tail(TT,12)
end
close(c)
The output confirms that the loop is correctly reading the batch list and converting each entry to a timetable. Since each one is in turn labeled "TT", the variable name needs to be changed in order to save separately in the loop. I have tried a number of possibilities but obviously have not located the correct function, syntax, or punctuation. Any thoughts would be most welcome. Virtually all of the raw data I work with will be timetables.
Chris

15 Comments

These all a monthly series over same time frame, by any chance?
If so, I'd probably combine the data into one table as columns using the batch name as the variable name.
If they're not commensurate in time, couple of choices, but simplest would be to just save each as a file with the batch name as the root filename.
I manually downloaded one; don't have the Datafeed TB and wasn't aware of the data source, either.
url = 'https://fred.stlouisfed.org/';
c=fred(url);
c.DataReturnFormat='table';
c.DatetimeType='datetime';
batch = {"WPU1141","WPU1144","WPU1174","WPU3021","WPS302201","WPS511104",...
"WPU512101","WPU067904","WPU106","WPU1423"};
for ind = 1:length(batch)
d=fetch(c,batch{ind});
TT=table2timetable(d.Data{1});
TT.Properties.VariableNames=batch{ind};
save(TT,batch{ind})
end
close(c)
will put the batch in as variable name and write each in turn to a .mat file by that name as well.
We could be more sophisticated if knew answer to first Q?
dpb--the answer to your q is yes, this particular series are all on monthly freq and return the same dates. I hadn't tried the specific indexing syntax you propose but looks like it should work better than my earlier versions--will try this afternoon. The challenge with FRED data is that the value columns for each item are all represented (delivered) as Var2. So most aggregation methods fail. That is one other reason I'd like to attach a series identifier to each iteration before saving as *.mat, e.g., to avoid separately renaming each downloaded file. Other data vendors deliver in a format that transfers the identifier to Matlab. There will be hundreds of files in each data class that I use. Thank you very much for the input.
dpb--we have a partial solution, still working on it. Your second proposal does convert the first Var2 to the first item identifier. The first line of the"for" loop moves to item 2 but hangs on the TT.(batch{})... statement saying "index exceeds the number of array elements (1). Haven't found a solution to this. The series title can be extracted (I now discover) by summoning d.Title{}. Possibly easier to convert that to the timetable name.
Hmmm....set a breakpoint there on that line
TT.(batch{ind})=d.Data{1};
and show what
whos TT d
returns as well as
head(TT)
head(d)
(I'm assuming d is a table???)
Too bad don't have the toolbox here; remote debugging is much tougher when can't replicate the system.
Yes, d is always a 1x11 table. Output doesn't copy so here is the info requested:
TT 888x2 timetable
d 1x11 table
head TT returns first 8 entries Var1 (i.e.,time) followed by data for WPU1141 and "1x2 table", not data, in every cell for WPU1144. Head d returns the structural array (not correctly formatted below). These are the first four cells of 11:
Title SeriesID Source Release
_______________________________________________________________________________________________________ ____________ ____________________________________ _________________________
{' Producer Price Index by Commodity: Machinery and Equipment: Industrial Material Handling Equipment'} {' WPU1144'} {' U.S. Bureau of Labor Statistics'} {' Producer Price Index'}
Hope this helps. Thanks a million.
Stop in debugger at the line and
save 'state'
will save the entire context of the function at that point to a .mat file "state.mat"
Attach that file with the paperclip icon...then I can poke directly at what d is to figure out the dereferencing issues...it's not clear to me how the table inside a table is getting generated by the above...in particular since it is in a loop what is different the second pass from the first.
Attach your actual m-file as well so have an accurate copy of it as well.
The information we need is "indexable" via FRED whether we use seriesID (the characters in the batch variable) or Title (the plain language names in d). I can extract contents of Title from FRED but have not found a syntax that ML will accept as a variable name to replace the TT designation. It's the same issue initially encountered with the items in batch. Back shortly with *m.
Possible insight. The Data{1} syntax refers to a column in the Data field, which has 2 according to output in my workspace. I can extract the contents of Title by referencing Title alone.
OK, I see. Having the actual returned table d helps. What I'm still uncertain about is why the first worked as intended since (I think?) the code was in a loop so the same dereferencing should have happened both passes.
Anyways, try the following for the assignment -- BTW, the returned d.SeriesID series name string contains a leading blank that will be significant in the column names if used as is; since you've got to have the list of desired series anyway, I think I'd just continue to use it as the name generator.
The following should then work--and I see from the above code why the first is different -- thought you were using a version I posted later that has the first case already handled:
url = 'https://fred.stlouisfed.org/';
c=fred(url);
c.DataReturnFormat='table';
c.DatetimeType='datetime';
batch = {"WPU1141","WPU1144","WPU1174","WPU3021","WPS302201","WPS511104",...
"WPU512101","WPU067904","WPU106","WPU1423"};
d=fetch(c,batch{1}); % get the first series only
TT=table2timetable(d.Data{1}); % create the timetable
TT.Properties.VariableNames=batch{1}; % set first variable name
for ind = 2:length(batch) % now get the rest of the series, append to TT
d=fetch(c,batch{ind}); % get the first series only
TT.(batch{ind})=d.Data{:}.Var2; % dereference the cell containing the table; get the Var2 data
end
close(c)
save(TT,'WHATEVERGLOBALNAMEWANT')
I should have recognized what d was sooner, but sometimes hard to wrap head around... :)
d.Data{:}
returns the table; you can reference the return value of it same as if saved as a variable so the dot addressing to the internal variable name works. It is returned as a default table with VarN as the variables names so just hardcode 'Var2'
I probably forgot to restore the script to that last version we discussed. I was experimenting with syntax and the structure of the program to figure out what was different about the last line of the loop. The output from one of those runs revealed that the Data field did contain 2 columns, while the other fields in d obviously didn't need them or the {1} argument. So, yeah, that was not something you see every day (head scratch). The documentation for retrieving the data values doesn't note this.
Good catch on the empty space...actually every header in that table has the same gap after the first apostrophe. Weird. Also I had earlier tried a statement similar to the last line in the loop above but the syntax wasn't valid.
I just ran this loop and it looks like it will probably work. I say probably because the loop pauses while processing item#4. The error message is "to assign or create a variable...the # of rows must match the height of the table." Turns out item #4 is shorter than the first four. Unfortunately, this will often be the case with fundamental series. But, clearly your script works. Excellent work! Learned a lot.
""to assign or create a variable...the # of rows must match the height of the table." Turns out item #4 is shorter than the first four. "
Well, in that case, the fundamental assumption that each of these series is commensurate in the time vector is not upheld and so incorporating them together in one timetable isn't appropriate after all. That was why asked the Q? earlier.
In this case you'll need to either
1. Create the initial time series that is the maximum number that is in the whole series and then place the data from each in turn at the correct location in the series with a missing value indicator for those not present; or
2. Revert to the earlier path shown of saving each as its own timetable and either writing each as a separate file or create a cell array of tables.
My original trial code was run on data with consistent time vectors but I was designing it for the reality of data files of different length. This was why the save was inside the loop. The error message(s) flagged the syntax I used to try to assign variable names or series ID to the time table. This was my larger problem considering that it is very easy to aggregate tables, etc., of equal height. So, the major obstacle was resolved--much appreciated.
At this point I am only trying to repopulate data bases that I have used on a different software platform. Creating uniform column heights w/ NAs and other grooming can wait for another day. This loop will be a huge time saver as my data needs are substantial!
OK. However, I will note that I think you could get that aggregation while doing this by only one or two more lines of code. If you were to go ahead and create a timetable for each as were on download, then retime it to the full time interval in the one big TT, you'd have both worlds...the data series all in one convenient file to be obtained at will plus the commensurate time lengths for all.
Just a thought -- the future may not be so far away or such a stretch as it seems... :)
Will give it a shot once I get all the data into ML. As it happens, the initial stages of my analysis are within each series and only later across/between. I have used retime for simpler loops and will be sure to enlist when I get to it. BTW, that batch of series IDs was originally limited to the first 3 items because they were = height. I had started adding the remaining items in anticipation of expanding my database once the syntax issues were solved--didn't clarify that at all. Great job dpb :-) .
OK, you know your needs best, of course.
Just didn't want my previous comment to sound more complicated than it really is and thereby discourage you from getting something sooner rather than later in thinking would be difficult to match series.

Sign in to comment.

 Accepted Answer

"-the answer to your q is yes, this particular series are all on monthly freq and return the same dates."
Ah! Was hoping for that to be the case -- then do something more like
url = 'https://fred.stlouisfed.org/';
c=fred(url);
c.DataReturnFormat='table';
c.DatetimeType='datetime';
batch = {"WPU1141","WPU1144","WPU1174","WPU3021","WPS302201","WPS511104",...
"WPU512101","WPU067904","WPU106","WPU1423"};
d=fetch(c,batch{1});
TT=table2timetable(d.Data{1});
TT.Properties.VariableNames=batch{ind};
for ind = 2:length(batch)
d=fetch(c,batch{ind});
TT.(batch{ind})=d.Data{1};
end
close(c)
save(TT,'GLOBALBATCHDATA')
The above will return one timetable with the variables catenated by column and by name...I'd think much more convenient than a whole bunch of files.
NB: You can use dynamic variable names with the table dot notation by surrounding the variable name char() string in (). Look at the table provided in the link for the table class at the link about accessing data from a table for a full expose of all the addressing modes available and syntax for each. It can get somewhat convoluted, but there's a way to do almost anything needed.
CAUTION: Air code, check syntax carefully... :)

4 Comments

Does
d=fetch(c,batch{1});
fetch return a table with two variables, 'Var1', 'Var2'?
If so, you can change the variable names right there through the d.Properties.VariableNames cell array.
Or, by any chance is the connection object smart enough to have something like an import options object that would let you set variable names in it on the fetch, maybe?
d=fetch... returns a 1x11 structural array describing what is in the data series feed. The data is in cell labeled 'Data'. The data is delivered by the second line of code indexing into the cell/column. Time is denoted as Var1 and values are Var2. I just learned that d.Title{1} will return a character vector of the data name in the first portion of the proposed script.
The second line inside the "for" loop correctly returns the structural array for the next item in the batch. ML is balking at the TT.(batch{ind})=d.Data{ind} statement. The message is:
Index exceeds the number of array elements (1).
Yeah, it's the detail of that content of d that need to see to understand how to dereference it.
the LHS TT.(batch{ind}) is creating a new column variable in the TT table with the name of the current batch ID. The RHS should be the data array to go there, however it is referenced.
While stopped in the debugger at that line, you should be able to write at the command line
TT.(batch{ind})=nan(height(TTT,1));
and then
head(TT)
should show the new variable in the table with value of NaN. What need is to replace the array of NaN with whatever dereference of the returned data is. I can't see that from here, unfortunately...need you to do the probing for me. :)
That's what would be helpful in the output of the whos commands I gave before...shows me the precise storage and types as opposed to a verbal description.
We can get this ironed out, it's just a little syntax/data storage issue.

Sign in to comment.

More Answers (0)

Categories

Asked:

on 15 Feb 2021

Commented:

dpb
on 18 Feb 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!