Appending to a saved dataset
9 views (last 30 days)
Show older comments
I'm trying to read data from a text file, do some data analysis, save the results in a dataset, and export my dataset into a .dat file using the export function.
The problem arises when I have several text files and I wind up with well over 100,000 observations and about 200 parameters. My approach right now is, I read data from the text file, save my data analysis in an interim dataset, concatenate my complete dataset with the interim, and at the end of it all I use the export function. So my code looks something like:
complete_ds = [];
for i = 1:length(textfiles),
current_file = textfiles(i);
fid = fopen(current_file);
data = ReadFile(fid);
fclose(fid);
interim_ds = AnalyzeData(data);
complete_ds = vertcat(complete_ds, interim_ds);
end
export(complete_ds, 'file', 'Allmydata.dat');
This is taking a lot of time and I'd like to be able to append to the exported dataset instead. Any suggestions? Also, I know that preallocating may help, but it is difficult to predict how much memory I want to set aside for the dataset since each text file may have a different number of observations.
3 Comments
Image Analyst
on 21 Jun 2011
How many text files? How much time? Minutes? Hours? What is the difference between observations and parameters (if that matters)? You can take a guess at preallocating by looking at the file size. If you have 50,000 lines (estimated from a file size of, say, 50 kb), then preallocating say 40 or 50 thousand rows in the array would be faster than allocating none at all, even if you have to extend it a few rows or truncate it a few rows because you didn't use them all. Inside AnalyzeData(), can you possibly estimate the number of rows that interim_ds will need?
Answers (1)
Matt Tearle
on 21 Jun 2011
If it just comes down to "I'd like to be able to append to the exported dataset instead", then here's one way to do it, but it's a bit of a nasty hack...
- Find the directory $MATLAB\toolbox\shared\statslib\@dataset (where $MATLAB is your installation directory -- eg C:\Program Files\MATLAB\R2011a).
- Copy the entire @dataset directory to somewhere local.
- Inside @dataset, make a copy of export.m and call it export_app.m (or whatever).
- Edit export_app.m. On line 1, change export to export_app. Change line 169 (in R2011a, at least -- it might be slightly different in other releases) from fid = fopen(filename,'wt'); to fid = fopen(filename,'at'); Save the file.
Then
>> export(x1,'file','testappend.dat')
>> export_app(x2,'file','testappend.dat','WriteVarNames',false)
should work for you.
Note, though, that you're now using a local version of the dataset class, so funky instabilities may ensue... Use with caution! Probably best to hide it away in a directory somewhere and go into that directory only for this purpose!
0 Comments
See Also
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!