Using save with -v7.3 takes a long time and the mat file size is enormous
Show older comments
I tried to save with -v7 the file size was 18 MB while with using -v7.3 it's 6 GB !!!
4 Comments
Adam
on 10 Nov 2016
save what?
Omar Abdelkader
on 10 Nov 2016
Walter Roberson
on 10 Nov 2016
Can you make the 18 megabyte version available through something like Google Drive?
Accepted Answer
More Answers (1)
Rik van der Weij
on 8 Jun 2020
Edited: Walter Roberson
on 8 Jun 2020
tried the following:
a = ones(15000);
save('a.mat', 'a'); % 800kb file
save('b.mat', 'a', '-v7.3'); % 11 mb file
The same problem I have with real data. My file gets flagged for 2GB limit, although any file I save in reality is much smaller, and I'm forced to save in -v7.3 and then the file size gets really, really large.
2 Comments
Walter Roberson
on 8 Jun 2020
-v7 MAT files have 32 bit size counters. For any particular variable, the process is to generate the uncompressed variable (which must therefore stay within the limits of the 32 bit counters), and then run a compression routine on it and store the compressed version. There is no clever algorithm to do piecewise packing into segments that each individually fit into 2 GB or 4 GB compressed, there is just the raw (uncompressed, not-clever) serialized representation and the LZW version of that, in -v7 files.
Unfortunately, Yes, -v7.3 HDF files are not nearly as compact as one might hope.
Poking at b.mat with an HDF viewer, I see that it was created with GZIP level 3 compression, 169.972:1 compression ratio, which is 99.4%. When I wrote those 1's out in binary with no overhead (just double precision numbers) I find that gzip -3 does indeed compress to 99.4% (though smaller than the .mat file). I find that even gzip -9 only compresses to 99.8%, leaving a file that is over 2 1/2 megabytes.
Now, if I take that gzip -9 result and pass it through gzip -9 again, then I get a super small file, only 8553 bytes, so there is still a lot of redundant information left after the 99.4 or 99.8% compression, but gzip -3 or gzip -9 cannot find that in one pass.
It looks to me as if the HDF5 specification permits a couple of compression options that could sometimes be more effective, but it does turn out that what MATLAB is invoking is not unreasonable -- it isn't Mathwork's fault that libz's gzip -3 or even gzip -9 do not do nearly as well as one might hope.
Thanks for your comments. Do you know if there is a workaround for this? In my case I have size columns that are 920 MB each before compression (5.5 GB total) and 75 MB per column (450 MB total) if saved individually with standard compression, but 17 GB if saved as a -v7.3 .mat. Is my only other option to save the columns (or batches of rows) separately and then reconstruct the table after loading? Thanks.
Categories
Find more on HDF5 in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!