Big 3D gridded data set

2 views (last 30 days)
Belinda Finlay
Belinda Finlay on 1 Sep 2020
Edited: Adam Wyatt on 9 Sep 2020
I have 20 years of 3hour gridded ocean temperature data, in 3 dimensions (latitude, longitude and depth). To use the data more easily I am creating daily averages which results in a 563*1001*40 matrix. I was then creating a cell array for each year, with 365 daily averages. The cell arrays are quickly become very large.
I have read about datastore and tall array; however, all the examples I find are for tabulated data. Noting I am going to end up with ~7300 563*1001*40 matrix (one for each day of the year for 20 years). What is the best tool for managing such a large data set?
I will be extracting sections of the data based on lat long and time to do some composite analysis of the data but not until I can get it into a workable format.
Thanks in advance,
Belinda
  3 Comments
Belinda Finlay
Belinda Finlay on 1 Sep 2020
I would like to be able to run a script that access areas of the grid over the 20 years to develop composite plots. Does that make sense?
Madhav Thakker
Madhav Thakker on 9 Sep 2020
  1. Read data for 1 day (or smaller duration).
  2. Do some analysis.
  3. Remove the variable from RAM.
  4. Read for next day.

Sign in to comment.

Answers (1)

Adam Wyatt
Adam Wyatt on 9 Sep 2020
Edited: Adam Wyatt on 9 Sep 2020
If you really do need all the data, then you can also use "matfile". You bascially then have a cell-array of matfile objects and can access the variables within each file programatically: I recommend using v7.3 files
  1. Load daily data
  2. Process daily data
  3. Save daily data as *.mat file
  4. Repeat 1-3 for each day, saving to a new *.mat file - I recommend usnig numeric suffixes for filenames
  5. Create cell array of matfile objects - call that m
  6. Access data via m{indx}.variablename
You can use index notation to access part of the array within the file (with some restrictions).
I've successfully used this method to access and process 10s of GB of mat file data (i.e. that is the compressed file size - that actual data size was of similar order to you).
I even created a class that enabled me to access the data more easily and perform other operations.
Do you really have a different number of data points for each year - i.e. do you really need cell-arrays. Try to avoid cells if possible.

Categories

Find more on Structures in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!