Reading and storing data efficiently - from spreadsheet to cell array

2 views (last 30 days)
Hello everybody,
I am wondering what is the most efficient way to read data from a spreadsheet file and store it in the most efficient data structure. The question is general, but I also provide the way I do to welcome any suggestion.
I read data from very long spreadsheet (hundred-of-thousands rows for each sheet), and I need to store only one column for each sheet.
for j = 1:12
Data = readtable(filename, 'Sheet', j+1, 'ReadVariableNames', false);
Data = table2array(Data(:,3));
Data = rmmissing(Data);
filenameOutput = [outputDir, indexName{i}, monthName{j}, '.mat'];
save(filenameOutput, 'Data');
end
What is the most efficient way to do it?
And what about if I want to store the data in a cell format (that is, a N(i)x1 cell where N(i) is the variable length of each column)?
Thank you very much in advance

Answers (1)

Deepak
Deepak on 16 Jan 2025
We can optimize the process of reading and storing data from spreadsheets by using "readmatrix" to directly fetch only the required column, reducing memory usage and improving speed compared to "readtable". By specifying the column range, we avoid loading unnecessary data. Additionally, storing the data in a preallocated cell array minimizes dynamic resizing during the loop, enhancing efficiency. Saving each dataset immediately after processing reduces memory overhead. These optimizations streamline file I/O operations and improve overall performance when handling large datasets in MATLAB.
Below is an optimized MATLAB code for the same:
% Predefine cell array for storing column data
numSheets = 12;
columnData = cell(numSheets, 1);
for j = 1:numSheets
% Read only the third column from the current sheet
data = readmatrix(filename, 'Sheet', j+1, 'Range', 'C:C', 'OutputType', 'double');
% Remove missing values
data = rmmissing(data);
% Store the data in a cell array
columnData{j} = data;
% Save the data to a .mat file
filenameOutput = fullfile(outputDir, [indexName{i}, monthName{j}, '.mat']);
save(filenameOutput, 'data');
end
Please find attached the documentation of functions used for reference:
I hope this helps in resolving the issue.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!