Reading and storing data efficiently - from spreadsheet to cell array
2 views (last 30 days)
Show older comments
Hello everybody,
I am wondering what is the most efficient way to read data from a spreadsheet file and store it in the most efficient data structure. The question is general, but I also provide the way I do to welcome any suggestion.
I read data from very long spreadsheet (hundred-of-thousands rows for each sheet), and I need to store only one column for each sheet.
for j = 1:12
Data = readtable(filename, 'Sheet', j+1, 'ReadVariableNames', false);
Data = table2array(Data(:,3));
Data = rmmissing(Data);
filenameOutput = [outputDir, indexName{i}, monthName{j}, '.mat'];
save(filenameOutput, 'Data');
end
What is the most efficient way to do it?
And what about if I want to store the data in a cell format (that is, a N(i)x1 cell where N(i) is the variable length of each column)?
Thank you very much in advance
0 Comments
Answers (1)
Deepak
on 16 Jan 2025
We can optimize the process of reading and storing data from spreadsheets by using "readmatrix" to directly fetch only the required column, reducing memory usage and improving speed compared to "readtable". By specifying the column range, we avoid loading unnecessary data. Additionally, storing the data in a preallocated cell array minimizes dynamic resizing during the loop, enhancing efficiency. Saving each dataset immediately after processing reduces memory overhead. These optimizations streamline file I/O operations and improve overall performance when handling large datasets in MATLAB.
Below is an optimized MATLAB code for the same:
% Predefine cell array for storing column data
numSheets = 12;
columnData = cell(numSheets, 1);
for j = 1:numSheets
% Read only the third column from the current sheet
data = readmatrix(filename, 'Sheet', j+1, 'Range', 'C:C', 'OutputType', 'double');
% Remove missing values
data = rmmissing(data);
% Store the data in a cell array
columnData{j} = data;
% Save the data to a .mat file
filenameOutput = fullfile(outputDir, [indexName{i}, monthName{j}, '.mat']);
save(filenameOutput, 'data');
end
Please find attached the documentation of functions used for reference:
I hope this helps in resolving the issue.
0 Comments
See Also
Categories
Find more on Spreadsheets in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!