Find faster way than compose to format table
9 views (last 30 days)
Show older comments
Catriona Fyffe
on 9 Apr 2024
Answered: Catriona Fyffe
on 28 Apr 2024
I have a large table of data which needs to be output as a csv and then read into a model. For some reason (which I cannot solve) the model will not take in values with scientific number formating (e.g. very small or large numbers, so 8.8817e-04, rather than 0.000888). To get around this I turn the table into text and use compose, but this is slow as the data are large. I have tried using round, which is fast, but Matlab still displays in scientific notation for some values and these remain in the csv, which does not solve the problem. Any faster ideas welcome!
Here's an example. Note I know you don't need all the DateTime info, but it gives the form of the data. In reality I have 30 years of data at least, which is why it is a problem. It also need to use a seperate workstation or HPC as the string/compose lines lead to out of memory problems on my own laptop, even if it can store this size of table ok.
Many thanks!
StartDate=datetime('01/01/2019 00:00','InputFormat','dd/MM/yyyy HH:mm');
EndDate=datetime('31/10/2020 23:00','InputFormat','dd/MM/yyyy HH:mm');
DateTime = StartDate:hours(1):EndDate;
DateTime = DateTime';
datasize =size(DateTime,1);
No_St = 307;
Data = rand(datasize,No_St);
Test_table = array2table(Data);
Test_table.Time = DateTime;
Test_table = movevars(Test_table,"Time","Before",1);
Table_str = string(Test_table{:,2:end}); %convert to string
Table_str = compose('%.6f',Table_str); %Reduce precision
3 Comments
Accepted Answer
Athanasios Paraskevopoulos
on 9 Apr 2024
Here’s an approach you can try:
1. Instead of converting the whole table to strings at once, process the data in smaller chunks. This can help manage memory usage more effectively.
2. Use MATLAB’s `fprintf` function to write directly to a CSV file in the desired format. This avoids creating large temporary string arrays.
Here’s an example of how you can implement this:
matlab
StartDate = datetime('01/01/2019 00:00', 'InputFormat', 'dd/MM/yyyy HH:mm');
EndDate = datetime('31/10/2020 23:00', 'InputFormat', 'dd/MM/yyyy HH:mm');
DateTime = StartDate:hours(1):EndDate;
DateTime = DateTime';
datasize = size(DateTime, 1);
No_St = 307;
Data = rand(datasize, No_St);
Test_table = array2table(Data);
Test_table.Time = DateTime;
Test_table = movevars(Test_table, "Time", "Before", 1);
% Define chunk size
chunkSize = 10000; % Adjust this value based on available memory
% Open file for writing
fid = fopen('output.csv', 'w');
% Write header
fprintf(fid, 'Time,');
fprintf(fid, '%s,', string(Test_table.Properties.VariableNames(2:end-1)));
fprintf(fid, '%s\n', string(Test_table.Properties.VariableNames(end)));
% Write data in chunks
for i = 1:chunkSize:height(Test_table)
endIndex = min(i + chunkSize - 1, height(Test_table));
chunk = Test_table(i:endIndex, :);
% Write each row
for j = 1:size(chunk, 1)
fprintf(fid, '%s,', datestr(chunk.Time(j), 'dd/mm/yyyy HH:MM'));
fprintf(fid, '%.6f,', chunk{j, 2:end-1});
fprintf(fid, '%.6f\n', chunk{j, end});
end
end
% Close file
fclose(fid);
This script processes the table in chunks and writes directly to a CSV file, formatting the numeric values to avoid scientific notation. The `chunkSize` variable can be adjusted based on your system’s memory capacity to optimize performance.
I hope that will help you
2 Comments
Stephen23
on 10 Apr 2024
" It is maybe a bit slow to write every row, but I can simplify this."
You do not need to write each row in a loop, Voss showed how to use FPRINTF effectively.
Avoid deprecated DATESTR.
More Answers (2)
Voss
on 9 Apr 2024
Rather than trying to find a faster alternative to compose to convert table data to appropriately formatting strings, the real problem you want to solve is: How to write a csv file with appropriately formatted numbers. Right?
Your code doesn't show how you are writing the file or what you are writing to it (i.e., does it include the table variable names? does it include the datetime column?), so I'll assume you only need the data in Test_table{:,2:end} to be written to the file.
% Creating the table
StartDate=datetime('01/01/2019 00:00','InputFormat','dd/MM/yyyy HH:mm');
EndDate=datetime('31/10/2020 23:00','InputFormat','dd/MM/yyyy HH:mm');
DateTime = StartDate:hours(1):EndDate;
DateTime = DateTime';
datasize =size(DateTime,1);
No_St = 307;
Data = rand(datasize,No_St);
Data(:,1) = Data(:,1)*1e-6; % make some data small, for testing
Test_table = array2table(Data);
Test_table.Time = DateTime;
Test_table = movevars(Test_table,"Time","Before",1);
Here's one thing you can try, which is to use fprintf with the desired precision:
tic
M = Test_table{:,2:end}.';
fmt = [repmat('%.6f,',1,size(M,1)) '\n'];
fid = fopen('test_matrix.csv','w');
fprintf(fid,fmt,M);
fclose(fid);
toc
For timing comparison, using compose and writecell:
tic
Table_str = compose('%.6f',Test_table{:,2:end}); %Reduce precision
writecell(Table_str,'test_cell.csv');
toc
Note that if you open these files in Excel, some data may show up in scientific notation. It's not really in scientific notation; that's just Excel doing its thing. To see the actual data as it really is stored in the file, use another program to view it, such as Notepad++.
See Also
Categories
Find more on Structures in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!