How can I import multiple .CSV files in MATLAB and process the data of each file?
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Show older comments
Hi all. I have a foulder containing 1488 csv files of the form: 2016-05-day-T-hour-h-minute.csv (that have 3 columns and 2304 rows with just a numerical value in each cell). I wanted Matlab to read each csv file and each cell so I can process and analyse these numbers. I have a problem with the variable F, as a result when I check what F is I can see that it displays the name of the last file in the foulder. As a consequence, the analysis of the data is (eta,north and west) is carried out only using the numbers in the last file.
After running the code, by typing F i get the following:
F=
'Files\2016\May\2016-05-31T23h36.csv'
which is my last file. How can I make matlab to read the cells in ALL the files so that I can create a vector that I can use to do my analysis?
In other words, I need to apply the eta,north,west equations to all files and not just the last one.
Thanks
P = 'Files\2016\May';
S = dir(fullfile(P,'*.csv'));
for k = 1:numel(S)
F = fullfile(P,S(k).name);
S(k).data = csvread(F,1);
end
%This analyses all the data
eta = S(k).data(:,1)/100; % Displacement 1
north = S(k).data(:,2)/100; % Displacement 2
west = S(k).data(:,3)/100; % Displacement 3
end
Accepted Answer
It looks like you are on the right track with your looping on the file names.
It is a little confusing adding the data as an additional field to your existing file list structure S, but it isn't wrong.
I'm not sure what csvread(F,1) does, in the documentation they describe using either just one argument, or additional arguments for starting row and column. I don't know what it does with only two arguments.
After the loop completes, k will equal the number of files, so your lines, eta = S(k).data ... west = S(k).data will only compute values for the last file.
You could either use a loop to assign those or I think this would work
% put all of the data into a n by 3 matrix
data = [vertcat(S(:).data)]
% extract the column data
eta = data(:,1)/100
north = data(:,2)/100
west = data(:,3)/100
11 Comments
Christian Scalia
on 22 Jan 2021
I think I was making my life more complicated with the loop. Your answer gives me exactly what I needed, thanks a lot!
Ps: I was using cvsread(F,1) because I wanted the script to read from the second row onwards cause row 1 is a text row.
"It is a little confusing adding the data as an additional field to your existing file list structure S, but it isn't wrong."
Many of my answers over many years use this approach. I often use this approach as it is simpler and easier to implement than creating an entirely new structure or cell array for storing the imported data. Not only that, it keeps the filenames and filedata together in one structure element (very handy when sorting).
Jon
on 22 Jan 2021
Good points, makes perfect sense. Thanks for helping me understand the benefits of adding the data to the existing structure.
@Jon I realised that the problem I'm having could be fixed by applying a modification to your answer. The code runs through all my files and extracts the data (happy with that) however, is the way it stores the data I don't like. If I create, as you suggested a n by 3 matrix by using
data = [vertcat(S(:).data)]
then all the data of the x number files will be inserted into just 1 n by 3 matrix . The point of my analysis is that I need to analyse the file one by one and keep the information from each file separated. So maybe instead of having 1 n by 3 matrix, I need n matrices of n by 3,n and i need to run the script on each matrix at the time. This is the script I modified but still has the problem of storing all the information into 1 matrix.
Anything you have in mind that could help me with this problem?
P = 'Files\2016\May';
S = dir(fullfile(P,'*.csv'));
for k = 1:numel(S)
F = fullfile(P,S(k).name);
S(k).data = csvread(F,1);
data=[vertcat(S(:).data)];
eta =data(:,1)/100
north = data(:,2)/100
west = data(:,3)/100
end
@Stephen Cobeldick as @Jon in his answer, "After the loop completes, k will equal the number of files, so your lines, eta = S(k).data ... west = S(k).data will only compute values for the last file" by applying your approach the script will end up analysing the data from the last file, hence I get results from just the last file. Is there any chance, maybe also having a look to what I wrote above quoting John, that could help me?
Thanks a lot!
"...by applying your approach the script will end up analysing the data from the last file..."
No, that is your approach from your question, which uses the single index k after the loop has finished to access the single last structure element (I don't recall ever doing that). My approach would be to process all of the structure elements, either using a loop or a comma-separated list:
"The point of my analysis is that I need to analyse the file one by one and keep the information from each file separated"
Sure, then just loop over the elements of S (from your original question):
%This analyses all the data
for k = 1:numel(S)
eta = S(k).data(:,1)/100; % Displacement 1
north = S(k).data(:,2)/100; % Displacement 2
west = S(k).data(:,3)/100; % Displacement 3
... whatever you want to do with these arrays.
end
If you don't want to merge all of the data together (as Jon's answer shows), then don't merge it together.
Is it required to have all of the data from all files simultaneously in memory? Would it be possible to process the data as soon as it is imported?
@Stephen CobeldickI get this error:
Reference to non-existent field 'data'.
Error in script_to_export (line 8)
eta = S(k).data(:,1)/100; % Displacement 1
For some reason, you removed the code that actually imports the files. File data does not import itself.
P = 'Files\2016\May';
S = dir(fullfile(P,'*.csv'));
for k = 1:numel(S) % why did you remove this?
F = fullfile(P,S(k).name); % why did you remove this?
S(k).data = csvread(F,1); % why did you remove this?
end
%
for k = 1:numel(S) % this loop is actually the only change from your question.
eta = S(k).data(:,1)/100; % Displacement 1
north = S(k).data(:,2)/100; % Displacement 2
west = S(k).data(:,3)/100; % Displacement 3
... whatever you want to do with these arrays.
end
It is not clear to me why you need all file data to be in memory at once. While you certainly can, do you need to?
P = 'Files\2016\May';
S = dir(fullfile(P,'*.csv'));
for k = 1:numel(S)
F = fullfile(P,S(k).name);
M = csvread(F,1);
eta = M(:,1)/100; % Displacement 1
north = M(:,2)/100; % Displacement 2
west = M(:,3)/100; % Displacement 3
... whatever you want to do with these arrays.
end
Christian Scalia
on 25 Jan 2021
"Is it required to have all of the data from all files simultaneously in memory? Would it be possible to process the data as soon as it is imported?"
No, is not required to have all the data from all the files simultanously in memory. Ideally, processing the data as soon as it is imported is what I needed. So the general idea is to import the data from 1 file, process it, store it, import the data from the 2nd file, process it, store it and so on...
Using the script above I have the same problem I had before, it reads all the file but it process the data from just the last file.
"Using the script above I have the same problem I had before, it reads all the file but it process the data from just the last file."
I doubt that. If you followed my example and completed the line marked "whatever you want to do with these arrays" then my code processes all of the data from all of the files. Most likely your processing does not allocate its outputs to any arrays (or save the data file), e.g. you will need something like this:
N = numel(S):
myoutput = nan(1,N); % preallocate
for k = 1:N
.. % importing etc.
myoutput(k) = .. % !!! store the output of your calculation !!!
end
If you do not use indexing to allocate to an output array then of course all of your loop iterations will just overwrite the previous one. Using indexing to store data is an important basic MATLAB concept:
What size array, what indexing to use, etc. depends on the class and size of the output data, which so far you have not told us anything about. You might find this useful information too:
P = 'Files\2016\May';
S = dir(fullfile(P,'*.csv'));
for k = 1:numel(S)
F = fullfile(P,S(k).name);
M = csvread(F,1);
eta = M(:,1)/100; % Displacement 1
north = M(:,2)/100; % Displacement 2
west = M(:,3)/100; % Displacement 3
... whatever you want to do with these arrays.
end
@Stephen Cobeldick this only gives me the results for the last file contained in the folder. By looking at the workspace I can see that
F='Files\2016\May\2016-05-01T01h06.csv'
which is the last file contained in the folder.
"...this only gives me the results for the last file contained in the folder."
Yes, because you have not made any attempt to store your data in the loop. My last comment explained that.
"By looking at the workspace I can see that ... which is the last file contained in the folder."
That is exactly what is expected: on each loop iteration F contains the name of the current file being processed. After the loop has completed, it contains the name of the last file processed. That is how loops work.
It is irrelevant to the issue that you are having, which is that you are not storing your data during the loop.
Have a look at the code in your original question. Note how the index k is used to store the imported file data on each loop iteration. Thus after the loop all of the data is stored, can be used later.
Question: does your code do anything like that with its output/results? (hint: no).
More Answers (0)
Categories
Find more on MATLAB in Help Center and File Exchange
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)