How can I load multiple dat files of different name pattern and from different directory consecutively to do something

Hi, I have some dat files of row*Column( 9999*10 )of names:
dir1/a_0.1_1
dir1/b_0.1_2
dir1/c_0.3_1
dir1/c_1.5_1
dir2/a_0.1_1
dir2/a_0.5_1
Now How can I load and recall then sequentially to do something? is there any short cut for loop for this specially when they are in different folder? Thank you

 Accepted Answer

You can use this FEX submission. Download it and place it in MATLAB path. To get the list of all .dat files, use it as follow
files = subdir('*.mat');
this will work, if you are present in the top folder of dir1, dir2 etc. This will give you name and path of all the '.dat' files in the subfolder. Then read and process like this
data = cell(1, length(files))
for i=1:length(files)
filename = files(i).name;
% load your file here, use load(), tableread() or any appropriate function depending on the type of data in .dat files
data{i} = readData; % save your read data or do any other processing
end

7 Comments

Hello Mr. Hamza, thank you very much for your effort to solve my problem. But it did not work for some reason, and also I tried many ways. May be the way I name the files is causing problem but this is the brief name I can use to identify each files. Previously you answered how to locate all the dat files in different folder.I have downloaded that FEX submission but do not know how to bring it to the file path. So now I bring them all in one folder. There are 20 dat files of consecutive name order and the code that I am running is also in this folder. Now As I can not able to recall or load one file after another sequentially and do something, so I am doing it 20 times, and at the end matlab showing ''Error using horzcat Out of memory. Type HELP MEMORY for your options.'' Now if you please see my code, from (line 80 to 92) is the something that I only need to do for each data file consecutively. Then (line 389 to 395 to bring them in one data table to plot)is the only thing I need to get out of all the data file. Could you please help? I am really new in this matlab coding.
@Mat1, the style which you was doing for coding is very prone to bugs. The best way is to use for loop, not to copy and paste each portion of code several times. I have updated the code, removed and commented the previous unnecessary code. Even after optimization, your code will work a bit slow because you are trying to interplate 10^8 points. That will require a lot of memory space. Consider reducing this number to a smaller value e.g. 10^6 or 10^7.
Thank you very much Mr. Hamza for being so patient to modify the code and make it clear and simple. Now still it is not loading all 20 files sequentially. Say on line 22 of your code if I put now
files = dir('B:/event_time_13_0.1_300_1.dat');
and do it for as 2, 3, 4 ,.... at the end then I am loading files one by one. So my question is when I am loading manually one after another files and running will that cause the line 108 and 113 of your code
ave_time = ave_time + X';
ave_radius = ave_radius + Y';
to keep the previous X and Y values to store in memory and so the plot I am seeing
semilogx((ave_time_extr)',(ave_radius_extr)','b');
will plot the mean of all the X, Y values of the 20 run? For some reason as I have found it is just plotting the X Y values of the current file that I am loading on line 22. Your code is varsatile but still it is not storing all the previous X Y values to take the mean of them to plot. Or I think I may make some mistake on line 22,by not properly loading the file name. Please advise. I am working on MATLAB R2018a in a windows 10 pc. Thank you.
Why are you loading the files one by one? You should place all files in one folder and load them all sequentially using
files = dir('B:/*.dat');
The code will automatically read all the files one by one and apply the algorithm.
Hello Mr. Hamza, it Works !!!!!
files = dir('B:/et/*.dat');
thank you so much. Now it takes just several minutes by your code, what takes several hour for me. God bless you for help me learn.
You are welcome. Yes, I noticed that your code requires huge memory several 10s of GBs. I would have taken a long time. You can further decrease time by making interpolation resolution of 10^8 to smaller values, but this depends on your requirement
In case you haven't noticed, the 2 lines at end should be
ave_time_extr = ave_time/length(files);
ave_radius_extr = ave_radius/length(files);
I forget to change those lines. Therefore right now you are just adding all the columns of X and Y. To get mean value, you need to divide them with length(files).

Sign in to comment.

More Answers (5)

Yes Mr. Hamza you are exactly right. Now on the first day you advised to download this FEX submission if in case .dat files are in different folder. I have downloaded FEX but do not know how to put that code on the MATLAB path. I have to spent lot of time just to bring the files in the same folder then execute your code. Your code is excellent for this job, but the .dat files that I have being generated in a computer cluster where they all came with the same name 'event_time'. So when I try to put them in one folder to plot in MATLAB I have to change their name every time or they overwrite. So It would be quite robust if I could load the .dat files from different sub-folders keeping the code in the main folder. Thank you very much.

3 Comments

You need to download the files from FEX and place it in the current folder of MATLAB. The current folder is the folder which is displayed in Current Folder window in MATLAB.
Now, suppose that you have a folder structure like this
B: <- B drive for you computer
other folders in the path
folderTop <--- this folder contains all the .dat files
folder1
file1.dat
file2.dat
...
folder2
file1.dat
file2.dat
...
folder3
file1.dat
file2.dat
...
You can use subdir() as follow
files = subdir('B:/other folders/folderTop/*.dat');
It will give you the similar struct given by dir but containing files in all the subfolders. You can use the remaining code same as before.
Wow!! It works like a magic. Thank you so much Mr. Hamza.
Had I knew this or when first time you mention about this if then I would understood this then It save me a lot of time. Now it seems I am learning something about matlab.
Now one last question, in line 89 where you put this command
index = find(isinf(x)); x(index) = x(index-1);
it works much better, but the reason I used this to avoid the inf from data as interpolation does not work if there are NAN, Inf, and same repetitive value in consecutive row. is there any general command to avoid that type of data value from the table entirely so that the row number remain the same for all the column.
For NaN you can use fillmissing() to replace values. Since you also want to deal with inf in the same way, so first replace all inf with NaN and then use fillmissing(). For example
x = [1 2 inf 8 9 nan 15];
x(isinf(x)) = nan;
fillmissing(x, 'linear') % using linear will avoid repeated values.
ans =
1 2 5 8 9 12 15

Sign in to comment.

Hello Mr. Hamza, one thing, In your corrected code if in line 98 instead of "interploation_points' if I decide to use m1 values of line 92, then it gives me unequal row length among 20 interpolation X Y pairs after interpolation has been done. the problem is then matlab can not take average of columns with unequal row length.
Is there any how it can be done in matlab with unequal matrix dimension? If not then by putting zeros on that rows which comes shorter_length than others. So basically then making the matrix of dimension of the max length row with a code and filled other shorter row length with zeros.

1 Comment

Can you explain again, which variable have different lengths and which function is causing the error?

Sign in to comment.

In this attachment the value of 'm' that I am using to interpolate in line 34. Now everything remain as of your corrected code just instead of interploation_points I am using m values. now m is changing as every data has been loaded. So at the end ave_time and ave_radius is not calculated as the because of variable m the interplotion points are different and so every X Y pair has different row length.
I am sending the whole folder with the code and files. Thank you

4 Comments

First thing, you are not replacing the missing values because the command
fillmissing(x, 'linear');
will not change x. You will need to assign its value
x = fillmissing(x, 'linear');
Also from line 19
m = (1* 10^3/((max(x)- min(x))/9999))
I get
m =
0.0014
If you want to use it in linspace() then it must be an integer.
Yes you are right. I think I have made the code complicated. So if instead of using any fixed interploation_poind if Ii want the m, a variable that will change the value in side the loop for every data file, then the ave_time matrix will be consist of colum vectors of different length. Will then the it possible to take the mean?
No, You must have all columns of equal length to take mean. Mean will only make sense if all columns have equal lengths. That is why you need to use fixed interpolation.

Sign in to comment.

Categories

Asked:

az
on 16 May 2018

Commented:

az
on 23 May 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!