Run statistical tests on multiple csvs
Show older comments
I have lots of days worth of heartrate data stored in seperate csvs and im currently running a ttest on the data 3hrs before and after 12am. I was going to run this manually on all 30 differenent days but i was wondering if there was a way of looping through all the different days to return all the p values at once.
1 Comment
Rik
on 20 Apr 2021
That seems very likely. Once you create file list (e.g. with dir) you should be able to do the processing in a loop.
Do you have a specific question about implementing this?
Answers (1)
Manash Sahoo
on 20 Apr 2021
Edited: Manash Sahoo
on 20 Apr 2021
Store your data in a folder, and use the "Dir" command to return the filenames and loop through them.
For example:
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(strcat(filepath,"\*.csv")) % Filepath would be the path where your csvs are located.
pval = {};
for i = 1:length(files)
HRDat = readmatrix(files.name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval{i}.
end
Your pvalues in the cell array "pval" will thus correspond to the files in the struct array "files.name". This is usually the way I do things with heart rate data. Let me know if you have any further questions!
EDIT: Fixed the code.
MS
7 Comments
Rik
on 20 Apr 2021
- Pre-allocation tends to be faster, and since a p-value is a numeric value, using a double array is probably fine as well.
- You're using the length function. Consider using numel or size instead.
- I personally prefer avoiding i as a variable name, so I changed that to n.
- Try to avoid using strcat to create a path. Using fullfile allows you that same flexibility, without having to wory about the correct filesep.
- As a last point: you forgot to index the struct inside the loop.
% Load your heart rate data. You can get the names of files and folders
% using the "dir" command.
files = dir(fullfile(filepath,'*.csv')) % Filepath would be the path where your csvs are located.
pval = NaN(numel(files),1);
for n = 1:numel(files)
HRDat = readmatrix(files(n).name); % You may need to edit this per your filepath.
% Do your analysis here, and return your pvalue to pval(n).
end
Manash Sahoo
on 20 Apr 2021
Ah. Thanks for the pointers! This is indeed a much better solution.
Ross Thompson
on 20 Apr 2021
Ross Thompson
on 20 Apr 2021
Edited: Ross Thompson
on 20 Apr 2021
Ross Thompson
on 20 Apr 2021
Rik
on 20 Apr 2021
That is a very low value: 0.4e-11. You could conclude that all your analyses have a p value of 0.
Otherwise you will have to look at the data you're using each iteration. When you do that, you will notice that you aren't actually using HRDat in the rest of your loop, so each iteration is using the exact same data.
Ross Thompson
on 20 Apr 2021
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!