Boxplot and mean for selected range of temporal datasets

Hello there,
I have a datasets containing 3 parameters, let's say z_i as depth, HOUR as time (in numeric format), and EP as the parameter I want to check its variability over depth and time. My intention is to get the EP median, min-max, and mean for selected range of z_i for each time, and then plot them all in a box plot and mean (in one plot), with x-axis as time and y-plot is the EP (better plotted in log-10 scale as the difference is very small).
See my file attached and the illustration of the flow process as I described above below:
Thank you!

1 Comment

hello
have you started something ? what issue are you facing ?

Sign in to comment.

Answers (1)

Hi Adi
As per my understanding, you would like to filter a dataset based on a specified depth range (‘z_i’) and calculate statistical measures (median, min, max, mean) of the parameter ‘EP’ for each hour (‘HOUR’).
From the ‘data_w.mat’ file, I see that ‘EP’ and ‘HOUR’ are matrices with 6 rows (possibly representing different observations or repetitions) and 9991 columns , while ‘z_i’ is a vector with 9991 elements, representing depth.
You can process this data in MATLAB as follows –
1. Load the data
load('data_w.mat'); % This loads EP, HOUR, and z_i
2. Filter the data
filtered_indices = (z_i >= z_min) & (z_i <= z_max); % z_min, z_max, depends on the selected range
filtered_EP = EP(:, filtered_indices);
filtered_HOUR = HOUR(:, filtered_indices);
filtered_EP = filtered_EP(:);% Flattening the matrices
filtered_HOUR = filtered_HOUR(:);
3. Compute the statistics
% Assumption – ‘unique_hours’, ‘medians’, ‘mins’, ‘maxs’, ‘means’ are declared.
for i = 1:length(unique_hours)
hour_data = filtered_EP(filtered_HOUR == unique_hours(i));
medians(i) = median(hour_data);
mins(i) = min(hour_data);
maxs(i) = max(hour_data);
means(i) = mean(hour_data);
end
4. Plotting
figure;
hold on;
boxplot(filtered_EP, filtered_HOUR, 'Colors', [0.7 0.7 0.7], 'Symbol', '');
plot(unique_hours + 1, means, 'ro-', 'LineWidth', 1.5, 'DisplayName', 'Mean');
set(gca, 'YScale', 'log');
For more information regarding the ‘boxplot’ function, kindly refer to the following documentation - https://www.mathworks.com/help/stats/boxplot.html
I hope this helps.

8 Comments

Thank you @Shishir Reddy for your suggestion. The six rows of the datasets I attached represent 6 sampling time, therefore the HOUR matrix, it contains 6 rows also. The column is related to the variability of EP over depth z_i. I'm sorry, I don't get with the term ‘unique_hours’ you meant. I'm sorry if my case is not clear enough.
warm regards
load('data_w.mat'); % This loads EP, HOUR, and z_i
z_min = min(z_i);
z_max = max(z_i);
filtered_indices = (z_i >= z_min) & (z_i <= z_max); % z_min, z_max, depends on the selected range
filtered_EP = EP(:, filtered_indices);
filtered_HOUR = HOUR(:, filtered_indices);
filtered_EP = filtered_EP(:);% Flattening the matrices
filtered_HOUR = filtered_HOUR(:);
unique_hours = unique(filtered_HOUR);
unique_hours(isnan(unique_hours)) = [];
NH = numel(unique_hours);
medians = zeros(1,NH);
mins = zeros(1,NH);
maxs = zeros(1,NH);
means = zeros(1,NH);
exp_log_means = zeros(1,NH);
for i = 1:NH
hour_data = filtered_EP(filtered_HOUR == unique_hours(i));
medians(i) = median(hour_data);
mins(i) = min(hour_data);
maxs(i) = max(hour_data);
means(i) = mean(hour_data,'omitnan');
exp_log_means(i) = exp(mean(log(hour_data),'omitnan'));
end
figure;
hold on;
boxplot(filtered_EP, filtered_HOUR);
plot(1:NH, means, 'go-', 'LineWidth', 1.5);
plot(1:NH, exp_log_means, 'ko-', 'LineWidth', 1.5);
set(gca, 'YScale', 'log');
[hh,mm,ss] = hms(hours(unique_hours));
xticklabels(string(datetime(0,0,0,hh,mm,round(ss),'Format','HH:mm:ss')))
Thank you @Voss! that's REALLY helping. Anyway, any line I should review to plot the x-axis in datetime-formatted?
Maybe something like this at the end, assuming you meant x-axis:
[hh,mm,ss] = hms(hours(unique_hours));
xticklabels(string(datetime(0,0,0,hh,mm,round(ss),'Format','HH:mm:ss')))
The zeros in the datetime call represent year, month, day; if your hour data (~40725 hours) represent an offset from some date, you can include that information instead of using zeros, and modify the 'Format' accordingly. See datetime.
(I've modifed my previous comment to include these lines of code to show the effect.)
Thank you @Voss. Actually, the x-axis is the time of sampling, it comes from the first column only of HOUR matrix (note that 2nd and the next following column they're the same as column 1). Sorry for confusing you so that you defined the unique_hours which then confusing me also, but then it's fine as it's the same values.
So, my intention is getting the tick mark on x-axis as hour and dates of the sampling (something like 01:00, 11/07/2011; 02:00, 11/07/2011; etc). I don't know if it's possible or not.
PS. The mean of EP for such selected range of depth --> creates 6 values of mean, mins, maxs, median. Those 6 outputs are related to 6 sampling times as I described above).
Anyway @Voss, do you know why the time plot is different to the time read by excel? any adjustment I should do?
Solved by adding this lines:
datetimeDate = datetime(unique_hours, 'ConvertFrom', 'datenum','Format','dd/MM/yy HH:mm');
xticklabels(string(datetimeDate))
Ah, so HOUR is actually days.

Sign in to comment.

Products

Release

R2022a

Asked:

on 5 Nov 2024

Commented:

on 7 Nov 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!