Plot exceeding time limit due to large dataset

1 view (last 30 days)
I have a dataset in which I have to categorize the rise of UV levels from a satellite.
As this satellite orbits the Earth there are periods where it cannot process the UV levels.
The raw data that I am getting is close to a square wave.
I have to categorize this data into rising where the satellite comes out of the influence of the Earth,
OnDuty where the satellite is able to record data properly and falling where it is going behind the Earth.
The dataset contains a timestamp of frequency 1Hz where one value is recorded every second.
I want to categorize these three classes into for atleast now, red, green and blue.
The dataset contains the said timestamp and the corresponding UV values.
I am currently using a threshold to categorize the data and is working for a smaller dataset, however with a dataset of 80000 entry points my code is not able to run efficiently.
Here is the code. Where states stores the ith values nature. The problem lies within the for loop that runs for the plotting.
function categorize_square_wave()
y_foo = readtable("80000entries.csv", Range='uv_values');
y_values = table2array(y_foo);
k = 1; % Sensitivity parameter
differences = diff(y_values);
% comments have been put to remove staticstical computation
mu = mean(differences);
sigma = std(differences);
% Thresholds
T_high = mu + k * sigma;
T_low = mu - k * sigma;
% coded directly to reduce computation
% T_high = 500;
% T_low = -100;
% Categorizing based on threshold values
% Initialize states with zeros (default to dutiful)
states = zeros(1, length(differences));
% Assign states using vectorization
states(differences > T_high) = 1; % Rising
states(differences < T_low) = -1; % Falling
% Collect dutiful values
% dutiful_values = y_values(states == 0);
figure;
hold on;
for i = 1:length(differences)
if states(i) == 1
plot([i, i+1], [y_values(i), y_values(i+1)], 'g', 'LineWidth', 2); % Green for Rising
elseif states(i) == -1
plot([i, i+1], [y_values(i), y_values(i+1)], 'r', 'LineWidth', 2); % Red for Falling
else
plot([i, i+1], [y_values(i), y_values(i+1)], 'b', 'LineWidth', 2); % Blue for Dutiful
% dutiful_values = [dutiful_values, y_values(i)]; % Collect Dutiful values
end
end
% Plot original values in dashed lines
plot(y_values, 'k--', 'LineWidth', 1); % Black dashed line for original values
yline(T_high, 'r--', 'T_{high}', 'LabelVerticalAlignment', 'bottom', 'LabelHorizontalAlignment', 'right'); % Dashed red line for T_high
xlabel('Sample Number');
ylabel('y(t)');
title('Classification of Changes in Square Wave');
hold off;
end
categorize_square_wave();
% we have a good estimation of what values are
% is there a way to use flags on each state so that we can just plot
% without checking each value?
% implemetn flags
  2 Comments
Saurav
Saurav on 4 Oct 2024
Can you attach the .csv file for better debugging?
Aditya
Aditya on 4 Oct 2024
Im sorry that i couldnt attach it, the basics are there is a timestamp corresponding to a uv irradiance level

Sign in to comment.

Accepted Answer

Voss
Voss on 4 Oct 2024
You can replace those ~80000 plotted red, green, and blue lines with 3 lines: one red, one green, and one blue.
Use NaNs in the plotted lines where the data is not pertinent to that line, e.g., the green "rising" line will have NaNs wherever the data is not "rising". NaNs don't render on a plotted line so can be used to create gaps in a line.
Here's an example with made-up data consisting of 80000 datapoints:
y_values = min(0.5,max(-0.5,sin(linspace(0,4*pi,80000))));
k = 1; % Sensitivity parameter
differences = diff(y_values);
% comments have been put to remove statistical computation
mu = mean(differences);
sigma = std(differences);
% Thresholds
T_high = mu + k * sigma;
T_low = mu - k * sigma;
% Categorizing based on threshold values
% Initialize states with zeros (default to dutiful)
states = zeros(1, length(differences));
% Assign states using vectorization
states(differences > T_high) = 1; % Rising
states(differences < T_low) = -1; % Falling
% Initialize three vectors of NaNs
N = numel(y_values);
y_rising = NaN(1,N);
y_falling = NaN(1,N);
y_on = NaN(1,N);
% populate the rising data line with y_values where the data is rising
idx = find(states == 1);
y_rising(idx) = y_values(idx);
y_rising(idx+1) = y_values(idx+1);
% populate the falling data line with y_values where the data is falling
idx = find(states == -1);
y_falling(idx) = y_values(idx);
y_falling(idx+1) = y_values(idx+1);
% populate the on-duty data line with y_values where the data is on-duty
idx = find(states == 0);
y_on(idx) = y_values(idx);
y_on(idx+1) = y_values(idx+1);
% plot the three lines
figure
hold on
plot(y_rising,'g','LineWidth',2)
plot(y_falling,'r','LineWidth',2)
plot(y_on,'b','LineWidth',2)
% Plot original values in dashed lines
plot(y_values, 'k--', 'LineWidth', 1); % Black dashed line for original values
yline(T_high, 'r--', 'T_{high}', 'LabelVerticalAlignment', 'bottom', 'LabelHorizontalAlignment', 'right'); % Dashed red line for T_high
xlabel('Sample Number');
ylabel('y(t)');
title('Classification of Changes in Square Wave');
  4 Comments
Aditya
Aditya on 5 Oct 2024
Edited: Aditya on 5 Oct 2024
Hey there appears to be an issue when I use a few datasets where the cycles are not exactly square.
Because of this thresholding kinks are being mesured as duty states where they really have to be on the rising edge.
The timestamp is of x_values and a typical orbit is roughly about 90 minutes.
The logic I am thinking of is that after a sequence of consecutive rising states I will just change the next 15 minutes or 15*60 counts of data into the rising edge to factor in the kink and do the same for the kink for the trailng edge. I hope I am clear, I will try attaching an image for the exact issue.
Here is the new x and y values.
y_foo = readtable("aug33_orbit236_31aug2024_UVA.csv");
% count_uva = table2array(y_foo.Counts);
counts_uva = y_foo.Counts;
timestamp = datetime(y_foo.Time, 'InputFormat', 'dd MMMM yyyy, HH:mm:ss.SS');
k = 1; % Sensitivity parameter
differences = diff(counts_uva);
kinks of duty cycles
I am so sorry for taking up your time again. Also if there is any other approach to categorize such values?
Voss
Voss on 5 Oct 2024
Whatever method you decide to use to categorize the data into the three groups, you can use the approach in my answer to plot three lines.
You might want to post a new question about selecting the best method for categorizing that data.

Sign in to comment.

More Answers (0)

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!