Efficient and quick way to summation of large data points

I have a large data points of about 1e7 (voltage with respect to time) of which I need to measure allan deviation. I averaged out and skipped the datapoints 2e5. I used about 90 discrete values of averaging time (tau) using the log space. The code is given below. Yet it take 14 minutes to run the program. Is there anything that I could do to reduce the running time?
Kindly suggest any improvement in the program. Thanks in advance.
% Data Pre-processing (DC Zero level in window of 33 data points)
% Data Pre-processing (Skip data points - first data point in each window of 33)
V = floor(length(voltage)/33);
c = zeros(1, V);
t = zeros(1, V);
d = zeros(1, V);
l=1;
for i=1:33:33*V
k=0;
for j=1:1+32
k=k+voltage(j);
end
k=k/33;
for j=i:i+32
c(j)=voltage(j)-k;
end
t(l)=time(i);
d(l)=c(i);
l=l+1;
end
t = transpose(t);
d = transpose(d);
% Voltage (V) to Omega (deg/hr)
% Omega = Voltage / Sensitivity, where Sensitivity is 9.27 mV/deg/s
omega = (d/9.27e-3)*3600;
N = length(omega);
% Setting Averaging time
t0 = t(2)-t(1);
n = unique(ceil(logspace(log10(1), log10((N-1)/2), 100).')); % n is the selected discrete values of cluster size
tau = n*t0;
% Computing Allan Deviation
% Allan Deviation Equation DOI 10.1109/TIM.2007.908635
adev = 0;
Y = [];
for i=1:numel(n)
a = 0;
ni = n(i);
for k=1:(N-(2*n(i))+1)
a = a + (((sum(omega(k+n(i):k+n(i)+n(i)-1)/(n(i)*t0)))-(sum(omega(k:k+n(i)-1)/(n(i)*t0))))^2);
end
adev = sqrt((1/(2*(N-(2*n(i))+1)))*a);
Y = [Y; adev];
end

4 Comments

I recommend that instead of that way of handling i and l and d and t and k -- that you simply switch to using 2D arrays. You can always reshape() to a vector afterwards if you need to.
(In addition to @Bruno Luong's answer)
Define t and d outside the loop and vectorize the first loop -
voltage = reshape(voltage,33,[]);
k = mean(voltage);
c = reshape(voltage - k,1,[]);
idx = 1:33:33*V;
t=time(idx)
d=c(idx);
Any particular reason you are performing transpose of t and d? Seems redundant to me (from a coding perspective). And as you are dealing with a large array, doing transpose might take a toll on time. If it is necessary, a better alternative would be reshape().
And you should preallocate Y as well.
" transpose might take a toll on time."
0.2 ms
A=rand(1,1e7); % I believe OP array size is 33 time smaller
tic; A=A.'; toc
Elapsed time is 0.000162 seconds.
Well, I did say might. But yeah, it's not the case here.

Sign in to comment.

Answers (1)

instead of doing over and over (in the loop on the bottom) such calculation
sum(omega(i1:i2));
you could do once the cumulative sum
I = cumsum([0; omega(:)]);
and then replace sum(omega(i1:i2)) by
I(i2+1)-I(i1) % == sum(omega(i1:i2));
Your loop becomes (EDIT)
Y = zeros(numel(n),1);
I = cumsum([0; omega(:)]);
for i=1:numel(n)
a = 0;
ni = n(i);
for k=1:(N-(2*n(i))+1)
a = a + ((I(k+2*ni)-I(k+ni))/(ni*t0) - (I(k+ni)-I(k))/(n(i)*t0))^2;
end
adev = sqrt((1/(2*(N-(2*ni)+1)))*a);
Y(i) = adev;
end
which in term can be further vectorized (EDIT)
Y = zeros(numel(n),1);
I = cumsum([0; omega(:)]);
for i=1:numel(n)
ni = n(i);
p = N-2*ni+1;
K = 1:p;
Y(i) = 1/(ni*t0*sqrt(2*p)) * norm(I(K+2*ni) - 2*I(K+ni) + I(K));
end

7 Comments

Check the correctness of the bottom loop with random data.
Your loop
N = 100;
omega = rand(N,1);
n = randi(N,10,1);
t0 = rand;
Y = [];
for i=1:numel(n)
a = 0;
ni = n(i);
for k=1:(N-2*ni+1)
a = a + (((sum(omega(k+n(i):k+n(i)+n(i)-1)/(n(i)*t0)))-(sum(omega(k:k+n(i)-1)/(n(i)*t0))))^2);
end
adev = sqrt((1/(2*(N-(2*n(i))+1)))*a);
Y = [Y; adev];
end
Y
Y = 10×1
0 0 0.1058 0.0964 0.1520 0 0.0839 0.1223 0.0990 0
New method
Y = zeros(numel(n),1);
I = cumsum([0; omega(:)]);
for i=1:numel(n)
ni = n(i);
p = N-2*ni+1;
K = 1:p;
Y(i) = 1/(ni*t0*sqrt(2*p)) * norm(I(K+2*ni) - 2*I(K+ni) + I(K));
end
Y
Y = 10×1
0 0 0.1058 0.0964 0.1520 0 0.0839 0.1223 0.0990 0
Here is the complete code, lasts about half second (> 1000 times faster than 14 minutes)
% Fake data for testing
voltage = rand(1e7,1);
dt = rand;
m = length(voltage);
time = dt*(1:m);
tic
q = 33;
V = floor(m/q);
% Data Pre-processing (DC Zero level in window of 33 data points)
% Data Pre-processing (Skip data points - first data point in each window of 33)
k = mean(voltage(1:q));
c = voltage - k;
i = 1:q:q*V;
t = time(i);
d = c(i);
t = t(:);
d = d(:);
% Voltage (V) to Omega (deg/hr)
% Omega = Voltage / Sensitivity, where Sensitivity is 9.27 mV/deg/s
omega = d*(3600/9.27e-3);
N = length(omega);
% Setting Averaging time
t0 = t(2)-t(1);
n = unique(ceil(logspace(log10(1), log10((N-1)/2), 100).')); % n is the selected discrete values of cluster size
tau = n*t0;
% Computing Allan Deviation
% Allan Deviation Equation DOI 10.1109/TIM.2007.908635
Y = zeros(numel(n),1);
I = cumsum([0; omega(:)]);
for i=1:numel(n)
ni = n(i);
p = N-2*ni+1;
K = 1:p;
Y(i) = 1/(ni*t0*sqrt(2*p)) * norm(I(K+2*ni) - 2*I(K+ni) + I(K));
end
toc
Elapsed time is 0.406899 seconds.
@Bruno Luong, c is incorrectly computed in your code -
% Fake data for testing
voltage = rand(1e7,1);
dt = rand;
m = length(voltage);
time = dt*(1:m);
tic
q = 33;
V = floor(m/q);
% Data Pre-processing (DC Zero level in window of 33 data points)
% Data Pre-processing (Skip data points - first data point in each window of 33)
k1 = mean(voltage(1:q));
c1 = voltage - k1;
%V = floor(length(voltage)/33);
c2 = zeros(33*V, 1);
l=1;
for i=1:33:33*V
k2=0;
for j=1:1+32
k2=k2+voltage(j);
end
k2=k2/33;
for j=i:i+32
c2(j)=voltage(j)-k2;
end
l=l+1;
end
isequal(c1,c2)
ans = logical
0
Indeed the OP original code creates (rather growing) C of length 33*V and not m.
Only some of the c are used to build d to compute Allan deviation, it doesn't matter the last exceeding tail of few elements of c.
You should not compare numerical with ISEQUAL.
Thank you @Bruno Luong @Walter Roberson @Dyuman Joshi for your replies, time and effort. It begins to make sense not to use loops generously.
I have a question @Bruno Luong in your complete code at the data pre-processing regarding variables k and c.
My intention is to split the original data points in various windows of size 33. Each window has a average value, k which need to be subtracted from each element in that window. However, I use only the first element of each window and the rest is discarded.
But in your code I understand that k is a scalar (mean of the first window) which is subtracted from all the elements in voltage array. k is not same for all the elements of voltage, rather only to the particular window, In that case, wouldn't it be necessary to use a 'for loop' to find k in every window?
Thanks.
@Pradeep Chandran "But in your code I understand that k is a scalar (mean of the first window"
I simply duplicate what your code does.
k=0;
for j=1:1+32
k=k+voltage(j);
end
k=k/33;
Your code compute k as mean of the first window, look at j index of your loop.
If you want the mean of the current windows you should make
for j=i:i+32
...
end
I believe few people ask you about this oddness and potential bug, including me but I delete later the question since you did not answer.
My other question (was deleted so I reapeat here) is that why you use the first value of c in the window to set d
d(l)=c(i);
Is it intentional, and the other value of voltages are used only to compute the mean value k?
If I had to chose one c per window do nuild d, I would rather select the middle point
d(l)=c(i+16);

Sign in to comment.

Categories

Find more on MATLAB in Help Center and File Exchange

Asked:

on 6 Aug 2023

Moved:

on 8 Sep 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!