How to average over different length vectors without excessive for loops?

131 views (last 30 days)
Hi there,
My problem involves running lots of different stochastic simulations (imagine some sort of Brownian motion) and then averaging over all of these different histories to compute quantites such as mean, variance etc
At the moment for each run I do have an output vector that that e.g. could be
X1 = [0 1 4 6 8]
where each new entry in the vector represents the position of a particle after a standard time increment. Here we have 5 elements of the vector so there have been 4 time increments. Although in practice these would be much longer. The problem is that each run ends when a certain condition is met (say X = 8) and this generically happens after differnt times. This means the next run might be something like
X2 = [0 4 8]
Which is only 3 elements long and thus only 2 time increments. I have done this for R number of runs. If each Xi vector had the same length I know I could simply collect them in one object X like so:
X = [X1; X2; ... XR]
and then compute the mean using the mean function in the appropriate direction. However unfortunately this wouldn't work in this case as the vectors are of different lengths.
For example if all I had was X1 and X2 I want some process that would calculate the mean at each timestep like so
mean1 = (X1(1)+X2(1))/2; mean2 = (X1(2)+X2(2))/2; mean3 = (X1(3)+X2(3))/2; %data at each timestep for X1 and X2 runs so average over both
mean4 = X1(4); mean5 = X1(5); %no X2 data for these timesteps so only averaging over X1 run
meanX = [mean1 mean2 mean3 mean4 mean5]
But obviously in a way that is scaleable without doing this process thousands of times using lots of for loops. In my actual code I have several thousand runs with each run having several hundred elements so this needs to be reasonably scaleable.
Thanks for any help people can offer and I'm obviously happy to try and clarify anything I have poorly explained

Accepted Answer

Adam Danz
Adam Danz on 1 Dec 2020
Edited: Adam Danz on 1 Dec 2020
I suggest collected all of the variable-length row-vectors within a cell array, then organize them in a matrix and use NaN to pad missing values. Then you can use the "omitnan" property of mean() to average across columns while ignoring NaNs.
Demo:
a{1} = [1 2 5];
a{2} = [5 1 3 5];
a{3} = [9 0 2 1 8];
a{4} = [4 2];
% Vertically concatenate, pad with NaNs
maxNumCol = max(cellfun(@(c) size(c,2), a)); % max number of columns
aMat = cell2mat(cellfun(@(c){padarray(c,[0,maxNumCol-size(c,2)],NaN,'Post')}, a)')
aMat = 4×5
1 2 5 NaN NaN 5 1 3 5 NaN 9 0 2 1 8 4 2 NaN NaN NaN
colMeans = mean(aMat,1,'omitnan')
colMeans = 1×5
4.7500 1.2500 3.3333 3.0000 8.0000
  5 Comments
Ashfaq Ahmed
Ashfaq Ahmed on 4 Apr 2023
@Adam Danz this is a brilliant approach. Can you please help me to write the code as a function in a way that we only need to input the variables (of different lengths) and it will do the mean of them?
dpb
dpb on 4 Apr 2023
Edited: dpb on 5 Apr 2023
What do you want the footprint of the function to be -- any number of vectors of variable length?
If so, then use varargin and you'll have the cell array automagically. All you'll have to do is ensure they're all oriented the same direction first; Adam's solution above assumes they're row vectors--
function colMeans=avgVecs(varargin)
a=varargin; % use Adam's internal variable; could change a-->varargin
% Vertically concatenate, pad with NaNs
maxNumCol = max(cellfun(@(c) size(c,2), a)); % max number of columns
aMat = cell2mat(cellfun(@(c){[c nan(1,maxNumCol-numel(c))]}, a)');
colMeans = mean(aMat,1,'omitnan');
end
Locally, the above with the same input vectors as separate variables
>> avgVecs(a,b,c,d)
ans =
4.7500 1.2500 3.3333 3.0000 8.0000
>>
I don't have Image Processing TB so replaced padarray with base MATLAB code.
In general, I wouldn't recommend going at it this way in creating the multiple named variables; it would be better to use a cell array initially and avoid the need to make the conversion entirely. In that case, you would simply pass in the cell array itself; varargin does the dirty work of creating a cell array out of multiple inputs when used in a function argument as shown. There's no equivalent neat syntax I'm aware of that does this directly at the command line or inside a script or function without the call to the lower-level function. You could, of course, simply have the oneliner function of
function varargout=vecs2cell(varargin)
varargout=varargin;
end
The output would be the 1x4 cell array; of course at this point they wouldn't be yet padded to common length, but that's what Adam's code expects as input.

Sign in to comment.

More Answers (2)

dpb
dpb on 1 Dec 2020
Use a cellarray to store the results of each trial instead of individual named variables; then
means=cellfun(@mean,x);

David Hill
David Hill on 1 Dec 2020
I would use a cell array.
for k=1:100
x{k}=randi(100,1,randi(1000));%simulate your outputs
end
Mean=zeros(1,100);
for k=1:100
Mean(k)=mean(x{k});%calculate the mean and whatever else you want
end

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!