How do I apply the same operation on vectors of different length but of similar name?
1 Comment
Accepted Answer
Hi @Henning,
I wanted to address your question about calculating means for your 50 column vectors and clear up some confusion. First, Stephen23 is absolutely correct that cell arrays do NOT require equal-length vectors, so that assumption you had is not a limitation at all. The MATLAB documentation confirms cell arrays can hold vectors of any length, which makes them perfect for your situation. However, the real issue Stephen23 is pointing out is that having 50 separate variables like A_1, A_2, A_3, etc. in your workspace is actually the root problem, and that's where you should focus your fix. The approach you're thinking of with a loop and startsWith condition will technically work, but as Stephen23 mentions in that tutorial link, dynamically named variables lead to slow, complex, and inefficient code. What you really want to do is go back to where you're importing the data and change that process to load everything into a single container from the start, either a cell array, a structure, or a containers.Map object.
I've put together a complete working example with synthetic sensor data that shows you four different solutions. Solution 1 uses a structure where you store each vector as a field like dataStruct.Sensor_1, dataStruct.Sensor_2, etc., and then you can loop through the field names, filter with startsWith, and calculate means easily. Solution 2 uses a cell array where all your vectors go into sensorData{1}, sensorData{2}, etc., and you can calculate all the means in literally one line using cellfun(@mean, sensorData), which is incredibly clean and fast. Solution 3 shows you how to work with your existing scattered variables using who() to get all variable names, filtering them with startsWith, and then using eval() in a loop to access each one and calculate its mean, but this is the slowest approach and should only be used if you absolutely cannot refactor your import code. Solution 4 demonstrates containers.Map which is a modern key-value storage approach that's very flexible for lookups, though it's about 4 times slower than the cell array method based on my benchmarking.
The code I've included generates 50 synthetic temperature sensors with varying numbers of samples from 5 to 200 readings each, calculates mean, standard deviation, and sample count for each sensor using all four methods, and creates visualizations showing sensor means with error bars, the distribution of sample counts, a histogram of mean temperatures, and raw data traces from the first three sensors. When I ran performance tests over 100 iterations, the structure approach took 0.0136 seconds, cell array took 0.0163 seconds, and containers.Map took 0.0634 seconds, clearly showing that the cleaner approaches are also faster. The visualizations show a nice upward temperature trend across sensors from about 22 to 25 degrees Celsius with realistic noise, and the histogram shows most sensor means cluster around 23.5 degrees, which confirms the synthetic data is working as expected.
My recommendation is this: if you can modify your import code, absolutely do that and use either a cell array or structure from the start, with cell arrays being slightly preferable for your use case since cellfun makes calculations so elegant. If you're completely stuck with existing variables and cannot change the import process, then yes, use the who() and eval() approach I showed in Solution 3, but understand this is a workaround for a bad situation, not a best practice. The key takeaway Stephen23 is emphasizing is that you should never create dynamically named variables in the first place because it makes everything harder down the line, and the time you invest now in fixing your import process will save you countless hours of frustration later. I've attached the complete code with all four solutions, performance benchmarking, and visualizations so you can see exactly how each approach works with real data, and you can adapt whichever solution fits your current situation best.
%% BETTER SOLUTIONS FOR CALCULATING MEANS OF MULTIPLE VECTORS % This script demonstrates improved approaches using realistic synthetic data
%% Generate realistic synthetic sensor data
fprintf('Generating synthetic sensor data...\n');
rng(42); % For reproducibility
% Create synthetic data that mimics real-world scenario numSensors = 50; sensorData = cell(numSensors, 1); sensorNames = cell(numSensors, 1); sensorLengths = zeros(numSensors, 1);
for i = 1:numSensors % Each sensor has different number of readings (5 to 200 samples) nSamples = randi([5, 200]); % Generate temperature-like data: baseline around 22°C with noise baseline = 22; trend = (i/numSensors) * 3; % Slight trend across sensors noise = randn(nSamples, 1) * 2; % 2°C standard deviation
sensorData{i} = baseline + trend + noise;
sensorNames{i} = sprintf('Sensor_%d', i);
sensorLengths(i) = nSamples;
endfprintf('Generated %d sensors with %d to %d samples each\n\n', ...
numSensors, min(sensorLengths), max(sensorLengths));
%% SOLUTION 1: Structure-based approach (BEST)
fprintf('=== SOLUTION 1: Structure (BEST) ===\n');
dataStruct = struct();
for i = 1:numSensors
dataStruct.(sprintf('Sensor_%d', i)) = sensorData{i};
end
fieldNames = fieldnames(dataStruct); sensorFields = fieldNames(startsWith(fieldNames, 'Sensor_'));
means1 = zeros(length(sensorFields), 1); stdDevs1 = zeros(length(sensorFields), 1); counts1 = zeros(length(sensorFields), 1);
for i = 1:length(sensorFields)
data = dataStruct.(sensorFields{i});
means1(i) = mean(data);
stdDevs1(i) = std(data);
counts1(i) = length(data);
end
results1 = table(sensorFields, means1, stdDevs1, counts1, ...
'VariableNames', {'Sensor', 'Mean', 'StdDev', 'NumSamples'});
fprintf('First 10 sensors:\n');
disp(results1(1:10,:));
fprintf('Summary: Mean temperature across all sensors: %.2f°C\n',
mean(means1));
fprintf('Temperature range: %.2f°C to %.2f°C\n\n', min(means1),
max(means1));
%% SOLUTION 2: Cell Array (VERY GOOD)
fprintf('=== SOLUTION 2: Cell Array (VERY GOOD) ===\n');
means2 = cellfun(@mean, sensorData); stdDevs2 = cellfun(@std, sensorData); counts2 = cellfun(@length, sensorData);
results2 = table(sensorNames, means2, stdDevs2, counts2, ...
'VariableNames', {'Sensor', 'Mean', 'StdDev', 'NumSamples'});
fprintf('First 10 sensors:\n');
disp(results2(1:10,:));
fprintf('This approach is clean, fast, and ideal for varying-length vectors\n\n');
%% SOLUTION 3: Working with existing workspace variables (IF NEEDED)
fprintf('=== SOLUTION 3: Workspace Variables (ACCEPTABLE) ===\n');
fprintf('Simulating scattered variables in workspace...\n');
clearvars Sensor_*
for i = 1:numSensors
eval(sprintf('Sensor_%d = sensorData{%d};', i, i));
end
allVars = who('Sensor_*');
numVars = length(allVars);
means3 = zeros(numVars, 1); stdDevs3 = zeros(numVars, 1); counts3 = zeros(numVars, 1);
for i = 1:numVars
data = eval(allVars{i});
means3(i) = mean(data);
stdDevs3(i) = std(data);
counts3(i) = length(data);
end
results3 = table(allVars, means3, stdDevs3, counts3, ...
'VariableNames', {'Sensor', 'Mean', 'StdDev', 'NumSamples'});
fprintf('First 10 sensors:\n');
disp(results3(1:10,:));
fprintf('Note: This works but is slower. Use only if variables already exist.\n\n');
%% SOLUTION 4: containers.Map (MODERN)
fprintf('=== SOLUTION 4: containers.Map (MODERN) ===\n');
dataMap = containers.Map();
for i = 1:numSensors
key = sprintf('Sensor_%d', i);
dataMap(key) = sensorData{i};
end
allKeys = keys(dataMap); sensorKeys = allKeys(startsWith(allKeys, 'Sensor_')); sensorKeys = sort(sensorKeys);
sensorKeysCol = sensorKeys(:); means4 = cellfun(@(k) mean(dataMap(k)), sensorKeysCol); stdDevs4 = cellfun(@(k) std(dataMap(k)), sensorKeysCol); counts4 = cellfun(@(k) length(dataMap(k)), sensorKeysCol);
results4 = table(sensorKeysCol, means4, stdDevs4, counts4, ...
'VariableNames', {'Sensor', 'Mean', 'StdDev', 'NumSamples'});
fprintf('First 10 sensors:\n');
disp(results4(1:10,:));
fprintf('Great for dynamic key-value storage and lookups\n\n');
%% BONUS: Matrix approach (if equal length)
fprintf('=== BONUS: Matrix Approach (if equal-length vectors) ===\n');
matrixData = randn(100, numSensors) * 2 + 22 + (1:numSensors)/numSensors * 3;
columnMeans = mean(matrixData, 1); columnStdDevs = std(matrixData, 1);
fprintf('Calculated means for %d sensors in one line!\n', numSensors);
fprintf('First 10 means: ');
fprintf('%.2f ', columnMeans(1:10));
fprintf('\n');
fprintf('This is the fastest approach when vectors have equal length.\n\n');
%% Visualization
fprintf('=== CREATING VISUALIZATION ===\n');
figure('Position', [100, 100, 1200, 600]);
subplot(2,2,1);
errorbar(1:numSensors, means2, stdDevs2, 'o-', 'LineWidth', 1.5);
xlabel('Sensor Number');
ylabel('Temperature (°C)');
title('Sensor Measurements: Mean ± Std Dev');
grid on;
subplot(2,2,2);
bar(counts2);
xlabel('Sensor Number');
ylabel('Number of Samples');
title('Sample Count per Sensor');
grid on;
subplot(2,2,3);
histogram(means2, 20, 'FaceColor', [0.3 0.6 0.9]);
xlabel('Mean Temperature (°C)');
ylabel('Frequency');
title('Distribution of Sensor Means');
grid on;
subplot(2,2,4);
hold on;
colors = lines(3);
for i = 1:3
plot(sensorData{i}, 'Color', colors(i,:), 'DisplayName', sensorNames{i});
end
xlabel('Sample Index');
ylabel('Temperature (°C)');
title('Raw Data: First 3 Sensors');
legend('Location', 'best');
grid on;
hold off;
fprintf('Visualization complete!\n\n');
%% Performance comparison
fprintf('=== PERFORMANCE COMPARISON ===\n');
nRuns = 100;
tic; for run = 1:nRuns temp = cellfun(@mean, sensorData); end time2 = toc;
tic;
for run = 1:nRuns
for i = 1:length(sensorFields)
temp = mean(dataStruct.(sensorFields{i}));
end
end
time1 = toc;
tic; for run = 1:nRuns temp = cellfun(@(k) mean(dataMap(k)), sensorKeysCol); end time4 = toc;
fprintf('Time for %d runs:\n', nRuns);
fprintf(' Cell array: %.4f sec (fastest, baseline)\n', time2);
fprintf(' Structure: %.4f sec (%.1fx slower)\n', time1, time1/time2);
fprintf(' containers.Map: %.4f sec (%.1fx slower)\n', time4, time4/time2);
fprintf('\n');
%% Recommendations
fprintf('=== FINAL RECOMMENDATIONS ===\n');
fprintf('1. BEST: Cell array (Solution 2) - fast, clean, handles varying
lengths\n');
fprintf('2. VERY GOOD: Structure (Solution 1) - organized, readable, easy to
maintain\n');
fprintf('3. MODERN: containers.Map (Solution 4) - flexible, good for
lookups\n');
fprintf('4. LAST RESORT: Workspace variables (Solution 3) - slow, only if
already exists\n');
fprintf('5. IDEAL: Matrix - if all vectors have equal length, use this!\n');
fprintf('\n');
fprintf('KEY TAKEAWAY: Import your data into a container (cell/struct/map)
from the start!\n');
fprintf(' Never create separate variables like A_1, A_2, A_3...\n');
Note: please see attached.
Hope this helps clarify things and gives you a clear path forward. Let me know if you have any questions about implementing any of these solutions.
More Answers (1)
See Also
Categories
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!