How can I make a combination/permutation of all possible values with a given subset of data?

Hello, I'm having trouble putting this into words so I'll give an example and hopefully someone can help.
To make it simple, let's say I have a 200 second time series (200x1 array) from 3 regions (A,B,C). Each region has different types, so for all A, theres A1, A2, A3 etc. This also applies to B and C. However the number of types differ for each region. So if A has A1 - A5, B would have B1 - B9 etc.
I want to make an array combination of one of each region. So [A1 B1 C1], [A2 B1 C1], [A3 B1 C1], etc. So if I had 3 regions, I want all combinations of a 200 x 3 array possible using one type from each region.
My question is, currently, I have all the types and regions in one array (200 x 164). So A1:A5 B1:B11 C1:C20 D1:D5 etc. In total, I have 54 regions, so I would want to make all possible combinations of a 200 x 54 array.
Is there a way to do this with how my data is currently organized? Thanks for any suggestions.

2 Comments

I doubt that your computer would be table to store all of those combinations in memory at once. Would it be sufficient to generate tham one-at-a-time ?
The problem may anyway be intractable due to the total number of combinations required.

Hi @Andrew You ,

To generate all possible combinations of a 200 x 54 array from your current 200 x 164 array, you can extract the regions you need and concatenate them to form the desired array. Here's a sample code snippet to achieve this:

% Sample data (replace this with your actual data)

data = rand(200, 164); % Assuming your data is stored in a variable named 'data'

% Extract regions A1:A5, B1:B11, C1:C20, D1:D5 (adjust the indices accordingly)

regions_A = data(:, 1:5);

regions_B = data(:, 6:16);

regions_C = data(:, 17:36);

regions_D = data(:, 37:41);

% Concatenate the extracted regions to form a 200 x 54 array

combined_array = [regions_A, regions_B, regions_C, regions_D]; % Add more regions as needed

% Display the size of the combined array

size(combined_array)

So, by extracting the regions of interest and concatenating them, you can create the desired 200 x 54 array. Make sure to adjust the indices and add more regions as necessary to cover all 54 regions in your data. Please see attached results of code snippet.

Please let me know if you have any further questions.

Sign in to comment.

Answers (1)

Below is example code for running through all combinations of a simpler problem of just 9 regions (A1:A3, B1:B2, C1, D1:D3). You can update the parameter settings for your full problem. dataCombinations stores all the combinations in a single variable, with the third index iterating over the combinations. But as Stephen23 remarked, storing all the combinations may require too much memory. So it would be more efficient to process each combination as it's generated.
% using smaller values for testing and demonstration
nTime = 1; % 200 in full problem
nRegionClass = 4; % 54 in full problem
nRegionClassSize = [3 2 1 3]; % to be updated for full problem
nRegionTotal = sum(nRegionClassSize);
data = rand(nTime, nRegionTotal); % dummy values for testing
nCombinations = prod(nRegionClassSize);
iRegionStart = cumsum([0 nRegionClassSize(1:end-1)]); % index of region just before each class
dataCombinations = zeros(nTime, nRegionClass, nCombinations);
combCounters = ones(1, nRegionClass);
for i = 1:nCombinations
regionSubset = combCounters + iRegionStart;
disp("Combination #" + num2str(i) + ": " + num2str(regionSubset));
dataCombinations(:, :, i) = data(:, regionSubset); % extracts data for region combinations
for j = 1:nRegionClass
if combCounters(j) < nRegionClassSize(j)
combCounters(j) = combCounters(j) + 1;
break;
else
combCounters(j) = 1;
end
end
end
Combination #1: 1 4 6 7 Combination #2: 2 4 6 7 Combination #3: 3 4 6 7 Combination #4: 1 5 6 7 Combination #5: 2 5 6 7 Combination #6: 3 5 6 7 Combination #7: 1 4 6 8 Combination #8: 2 4 6 8 Combination #9: 3 4 6 8 Combination #10: 1 5 6 8 Combination #11: 2 5 6 8 Combination #12: 3 5 6 8 Combination #13: 1 4 6 9 Combination #14: 2 4 6 9 Combination #15: 3 4 6 9 Combination #16: 1 5 6 9 Combination #17: 2 5 6 9 Combination #18: 3 5 6 9

3 Comments

Thank you very much for the fast answer and code!
I see that memory will be the issue, as the number of combinations would be 4.2797e+15 with my current data. The next plan was to calculate some metric on the combinations to find the combination that creates the best metrics. Would this be possible still since MATLAB would have to still store that many calculated metrics to compare with one another? If so, where should I apply the metric calculations?
So after each combination made 200 time series x 54 regions, I plan to calculate a linear correlation (corr) of that combination and calculate several metrics on the resulting rho. So in theory, I would have 4.2797e+15 for one metric and I want to find the best combination that created the highest metric. Alternatively, I could apply a threshold, so if a combination doesnt pass a threshold, I can remove it. Would this be more feasible?
So you have 4.2797e+15 combinations... lets assume that your code can process them at a rate of one million combinations per second, then you will only need to wait:
4.2797e+15 / (1e6 * 60*60*24*365)
ans = 135.7084
one hundred and thirty-six years for the results.
You might need to think about your approach a bit more, e.g. perhaps use dynamic programming.
FYI you can perform this computation without the "magic numbers" 60, 24, and 365 using some duration functions.
numCombinations = 4.2797e15;
Y = years(seconds(numCombinations/1e6))
Y = 135.6183
This matches the computations with "magic numbers" if you use 365.2425 instead of 365.
4.2797e+15 / (1e6 * 60*60*24*365.2425)
ans = 135.6183
It doesn't make a lot of difference in this case, shaving off a mere 0.1 year, but IMO the intent of the years and seconds calls is a little clearer.
I agree with your last statement; brute-forcing this problem is probably not the best approach. Without knowing the problem the original poster wants to solve, offering specific suggestions for a different approach doesn't seem possible.

Sign in to comment.

Asked:

on 29 Jul 2024

Commented:

on 31 Jul 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!