How to divide a vector randomly in 3 groups?

Hello everyone,
I have the following problem: I have a column vector A = [1:1000]' and I would like to divide it in 3 groups randomly. (For example in A1=[1:200], A2=[1:450], A3=[1:350]) It doesn't matter if the groups are contiguous. That is, if the first group is made up of sample 1 to 200, the second group should be made up of sample 201 to 650, etc.
The only requirement is that each group must contain at least 10% to 80% of the data. That is, there cannot be a group with 2 samples and the rest with 499 and 499.
Thanks in advance,
J.F.

2 Comments

Each group contains 10% to 80% of the data, and you allow overlaps?
Yes. The goal is really to know how many samples each group has (number), so even if they overlap it would not be a problem.
Thanks for the reply.

Sign in to comment.

 Accepted Answer

Based on my current understanding, maybe this rejection method might do what you want. Again, since there are only three groups the rejection percentage of about 23% might be tolerable to you.
A = your vector
n = numel(A);
n10 = floor(0.10*n);
while( true )
p = n10 + sort(randperm(n-2*n10+1,2)-1);
if( p(1) >= n10 && p(2)-p(1) >= n10 && n-p(2) >= n10 )
break;
end
end
G = {A(1:p(1)),A(p(1)+1:p(2)),A(p(2)+1:end)};

More Answers (1)

Maybe a simple loop:
n = numel(A);
n10 = floor(0.10*n)-1;
n80 = floor(0.80*n);
for k=1:3
k1 = randi(n-n10);
k2 = k1 + n10 + randi(min(n-n10-k1+1,n80-n10)) - 1;
G{k} = A(k1:k2);
end

6 Comments

Thanks, I appreciate your intention to help. But the result is not what i would like.
It is necessary that all the data is in the three different groups. With this code, some samples are not grouped. Any idea how to solve it?
Thanks in advance,
J.F.
I thought you wrote that overlaps were OK. And what do you mean by samples not being "grouped"?
Yes, overlaps are OK. The problem occurs when I run it for a vector of 1: 1000 samples. In theory, the sum of the 3 groups should be 1000, and for example, in the following screenshots it can be seen that this is not the case.
The goal is to divide the vector from 1 to 1000 samples into three different groups. Where each group contains at least 10% of the total of the samples and a maximum of 80% of the total of the samples. For example, if it is randomly found that the first group is made up of 80% of the samples, group 2 and 3 will be made up of 10% and 10% of the samples. Another example would be if the first group is made up of 50% of the samples, the second can be made up of 30% and the third is made up of 20% of the samples.
In the following picture I have tried to illustrate the divition of the total samples (from 1 to 1000 samples divided in 1st group, 2nd group and 3rd group).
I hope that now the problem will be understanded!
Well, now it sounds like you want complete coverage of your sample set. This doesn't seem to match the A1=[1:200], A2=[1:450], A3=[1:350] example you originally posted where the apparent indexing all started at 1. But maybe what you meant to convey is just the number of elements in each group, not the indexing. True?
Yes, true, I meant the number of elements in each group in this first example, sorry for the confusion. If you know how to resolve the problem I would appreciate.
Thank you very much.
Hello again @James Tursa! Sorry for the inconvenence, do you know how I can resolve this problem? I have not managed to fix it ... Thank you again for your selfless help.
Regards,
J.F.

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!