Splitting a matrix according to there labels
Show older comments
I have a matrix of (1900 x 4 double), fourth column contains labels 3, 2 and 1. I want to split this data in 20:80 ratio of A and B where A contains 20% of each labels 3,2,&1. And B contains 80% of each labels i.e. 80% of label 3, 80% of label 2 and 80% of label 1. Please help how can this be achieved.
6 Comments
Dyuman Joshi
on 10 May 2022
How exactly would you split? By what criteria? Also, what do you mean by labels?
NotA_Programmer
on 10 May 2022
dpb
on 10 May 2022
Dyuman Joshi
on 10 May 2022
By split, I meant how exactly to allot the 20% and 80%? Randomly or by a criteria?
NotA_Programmer
on 10 May 2022
dpb
on 10 May 2022
Accepted Answer
More Answers (1)
[ix,idx]=findgroups(X(:,4)); % get grouping variable on fourth column X
for i=idx.' % for each group ID (must be numeric as here)
I=I(find(ix==i)); % the indices into X for the group
N=numel(I); % how many in this group
I=I(randperm(N)); % rearrange randomly the elements of index vector
nA=floor(0.8*N); % how many to pick for A (maybe round() instead???)
iA{i}=I(1:nA); % the randomized selection for A
iB{i}=I(nA+1:end); % rest for B
end
5 Comments
As is; yes...it simply illustrates the steps.
I had the loop in an arrayfun construct locally so they were returned as cell arrays automagically, But the complexity of the anonymous function was such as figured not ideal to post so converted to conventional loop adding intermediaries but didn't add the explicit indices.
Add an {i} to each to save the subscript arrays or "do whatever" with them inside the loop before moving on to the next iteration, user choice.
ADDENDUM:
Made correction to Answer; including fixup for another variable name change missed in converting the anonymous function before...
NotA_Programmer
on 10 May 2022
You've got a missing ".'" transpose operator on the for loop iterator -- it must be a row vector; passing a column vector will result in the problem that all three indices are passed at once. I could have made the code more robust by writing
for i=idx(:).'
instead which (:) forces a column vector and ".'" turns it into row.
However, I see I missed an important step in the cleanup from the anonymous function version -- the line
I=randperm(N);
needs to be
I=I(randperm(N));
to rearrange the subset indices to the grouped variables; the randperm(N) call simply generates the right length of vector subscripts in a random order; still need the actual subscripts from the matching operation of finding the ones in the given group.
With those corrections, it should work as is...cleanest would be to copy and paste the actual code instead of retyping; then you also get indenting and comments and all... :)
I did make the above correction in the Answer code...sorry I missed that first time; glad there was another issue that you reposted so had the chance to see it! :)
NotA_Programmer
on 10 May 2022
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!