# How to break data in to groups using while loop?

2 views (last 30 days)

Show older comments

matrix = [1 50 60 70 50 40

2 NaN 10 20 10 10

3 NaN 20 NaN NaN NaN

1 NaN 60 30 40 50

2 10 20 10 20 NaN

1 30 20 40 NaN 50

2 NaN 50 50 NaN NaN]

The first column indicates to which group the rows belong to. "1, 2, 3" (row 1, 2, 3) are Group 1, "1, 2" (row 4, 5 ) are Group 2, the next "1, 2" ( row 6, 7) are Group 3.

I am trying to find the number of NaN in each group. The desired result would be:

ans =

5

2

4

Is it possible to do this using a while loop?

eg. In Column 1, while "the current element" is larger than "the element in the previous row", statement

I am sorry if this description is confusing.

##### 0 Comments

### Accepted Answer

dpb
on 31 Jan 2023

Edited: dpb
on 31 Jan 2023

Answer the Q? actually asked -- I'll post this as a separate answer for convenience but not remove the first.

OK, as @Stephen23 pointed out earlier, I didn't read the Q? carefully enough and just assumed the first column values were the group IDs. As the comment below says, I'd first build the grouping variable from the definition of the group; it'll bound to be useful later, anyway. (And, it illustrates another technique worth knowing...)

M= [1 50 60 70 50 40

2 NaN 10 20 10 10

3 NaN 20 NaN NaN NaN

1 NaN 60 30 40 50

2 10 20 10 20 NaN

1 30 20 40 NaN 50

2 NaN 50 50 NaN NaN];

g=nan(size(M,1),1); % first build a grouping variable of right size; missing values

g(M(:,1)==1)=unique(M(:,1)); % populate the first of each group location with its ID/group

g=fillmissing(g,'previous'); % and fill in the rest

nNaN=sum(groupsummary(M,g,@(x)sum(isnan(x))),2)

NOTA BENE: the whole M array can be passed here; the first column is immaterial to the count...

nNaN=splitapply(@(x)sum(isnan(x),'all'),M(:,2:end),g)

Oh. NOTA BENE SECOND:

The above needs modification to build the grouping variable in general -- it's probably just coincidence there are three groups and a maximum of three rows in any one group. So, unique isn't the generic answer for the RHS of the assignment of the initial group indices to the grouping variable -- it would be more like

g=nan(size(M,1),1); % first build a grouping variable of right size; missing values

ix=find(M(:,1)==1); % the locations of each group start

g(ix)=1:numel(ix); % populate the first of each group location with its ID/group

This will count the number of times the first index value occurs and where and generate that many groups irrespective of the number of records/group.

##### 8 Comments

dpb
on 3 Feb 2023

"...elements that are NaN in one column but are non-NaN in its previous column"

That's precisely the result of MM above; that the first column is in there is immaterial; it simply is a placeholder to match the size of M.

That's why the solution above uses either MM(:,2:end) or cleans out MM(:,1) when done; it is known that the first column isn't of interest. But, that doesn't affect the remaining columns.

Oh. Brain freeze -- it's the first two columns of the result that are invalid, not just the first. So, use

M= [1 50 60 70 50 40

2 NaN 10 20 10 10

3 NaN 20 NaN NaN NaN

1 NaN 60 30 40 50

2 10 20 10 20 NaN

1 30 20 40 NaN 50

2 NaN 50 50 NaN NaN];

MM=isfinite(circshift(M,1,2))&isnan(M);

MM(:,1:2)=false

### More Answers (6)

the cyclist
on 31 Jan 2023

Here is one way:

matrix = [1 50 60 70 50 40

2 NaN 10 20 10 10

3 NaN 20 NaN NaN NaN

1 NaN 60 30 40 50

2 10 20 10 20 NaN

1 30 20 40 NaN 50

2 NaN 50 50 NaN NaN];

g = findgroups(matrix(:,1));

out = splitapply(@(x)sum(isnan(x(:))),matrix(:,2:end),g)

##### 2 Comments

dpb
on 31 Jan 2023

the cyclist
on 31 Jan 2023

dpb
on 31 Jan 2023

Edited: dpb
on 31 Jan 2023

No explicit looping construct needed; let MATLAB do it for you...I shortened your variable name to M...

M= [1 50 60 70 50 40

2 NaN 10 20 10 10

3 NaN 20 NaN NaN NaN

1 NaN 60 30 40 50

2 10 20 10 20 NaN

1 30 20 40 NaN 50

2 NaN 50 50 NaN NaN];

nNaN=sum(groupsummary(M(:,2:end),M(:,1),@(x)sum(isnan(x))),2)

See <groupsummary> for the details on using it; in short, the first argument is the array to compute the summary over (in your case everything except the first column) while the second is the grouping variable (your first column). Then the function to apply can be a builtin such as 'mean' or as shown above, whatever you want it to be.

groupsummary applies the function to each column of the array by grouping variable so above we first add up how many True values are returned for each column by isnan, that returns an array of counts by column; The total for each group then is simply the sum of those by row (the second, optional argument, 2) applied to the resulting array.

ADDENDUM: (Answer the Q? actually asked)*

OK, as @Stephen23 points out below, I didn't read the Q? carefully enough and just assumed the first column values were the group IDs. As the comment below says, I'd first build the grouping variable from the definition of the group; it'll bound to be useful later, anyway. (And, it illustrates another technique worth knowing...)

g=nan(size(M,1),1); % first build a grouping variable of right size; missing values

g(M(:,1)==1)=unique(M(:,1)); % populate the first of each group location with its ID/group

g=fillmissing(g,'previous'); % and fill in the rest

nNaN=sum(groupsummary(M(:,2:end),g,@(x)sum(isnan(x))),2)

Same solution still works, just use the new grouping variable in place of the column values...

The above can/will work generically for other numbering sequences but one would need to save the result of unique and explicitly code for whatever was the indicator value for starting the new sequence other than 1.

*PS. I wondered why so many people were getting the wrong answer first time... :)

##### 2 Comments

dpb
on 31 Jan 2023

Benjamin Thompson
on 31 Jan 2023

Edited: Jan
on 1 Feb 2023

Even better, use the vectorization features in Matlab to count NaNs in each row all at once.

nanCounts = sum(isnan(matrix),1); % This will count NaNs in each row.

groupCounts = zeros(size(unique(matrix),1),); % Creates a zero vector whose length is equal to number of groups

for i = 1:length(groupCounts)

I = nanCounts(matrix(:,1) == i; % Creates an index vector for group i counts

groupCounts(i) = sum(I);

end

##### 0 Comments

Voss
on 31 Jan 2023

matrix = [1 50 60 70 50 40

2 NaN 10 20 10 10

3 NaN 20 NaN NaN NaN

1 NaN 60 30 40 50

2 10 20 10 20 NaN

1 30 20 40 NaN 50

2 NaN 50 50 NaN NaN];

group_start_idx = find(matrix(:,1) == 1);

group_end_idx = find(diff([matrix(:,1); 1]) <= 0);

n_groups = numel(group_start_idx);

n_nans = zeros(n_groups,1);

for ii = 1:n_groups

n_nans(ii) = nnz(isnan(matrix(group_start_idx(ii):group_end_idx(ii),:)));

end

disp(n_nans);

##### 0 Comments

Voss
on 31 Jan 2023

"Is it possible to do this using a while loop?"

matrix = [1 50 60 70 50 40

2 NaN 10 20 10 10

3 NaN 20 NaN NaN NaN

1 NaN 60 30 40 50

2 10 20 10 20 NaN

1 30 20 40 NaN 50

2 NaN 50 50 NaN NaN];

n_nans = [];

row = 1;

n_rows = size(matrix,1);

while row <= n_rows

group_start_row = row;

while row < n_rows && matrix(row+1,1) > matrix(row,1)

row = row+1;

end

group_end_row = row;

n_nans(end+1,1) = nnz(isnan(matrix(group_start_row:group_end_row,:)));

row = row+1;

end

disp(n_nans);

##### 0 Comments

Jan
on 1 Feb 2023

matrix = [1 50 60 70 50 40; ...

2 NaN 10 20 10 10; ...

3 NaN 20 NaN NaN NaN; ...

1 NaN 60 30 40 50; ...

2 10 20 10 20 NaN; ...

1 30 20 40 NaN 50; ...

2 NaN 50 50 NaN NaN];

group = cumsum([1; diff(matrix(:, 1)) < 0]); % [1 1 1 2 2 3 3].'

data = sum(isnan(matrix), 2);

result = accumarray(group, data)

##### 1 Comment

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!