Odd behavior of unstack in Matlab2025b

Hello,
When calling the unstack function using @sum as the AggregationFunction for numerical values in 2025b, I'm getting the warning below and the empty groups get a zero (from summing into a 0-1 input)
Warning: When a group has no rows for a given value of the indicator variable, UNSTACK calls the supplied aggregation function with an input of size 0-by-1 instead of automatically filling the value.
Review the output to ensure desired result is obtained. This warning might be removed in a future release.
However the function documentation states the following:
Missing value of the appropriate data type, such as a NaN, NaT, missing string, or undefined categorical value.
Which is the behavior I used to get in previous Matlab versions (e.g. 2019b), would get NaNs from empty groups, and which I would have epxected to happend here since its what the documentation suggests.
Any thoughts on why this might be happening or if I can change the behavior so as to get NaNs instead of calling the aggregation function into a 0-1 input for the empty groups?

Answers (2)

Matt J
Matt J on 20 Mar 2026 at 19:13
Edited: Matt J on 20 Mar 2026 at 19:21
The change occurred in R2020a and was documented,
The workaround would be to stipulate an aggregation function that returns NaN for empty inputs, e.g.,
Date = [repmat(datetime('2008-04-12'),6,1);...
repmat(datetime('2008-04-13'),5,1)];
Stock = categorical({'Stock1';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock1';'Stock2'} ,{'Stock1';'Stock2';'Stock3'});
Price = [60.35;27.68;64.19;25.47;28.11;27.98;...
63.85;27.55;26.43;65.73;25.94];
S = timetable(Date,Stock,Price);
S([8,9,11],:)=[]
S = 8×2 timetable
Date Stock Price ___________ ______ _____ 12-Apr-2008 Stock1 60.35 12-Apr-2008 Stock2 27.68 12-Apr-2008 Stock1 64.19 12-Apr-2008 Stock2 25.47 12-Apr-2008 Stock2 28.11 12-Apr-2008 Stock2 27.98 13-Apr-2008 Stock1 63.85 13-Apr-2008 Stock1 65.73
U = unstack(S,'Price','Stock', Aggregation=@(z) sum(z)/~isempty(z))
U = 2×2 timetable
Date Stock1 Stock2 ___________ ______ ______ 12-Apr-2008 124.54 109.24 13-Apr-2008 129.58 NaN

20 Comments

dpb
dpb on 20 Mar 2026 at 20:11
Edited: dpb on 21 Mar 2026 at 0:39
Date = [repmat(datetime('2008-04-12'),6,1);...
repmat(datetime('2008-04-13'),5,1)];
Stock = categorical({'Stock1';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock1';'Stock2'} ,{'Stock1';'Stock2';'Stock3'});
Price = [60.35;27.68;64.19;25.47;28.11;27.98;...
63.85;27.55;26.43;65.73;25.94];
S = timetable(Date,Stock,Price);
S([8,9,11],:)=[];
U = unstack(S,'Price','Stock')
U = 2×2 timetable
Date Stock1 Stock2 ___________ ______ ______ 12-Apr-2008 124.54 109.24 13-Apr-2008 129.58 NaN
I was too lazy to try to build an example, @Matt J, so I'll tag on to yours. <grins>
As my earlier supposition noted, omitting the 'Aggregation' function so it defaults to @sum produces the same result.
Matt J
Matt J about 1 hour ago
Edited: Matt J 4 minutes ago
Yes, but that serves the present purposes only because we are aggregating with summation. For other kinds of aggregation, you will need to handle the empty group case explicitly, if you want it to generate a NaN.
Yes, I am aware. I'm also not a fan of having different behaviors within unstack when calling the same aggregation function with two different syntaxes
Matt J
Matt J 39 minutes ago
Edited: Matt J 27 minutes ago
Well, sure, but the two aggregation functions are not the same. The default is not @sum. The default is,
AggregationFunction = @(z) sum(z)/~isempty(z)
Hence, my posted answer.
The behavior you are seeing when setting AggregationFunction=@sum is how sum([]) normally behaves in all other contexts in Matlab, which is why I think they changed the behavior of unstack() in R2020a. It was a bug fix.
Oh that changes things, but the documentation for unstack shows @sum as the default aggregation function for numeric data, not sum(z)/~isempty(z).
dpb
dpb about 2 hours ago
Edited: dpb 36 minutes ago
The doc took a little shortcut on the way to the forum.
In actuality, internally the code accumulates zeros and then fixes up the missing locations with the NaN fill value rather than defining a function.
Comments within unstack.m
% If no aggregation function was supplied, let ACCUMARRAY copy/sum data from
% tall to wide. If an aggregation function _was_ supplied, let ACCUMARRAY apply
% that to all non-empty bins. In either case, let ACCUMARRAY fill wide elements
% corresponding to empty bins with zeros, faster. Overwrite those elements with
% the correct fillVal below.
While it is default behavior and convention for sum([]) to return zero, it is creating something out of nothing in doing so and for many cases knowing there was not data in a given bin is crucial rather than that the data in those bins was identically (or at least summed to) zero.
I don't have a release prior to R2017b installed; in the olden days the initial developer was generally listed in the header comments and very often Cleve was the original designer/implementer. Now there's just the copyright message so can't see that history. I'd certainly not classify the behavior as a bug;
Of course, for many common statistics it will do so because it will be trying to divide by zero or they will retun the empty result instead of zero.
Date = [repmat(datetime('2008-04-12'),6,1);...
repmat(datetime('2008-04-13'),5,1)];
Stock = categorical({'Stock1';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock1';'Stock2'} ,{'Stock1';'Stock2';'Stock3'});
Price = [60.35;27.68;64.19;25.47;28.11;27.98;...
63.85;27.55;26.43;65.73;25.94];
S = timetable(Date,Stock,Price);
S([8,9,11],:)=[];
U = unstack(S,'Price','Stock','AggregationFunction',@std)
U = 2×2 timetable
Date Stock1 Stock2 ___________ ______ ______ 12-Apr-2008 2.7153 1.2398 13-Apr-2008 1.3294 NaN
U = unstack(S,'Price','Stock','AggregationFunction',@max)
U = 2×2 timetable
Date Stock1 Stock2 ___________ ______ ______ 12-Apr-2008 64.19 28.11 13-Apr-2008 65.73 NaN
In fact, it is @sum that is more or less the "odd man out" in its behavior relative to other functions....otomh I really can't think of another that doesn't just return empty.
So the reason I noticed this was because my team is migrating from 2019b to 2025b. In 2019b using 'Aggregation Function', @sum in the unstack function syntax would return NaNs for bins with no data
Matt J
Matt J 16 minutes ago
Edited: Matt J 14 minutes ago
In 2019b using 'Aggregation Function', @sum in the unstack function syntax would return NaNs for bins with no data
Yes, as I said, that was (probably) viewed as a bug and fixed in R2020a.
Matt J
Matt J 11 minutes ago
Edited: Matt J 8 minutes ago
I'd certainly not classify the behavior as a bug; I think it unfortunate that it doesn't have the switch to return the missing value for any aggregation function, not just the defaulted sum.
But it would be a bug (or at least a hazard) to leave it so that not specifying an AggregationFunction is the only syntax with that behavior.
I would consider it a feature, not a bug and it is clearly documented behavior after the change. I didn't go back to the release notes but I'm sure it would habe been noted as a behavior change.
As the other examples I posted probably while you were responding show, it is really more of an aberration in the behavior of sum([]) to convert nothing into something than any other function although it is a general convention to do so, (apparently inherited from set theory?).
Here's a simple wrapper to achieve uniformly NaN-on-empty behavior:
S([8,9,11],:)=[]
S = 8×2 timetable
Date Stock Price ___________ ______ _____ 12-Apr-2008 Stock1 60.35 12-Apr-2008 Stock2 27.68 12-Apr-2008 Stock1 64.19 12-Apr-2008 Stock2 25.47 12-Apr-2008 Stock2 28.11 12-Apr-2008 Stock2 27.98 13-Apr-2008 Stock1 63.85 13-Apr-2008 Stock1 65.73
U = Unstack(S,'Price','Stock' ,AggregationFunction=@sum)
U = 2×2 timetable
Date Stock1 Stock2 ___________ ______ ______ 12-Apr-2008 124.54 109.24 13-Apr-2008 129.58 NaN
function [varargout]=Unstack(S,vars,ivar, varargin)
args=struct(varargin{:});
if ~isfield(args,'AggregationFunction')
args.AggregationFunction=@sum;
end
f=args.AggregationFunction;
args.AggregationFunction=@(z)f(z)*(numel(z)/numel(z));
varargin=namedargs2cell(args);
[varargout{1:nargout}]=unstack(S,vars,ivar, varargin{:});
end
In fact, it is @sum that is more or less the "odd man out" in its behavior relative to other functions....otomh I really can't think of another that doesn't just return empty.
It is not aberrant. Here are a few more:
prod([])
ans = 1
any([])
ans = logical
0
all([])
ans = logical
1
numel([])
ans = 0
nnz([])
ans = 0
I left out "computational" in the prior statement presuming it would be implied/assumed. Only prod() that I didn't think of is one I would consider pertinent here--it would have been ideal for my prior test/illustration if I had thought of it at the time..
format bank, format compact
Date = [repmat(datetime('2008-04-12'),6,1);...
repmat(datetime('2008-04-13'),5,1)];
Stock = categorical({'Stock1';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock2';'Stock1';'Stock2';...
'Stock2';'Stock1';'Stock2'} ,{'Stock1';'Stock2';'Stock3'});
Price = [60.35;27.68;64.19;25.47;28.11;27.98;...
63.85;27.55;26.43;65.73;25.94];
S = timetable(Date,Stock,Price);
S([8,9,11],:)=[];
U = unstack(S,'Price','Stock','AggregationFunction',@prod)
Warning: When a group has no rows for a given value of the indicator variable, UNSTACK calls the supplied aggregation function with an input of size 0-by-1 instead of automatically filling the value. Review the output to ensure desired result is obtained. This warning might be removed in a future release.
U = 2×2 timetable
Date Stock1 Stock2 ___________ _______ _________ 12-Apr-2008 3873.87 554502.60 13-Apr-2008 4196.86 1.00
To me, the above is by far the more hazardous behavior, especially if Mathworks were to ever actually remove the warning and let the above result go silently. My opinion is still that no data is no data and silently returning a finite value in its place is abadidea™.
Of the informational functions, numel() and nnz() are certainly expected; any() I grok, all() looks aberrant to me despite being documented as so given the description as "Determine if all array elements are nonzero or true". Certainly there are none of either in an empty set so not sure how it would have been determined to return true for it but false for any(). But, they didn't ask... <g>
Anyways, interesting sidebar and I'll add your Unstack to my Utilities directory of generally useful functions.
any() I grok, all() looks aberrant to me despite being documented...so not sure how it would have been determined to return true for it but false for any().
For arbitrary non-empty row vectors A and B, we have,
any([A,B]) = any(A) | any(B)
all([A,B]) = all(A) & all(B)
For this to generalize to the case when B = [], you need any([])=false and all([])=true.
A=[];
B=[];
A==B
ans = 0×0 empty logical array
isn't symmetric, though.
Nor is
A=rand(5,1);
[all(A) any(A)]
ans = 1×2 logical array
1 1
[all(B) any(B)]
ans = 1×2 logical array
1 0
That any() of the same array that is declared to be full doesn't have any elements is a contradiction.
Illustrates the edge cases are almost always ugly in one way or another. It could end up letting somebody not have to have special handling for a specific application I suppose even though it's nonsensical in isolation.
Of course, you just as well defined any([]) as true and all([]) as false and achieved the same result.
I don't have any better solution, just noting the oddities here.
A==B isn't symmetric though
I didn't understand that part, I'm afraid. Where is the asymmetry in A==B? Doesn't this prove that it is symmetric:
A=[]; B=[];
isequal(A==B,B==A)
ans = logical
1
That any() of the same array that is declared to be full doesn't have any elements is a contradiction.
all(B)=true means only that B contains no zeros, which is indeed the case for B=[];
any(B)=false means only that B contains no ones, which is also the case for B=[];
A=[]; B=[];
A==B
ans = 0×0 empty logical array
returning empty isn't symmetric with returning a value as the other examples did.
"all(B)=true means only that B contains no zeros"
The doc for all doesn't say that...it says "Determine if all array elements are nonzero or true". There are no true or nonzero elements in [].
The definition of all([]) being true is simply one of being defined that way in the fine print.
Matt J
Matt J about 1 hour ago
Edited: Matt J 43 minutes ago
The doc for all doesn't say that...it says "Determine if all array elements are nonzero or true". There are no true or nonzero elements in [].
The doc could be worded more transparently, but those two things aren't really logically contradictory. If you don't believe that all elements of the empty matrix are nonzero or true, then point out which element violates this (also known as a vacuous truth).
Another reason that all([])=true is an appropriate convention is to be consistent with De Morgan's Laws. I think you said you believe these are appropriate:
any([])=false
~[] = []
If so, then from De Morgan's Laws,
all([]) = ~any(~[]) = ~any([])=true
Finally, I would note that Matlab is not alone in adopting this convention. It is also pretty consistent with other programming languages.
Matt J is correct, these are not abberations, these are mathematically consistent results (hint: identity element):
prod([])
ans = 1
sum([])
ans = 0
Lets look in more detail at ANY and ALL. Disregarding dimensions and the like, ANY must satisfy this equivalence:
all([A,B]) = all(A) && all(B)
In essence, this is what ALL means (for any division of some set into some arbitrary subsets A & B), so this equivalence must be true. This includes empty subsets, therefore specify B=[] and we get:
all([A,[]]) = all(A) && all([])
which can only be equivalent when ALL([])==TRUE. Similarly for ANY:
any([A,B]) = any(A) || any(B)
This includes empty subsets, therefore specify B=[] and we get:
any([A,[]]) = any(A) || any([])
which can only be equivalent when ANY([])==FALSE. De Morgan's law also constrains both values jointly:
all(A) = ~any(~A)
given ~[] = []. So the two outputs are not independent: they must be logical negations of each other! MATLAB implements the only mathematically consistent definitions of these operations.

Sign in to comment.

dpb
dpb on 20 Mar 2026 at 19:13
Edited: dpb on 20 Mar 2026 at 20:19
What was the exact syntax you used? The way I interpret unstack doc for the aggregation function result with a missing indicator is that the aggregation function must be defaulted -- specifying the @sum isn't the same as defaulting even though it is the default function. See the followup comment to @Matt J's example.
For anything other than summing, however, you would have to create a function similar to his example.

1 Comment

Thank you all for the quick feedback! @dpb hit the nail on the head on my issue. I was adding 'Aggregation Function', @sum to the syntax of the function, if I leave that out, it defaults to summing and I get the desired behavior, i.e., missing groups show up as NaN instead of calling the function over a 0x1 input

Sign in to comment.

Categories

Products

Release

R2025b

Asked:

on 20 Mar 2026 at 18:23

Edited:

39 minutes ago

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!