Clear Filters
Clear Filters

alternative rounding method when using histcounts

8 views (last 30 days)
My understanding is that when using hiscounts matlab uses the "round up" method when a value lies exactly on a bin edge.
Is it possible to implement a different type of rounding startegy e.g. bankers rounding/gaussian rounding? e.g 4.5 would round to 4 not to 5
It doesn;t appear to be an optional parameter to pass to the hiscounts function
Guillaume on 14 Apr 2020
I'm not sure what you mean. histcounts doesn't round anything. Can you describe your problem in more details?
Oliver Warlow
Oliver Warlow on 14 Apr 2020
Hi Guillaume,
histcounts makes a rounding decison when a value is exactly on a a bin edge.
e.g. if my bin edges are 0,0.5, 1, 1.5, 2, 2.5, 3
using matlab standard "round up" routine a value of exactly 2.5 would be placed in the bin with edges 2.5&3
however using bakers/gaussian rounding a value of exactly 2.5 would get round toward the nearest even integer which would put it in the bin with edges 2&2.5
hope that makes sense

Sign in to comment.

Accepted Answer

Guillaume on 14 Apr 2020
"histcounts makes a rounding decison when a value is exactly on a a bin edge."
No, you misunderstand how histcounts works. As I said it doesn't do any rounding.
As documented, The value X is in the bin if except for the last bin where the the right edge is part of the bin. So, yes if you have an edge at 2.5, the value 2.5 will be part of the 2.5+ bin, no rounding involved.
Now, I thought there was a way to reverse this so that's it's the right edge that is included in bin k instead of the left edge but I was surprised to find that this option is only available for discretize which is very similar in some way.
So, if you want 2.5 to be included in the [2, 2.5] bin, you have two options:
1. Change your bin definition so that the right edge is not 2.5 but the next number up, which is 2.5 + eps(2.5):
edges = [0, 0.5, 1, 1.5, 2, 2.5, 3]
edges(2:end-1) = edges(2:end-1) + eps(edges(2:end-1)); %increase right edges of each bin to the next representable number
%use histcount as normal
2. Do an indirect trip through discretize:
edges = [0, 0.5, 1, 1.5, 2, 2.5, 3];
bin = discretize(yourvector, edges, 'IncludedEdge', 'right');
newedges = 1:numel(edges)
result = histcounts(yourvector, newedges, ..your_histcounts_options); %works as long as 'Normalization' doesn't rely on bin width (i.e. 'cdf' and 'countdensity')
Guillaume on 14 Apr 2020
But in the case of an ascending set of bin edges it is making decisions consitent with a policy of rounding
Not really. The decision is simple: is the number greater or equal than the left edge but strictly smaller than the right edge. If so, it belongs to this bin. That is all.
From your description, it sounds to me that you've got a more fluid definition of bin edges and I think you'd get the behaviour you want if you define the bin edges in a way that matches your definition.This is what I've tried to do with my option 1. If you explain what your pseudo gaussian rounding actually mean, we can probably come up with the correct edges.
Oliver Warlow
Oliver Warlow on 15 Apr 2020
Hi Guillaume,
yes as I as I say there isn't any rounding happening - but what I want to do is make a decion on which bin to put an entry into based on a policy similar to what is used in 'Guassian rounding' i.e. it more depends on the integer component of the bin centre not the value of the bin edge. You are correct this is a special case and is not how hiscounts fundamentally works - so the best solution will be the workaround you and Steven have suggested
If you are interested in why you would use Gaussian rounding there is info here ( also refered to as "half to even". For experimental measurements where the results are affected by precision error you will always skew your data if you use a purely 'left' or 'right' policy - you can observe this by running discretize on a random vector. Gaussian/bankers/round to even will skew the data towards bins with an even integer component bin centre - this is obviosuly not ideal but in some cases it is prefered to a left/right skew.

Sign in to comment.

More Answers (1)

Steven Lord
Steven Lord on 14 Apr 2020
If a value in your data exactly matches one of the elements of the edges vector, that value is counted in the right bin of the two (unless it matches the last element of the edges vector, in which case it's in the last bin.) From the histcounts documentation page:
"[N,edges] = histcounts(X,edges) sorts X into bins with the bin edges specified by the vector, edges. The value X(i) is in the kth bin if edges(k) X(i) < edges(k+1). The last bin also includes the right bin edge, so that it contains X(i) if edges(end-1) X(i) edges(end)."
Bins other than the last contain their left edge but not their right, and the last bin contains both edges.
There's no option to change which edge each bin contains (to make the first bin contain both its edges and make all others contain their right edge but not their left.) The discretize function has an option that does this, so asking for a similar option in histcounts and related functions seems to me like a reasonable enhancement request for you to file with Technical Support.
Guillaume on 14 Apr 2020
"seems to me like a reasonable enhancement request for you to file with Technical Support"
FWIW, I've already submitted that as enhancement request earlier today. It wouldn't hurt Oliver requesting that as well. The more people asking for it, the more chance it gets implemented.
The rounding business, I'm not so sure it would get implemented though. As I've said already, there's no rounding happening at all. Just plain comparisons to bin edges.
Steven Lord
Steven Lord on 14 Apr 2020
So instead of 2.5 being in your edges list, replace it with 2.5+eps(2.5) so the edge is just barely greater than 2.5.
edges = 0:0.5:4;
values = edges;
halfToPad = mod(edges, 1) == 0.5 & mod(ceil(edges), 2) == 1;
edges(halfToPad) = edges(halfToPad) + eps(edges(halfToPad))
h = histogram(values, edges);
Bin [0, 0.5+) is higher because the value 0.5 in values is in that bin rather than in bin [0.5+, 1).
Bin [2, 2.5+) is higher because the value 2.5 in values is in that bin rather than in bin [2.5+, 3).
Bin [3.5, 4] is higher because as the last bin it contains both its edges, so both 3.5 and 4 in values are in that bin.
And if you look two elements of edges are just barely greater than the corresponding elements in values.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!