File Exchange

## Calculate number of bins for histogram

version 1.1.0.0 (4.16 KB) by Richie Cotton

### Richie Cotton (view profile)

Automatically calculates the 'best' number of bins for a histogram.

Updated 24 Oct 2008

Two files are included:
CALCNBINS, which calculates the "ideal" number of bins to use in a histogram, using three possible methods. (Freedman-Diaconis', Scott's and Sturges' methods.)

HISTX is a wrapper for Matlab's own histogram function HIST, that uses CALCNBINS to choose the number of bins if none is provided.

Examples:
y = randn(10000,1);
nb = calcnbins(y, 'all')
% nb =
% fd: 57
% scott: 44
% sturges: 15
calcnbins(y) %Uses the middle value from the above
% ans =
% 44
calcnbins(y, 'fd') % Choose your method
% ans =
% 57
histx(y) %Plots a histogram using middle method
histx(y, 'all') %Plots 3 histograms, using each method

### Cite As

Richie Cotton (2020). Calculate number of bins for histogram (https://www.mathworks.com/matlabcentral/fileexchange/21033-calculate-number-of-bins-for-histogram), MATLAB Central File Exchange. Retrieved .

Richie Cotton

John,

Thanks for the feedback, it is much appreciated.

The Sturges method returns the value 18 in each of your examples because it is based solely on the length of the input data. I've updated the documentation to explain a little about the methods, and included some references for those wishing to find out more.

Also newly included is the histx function which acts as a wrapper for hist, calling calcnbins when no breaks are specified.

John D'Errico

Getting better. But still apparently an interesting feature.

n = calcnbins(randn(100000,1),'all')
n =
fd: 152
scott: 117
sturges: 18

n = calcnbins(randn(100000,1),'all')
n =
fd: 149
scott: 115
sturges: 18

n = calcnbins(rand(100000,1),'all')
n =
fd: 47
scott: 46
sturges: 18

n = calcnbins(rand(100000,1).^8,'all')
n =
fd: 233
scott: 62
sturges: 18

n = calcnbins(rand(100000,1).^.5,'all')
n =
fd: 64
scott: 57
sturges: 18

n = calcnbins(rand(100000,1),'all')
n =
fd: 47
scott: 46
sturges: 18

Note that all cases seem to generate exactly 18 bins for the last method, although for smaller samples this is not true. Is 18 the largest number of bins that the Sturges method will return? (No.)

n = calcnbins(randn(1000,1),'all')
n =
fd: 26
scott: 21
sturges: 11

n = calcnbins(rand(1000000,1),'all')
n =
fd: 100
scott: 99
sturges: 21

It might be useful to provide either additional information about the methods, i.e., why use one over another and when is one better than the others? For example, if a set of data tends to have outliers, is one method more intelligent in its choice? At the very least, provide a reference that would help a user to understand the methods.

Overall, this is a nice little tool. Were I the author, I might even be tempted to add a new version of hist that would call this code first when the number of bins was not specified, Otherwise, it would just call the default hist.

Richie Cotton

The help is now fixed, as is the bug with the Scott method (simply a case of missing brackets), and the dependency on the stats toolbox has been removed.

John D'Errico

Pretty good, with good help in general. Error checks, with a default for the method. The method is tolerant of lower case, etc. All well done.

I'd suggest a couple of minor changes. There is no complete H1 line, as the first comment line. The first two lines of the help were:

% NBINS = CALCNBINS(X, METHOD) calculates the "ideal" number of bins to use
% in a histogram, using a choice of methods.

Since the utility of this code is to work with a histogram, you should have that in the H1 line. So a simple re-wording of those first two lines might be:

% Calculate the "ideal" number of bins to use in a histogram, using a choice of methods.
% NBINS = CALCNBINS(X, METHOD)

I did find one interesting result, that seemed less than sterling.

x = rand(1,100000);
n = calcnbins(x,'sc')
n =
1

n = calcnbins(sqrt(x),'sc')
n =
1

n = calcnbins(x.^10,'sc')
n =
1

Surely all of the above examples would not be best served by only one bin?