Meaning of box plot notches

29 views (last 30 days)
Yolande Serra
Yolande Serra on 25 Jul 2021
Commented: the cyclist on 30 Jul 2021
I am confused by the explanation of the notches on box plots. Matlab help indicates that the notches are at q+/-1.57*iqr/sqrt(n), where q is the median and iqr is the interquartile range. It is then stated that this is equivalent to the 5% confidence limits on the median. From what I learned about statistics a multiplier of 1.96 would be the 95% confidence limits on the median, so I'm not sure 1) why matlab chose to use the 1.57 multiplier, 2) where the 5% confidence limit result comes from in using this multiplier. Looks more like 67% conf limits, or equivalent to 1 sigma for a normal distribution? Is there a way to get box plots to make notches at the 95% confidence limits of the median (ie, using the 1.96 multiplier)?
Thanks for clarifying.

Answers (1)

the cyclist
the cyclist on 26 Jul 2021
The value 1.57 is not something that "MATLAB chose", but rather is directly out of the original research paper that introduced the box-and-whiskers plot. There is a nice explanation in this CrossValidated answer.
If you want to change the value -- which I would only do if you develop your own rigorous theory of confidence intervals of medians -- you could copy the boxplot.m file to your own directory, and edit the value there.
  2 Comments
Yolande Serra
Yolande Serra on 26 Jul 2021
The CrossValidated answer was very helpful, thanks!
Regarding the confidence levels, the link you provide indicates that the 1.57 is an estimate of the 95% confidence levels. The medians that lie outside of the notched regions are different at the 95% confidence level. In the More About section of the boxplot help page it states, "The width of a notch is computed so that boxes whose notches do not overlap have different medians at the 5% significance level." However, in one of the examples on this same page it states, "The boxplot shows that the difference between the medians of the two groups is approximately 1. Since the notches in the box plot do not overlap, you can conclude, with 95% confidence, that the true medians do differ." This is confusing. I would say the medians differ at the 95% significance level, meaning you are fairly certain they are different if they lie outside of the notches.
the cyclist
the cyclist on 30 Jul 2021
I think the language in the original paper is not crystal clear, but here is my interpretation (which aligns with what is stated in the MATLAB documenation). The paper states ...
"Should one desire a notch indicating a 95 percent confidence interval about each median, C = 1.96 would be used." [I added the emphasis.]
To me, this sentence is very clear, that the notch around a single box is not the 95th confidence interval of that individual median (because C = 1.96 was not chosen).
The next sentence of the paper is ...
" ... a form of 'gap gauge' ... at the 95th percent level was desired."
The exact meaning of "gap gauge" is not perfectly clear to me, but I interpret that to mean that if the notches do not overlap, then the medians are significantly different. This is consistent with both the sentences from the documentation that you quoted in your comment.

Sign in to comment.

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!