Violinplot extending beyond data range

54 views (last 30 days)
Angie
Angie on 28 Nov 2024 at 13:56
Commented: William Rose on 3 Dec 2024 at 15:28
Hello everyone,
I’m using the violinplot function in MATLAB to create violin plots for some datasets. I am specifying the position and the data as follows:
violinplot(3, data2(5:end));
However, I’ve encountered an issue. The violin plot extends to negative values even though all my data values are positive. For another dataset, I observed a similar problem: the violin plot includes values that are negative or larger than the maximum values in my data.
I’ve read that this might be caused by the kernel density estimation (KDE) method used by violinplot to calculate and visualize the data's probability density. KDE smooths the data distribution and can sometimes produce density values outside the actual range of the data.
I’m unsure how to resolve this issue and would greatly appreciate any advice or suggestions.
Thank you!
Angie

Accepted Answer

William Rose
William Rose on 28 Nov 2024 at 15:56
Edited: William Rose on 28 Nov 2024 at 16:01
[Edit: add ylim() so that all 3 plots have same y-axis range.]
You can vary the bandwidth, or the kernel function, or both. In the examples below, the data are uniformly distributed on (0,1), which is kind of a worst case, if you don't want the violin to extend to negative values. The violins do extend beyond the data in the examples below, but the options control by how much it extends. Experiment to see if you like the results. You may not be able to avoid the violin going negative, depending on your data.
ydata = rand(100,1);
figure;
%
subplot(131)
violinplot(ydata);
title('Default Violinplot'); ylim([-.5,1.5])
%
[f1,xf1] = kde(ydata,Bandwidth=0.05);
subplot(132)
violinplot(EvaluationPoints=xf1,DensityValues=f1)
title('Bandwidth=0.05'); ylim([-.5,1.5])
%
[f2,xf2] = kde(ydata,Kernel="box");
subplot(133)
violinplot(EvaluationPoints=xf2,DensityValues=f2)
title('Box Kernel'); ylim([-.5,1.5])
  4 Comments
Angie
Angie on 3 Dec 2024 at 12:38
Thank you very much! As a pdf obtained with a kernel distribution extends beyond the most extreme data points in my dataset, which is something I want to avoid, I was considering using other distributions instead. Your examples have been very helpful.

Sign in to comment.

More Answers (0)

Categories

Find more on Automotive in Help Center and File Exchange

Tags

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!