Tiling stacks of boxplots. Each stack contains 5 boxplots
Show older comments
I have generated five boxplots stacked side by side. I used subplot(1,5,1)....suplot(1,5,5). I'm attaching a figure for easy reference. I'd like to tile under this figure another one containing five similar stacked subplots. I've tried various solutions to no avail. The closest to my requirements is the boxplotGroup function but I'm still unable to get what I want.
I'd be grateful for any help
1 Comment
Accepted Answer
More Answers (3)
the cyclist
on 30 Nov 2024
0 votes
Would subplot(2,5,1) ... subplot(2,5,10) do what you want?
Also, I would highly recommend using tiledlayout over subplot. It takes a bit of getting used to, if you have been using subplot for a while, but in the long run it is much better.
That would just double the number of rows in the subplot (or tiledlayout) arrangement...
y=randn(100,5).*[3 2 3 4 3]+[30 20 25 45 55]; y=[y fliplr(y)]; y=[y y]; % just some dummy data of same size
g={'labile','stable'};
M=2;
W=width(y);
N=W/numel(g)/M;
for i=1:2:W
j=floor(i/2)+1;
subplot(M,N,j)
boxplot(y(:,[i:i+1]),g)
end
The funky label orientation only shows up on this platform, the axes are all consistent on desktop...
Using the <tiledlayout> instead of subplot would give you some additional features; and boxchart might be worth looking into...
If you're looking for more sophisticated look, you could probably put each pair in a panel that would separate the two visually...I've never messed with them, so will leave as "exercise for Student"...
8 Comments
y=randn(100,5).*[3 2 3 4 3]+[30 20 25 45 55]; y=[y fliplr(y)]; y=[y y]; % just some dummy data of same size
g={'labile','stable'};
M=2;
W=width(y);
N=W/numel(g)/M;
tiledlayout(M,N)
for i=1:2:W
j=floor(i/2)+1;
%subplot(M,N,j)
nexttile
boxplot(y(:,[i:i+1]),g)
end
Well, that's kinda' rude...dunno why that might be....looks ok here
George
on 1 Dec 2024
As my example shows, just double the number of rows in the subplot() tiling from 1x5 to 2x5 and use the same logic as you already have--except instead of going from 1:5 you go from 1:10 for the subplot index...that's what the above does if you'll look at the values of M, N, j
L=randi([100 250],20,1); % arbitrary lengths of 20 vectors between 100, 250
N=sum(reshape(L,2,[])).'; % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
MN=randi([10 65],20,1); % means
SD=randi([ 3 20],20,1); % std dev
y=cell2mat(arrayfun(@(m1,s1,l1,m2,s2,l2)[m1+s1*randn(l1,1);m2+s2*randn(l2,1)],MN(1:2:end),SD(1:2:end),L(1:2:end),MN(2:2:end),SD(2:2:end),L(2:2:end),'UniformOutput',false));
% data generated roughly matching example; draw Box charts...
i1=1; % starting point in combined data vector
for i=1:numel(N) % the unique datasets of varying lengths
subplot(2,5,i) % hard 2x5 arrangement this time...
i2=i1+N(i)-1; % endpoint of ith dataset
boxplot(y(i1:i2),g(i1:i2))
i1=i2+1; % increment for next pass
end
Same issue as before with appearance of axes here, but the general idea still works...
I would again encourage you to investigate boxchart; you could do this in two subplots using the five categories as another grouping variable with the GroupByColor' named optional named parameter for it.
I would also suggest loading the data into a table where the data would be very useful in further analyses of these data using such tools as groupsummary and/or varfun
The key lesson here is to build the dataset with sufficient detail as to be able to handle the lengths generically without hardcoding the numbers into the code as does yours above; that makes it extremely difficult to write the code to handle the varying lengths.
The other way to approach this would be to start with cell data where each dataset is in a cell array; you could then programmatically determine the sizes of those and build the table from them...no "magic numbers" should be needed.
ADDENDUM
"...The problem ... lies in the different vector lengths ... I tried to pad each vector with zeros at the beginning or end..."
That is the other way to approach it, except use NaN (or Inf) padding instead of zeros; boxplot ignores NaN (although I note that it is not documented to do so and the only example with varying length vectors uses the combined length and grouping as above, not the array and silently ignoring the non-finite values...that's very surprising that hasn't been added to the documentation after all these years.
y=40+randn(100,2).*[10 30];
g=categorical({'labile','stable'});
y(1:10,1)=nan;
boxplot(y,g)
If you run a similar example and look at the data returned by the datatips on the chart, you'll see the data accounts for the missing values for you...
CLASSES={'aliphatic%','acidic%','charged%','polar%','non-polar%'};
L=randi([100 250],10,1); % arbitrary lengths of 10 vectors between 100, 250
N=sum(reshape(L,2,[])).'; % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
c=arrayfun(@(c,n)repmat(c,n,1),CLASSES',N,'UniformOutput',false);
c=categorical(cat(1,c{:})); % group classes by color
MN=randi([10 65],10,1); % means
SD=randi([ 3 20],10,1); % std dev
y=cell2mat(arrayfun(@(m,s,l)m+s*randn(l,1),MN,SD,L,'UniformOutput',false));
tT=table(y,g,c);
boxchart(tT.g,tT.y,'GroupByColor',tT.c)
legend('Location','eastoutside')
box on
W/o the actual data file not possible to diagnose what you did wrong but the example code I did works provided the data array is defined as a column vector; if you convert to array format including the NaN, then you obviously can't index into it as if it were a 1D vector without.
Look at the examples above more closely; look specifically at size(y) in
L=randi([100 250],20,1); % arbitrary lengths of 20 vectors between 100, 250
N=sum(reshape(L,2,[])).'; % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
MN=randi([10 65],20,1); % means
SD=randi([ 3 20],20,1); % std dev
y=cell2mat(arrayfun(@(m1,s1,l1,m2,s2,l2)[m1+s1*randn(l1,1);m2+s2*randn(l2,1)],MN(1:2:end),SD(1:2:end),L(1:2:end),MN(2:2:end),SD(2:2:end),L(2:2:end),'UniformOutput',false));
...
you'll find that out...well, let's just go for it...
whos g y
You notice these are 1D vectors with subsections of the arbitrary length of the various pieces-parts that were defined by L
N.'
L.'
sum(L)
and you will note that sum(L) == numel(y). Ergo, linear indexing into the vector by the length given by N is the correct indexing in that case; it would NOT be so if had a full-length, augmented array in which each N would be the same and equal to the max(L).
As for the figures, of course if you create a new figure you get two, and not multiple subplots in one...and, also, if you tell subplot() to divide the axes into two rowsxN axes/row the height will be half that if you tell it to only put one set of axes on a row in the figure. Read the doc and look at the examples for subplot to see what it does.
It all depends upon what your output format is desired to be -- do you want all 10 in one figure as we've been presuming from the wording of the initial question or as two separate figures with only five on each?
Again, if you still can't figure out what is different in what I've showed you; then attach the data file; making up data to try to match is, as always fraught with misunderstandings when the actual starting point isn't the same and we presume the poster can relate the examples to their situation.
dpb
on 1 Dec 2024
T = readtable('test.xlsx'); % imports table with NaN for missing values
T = table2array(T); size(T) = 247 20
M=2; W = width(T);
M=2; W=width(y); N=W/numel(g)/M;
for i=1:2:W
j=floor(i/2)+1;
subplot(M,N,j)
boxplot(y(:,[i:i+1]),g)
end
In the above code snippet, g and y are undefined; they did NOT come from having read the test.xlsx file by the preceding code; ergo, there's no telling what they really were and that don't match expectations is therefore not surprising.
Only if we know the content of the datfile itself can we make any judgements on how it should be treated and "burned once, twice't shy", I'm not going to make any assumptions this time about what it actually does look like. Attach the file...
tT=readtable('test.xlsx');
whos tT
[head(tT,4); tail(tT,4)]
sum(~isfinite(tT{:,:}))
Those data look nothing like the prior examples -- it's clear why the numbers were large now; that's what's in the file.
How were the percentage numbers shown before generated?
Categories
Find more on Data Distribution Plots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




