Tiling stacks of boxplots. Each stack contains 5 boxplots

I have generated five boxplots stacked side by side. I used subplot(1,5,1)....suplot(1,5,5). I'm attaching a figure for easy reference. I'd like to tile under this figure another one containing five similar stacked subplots. I've tried various solutions to no avail. The closest to my requirements is the boxplotGroup function but I'm still unable to get what I want.
I'd be grateful for any help

1 Comment

I'm attaching the excell file with the data as per your request.
For reference: headings starting with g1 refer to "labile", g2 refer to "stable" (columns 1 to 10). Correspondingly: g1c for "labile" and g2c for "stable" (columns 11-20). Data in columns 1-10 come from one experiment and for columns 11-20 from a second one.
Again, thank you for your patience and suggestions.

Sign in to comment.

 Accepted Answer

This duplicates the prior work with a table instead; the previous still is correct for the one vector case, but having the data as the struct allows one the flexibility to get ahold of the field names and then move the metadata out of the variable names and into the data where it belongs...
S=load('datasets'); % read as a struct; can handle variable names that way programmatically
N=max(structfun(@numel,S)); % find longest vector
S=structfun(@(v)[v;nan(N-numel(v),1)],S,'uni',0); % and pad to that size
experiment=contains(fieldnames(S),'c_')+1; % set the experiment number from "c" id in name
experiment=cell2mat(arrayfun(@(e)repmat(e,N,1),experiment,'uni',0));
type=startsWith(fieldnames(S),'g2'); % and the cell type
type=cell2mat(arrayfun(@(t)repmat(t,N,1),type,'uni',0));
type=categorical(type,unique(type),{'labile','stable'});
class=extractAfter(fieldnames(S),'_'); % and the classification
class=arrayfun(@(c)repmat(c,N,1),class,'uni',0);
class=categorical(cat(1,class{:}));
observation=cell2mat(cellfun(@(f)S.(f),fieldnames(S),'uni',0));
tData=table(experiment,type,class,observation); % and turn into a table
head(tData)
experiment type class observation __________ ______ ______ ___________ 1 labile acidic 11.597 1 labile acidic 11.94 1 labile acidic 11.028 1 labile acidic 11.753 1 labile acidic 11.864 1 labile acidic 12.079 1 labile acidic 11.26 1 labile acidic 12.181
groupsummary(tData,{'experiment','type','class'},'all')
ans = 20x16 table
experiment type class GroupCount mean_observation sum_observation min_observation max_observation range_observation median_observation mode_observation var_observation std_observation nummissing_observation nnz_observation numunique_observation __________ ______ _________ __________ ________________ _______________ _______________ _______________ _________________ __________________ ________________ _______________ _______________ ______________________ _______________ _____________________ 1 labile acidic 245 11.954 2881 8.4507 14.444 5.9937 12.024 11.824 0.72844 0.85349 4 241 198 1 labile aliphatic 245 28.582 6888.3 25.2 33.712 8.5121 28.571 26.052 3.3274 1.8241 4 241 214 1 labile charged 245 26.751 6447.1 22.177 31.434 9.2576 26.679 26.253 1.981 1.4075 4 241 206 1 labile npolar 245 54.873 13224 51.073 59.557 8.4844 54.902 55.4 1.6953 1.302 4 241 203 1 labile polar 245 45.122 10874 40.443 48.927 8.4844 45.098 44.6 1.7028 1.3049 4 241 203 1 stable acidic 245 11.475 2811.5 8.2 14.112 5.9119 11.569 11.776 1.0054 1.0027 0 245 208 1 stable aliphatic 245 29.657 7265.9 25.528 34.898 9.3701 29.762 27.6 2.9134 1.7069 0 245 226 1 stable charged 245 26.079 6389.3 21.792 29.803 8.0103 26.052 25.149 2.3286 1.526 0 245 220 1 stable npolar 245 55.621 13627 51.196 60.196 8.9999 55.666 55.709 3.1817 1.7837 0 245 227 1 stable polar 245 44.379 10873 39.804 48.804 8.9999 44.334 44.291 3.1829 1.7841 0 245 227 2 labile acidic 245 11.78 1354.6 8.4507 13.878 5.4276 11.753 12.5 0.70648 0.84052 130 115 110 2 labile aliphatic 245 28.717 3302.5 25.421 33.531 8.11 28.63 27.921 2.868 1.6935 130 115 109 2 labile charged 245 26.503 3047.9 22.485 30.303 7.8178 26.471 25.941 1.972 1.4043 130 115 111 2 labile npolar 245 54.979 6322.6 52.523 59.557 7.034 54.91 54.028 1.7804 1.3343 130 115 111 2 labile polar 245 45.021 5177.4 40.443 47.477 7.034 45.09 43.843 1.7804 1.3343 130 115 111 2 stable acidic 245 11.292 1795.4 8.2 14.111 5.9115 11.446 9.5918 1.023 1.0115 86 159 143
Removing metadata from variable names and converting to a table makes further analyses much simpler and also is easier to present the data...
As for the boxplots, within the vectors, the previous code would work just fine; with the above table, varfun with the grouping variables would work as well.
For the prior result, then
j=0;
for e=unique(tData.experiment).'
ix=tData.experiment==e;
for c=categories(tData.class).'
j=j+1;
hAx=subplot(2,5,j);
iy=ix & tData.class==c;
boxplot(tData.observation(iy),tData.type(iy))
hAx.XAxis.TickLabelRotation=0;
end
end
looks about right with the same issue that the online platform does something funky with the first axes on each row.
Again, the above makes the previous presumption that you wanted all of them in one figure...
ADDENDUM
Nota Bene: the orientation of the vectors in the for...end loops; MATLAB iterates over the items in the list by column, so must ensure those are row vectors--hence the transpose.
ADDENDUM SECOND
"the previous still is correct for the one vector case,"
Nota Bene: To use the observation field as the vector, remember it is now augmented to full length so the indexing is over N elements, not the variable number used in prior examples...or pull the data from the struct without the augmentation to same length and the prior logic will work as given if compute the L length vector to coincide with actual data instead of making up something as I did in the example by using a random length...
ADDENDUM THIRD
Forcibly setting the XAxis.TickLabelRotation property back to 0 fixes the issue with the first axes on the two rows.

5 Comments

And how much easier it is when don't have to deal with explcit names and sizes!!!! <VBG>
Apologies for getting back on this issue. Using 'datasets.mat' reproduced the plot. Applying the same script on a different dataset (attached 'datasets5.mat') did not stack the subplots (attached Fig1.png). I'm not all that familiar with Matlab structures and canno troubleshoot the problem. Any help would be most welcome.
whos -file datasets5
Name Size Bytes Class Attributes g1_genLen 233x1 1864 double g1_intrLen 233x1 1864 double g1_numExon 233x1 1864 double g1c_genLen 112x1 896 double g1c_intrLen 112x1 896 double g1c_numExon 112x1 896 double g2_genLen 240x1 1920 double g2_intrLen 240x1 1920 double g2_numExon 240x1 1920 double g2c_genLen 157x1 1256 double g2c_intrLen 157x1 1256 double g2c_numExon 157x1 1256 double
There are only 12 instead of 20 variables and you got 6 subplots of 2 variables from the 12...what else would you have expected?
The subplots should look like:
g1_genLen/g2_genLen g1_intrLen/g2_intrLen g1_numExon/g2_numExon
g1c_genLen/g2c_genLen g1c_intrLen/g2c_intrLen g1c_numExon/g2c_numExon
Well, then you've got to generalize the tiling shape based on the size of the input data as I had originally done instead of using a fixed 5x2, but it wasn't made a requirement that would have anything but 20 sets so reverted back to the hardcoded arrangement.

Sign in to comment.

More Answers (3)

Would subplot(2,5,1) ... subplot(2,5,10) do what you want?
Also, I would highly recommend using tiledlayout over subplot. It takes a bit of getting used to, if you have been using subplot for a while, but in the long run it is much better.
That would just double the number of rows in the subplot (or tiledlayout) arrangement...
y=randn(100,5).*[3 2 3 4 3]+[30 20 25 45 55]; y=[y fliplr(y)]; y=[y y]; % just some dummy data of same size
g={'labile','stable'};
M=2;
W=width(y);
N=W/numel(g)/M;
for i=1:2:W
j=floor(i/2)+1;
subplot(M,N,j)
boxplot(y(:,[i:i+1]),g)
end
The funky label orientation only shows up on this platform, the axes are all consistent on desktop...
Using the <tiledlayout> instead of subplot would give you some additional features; and boxchart might be worth looking into...
If you're looking for more sophisticated look, you could probably put each pair in a panel that would separate the two visually...I've never messed with them, so will leave as "exercise for Student"...

8 Comments

y=randn(100,5).*[3 2 3 4 3]+[30 20 25 45 55]; y=[y fliplr(y)]; y=[y y]; % just some dummy data of same size
g={'labile','stable'};
M=2;
W=width(y);
N=W/numel(g)/M;
tiledlayout(M,N)
for i=1:2:W
j=floor(i/2)+1;
%subplot(M,N,j)
nexttile
boxplot(y(:,[i:i+1]),g)
end
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Warning: boxplot might not be displayed properly in the tiled chart layout.
Well, that's kinda' rude...dunno why that might be....looks ok here
Thank you for the prompt reply and suggestion. It kind of works but not quite. My question was not detailed enough. I'll try to be more specific. Each subplot consists of two boxplots (labile and stable). The length of the vectors is different. To correct for this, for subplot (1, 5, 1), I used the following:
x1 = [g1_aliphatic; g2_aliphatic];
j1 = repmat({'labile'}, 241,1);
j2 = repmat({'stable'}, 245,1);
j = [j1; j2];
boxplot(x1, j)
title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')
Similarly for the other subplots.
This generated the figure that I attached. Under this figure, I'd like to tile another one (again 5 subplots, each consisting of two boxplots. An example of the first subplot of the second tiled figure would be:subplot(2, 5, 1);
xc1 = [g1c_aliphatic; g2c_aliphatic];
jc1 = repmat({'labile'}, 115,1);
jc2 = repmat({'stable'}, 159,1);
jc = [jc1; jc2];
boxplot(xc1, jc)
title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')
The problem with your solution lies in the different vector lengths (see bold typeface above). I tried to pad each vector with zeros at the beginning or end. I then used your code. It works but the each subplot is distorted (as expected). A second problem has to do with the max and min values. They differ widely between the subplots.
Once again many thanks for your help
As my example shows, just double the number of rows in the subplot() tiling from 1x5 to 2x5 and use the same logic as you already have--except instead of going from 1:5 you go from 1:10 for the subplot index...that's what the above does if you'll look at the values of M, N, j
L=randi([100 250],20,1); % arbitrary lengths of 20 vectors between 100, 250
N=sum(reshape(L,2,[])).'; % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
MN=randi([10 65],20,1); % means
SD=randi([ 3 20],20,1); % std dev
y=cell2mat(arrayfun(@(m1,s1,l1,m2,s2,l2)[m1+s1*randn(l1,1);m2+s2*randn(l2,1)],MN(1:2:end),SD(1:2:end),L(1:2:end),MN(2:2:end),SD(2:2:end),L(2:2:end),'UniformOutput',false));
% data generated roughly matching example; draw Box charts...
i1=1; % starting point in combined data vector
for i=1:numel(N) % the unique datasets of varying lengths
subplot(2,5,i) % hard 2x5 arrangement this time...
i2=i1+N(i)-1; % endpoint of ith dataset
boxplot(y(i1:i2),g(i1:i2))
i1=i2+1; % increment for next pass
end
Same issue as before with appearance of axes here, but the general idea still works...
I would again encourage you to investigate boxchart; you could do this in two subplots using the five categories as another grouping variable with the GroupByColor' named optional named parameter for it.
I would also suggest loading the data into a table where the data would be very useful in further analyses of these data using such tools as groupsummary and/or varfun
The key lesson here is to build the dataset with sufficient detail as to be able to handle the lengths generically without hardcoding the numbers into the code as does yours above; that makes it extremely difficult to write the code to handle the varying lengths.
The other way to approach this would be to start with cell data where each dataset is in a cell array; you could then programmatically determine the sizes of those and build the table from them...no "magic numbers" should be needed.
ADDENDUM
"...The problem ... lies in the different vector lengths ... I tried to pad each vector with zeros at the beginning or end..."
That is the other way to approach it, except use NaN (or Inf) padding instead of zeros; boxplot ignores NaN (although I note that it is not documented to do so and the only example with varying length vectors uses the combined length and grouping as above, not the array and silently ignoring the non-finite values...that's very surprising that hasn't been added to the documentation after all these years.
y=40+randn(100,2).*[10 30];
g=categorical({'labile','stable'});
y(1:10,1)=nan;
boxplot(y,g)
If you run a similar example and look at the data returned by the datatips on the chart, you'll see the data accounts for the missing values for you...
CLASSES={'aliphatic%','acidic%','charged%','polar%','non-polar%'};
L=randi([100 250],10,1); % arbitrary lengths of 10 vectors between 100, 250
N=sum(reshape(L,2,[])).'; % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
c=arrayfun(@(c,n)repmat(c,n,1),CLASSES',N,'UniformOutput',false);
c=categorical(cat(1,c{:})); % group classes by color
MN=randi([10 65],10,1); % means
SD=randi([ 3 20],10,1); % std dev
y=cell2mat(arrayfun(@(m,s,l)m+s*randn(l,1),MN,SD,L,'UniformOutput',false));
tT=table(y,g,c);
boxchart(tT.g,tT.y,'GroupByColor',tT.c)
legend('Location','eastoutside')
box on
Thanks once again. I tried your suggestion. Steps followed and output below.
T = readtable('test.xlsx'); % imports table with NaN for missing values
T = table2array(T); size(T) = 247 20
M=2; W = width(T);
M=2; W=width(y); N=W/numel(g)/M;
for i=1:2:W
j=floor(i/2)+1;
subplot(M,N,j)
boxplot(y(:,[i:i+1]),g)
end
To check the output (attached T.png), I did the following:
figure(1)
subplot(2, 5, 1); x1 = [g1_aliphatic; g2_aliphatic];
j1 = repmat({'labile'}, 241,1); j2 = repmat({'stable'}, 245,1);
j = [j1; j2]; boxplot(x1, j)
title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')
subplot(2,5,2) - subplot(2,5,5) as above
figure(2)
subplot(2, 5, 1); xc1 = [g1c_aliphatic; g2c_aliphatic];
jc1 = repmat({'labile'}, 115,1); jc2 = repmat({'stable'}, 159,1);
jc = [jc1; jc2]; boxplot(xc1, jc)
title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')
Subplots for figure(2) as above.
fig1.png and fig2.png attached.
You will notice that T.png is quite different to fig1 and fig2
Additional points:
  1. The two fig1 and fig2 are printed in separate screens. Putting "hold on" between has no effect.
  2. In figure(1), if instead of subplot(2, 5, 1), I use subplot(1, 5, 1), the size of the figure is doubled vertically.
W/o the actual data file not possible to diagnose what you did wrong but the example code I did works provided the data array is defined as a column vector; if you convert to array format including the NaN, then you obviously can't index into it as if it were a 1D vector without.
Look at the examples above more closely; look specifically at size(y) in
L=randi([100 250],20,1); % arbitrary lengths of 20 vectors between 100, 250
N=sum(reshape(L,2,[])).'; % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
MN=randi([10 65],20,1); % means
SD=randi([ 3 20],20,1); % std dev
y=cell2mat(arrayfun(@(m1,s1,l1,m2,s2,l2)[m1+s1*randn(l1,1);m2+s2*randn(l2,1)],MN(1:2:end),SD(1:2:end),L(1:2:end),MN(2:2:end),SD(2:2:end),L(2:2:end),'UniformOutput',false));
...
you'll find that out...well, let's just go for it...
whos g y
Name Size Bytes Class Attributes g 3290x1 3556 categorical y 3290x1 26320 double
You notice these are 1D vectors with subsections of the arbitrary length of the various pieces-parts that were defined by L
N.'
ans = 1×10
442 368 326 252 314 249 366 379 328 266
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
L.'
ans = 1×20
195 247 198 170 140 186 124 128 112 202 140 109 248 118 179 200 101 227 156 110
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
sum(L)
ans = 3290
and you will note that sum(L) == numel(y). Ergo, linear indexing into the vector by the length given by N is the correct indexing in that case; it would NOT be so if had a full-length, augmented array in which each N would be the same and equal to the max(L).
As for the figures, of course if you create a new figure you get two, and not multiple subplots in one...and, also, if you tell subplot() to divide the axes into two rowsxN axes/row the height will be half that if you tell it to only put one set of axes on a row in the figure. Read the doc and look at the examples for subplot to see what it does.
It all depends upon what your output format is desired to be -- do you want all 10 in one figure as we've been presuming from the wording of the initial question or as two separate figures with only five on each?
Again, if you still can't figure out what is different in what I've showed you; then attach the data file; making up data to try to match is, as always fraught with misunderstandings when the actual starting point isn't the same and we presume the poster can relate the examples to their situation.
T = readtable('test.xlsx'); % imports table with NaN for missing values
T = table2array(T); size(T) = 247 20
M=2; W = width(T);
M=2; W=width(y); N=W/numel(g)/M;
for i=1:2:W
j=floor(i/2)+1;
subplot(M,N,j)
boxplot(y(:,[i:i+1]),g)
end
In the above code snippet, g and y are undefined; they did NOT come from having read the test.xlsx file by the preceding code; ergo, there's no telling what they really were and that don't match expectations is therefore not surprising.
Only if we know the content of the datfile itself can we make any judgements on how it should be treated and "burned once, twice't shy", I'm not going to make any assumptions this time about what it actually does look like. Attach the file...

Sign in to comment.

tT=readtable('test.xlsx');
whos tT
Name Size Bytes Class Attributes tT 247x20 45459 table
[head(tT,4); tail(tT,4)]
ans = 8x20 table
g1_aliphatic g2_aliphatic g1_acidic g2_acidic g1_charged g2_charged g1_polar g2_polar g1_npolar g2_npolar g1c_aliphatic g2c_aliphatic g1c_acidic g2c_acidic g1c_charged g2c_charged g1c_polar g2c_polar g1c_npolar g2c_npolar ____________ ____________ __________ __________ __________ __________ __________ __________ __________ __________ _____________ _____________ __________ __________ ___________ ___________ __________ __________ __________ __________ 2.7186e+05 3.1731e+05 1.1597e+05 1.1731e+05 2.7376e+05 2.8654e+05 4.7719e+05 4.4231e+05 5.2281e+05 5.5769e+05 2.5421e+05 3.1731e+05 1.1028e+05 1.1731e+05 2.4673e+05 2.8654e+05 4.7477e+05 4.4231e+05 5.2523e+05 5.5769e+05 2.8358e+05 2.6946e+05 1.194e+05 1.1776e+05 2.7052e+05 2.6148e+05 4.6455e+05 4.6108e+05 5.3545e+05 5.3892e+05 2.8486e+05 2.8911e+05 1.1753e+05 1.1683e+05 2.7092e+05 2.5545e+05 4.4821e+05 4.4554e+05 5.5179e+05 5.5446e+05 2.5421e+05 2.8911e+05 1.1028e+05 1.1683e+05 2.4673e+05 2.5545e+05 4.7477e+05 4.4554e+05 5.2523e+05 5.5446e+05 2.58e+05 3.0691e+05 1.1864e+05 1.0976e+05 2.7119e+05 2.8658e+05 4.7269e+05 4.7968e+05 5.2731e+05 5.2032e+05 2.8486e+05 3.0691e+05 1.1753e+05 1.0976e+05 2.7092e+05 2.8658e+05 4.4821e+05 4.7968e+05 5.5179e+05 5.2032e+05 2.5545e+05 3.3676e+05 1.2079e+05 92402 2.6733e+05 2.2587e+05 4.6733e+05 4.271e+05 5.3267e+05 5.729e+05 NaN 2.9231e+05 NaN 1.1429e+05 NaN 2.4396e+05 NaN 4.5934e+05 NaN 5.4066e+05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0485e+05 NaN 1.165e+05 NaN 2.6019e+05 NaN 4.1165e+05 NaN 5.8835e+05 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
sum(~isfinite(tT{:,:}))
ans = 1×20
6 2 6 2 6 2 6 2 6 2 132 88 132 88 132 88 132 88 132 88
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Those data look nothing like the prior examples -- it's clear why the numbers were large now; that's what's in the file.
How were the percentage numbers shown before generated?

1 Comment

It seems that Excel did one of its tricks and I was fool enough not to check. For some reason Excel has problems with decimals, etc. Anyway, it's not a cheap excuse and I'm sorry about this. I'm attaching a matlab file containing the data of the two datasets as individual vectors (datasets.mat). Thanks again.

Sign in to comment.

Categories

Asked:

on 30 Nov 2024

Commented:

dpb
on 6 Dec 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!