Tiling stacks of boxplots. Each stack contains 5 boxplots

Question

0 votes

Fig2b.png

I have generated five boxplots stacked side by side. I used subplot(1,5,1)....suplot(1,5,5). I'm attaching a figure for easy reference. I'd like to tile under this figure another one containing five similar stacked subplots. I've tried various solutions to no avail. The closest to my requirements is the boxplotGroup function but I'm still unable to get what I want.

https://ch.mathworks.com/matlabcentral/fileexchange/74437-boxplotgroup?s_tid=srchtitle

I'd be grateful for any help

1 Comment
Show -1 older comments Hide -1 older comments

George on 1 Dec 2024

Moved: dpb on 1 Dec 2024

test.xlsx

I'm attaching the excell file with the data as per your request.

For reference: headings starting with g1 refer to "labile", g2 refer to "stable" (columns 1 to 10). Correspondingly: g1c for "labile" and g2c for "stable" (columns 11-20). Data in columns 1-10 come from one experiment and for columns 11-20 from a second one.

Again, thank you for your patience and suggestions.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 2 Dec 2024

Edited: dpb on 2 Dec 2024

Open in MATLAB Online

0 votes

datasets.mat

This duplicates the prior work with a table instead; the previous still is correct for the one vector case, but having the data as the struct allows one the flexibility to get ahold of the field names and then move the metadata out of the variable names and into the data where it belongs...

S=load('datasets');  % read as a struct; can handle variable names that way programmatically
N=max(structfun(@numel,S));                         % find longest vector
S=structfun(@(v)[v;nan(N-numel(v),1)],S,'uni',0);   % and pad to that size
experiment=contains(fieldnames(S),'c_')+1;          % set the experiment number from "c" id in name
experiment=cell2mat(arrayfun(@(e)repmat(e,N,1),experiment,'uni',0));
type=startsWith(fieldnames(S),'g2');                % and the cell type
type=cell2mat(arrayfun(@(t)repmat(t,N,1),type,'uni',0));
type=categorical(type,unique(type),{'labile','stable'});
class=extractAfter(fieldnames(S),'_');              % and the classification
class=arrayfun(@(c)repmat(c,N,1),class,'uni',0);
class=categorical(cat(1,class{:}));
observation=cell2mat(cellfun(@(f)S.(f),fieldnames(S),'uni',0));
tData=table(experiment,type,class,observation);      % and turn into a table
head(tData)
    experiment     type     class     observation
    __________    ______    ______    ___________

        1         labile    acidic      11.597   
        1         labile    acidic       11.94   
        1         labile    acidic      11.028   
        1         labile    acidic      11.753   
        1         labile    acidic      11.864   
        1         labile    acidic      12.079   
        1         labile    acidic       11.26   
        1         labile    acidic      12.181   
groupsummary(tData,{'experiment','type','class'},'all')
ans = 20x16 table
    experiment     type       class      GroupCount    mean_observation    sum_observation    min_observation    max_observation    range_observation    median_observation    mode_observation    var_observation    std_observation    nummissing_observation    nnz_observation    numunique_observation
    __________    ______    _________    __________    ________________    _______________    _______________    _______________    _________________    __________________    ________________    _______________    _______________    ______________________    _______________    _____________________

        1         labile    acidic          245             11.954               2881             8.4507             14.444              5.9937                12.024               11.824             0.72844            0.85349                   4                    241                   198         
        1         labile    aliphatic       245             28.582             6888.3               25.2             33.712              8.5121                28.571               26.052              3.3274             1.8241                   4                    241                   214         
        1         labile    charged         245             26.751             6447.1             22.177             31.434              9.2576                26.679               26.253               1.981             1.4075                   4                    241                   206         
        1         labile    npolar          245             54.873              13224             51.073             59.557              8.4844                54.902                 55.4              1.6953              1.302                   4                    241                   203         
        1         labile    polar           245             45.122              10874             40.443             48.927              8.4844                45.098                 44.6              1.7028             1.3049                   4                    241                   203         
        1         stable    acidic          245             11.475             2811.5                8.2             14.112              5.9119                11.569               11.776              1.0054             1.0027                   0                    245                   208         
        1         stable    aliphatic       245             29.657             7265.9             25.528             34.898              9.3701                29.762                 27.6              2.9134             1.7069                   0                    245                   226         
        1         stable    charged         245             26.079             6389.3             21.792             29.803              8.0103                26.052               25.149              2.3286              1.526                   0                    245                   220         
        1         stable    npolar          245             55.621              13627             51.196             60.196              8.9999                55.666               55.709              3.1817             1.7837                   0                    245                   227         
        1         stable    polar           245             44.379              10873             39.804             48.804              8.9999                44.334               44.291              3.1829             1.7841                   0                    245                   227         
        2         labile    acidic          245              11.78             1354.6             8.4507             13.878              5.4276                11.753                 12.5             0.70648            0.84052                 130                    115                   110         
        2         labile    aliphatic       245             28.717             3302.5             25.421             33.531                8.11                 28.63               27.921               2.868             1.6935                 130                    115                   109         
        2         labile    charged         245             26.503             3047.9             22.485             30.303              7.8178                26.471               25.941               1.972             1.4043                 130                    115                   111         
        2         labile    npolar          245             54.979             6322.6             52.523             59.557               7.034                 54.91               54.028              1.7804             1.3343                 130                    115                   111         
        2         labile    polar           245             45.021             5177.4             40.443             47.477               7.034                 45.09               43.843              1.7804             1.3343                 130                    115                   111         
        2         stable    acidic          245             11.292             1795.4                8.2             14.111              5.9115                11.446               9.5918               1.023             1.0115                  86                    159                   143         

Removing metadata from variable names and converting to a table makes further analyses much simpler and also is easier to present the data...

As for the boxplots, within the vectors, the previous code would work just fine; with the above table, varfun with the grouping variables would work as well.

For the prior result, then

j=0;

for e=unique(tData.experiment).'

ix=tData.experiment==e;

for c=categories(tData.class).'

j=j+1;

hAx=subplot(2,5,j);

iy=ix & tData.class==c;

boxplot(tData.observation(iy),tData.type(iy))

hAx.XAxis.TickLabelRotation=0;

end

looks about right with the same issue that the online platform does something funky with the first axes on each row.

Again, the above makes the previous presumption that you wanted all of them in one figure...

ADDENDUM

Nota Bene: the orientation of the vectors in the for...end loops; MATLAB iterates over the items in the list by column, so must ensure those are row vectors--hence the transpose.

ADDENDUM SECOND

"the previous still is correct for the one vector case,"

Nota Bene: To use the observation field as the vector, remember it is now augmented to full length so the indexing is over N elements, not the variable number used in prior examples...or pull the data from the struct without the augmentation to same length and the prior logic will work as given if compute the L length vector to coincide with actual data instead of making up something as I did in the example by using a random length...

ADDENDUM THIRD

Forcibly setting the XAxis.TickLabelRotation property back to 0 fixes the issue with the first axes on the two rows.

5 Comments
Show 3 older comments Hide 3 older comments

George on 6 Dec 2024

The subplots should look like:

g1_genLen/g2_genLen g1_intrLen/g2_intrLen g1_numExon/g2_numExon

g1c_genLen/g2c_genLen g1c_intrLen/g2c_intrLen g1c_numExon/g2c_numExon

dpb on 6 Dec 2024

Well, then you've got to generalize the tiling shape based on the size of the input data as I had originally done instead of using a fixed 5x2, but it wasn't made a requirement that would have anything but 20 sets so reverted back to the hardcoded arrangement.

Sign in to comment.

Answer 2

the cyclist on 30 Nov 2024

0 votes

Would subplot(2,5,1) ... subplot(2,5,10) do what you want?

Also, I would highly recommend using tiledlayout over subplot. It takes a bit of getting used to, if you have been using subplot for a while, but in the long run it is much better.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 3

dpb on 30 Nov 2024

Edited: dpb on 30 Nov 2024

Open in MATLAB Online

0 votes

That would just double the number of rows in the subplot (or tiledlayout) arrangement...

y=randn(100,5).*[3 2 3 4 3]+[30 20 25 45 55]; y=[y fliplr(y)]; y=[y y]; % just some dummy data of same size

g={'labile','stable'};

M=2;

W=width(y);

N=W/numel(g)/M;

for i=1:2:W

j=floor(i/2)+1;

subplot(M,N,j)

boxplot(y(:,[i:i+1]),g)

end

The funky label orientation only shows up on this platform, the axes are all consistent on desktop...

Using the <tiledlayout> instead of subplot would give you some additional features; and boxchart might be worth looking into...

If you're looking for more sophisticated look, you could probably put each pair in a panel that would separate the two visually...I've never messed with them, so will leave as "exercise for Student"...

8 Comments
Show 6 older comments Hide 6 older comments

dpb on 30 Nov 2024

Open in MATLAB Online

y=randn(100,5).*[3 2 3 4 3]+[30 20 25 45 55]; y=[y fliplr(y)]; y=[y y]; % just some dummy data of same size

g={'labile','stable'};

M=2;

W=width(y);

N=W/numel(g)/M;

tiledlayout(M,N)

for i=1:2:W

j=floor(i/2)+1;

%subplot(M,N,j)

nexttile

boxplot(y(:,[i:i+1]),g)

end

Warning: boxplot might not be displayed properly in the tiled chart layout.

Well, that's kinda' rude...dunno why that might be....looks ok here

George on 1 Dec 2024

Open in MATLAB Online

Thank you for the prompt reply and suggestion. It kind of works but not quite. My question was not detailed enough. I'll try to be more specific. Each subplot consists of two boxplots (labile and stable). The length of the vectors is different. To correct for this, for subplot (1, 5, 1), I used the following:

x1 = [g1_aliphatic; g2_aliphatic];
j1 = repmat({'labile'}, 241,1);
j2 = repmat({'stable'}, 245,1);
j = [j1; j2];
boxplot(x1, j)
title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')

Similarly for the other subplots.

This generated the figure that I attached. Under this figure, I'd like to tile another one (again 5 subplots, each consisting of two boxplots. An example of the first subplot of the second tiled figure would be:subplot(2, 5, 1);

xc1 = [g1c_aliphatic; g2c_aliphatic];
jc1 = repmat({'labile'}, 115,1);
jc2 = repmat({'stable'}, 159,1);
jc = [jc1; jc2];
boxplot(xc1, jc)
title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')

The problem with your solution lies in the different vector lengths (see bold typeface above). I tried to pad each vector with zeros at the beginning or end. I then used your code. It works but the each subplot is distorted (as expected). A second problem has to do with the max and min values. They differ widely between the subplots.

Once again many thanks for your help

dpb on 1 Dec 2024

Edited: dpb on 1 Dec 2024

Open in MATLAB Online

As my example shows, just double the number of rows in the subplot() tiling from 1x5 to 2x5 and use the same logic as you already have--except instead of going from 1:5 you go from 1:10 for the subplot index...that's what the above does if you'll look at the values of M, N, j

L=randi([100 250],20,1); % arbitrary lengths of 20 vectors between 100, 250

N=sum(reshape(L,2,[])).'; % total length of each dataset vector

g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));

g=categorical(g,unique(g),{'labile','stable'}); % grouping variable

MN=randi([10 65],20,1); % means

SD=randi([ 3 20],20,1); % std dev

y=cell2mat(arrayfun(@(m1,s1,l1,m2,s2,l2)[m1+s1*randn(l1,1);m2+s2*randn(l2,1)],MN(1:2:end),SD(1:2:end),L(1:2:end),MN(2:2:end),SD(2:2:end),L(2:2:end),'UniformOutput',false));

% data generated roughly matching example; draw Box charts...

i1=1; % starting point in combined data vector

for i=1:numel(N) % the unique datasets of varying lengths

subplot(2,5,i) % hard 2x5 arrangement this time...

i2=i1+N(i)-1; % endpoint of ith dataset

boxplot(y(i1:i2),g(i1:i2))

i1=i2+1; % increment for next pass

end

Same issue as before with appearance of axes here, but the general idea still works...

I would again encourage you to investigate boxchart; you could do this in two subplots using the five categories as another grouping variable with the GroupByColor' named optional named parameter for it.

I would also suggest loading the data into a table where the data would be very useful in further analyses of these data using such tools as groupsummary and/or varfun

The key lesson here is to build the dataset with sufficient detail as to be able to handle the lengths generically without hardcoding the numbers into the code as does yours above; that makes it extremely difficult to write the code to handle the varying lengths.

The other way to approach this would be to start with cell data where each dataset is in a cell array; you could then programmatically determine the sizes of those and build the table from them...no "magic numbers" should be needed.

George on 1 Dec 2024

Edited: dpb on 1 Dec 2024

Open in MATLAB Online

Thanks once again. I tried your suggestion. Steps followed and output below.

T = readtable('test.xlsx'); % imports table with NaN for missing values
T = table2array(T); size(T) = 247    20
M=2; W = width(T); 
M=2; W=width(y); N=W/numel(g)/M;
for i=1:2:W
    j=floor(i/2)+1;
    subplot(M,N,j)
    boxplot(y(:,[i:i+1]),g)
end

To check the output (attached T.png), I did the following:

figure(1)

subplot(2, 5, 1); x1 = [g1_aliphatic; g2_aliphatic];

j1 = repmat({'labile'}, 241,1); j2 = repmat({'stable'}, 245,1);

j = [j1; j2]; boxplot(x1, j)

title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')

subplot(2,5,2) - subplot(2,5,5) as above

figure(2)

subplot(2, 5, 1); xc1 = [g1c_aliphatic; g2c_aliphatic];

jc1 = repmat({'labile'}, 115,1); jc2 = repmat({'stable'}, 159,1);

jc = [jc1; jc2]; boxplot(xc1, jc)

title('aliphatic%', 'FontSize', 10, 'fontweight', 'bold')

Subplots for figure(2) as above.

fig1.png and fig2.png attached.

You will notice that T.png is quite different to fig1 and fig2

Additional points:

The two fig1 and fig2 are printed in separate screens. Putting "hold on" between has no effect.
In figure(1), if instead of subplot(2, 5, 1), I use subplot(1, 5, 1), the size of the figure is doubled vertically.

dpb on 1 Dec 2024

Edited: dpb on 1 Dec 2024

Open in MATLAB Online

W/o the actual data file not possible to diagnose what you did wrong but the example code I did works provided the data array is defined as a column vector; if you convert to array format including the NaN, then you obviously can't index into it as if it were a 1D vector without.

Look at the examples above more closely; look specifically at size(y) in

L=randi([100 250],20,1);                        % arbitrary lengths of 20 vectors between 100, 250
N=sum(reshape(L,2,[])).';                       % total length of each dataset vector
g=cell2mat(arrayfun(@(l1,l2)[zeros(l1,1);ones(l2,1)],L(1:2:end),L(2:2:end),'UniformOutput',false));
g=categorical(g,unique(g),{'labile','stable'}); % grouping variable
MN=randi([10 65],20,1);                         % means
SD=randi([ 3 20],20,1);                         % std dev
y=cell2mat(arrayfun(@(m1,s1,l1,m2,s2,l2)[m1+s1*randn(l1,1);m2+s2*randn(l2,1)],MN(1:2:end),SD(1:2:end),L(1:2:end),MN(2:2:end),SD(2:2:end),L(2:2:end),'UniformOutput',false));
...

you'll find that out...well, let's just go for it...

whos g y
  Name         Size            Bytes  Class          Attributes

  g         3290x1              3556  categorical              
  y         3290x1             26320  double                   

You notice these are 1D vectors with subsections of the arbitrary length of the various pieces-parts that were defined by L

N.'
ans = 1×10
   442   368   326   252   314   249   366   379   328   266
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
L.'
ans = 1×20
   195   247   198   170   140   186   124   128   112   202   140   109   248   118   179   200   101   227   156   110
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
sum(L)
ans = 3290

and you will note that sum(L) == numel(y). Ergo, linear indexing into the vector by the length given by N is the correct indexing in that case; it would NOT be so if had a full-length, augmented array in which each N would be the same and equal to the max(L).

As for the figures, of course if you create a new figure you get two, and not multiple subplots in one...and, also, if you tell subplot() to divide the axes into two rowsxN axes/row the height will be half that if you tell it to only put one set of axes on a row in the figure. Read the doc and look at the examples for subplot to see what it does.

It all depends upon what your output format is desired to be -- do you want all 10 in one figure as we've been presuming from the wording of the initial question or as two separate figures with only five on each?

Again, if you still can't figure out what is different in what I've showed you; then attach the data file; making up data to try to match is, as always fraught with misunderstandings when the actual starting point isn't the same and we presume the poster can relate the examples to their situation.

dpb on 1 Dec 2024

Open in MATLAB Online

T = readtable('test.xlsx'); % imports table with NaN for missing values
T = table2array(T); size(T) = 247    20
M=2; W = width(T); 
M=2; W=width(y); N=W/numel(g)/M;
for i=1:2:W
    j=floor(i/2)+1;
    subplot(M,N,j)
    boxplot(y(:,[i:i+1]),g)
end

In the above code snippet, g and y are undefined; they did NOT come from having read the test.xlsx file by the preceding code; ergo, there's no telling what they really were and that don't match expectations is therefore not surprising.

Only if we know the content of the datfile itself can we make any judgements on how it should be treated and "burned once, twice't shy", I'm not going to make any assumptions this time about what it actually does look like. Attach the file...

Sign in to comment.

Answer 4

dpb on 1 Dec 2024

Edited: dpb on 1 Dec 2024

Open in MATLAB Online

0 votes

test.xlsx

tT=readtable('test.xlsx');
whos tT
  Name        Size            Bytes  Class    Attributes

  tT        247x20            45459  table              
[head(tT,4); tail(tT,4)]
ans = 8x20 table
    g1_aliphatic    g2_aliphatic    g1_acidic     g2_acidic     g1_charged    g2_charged     g1_polar      g2_polar     g1_npolar     g2_npolar     g1c_aliphatic    g2c_aliphatic    g1c_acidic    g2c_acidic    g1c_charged    g2c_charged    g1c_polar     g2c_polar     g1c_npolar    g2c_npolar
    ____________    ____________    __________    __________    __________    __________    __________    __________    __________    __________    _____________    _____________    __________    __________    ___________    ___________    __________    __________    __________    __________

     2.7186e+05      3.1731e+05     1.1597e+05    1.1731e+05    2.7376e+05    2.8654e+05    4.7719e+05    4.4231e+05    5.2281e+05    5.5769e+05     2.5421e+05       3.1731e+05      1.1028e+05    1.1731e+05    2.4673e+05     2.8654e+05     4.7477e+05    4.4231e+05    5.2523e+05    5.5769e+05
     2.8358e+05      2.6946e+05      1.194e+05    1.1776e+05    2.7052e+05    2.6148e+05    4.6455e+05    4.6108e+05    5.3545e+05    5.3892e+05     2.8486e+05       2.8911e+05      1.1753e+05    1.1683e+05    2.7092e+05     2.5545e+05     4.4821e+05    4.4554e+05    5.5179e+05    5.5446e+05
     2.5421e+05      2.8911e+05     1.1028e+05    1.1683e+05    2.4673e+05    2.5545e+05    4.7477e+05    4.4554e+05    5.2523e+05    5.5446e+05       2.58e+05       3.0691e+05      1.1864e+05    1.0976e+05    2.7119e+05     2.8658e+05     4.7269e+05    4.7968e+05    5.2731e+05    5.2032e+05
     2.8486e+05      3.0691e+05     1.1753e+05    1.0976e+05    2.7092e+05    2.8658e+05    4.4821e+05    4.7968e+05    5.5179e+05    5.2032e+05     2.5545e+05       3.3676e+05      1.2079e+05         92402    2.6733e+05     2.2587e+05     4.6733e+05     4.271e+05    5.3267e+05     5.729e+05
            NaN      2.9231e+05            NaN    1.1429e+05           NaN    2.4396e+05           NaN    4.5934e+05           NaN    5.4066e+05            NaN              NaN             NaN           NaN           NaN            NaN            NaN           NaN           NaN           NaN
            NaN      3.0485e+05            NaN     1.165e+05           NaN    2.6019e+05           NaN    4.1165e+05           NaN    5.8835e+05            NaN              NaN             NaN           NaN           NaN            NaN            NaN           NaN           NaN           NaN
            NaN             NaN            NaN           NaN           NaN           NaN           NaN           NaN           NaN           NaN            NaN              NaN             NaN           NaN           NaN            NaN            NaN           NaN           NaN           NaN
            NaN             NaN            NaN           NaN           NaN           NaN           NaN           NaN           NaN           NaN            NaN              NaN             NaN           NaN           NaN            NaN            NaN           NaN           NaN           NaN
sum(~isfinite(tT{:,:}))
ans = 1×20
     6     2     6     2     6     2     6     2     6     2   132    88   132    88   132    88   132    88   132    88
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Those data look nothing like the prior examples -- it's clear why the numbers were large now; that's what's in the file.

How were the percentage numbers shown before generated?

1 Comment
Show -1 older comments Hide -1 older comments

George on 1 Dec 2024

datasets.mat

It seems that Excel did one of its tricks and I was fool enough not to check. For some reason Excel has problems with decimals, etc. Anyway, it's not a cheap excuse and I'm sorry about this. I'm attaching a matlab file containing the data of the two datasets as individual vectors (datasets.mat). Thanks again.

Sign in to comment.

Tiling stacks of boxplots. Each stack contains 5 boxplots

1 Comment
Show -1 older comments Hide -1 older comments

Accepted Answer

5 Comments
Show 3 older comments Hide 3 older comments

More Answers (3)

0 Comments
Show -2 older comments Hide -2 older comments

8 Comments
Show 6 older comments Hide 6 older comments

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

Tiling stacks of boxplots. Each stack contains 5 boxplots

1 Comment Show -1 older comments Hide -1 older comments

Accepted Answer

5 Comments Show 3 older comments Hide 3 older comments

More Answers (3)

0 Comments Show -2 older comments Hide -2 older comments

8 Comments Show 6 older comments Hide 6 older comments

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

5 Comments
Show 3 older comments Hide 3 older comments

0 Comments
Show -2 older comments Hide -2 older comments

8 Comments
Show 6 older comments Hide 6 older comments

1 Comment
Show -1 older comments Hide -1 older comments