You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Which Anova test and how to use it?
5 views (last 30 days)
Show older comments
Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
Commented: Franck paulin Ludovig pehn Mayo
on 20 Jul 2022
Good afernoon everyone,
I would like to use an anova test but i unfortunately does not know which one to use.
I have attached an excel file with the datas.
For instance,
I would like to know the relevance when Thickness and orientation are involved. These are the data of 9 individuals with 5 repetitions.
The correct/not represent whether the participants have found the correct answer or not. correct =1 and Not =0
28 Comments
Adam Danz
on 12 Jul 2022
> I would like to know the relevance when Thickness and orientation are involved.
What is relevance? How is that measured? If you are looking for accuracy or precision, computed from the "correct/not" column, then you don't need an ANOVA for that.
More importantly, what is the falsifiable question you're asking? Or, what is your null hypothesis?
Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
@Adam Danz To be honest , all this have me confused with your reply now. Knowing that , thickness and orientations are constant values , it seems like the anova test will not be relevant. And yes , i want it to be computed from the "correct/not" column.
For instance, let's take one specific case where i was seeking the recognition rate when thickness and orientation were involved. I found the mean of all participants and drew the bar graph. What i got was that, there were not tremedous gap [ in height among the bars ], so i wanted to know if the data were relevant that is why i thought an anova test or T test could be useful.
How can i know that my data can be "valid" ,"True" or going the same direction if other partcipants were involved?
Any suggestion ?
Adam Danz
on 12 Jul 2022
I assume the "recognition rate" is the same as accuracy which is indicated by a "1" in the correct/not column. Let me rephrase your goal with how I interpret it and you can let me know if my interpretation is incorrect.
You've got two independent variables (thickness and orientation). Thickness is on a continuous scale and has 6 levels, orientation is categorical and has 3 levels (horz, vert, control). You've got 1 independent variable which is binary, true/false, that describes some kind of decision so true (1) means correct and false (0) mean not-correct.
There are 9 participants with 5 reps and 18 conditions (6*3) which would result in 810 data points (rows of table) if all participants repeated all conditions 5 times but I only see 721 rows of data.
I still don't know the research question that motivated this design so I can only guess at the null hypothesis. In general, the order of events in a research project is
- define the question (sometimes the hardest part)
- define the null hypothesis given the question
- decide on methodology given the quesiton and null hyp.
- collect data
- analyze and interpret
For example, perhaps thickness or orientaiton is the main variable under question while the other one is a control condition that is not expected to have an effect. Or perhaps you're wondering whether horizontal and vertical orientation statistically differ from the control orientation condition. Another question might involve individual differences between participants. Each of those may have completely different statistical tests.
It sounds like you want to know if there is a statistical difference between some groups and, given the groups are similar, the difference might be small, but you need to find out if the small differences is significant. If you provide more detail on the question you're asking (and the null hypothesis would be nice to know, too), I could help further.
Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
I have 720 cases in total because .
I have four thicknesses technically
0.02 , 0.03 , 0.04 and 0. The zero thickness represents the control ( It was a blank data more like a psychological data)
. Actually, 0.02 = 0.04 ; 0.06 = 0.03 and 0.04 = 0.08 (It was due to the orientations).
So 4 thicknesses, 2 orientations , 2 amplitudes, 5 reptetions and 9 participants. = 4*2*2*5*9=720 cases.
The paragraph in bold is what i am looking for.
It sounds like you want to know if there is a statistical difference between some groups and, given the groups are similar, the difference might be small, but you need to find out if the small differences is significant and perhaps you're wondering whether horizontal and vertical orientation statistically differ from the control orientation condition.
Adam Danz
on 12 Jul 2022
Oh, I see. I just looked at the number of unique values and assumed they were fully nested. My bad.
Could I convince you to use bootstrapped confidence intervals instead of using a parametric test such as t-test of anova? The benefits are that non-parametric tests do not have the same assumptions that parametric tests have and they are easier to read have less reliance on subjective thresholds such as p-values. With confidence intervals, you can directly see whether they overlap or not.
Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
@Adam Danz i am down. But it is my first time hearing about it.
Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
I have attached the file.
Adam Danz
on 12 Jul 2022
I don't know how you made this bar plot or how you computed the means. The only thickness that nests with horizontal and vertical orientations is 0.4. You mentioned in a previous comment that some thickness values could be combined but that explanation was confusing. For example, you mentioned 0.02=0.04 but those are treated as separate conditions and the data in your plots how that they have different summary values.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1062725/Anovan_Ptestdata.xlsx','VariableNamingRule','preserve');
Tc = groupcounts(T,["thicknesss","orientation"])
Tc = 7×4 table
thicknesss orientation GroupCount Percent
__________ ______________ __________ _______
0 {'Control' } 180 25
0.02 {'vertical' } 90 12.5
0.03 {'vertical' } 90 12.5
0.04 {'horizontal'} 90 12.5
0.04 {'vertical' } 90 12.5
0.06 {'horizontal'} 90 12.5
0.08 {'horizontal'} 90 12.5
Franck paulin Ludovig pehn Mayo
on 12 Jul 2022
@Adam Danz okay lemme try to summarize it....
Due to the screen that i was using, i found that the horizontal thickness was two times the vertical thickness when display . It was probably due to the resolution of the screen. so H= 2V . (H= horizontal and V= vertical)
So even if it is treated separetely, it is not a problem. I did that way for other purposes...
I have many other graphs with different parameters . For example , the one attached will probably talk tou more ( Thickness - amplitude are the independent variables , Level1 and Level 2 represent the amplitudes)
Franck paulin Ludovig pehn Mayo
on 13 Jul 2022
@Adam Danz any help on that?
Adam Danz
on 13 Jul 2022
I have some ideas on how to proceed but short on free time. My suggestion is to compute bootstrapped confidence intervals using bootci (I recommend setting "type" to "per"). This will be peformed for each condition and will provide the confidence bounds. If the bounds do not overlap, you can conclude that the means (or whatever statistic you choose) come from different distributions. I demo'd this approach in this comment.
Franck paulin Ludovig pehn Mayo
on 13 Jul 2022
@Adam Danz i went through the demo . I would like to know how do i incomporate the correct/not data ?
Scott MacKenzie
on 17 Jul 2022
Edited: Scott MacKenzie
on 17 Jul 2022
@Franck paulin Ludovig pehn Mayo, oops, I meant to post my comment here, not as a comment to Adam's answer. In any event, thanks for your response, which is below.
But, could I ask that you re-post the modified Excel file and include the chart (generated from the data). I've fiddled with the data, but cannot seem to duplicate the chart you posted. Before doing the anova, it's important we are on the same page (i.e., my grouped bar chart looks like yours).
Franck paulin Ludovig pehn Mayo
on 17 Jul 2022
@Scott MacKenzie it is alright. The chart i posted was on the Mean of all the nine particpants. If you notice, on column "E", i have something labelled "Correct/not", 1 stands for succesful and 0 stands for wrong. So the mean was done on all the successful answers i Got .
The work was done on another sheet. I have attached the file (two sheet ; first one corresponds to the whole data and second one corresponds to the mean)
Scott MacKenzie
on 17 Jul 2022
@Franck paulin Ludovig pehn Mayo, thanks for re-posting the spreadsheet. However, there is no chart in the spreadsheet. What is needed -- to ensure we have the same interpretation of the data -- is a spreadsheet with the data and the chart. The chart must be generated from the data in the 1st worksheet, not from data that are manually entered and are separate from the data in the 1st spreadsheet. It's important to understand how the chart is created from the data for which you are intested in doing an anova.
BTW, I assume "recognition rate" is the mean for each condition of the 1s and 0s in the "correct/not correct" column, expressed as a percent (i.e., x100). Correct?
Also, my initial comment referred to the grouped bar chart you posted called "Exa.JPG". Seems you posted an additional bar chart called "a1.JPG", which is completely different. I'm focusing on Exa.JPG at the moment.
Franck paulin Ludovig pehn Mayo
on 17 Jul 2022
@Scott MacKenzie Yes the recognition rate is expressed in %.
I have attached the raw data of all the participants. To understand , the formulas and everything use Sheet P7 . The mean sheet is the last sheet.
Exa.JPG corresponds to the fourth graph. I have drawn the graph in both P7 and the last one "Mean sheet".
Scott MacKenzie
on 17 Jul 2022
@Franck paulin Ludovig pehn Mayo, sorry, but trying to figure out how your data were obtained and organized is just too much work. Bottom line is I can't recreate your chart and I can't figure out how you created it from the raw data. The problem (or part of the problem) is that your chart is based on data that were manually transcribed:
I don't know where the numbers in the formula came from, since there were manually entered. The first number is 90, which I assume is for the first participant (P1), but I'm just guessing. Elsewhere in this worksheet or on the worksheet for P1, I don't see this number caclulated anywhere, so it's a bit of a dead end.
Franck paulin Ludovig pehn Mayo
on 17 Jul 2022
@Scott MacKenzie if you read my previous comment, i said with P7 you will understand how i got the data
Scott MacKenzie
on 17 Jul 2022
@Franck paulin Ludovig pehn Mayo, I did look at the P7 worksheet, but I still can't sort things out. For example, the first manually-entered value in the formula for cell L17 in the MEAN worksheet is 90. Where does this number come from? The only 90 on the P7 worksheet is also manually entered and it is for a different orientation/thickness condition.
Franck paulin Ludovig pehn Mayo
on 17 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 17 Jul 2022
It is all done on average
We are focuS on each participant 4th graph and Mean 4th graph. For instance, if i only take the first value (horizontal and 0.02) , you will notice all the values are within the formula average.
example: Average (90,20,50,50,40,90,70,100,80)
My bad the values are not in order because i was using multiple files instead of one file with multiple sheet (was relatively new using excel, also did not how to connect the sheet that time.)
The first value (90 = P5 , 20 = P4, 50= P1 , 50 = P2 , 40 = P3 , 90=P6 , 70 = P7 , 100= P8 , 80 = P9)4th graph.
Those values i manualy entered them unfortunaltely. I understand it is very difficult to undersand the file.
Scott MacKenzie
on 18 Jul 2022
@Franck paulin Ludovig pehn Mayo, I'm not sure how to move forward with your data for the purpose of an anysis of variance. Perhaps, a different approach is appropriate. A new issue I just noticed is that some data are missing. From the bar chart you posted (copied below), I had the impression your design was 4x3:
But, it's not. There were measurements on participants only for 6 of these 12 conditions. The six conditions yielding recognition rate of 100% are just made-up, or placeholders, or something. There's likely an explanation and it probably makes sense. But, these bars do not reflect measurements on particiants, as there are no corresponding data in the table. So, perhaps the design (for a possible analyses of varaiance) is 3x2, but I'm not sure.
BTW, on a comment you made earlier -- I thought the participant's information were not needed -- knowing whcih data correspond to which participant is important and a central part of the an analysis of variance.
Perhaps Adam's answer is useful to you. Good luck.
Franck paulin Ludovig pehn Mayo
on 18 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 18 Jul 2022
@Scott MacKenzie for you to have an idea of my data let me explain the aim of the experiment. I designed a ring with haptic motors embedded in it. The motors will be running with just two amplitudes (Level 1 and Level 2 which are the lowest). On a touschreen, each participant (blindfolded) will have to find out the line (vertical and horizontal orientaions). The lines have different thicknesses 0.02 , 0.03 and 0.04 and 0(control) . The latter corresponds to blank , nothing is displayed.
So i was aiming to find the impact of amplitudes, thicknesses and orientation in the recognition rate. There were 5 repetitions as i said before but everything was ramdomly done. To find the start and the end of each partcipant is easy but to dertermine the repetitions rank or order is likely impossible.
To draw the graphs, I used the MEAN of all participants and seeing the graphs there were not a tremendous difference among the bars that's why i wanted to go for an anova test to check on the variance. I came accross the Anova, T-test just like a week ago.
Adam Danz
on 18 Jul 2022
@Franck paulin Ludovig pehn Mayo, your question is about applying a statistic to the data but the majority of this thread is back-and-forth questions trying to understand your data. It's really confusing to say that some conditions are actually other conditions. This thread currently has 267 views and almost 30 comments since it was posted 6 days ago which suggests a lot of time has been put into this. It shouldn't be this difficult to explain 12 data points (the number of bars in your figure).
I want to see you succeed in this goal so please let me gives some advice.
In the future, it would benefit you to spend some time cleaning up the data so it's very easy to explain and understand before you ask the question. Also, whenever you generate a plot, provide the code so we don't have to figure out what you're doing. That adds additional tasks we must figure out before we even get to your question. It looks like those bar plots were done outside of MATLAB but taking the time to figure out how to do it in MATLAB so you can ask a clearer question would help out a lot.
Adam Danz
on 20 Jul 2022
Just FYI, I deleted Carlos' answer because it was spam. He merely copied content from this Investopedia article and embedded a spam link at the end. Since the entirely of his content is available in the link above and was not authored by him, his content was removed and his profile has been flagged as spam.
I saw you voted for his answer so I wanted to explain why it's no longer here.
Franck paulin Ludovig pehn Mayo
on 20 Jul 2022
@Adam Danz okay i understand. Just came across this statitics methods like a week ago, trying to understand what is what and which one can suits my issue.
The few info i got seems to lead me towards confidence interval.
Adam Danz
on 20 Jul 2022
Edited: Adam Danz
on 20 Jul 2022
I came across these statistical methods 15 years ago and am still trying to understand which ones suit different sets of data and questions. It wasn't until about 5 years ago that I realized my long-tem confusion wasn't a problem with my understanding -- it's a problem in the field of statistics in general. So many peer-reviewed articles apply statistics incorrectly or do not show that the data are fit for the selected statistics. Worse yet, some people keep applying different statistics until they get the results they want which is p-hacking. Three years ago hundreds of scientists and statisticians around the globe supported a movement to change how we think about and practice statistics (see list of articles at the bottom of this answer). What's nice about bootstrapped CIs is that they can be used to visualize how closely related are two distributions rather than just providing a number such as p<0.005.
I'm not swaying you away from using an ANOVA method - but I am arguing that the movement mentioned is a big step forward in statistics.
Answers (1)
Adam Danz
on 13 Jul 2022
I recommend using bootstrapped confidence intervals. The idea is to resample your accuracy data with replacement and compute the mean on the sample for each condition. If you repeat this many times (1000, for example), you'll have a distribution of means which can be used to compute the middle 95% interval. Fortunately MATLAB has a function that does most of the work: bootci which is demo'd in this comment. After you have the CIs for each condition, you can plot them using errorbar. If the CIs do not overlap between two conditions, it is likely that the data from those condtions come from different distributions.
Here's a demo that performs bootstrapped CIs for a single condition in your data. I would set up the loop to compute CIs for all conditions but I still do not understand which conditions to compare since the data do not appear to be nested. Perhaps if the 'thickness' values were corrected in some way, it would be clearer. But first you give it a shot.
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1062725/Anovan_Ptestdata.xlsx','VariableNamingRule','preserve');
thickIdx = T.thicknesss == 0.04;
orientIdx = strcmp(T.orientation, 'vertical');
CI = bootci(1000, {@mean, T.("correct/not")(thickIdx & orientIdx)}, 'Type', 'per')
CI = 2×1
0.7667
0.9111
mu = mean(T.("correct/not")(thickIdx & orientIdx));
bar(mu)
hold on
errorbar(1, mu, mu-CI(1), mu-CI(2), 'k-','LineWidth',1)
14 Comments
Franck paulin Ludovig pehn Mayo
on 16 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 16 Jul 2022
@Adam Danz I have attached the new data sheet and a screenshot of the graph
The thicknesses available are 0=control , 0.02 , 0.03 and 0.04.
I would like to compare all the thicknesses when all the orientations (vertical and horizontal) are involved.
Scott MacKenzie
on 17 Jul 2022
@Franck paulin Ludovig pehn Mayo, I'm just seeing your question now. There are comments and an answer from @Adam Danz, so perhaps we're done here. However, let me add a comment.
To me, the most informative part of your question is the grouped bar chart. It shows the relationship between two independent variables (x-axis) and a dependent variable (y-axis). The independent variables are orientation with 3 levels (horizontal, vertical, control) and thickness with 4 levels (0.02, 0.03, 0.04, and control). The dependent variable is recognition rate (%). This looks appropriate for an analysis of variance. And you are not alone in wondering how to do this in MATLAB: There are at least 5 MATLAB anova functions! The anova will help answer three questions:
- Is there a significant effect of orientation on recognition rate?
- Is there a significant effect of thickness on recognition rate?
- Is there a significant Orientation x Thickness interaction effect on recognition rate?
This can be setup fairly easily in MATLAB, but, first, there are some issues that need to be clarified. The experiment engaged nine participants ("individuals" in the question) with five repetitions of the measurements for each participant on each condition. But, there is no column in the data set indicating which rows correspond to which participants. Ditto for repetition. Can you add columns for the participant codes and repetition numbers?
Also, I assume "0" in the thickness column corresponds to the "control" level for thickness, but please confirm.
Finally, note that there is a small labelling error in the bar chart. The x-axis label corresponds to the bar groups. This should be "Thickness", not "Orientation". Orientation appears via the bars within groups. So, if you wish to include "Orientation" in the chart, it should appear as the title for the legend entries.
Franck paulin Ludovig pehn Mayo
on 17 Jul 2022
"The experiment engaged nine participants ("individuals" in the question) with five repetitions of the measurements for each participant on each condition. But, there is no column in the data set indicating which rows correspond to which participants. Ditto for repetition"
I thought the particpant's information were not needed. Secondly, concerning the partcipants, after each 80 rows comes a new participant. (I have attached a file; A = first partcipant ,B= second...)
ex: From row 2 to row 81 , First partcipant
From row 82 to row 161 , second particpants and so on...
Concerning the repetitions, it will be impossible actually to detect since the experiment was ramdomly done each time .
ex : so from row 2 to row 81 , there are 5 repetitions within, but i cannot tell you the order.
So what i did , i copied all the results from each participants and i just pasted them in excel.
Yes, indeed the labbeling in the error bar is wrong thank you for me letting me know, indeed it is thickness instead of orientation.
The fact is that , there are lot of parameters that i will have to check the "relevance" such as:
- Thickness - Orientation (Recognition Rate)
- Amplitude - Orientation (Recognition Rate)
- Thickness- amplitude (Recognition Rate)
- Recognition Time when amplitude is involved
- Recognition Time when Thickness is involved.
Yes "0" thickness corresponds to the Control
Franck paulin Ludovig pehn Mayo
on 19 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 19 Jul 2022
@Adam Danz i did something but i dont know quite how to interpret the results . I did it based on your previous works
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1068365/Newfile.xlsx');
thickIdx = T.Var2 == 0.04;
orientIdx = strcmp(T.Var4, 'vertical');
data = T.("Var5")(thickIdx & orientIdx);
%number of bootstapps
nBoot = 1000;
[bci,bmeans] = bootci(nBoot, {@mean,data}, 'Type', 'per');
% bootstrap sample mean
bmu = mean(bmeans);
%mu = mean(data);
% Now repeat that process with lower-level bootstrapping
% using the same sampling proceedure and the same data.
bootMeans = nan(1,nBoot);
for i = 1:nBoot
bootMeans(i) = mean(data(randi(numel(data),size(data))));
end
CI = prctile(bootMeans,[5,95]);
mu = mean(bootMeans);
% Plot
figure()
ax1 = subplot(2,1,1);
histogram(bmeans);
hold on
xline(bmu, 'k-', sprintf('mu = %.2f',bmu),'LineWidth',2)
xline(bci(1),'k-',sprintf('%.1f',bci(1)),'LineWidth',2)
xline(bci(2),'k-',sprintf('%.1f',bci(2)),'LineWidth',2)
title('bootci()')
% plot the lower-level, direct computation results
ax2 = subplot(2,1,2);
histogram(bootMeans);
hold on
xline(mu, 'k-', sprintf('mu = %.2f',mu),'LineWidth',2)
xline(CI(1),'k-',sprintf('%.1f',CI(1)),'LineWidth',2)
xline(CI(2),'k-',sprintf('%.1f',CI(2)),'LineWidth',2)
title('Lower level')
linkaxes([ax1,ax2], 'xy')
% bar(bmu)
% hold on
% errorbar(1, mu, mu-CI(1), mu-CI(2), 'k-','LineWidth',1)
Adam Danz
on 19 Jul 2022
I'll break down your code below.
Here, you're looking at data in column "Var5" of your table from rows that from conditions Var2==0.4 and Var4="vertical".
T = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1068365/Newfile.xlsx');
thickIdx = T.Var2 == 0.04;
orientIdx = strcmp(T.Var4, 'vertical');
data = T.("Var5")(thickIdx & orientIdx)
data = 90×1
0
0
1
1
1
1
0
1
1
1
Then you're bootstrapping the mean from that selection of data. "bci" is the 95% confidence interval (CI) of the mean and "bmeans" are the 1000 bootstrapped means. See bootci for details.
%number of bootstapps
nBoot = 1000;
[bci,bmeans] = bootci(nBoot, {@mean,data}, 'Type', 'per')
bci = 2×1
0.7667
0.9111
bmeans = 1000×1
0.8444
0.8889
0.8889
0.8556
0.9000
0.8556
0.8111
0.8556
0.8556
0.8111
I don't know why you want the mean of the bootstrapped means. Maybe you have good reason for this. The line you commented out computes the mean of the raw data.
% bootstrap sample mean
bmu = mean(bmeans);
%mu = mean(data);
I'm not sure what "lower level bootstrapping" is. Is that a term I used somewhere in another thread? The for-loop merely implements the same type of bootstrapping that the bootci function does above. I was probably comparing the bootci functionality to another method of directly implementing bootstrapping (still not sure where you saw this but it does look like mine). The randi function resamples the data with replacement which is important to do in bootstrapping. Then the prctile line computes the CIs with the percentile method in the same way bootci does when type='per'.
% Now repeat that process with lower-level bootstrapping
% using the same sampling proceedure and the same data.
bootMeans = nan(1,nBoot);
for i = 1:nBoot
bootMeans(i) = mean(data(randi(numel(data),size(data))));
end
CI = prctile(bootMeans,[5,95]);
mu = mean(bootMeans);
This part plots the distribution of bootstrapped means from bootci
% Plot
figure()
ax1 = subplot(2,1,1);
histogram(bmeans);
this adds the mean of the bootstrap means. Maybe you want to show the mean of the data instead.
hold on
xline(bmu, 'k-', sprintf('mu = %.2f',bmu),'LineWidth',2)
Here you add the bootci CIs
xline(bci(1),'k-',sprintf('%.1f',bci(1)),'LineWidth',2)
xline(bci(2),'k-',sprintf('%.1f',bci(2)),'LineWidth',2)
title('bootci()')
Then you repeat with the lower level bootstrapping method which unsurprisingly has the same results.
% plot the lower-level, direct computation results
ax2 = subplot(2,1,2);
histogram(bootMeans);
hold on
xline(mu, 'k-', sprintf('mu = %.2f',mu),'LineWidth',2)
xline(CI(1),'k-',sprintf('%.1f',CI(1)),'LineWidth',2)
xline(CI(2),'k-',sprintf('%.1f',CI(2)),'LineWidth',2)
title('Lower level')
linkaxes([ax1,ax2], 'xy')
% bar(bmu)
% hold on
% errorbar(1, mu, mu-CI(1), mu-CI(2), 'k-','LineWidth',1)
But this isn't what your initial goal is. This is useful to compute the CIs (use one method or the other, no need to do both). Your initial goal is to compute the CI, not to plot the distributions and such.
Once you have the CIs for each condition, you can add them to your bar plot using the errorbar function.
Franck paulin Ludovig pehn Mayo
on 19 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 19 Jul 2022
@Adam Danz yes , you were doing a comparison between the two , i did not know which one to use. You are right the mean of the bootstrapped means is not relevant.
How can implement many conditions at the same time ...like all the thicknesses [0 0.02 0.03 0.04] and orientations [ horizontal vertical control] ; control=0,
Since they dont have the same size, is it going to be an issue?
being trying to implement since yesterday, but was not successful
Franck paulin Ludovig pehn Mayo
on 20 Jul 2022
Edited: Franck paulin Ludovig pehn Mayo
on 20 Jul 2022
@Adam Danz i wanted to do a loop for all the conditions but i was just having mistakes. so i went like "manually".
Also, concerning the graphs how can i have the error bars to be more distincts. Also , concerning the "o" thickness, i couldnt implement it . Also, how can i add "horizontal"?
was having this:
"BOOTFUN returns a NaN or Inf." , I guess it is because the mean cannot be computed with 0. Is there anyway , i can sort it out ?
T = readtable('Newfile.xlsx');
%Var2 = thickness
%thickIdx1 = T.Var2 == 0;
thickIdx2 = T.Var2 == 0.02;
thickIdx3 = T.Var2 == 0.03;
thickIdx4 = T.Var2 == 0.04;
%Var4= orientation
%orientIdx1 = strcmp(T.Var4, 'vertical');
orientIdx2 = strcmp(T.Var4, 'vertical');
orientIdx3 = strcmp(T.Var4, 'vertical');
orientIdx4 = strcmp(T.Var4, 'vertical');
%var5= correct/not
%data1 = T.("Var5")(thickIdx1 & orientIdx1);
data2 = T.("Var5")(thickIdx2 & orientIdx2);
data3 = T.("Var5")(thickIdx3 & orientIdx3);
data4 = T.("Var5")(thickIdx4 & orientIdx4);
%number of bootstapps
nBoot = 1000;
%CI1 = bootci(nBoot, {@mean,data1}, 'Type', 'per');
CI2 = bootci(nBoot, {@mean,data2}, 'Type', 'per')
CI3 = bootci(nBoot, {@mean,data3}, 'Type', 'per')
CI4 = bootci(nBoot, {@mean,data4}, 'Type', 'per')
% mu1 = mean(data1);
mu2 = mean(data2);
mu3 = mean(data3);
mu4 = mean(data4);
% bar(mu1)
bar(mu2)
bar(mu3)
bar(mu4)
hold on
% errorbar(1, mu1, mu1-CI1(1), mu2-CI2(2), 'k-','LineWidth',1)
errorbar(1, mu2, mu2-CI2(1), mu2-CI2(2), 'k-','LineWidth',1)
errorbar(1, mu3, mu3-CI3(1), mu3-CI3(2), 'k-','LineWidth',1)
errorbar(1, mu4, mu4-CI4(1), mu4-CI4(2), 'k-','LineWidth',1)
Adam Danz
on 20 Jul 2022
You're plotting the bars separately. Instead, plot them all together. bar([m1 m2 m3]) then apply the errorbars in the same way so you are creating 1 errorbar object that has 3 error bars.
It should looke like this,
bar([1 2 3])
hold on
errorbar([1 2 3], 1:3, rand(1,3), rand(1,3),'k-','LineStyle','none','LineWidth',1)
Franck paulin Ludovig pehn Mayo
on 20 Jul 2022
Thanks. Bar 1 and bar 2 are overlapping. To "fix" it , do I need to do more experiments and hoping that the standard deviation goes towards the mean?
Secondly, Is it possible if yes to the bootci with Thickness vs orientations( Horizontal and vertical) and not just with one parameter like " Vertical" in what's been done previously?
Adam Danz
on 20 Jul 2022
To "fix it" can be hairy.
Sometimes an insuffient amount of data is collected such that the sample of data does not reflect the unobservable full population of data. For example, if I'm calling people randomly to ask what their favorite ice cream is, maybe I accidentally called a disproportionaly high number of lactose intolerance people. In that case, then yes, collecting more data can reveal a more accurate picture of the population.
But if your sample of data already relfects the population, collecting more data will not change the outcome.
Most importantly, the amount of data you collect should not be decided from the resultant statistic. In other words, you should decide how much data to collect independtly from the results. Otherwise, that p-hacking and it's really bad science.
If your data reflect the underlying population, and if your bars overlap, then that's the result, that's the answer to your quesiton, that's reality. In that case, you cannot conclude that these two populations of means come from different distributions.
I did a study for 4 years and had those unexpected results - that two groups did not differ even though everyone expected them to differ. This is an opportunity to investigate why. Maybe previous studies had a different methodology or maybe the model should be viewed differently.
About comparing different conditions, all you have to do is change your indexing.
BTW I just noticed that your variables orientIdx2 orientIdx3 orientIdx4 are all the same thing. You only need one of those. Take some time to understand what these lines are doing.
Franck paulin Ludovig pehn Mayo
on 20 Jul 2022
@Adam Danz I have 3 main questions and i would like your imputs .
1) Recently i have asked myself how to know wheter i have a representative sample of all the population?
2) I have noticed that each time i run the code , i have a different error bars graph. The few i have seen were overlapping and some were not overlapping. How do i globally interpret that?
3)"About comparing different conditions, all you have to do is change your indexing", What i am trying to ask is instead of having two error bar graphs with Vertical and horizontal, is it not possible to have them in one graph?
Like analysing the thickness and orientations as a whole and not thickness vs vertical or thickness vs horizontal but rather thickness vs vertical&horizontal?
Adam Danz
on 20 Jul 2022
- This is more of an art form than a science. There are lots of bits of advice out there to know when enough is enough. It's been obvoius to me when I don't have enough data but less obvious when I've collected enough. I have use cross validation to help make that decision. The main idea is, if I remove something like 10-20% of my data and get approximately the same results, then I have enough data.
- It wouldn't be surprising if the CIs differ by a very small amount between runs. bootci uses and random selection of your data so the results can differ by a very small amount. If you're getting noticable different results between runs, someting is wrong. Either you're not runing enough boot straps (1000 should be enough but you could try more) or you're not providing the same exact input data between runs. This is definitely something you want to investigate.
- I still don't understand your dataset enough to imagine this comparison. If any given data point has a thickness property and an orientation property and you want to know whether thickness or orientation has a stronger effect, then I don't think you can do that with this bootstrapping method which makes me fear that this entire multiple-day thread has nothing to do with your actual goals. The main lesson, if this is the case, is that the data and the goals must be crystal clear to you and to the readers before a useful answer can be written.
I realized you previously asked about NaNs in your bootci results but I forgot to address that question. By default, mean does not ignore NaNs and if there is a NaN in the data, the mean will be NaN. You want to omit nans using
___ = bootci(nBoot, {@(x)mean(x,'omitnan'),data}, 'Type', 'per')
That's all the time I have for this thread @Franck paulin Ludovig pehn Mayo. I hope these ideas will be helpful to you even if you don't end up needing them.
Franck paulin Ludovig pehn Mayo
on 20 Jul 2022
@Adam Danz Thank you very much , i have grasped the concept. I have an idea how i will go from here.
The last input i would like to know is to fix the Nan . i have implemented it but unfortunately i am still having the same error.
BOOTFUN returns a NaN or Inf.
T = readtable('Newfile.xlsx');
%Var2 = thickness
thickIdx1 = T.Var2 == 0;
thickIdx2 = T.Var2 == 0.02;
thickIdx3 = T.Var2 == 0.03;
thickIdx4 = T.Var2 == 0.04;
%Var4= orientation
orientIdx = strcmp(T.Var4, 'vertical');
%var5= correct/not
data1 = T.("Var5")(thickIdx1 & orientIdx);
data2 = T.("Var5")(thickIdx2 & orientIdx);
data3 = T.("Var5")(thickIdx3 & orientIdx);
data4 = T.("Var5")(thickIdx4 & orientIdx);
%number of bootstapps
nBoot = 1000;
CI1 =bootci(nBoot, {@(x)mean(x,'omitnan'),data1}, 'Type', 'per')
CI2 = bootci(nBoot, {@mean,data2}, 'Type', 'per')
CI3 = bootci(nBoot, {@mean,data3}, 'Type', 'per')
CI4 = bootci(nBoot, {@mean,data4}, 'Type', 'per')
mu1 = mean(data1);
mu2 = mean(data2);
mu3 = mean(data3);
mu4 = mean(data4);
bar(mu1)
bar(mu2)
bar(mu3)
bar(mu4)
hold on
bar([1 2 3 4])
hold on
errorbar([1 2 3 4], 1:4, rand(1,4), rand(1,4),'k-','LineStyle','none','LineWidth',1)
See Also
Categories
Find more on Analysis of Variance and Covariance in Help Center and File Exchange
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)