How to evaluate mean for column

I have a data with 5 columns: X,Y,Z,A,B, the first three columns represent coordinates, the next ones some attributes of objects.
A column has values: 0 or 1, B columns has values: 1, 1, 1, 2, 2, 3, 3, ,3 ,3 , 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, ....
I would like to evaluate the mean value of A column for every individual value in column B. Eg. Evaluate mean value from column A but only for points which in B have value 1.
Does somebody know which function should I use to do this? Thank you in advance!

 Accepted Answer

You can use arrayfun:
dx=zeros(100,1);
dx(randi(100,10,1))=1;
B=cumsum(dx);
A=(1:100).^2;
meanvals=arrayfun(@(x) mean(A(B==x)),unique(B));

9 Comments

Karolina
Karolina on 6 May 2015
Edited: Karolina on 6 May 2015
Thank you very much, it is exactly what I was looking for!
I have one additional question. Do you know how in output file create additional column with information for which value is the evaluated mean? I assume that the mean in output file is sorted from the lowest value in column B to the highest value in column B, but is it possible to attach this info in output file - to avoid mistakes. Or maybe it is possible to attach the results of mean value for every individual (XYZ) point? Thank you in advance.
Exactly, the mean is sorted according to B, from lowest B to highest B. If you want to write such a file, you can just use
dlmwrite('myfilename.txt',[meanvals unique(B)])
In case B is very large, it could be beneficial to save the result of unique(B) in an array and use it both in the arrayfun and in the dlmwrite function. Otherwise, unique will be executed twice, but I doubt this will be the bottleneck.
Thank you for your answer, the code works fine, but when I saw the results I realized that I would be rather interested to have this results together with my input data as additional column e.g. "M", so I would like to have output file with columns: "X,Y,Z,A,B,M". Do you know maybe how to compile this?
In this case, I would use and extend Andrei Bobrov's answer.
[uB,~,c] = unique(B);
finalResult=[A(:) B(:) out(c(:))];
Thank you! It works fine! The only thing is that I do not understand what is happing here:
[uB,~,c] = unique(B);
I would also compile this result with the number of points in my individual numbers in "B" column. To evaluate the number of points I am using this:
values=unique(B);
Size=histc(B,values);
Do you know maybe how to compile this two variables: e.g. if there is more than 5 points with the same value (in B column) and the evaluated mean (from A column) is smaller than 1.3 give 1 else 2
I better refer to the help of the unique() function, it's well explained there (most likely better than I could).
I didn't get the second part, sorry. Using histc seems to be fine here, but I couldn't understand what you want to do then.
Thanks I will read the info for unique function.
I will try to explain my second part in other way. In my B column I would like to select only these values which are appearing more than 5 times (that is why I am using histc). Next I would like to check the evaluated mean (from A column) for these values which are appearing more than 5 times. In the mean column I want to select threshold=1.3, which will divide my data for two parts. At the end I would like to create a new column with values: 1, 2 which will represent my data divided using my threshold.
values=unique(B);
nums=histc(B,values);
small=nums>5 & out<1.3;
large=nums>5 & out>=1.3;
sl=small+2*large;
In sl you should have what you want. Plese see that I used "nums" as variable name as "Size" is pretty close to "size" and that's a Matlab function. Better don't name variables like this to prevent errors from the beginning.
Super! Thank you very much! Have a nice day!

Sign in to comment.

More Answers (2)

[~,~,c] = unique(B);
out = accumarray(c(:),A(:),[],@mean);
Below does the example you listed, change the value 1 to do the others.
mean(A(B==1))

1 Comment

Thank you for your answer. Do you know if there is some way to do this for all values in my B column automatically? In my B column I have values from "1" to "n" (I do not know the biggest value) and for every value I would like to have the mean from A column.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!