Get zeroes in a binned data set

I am trying to put a large number of data set which is in as a matrix to certain bins. But I get zeroes in my binned data set and I don't know what to do. My program is a bit long. Once I print one data set I get random zeros everywhere in the data set. I will be grateful if anyone can help me.
clc;
clear all;
ncdisp('at.nc')
T = ncread('at.nc','T',[1 1], [Inf Inf], [1 1]);
year= ncread('at.nc','year',[1],[Inf], [1]);
month= ncread('at.nc','month',[1], [Inf], [1]);
alt= ncread('at.nc','altitude',[1], [Inf], [1]);
day= ncread('at.nc','day',[1],[Inf], [1]);
flag= ncread('at.nc','quality flag',[1 1], [Inf Inf], [1 1]);
for i=1:32025
x(i)=year(i);
d(i)=day(i);
m(i)=month(i);
end
for i=1:32025
if ((m(i)==1 )&& (d(i)>=0.5) && (d(i)<=15.25))
L(i)=1;
elseif d(i)>=15.26 && d(i)<=30.5 && m(i)==1
L(i)=2;
elseif (d(i)==31 && m(i)==1) || (d(i)<=14.75 && m(i)==2)
L(i)=3;
elseif (d(i)>=14.76 && m(i)==2) || (d(i)<=1.5 && m(i)==3)
L(i)=4;
elseif (d(i)>=16.5 && d(i)==31 && m(i)==3)
L(i)=5;
elseif (d(i)>=16.5 && m(i)==3) || (d(i)<=31 && m(i)==3)
L(i)=6;
elseif (d(i)==1 && m(i)==4) || (d(i)<=15.75 && m(i)==4)
L(i)=7;
elseif (d(i)>=15.75 && m(i)==4) || (d(i)==30 && m(i)==4)
L(i)=8;
elseif (d(i)>=1 && m(i)==5) || (d(i)<=16.75 && m(i)==5)
L(i)=9;
elseif (d(i)>=16.25 && m(i)==5) || (d(i)==31 && m(i)==5)
L(i)=10;
elseif (d(i)==1 && m(i)==6) || (d(i)<=15.75 && m(i)==6)
L(i)=11;
elseif (d(i)>=15.25 && m(i)==6) || (d(i)==30 && m(i)==6)
L(i)=12;
elseif (d(i)==1 && m(i)==7) || (d(i)<=16.75 && m(i)==7)
L(i)=13;
elseif (d(i)>=16.25 && m(i)==7) || (d(i)==30 && m(i)==7)
L(i)=14;
elseif (d(i)==1 && m(i)==8) || (d(i)<=15.75 && m(i)==8)
L(i)=15;
elseif (d(i)>=15.75 && m(i)==8) || (d(i)==31 && m(i)==8)
L(i)=16;
elseif (d(i)>=0.5 && m(i)==9) || (d(i)<=15.75 && m(i)==9)
L(i)=17;
elseif (d(i)==30 && m(i)==9) || (d(i)<=15.75 && m(i)==9)
L(i)=18;
elseif (d(i)==1 && m(i)==10) || (d(i)<=15.75 && m(i)==10)
L(i)=19;
elseif (d(i)<=31 && m(i)==10) || (d(i)>=15.75 && m(i)==10)
L(i)=20;
elseif (d(i)==1 && m(i)==11) || (d(i)<=15.75 && m(i)==11)
L(i)=21;
elseif (d(i)==30 && m(i)==11) || (d(i)>=15.75 && m(i)==11)
L(i)=22;
elseif (d(i)==1 && m(i)==12) || (d(i)<=15.75 && m(i)==12)
L(i)=23;
elseif (d(i)==31 && m(i)==12) || (d(i)>=15.75 && m(i)==12)
L(i)=24;
else L(i)
end
end
n=0;
for i=1:120
for m=1:24
w4(i,m)=1;
w5(i,m)=1;
w6(i,m)=1;
w7(i,m)=1;
w8(i,m)=1;
w9(i,m)=1;
w10(i,m)=1;
w11(i,m)=1;
w12(i,m)=1;
w13(i,m)=1;
end
end
for j=1:32025
if (x(j)==2004)
for i=1:120
if (flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
E4(i,L(j),w4(i,L(j)))= T(i,j);
E4(i,L(j),w4(i,L(j)));
w4(i,L(j))= w4(i,L(j))+1;
end
end
elseif (x(j)==2005)
if (flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E5(i,L(j),w5(i,L(j)))= T(i,j)
w5(i,L(j))= w5(i,L(j))+1;
end
end
elseif (x(j)==2006)
if(flag(i,j)<=2) && ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E6(i,L(j),w6(i,L(j)))= T(i,j);
w6(i,L(j))= w6(i,L(j))+1;
end
end
elseif (x(j)==2007)
if(flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E7(i,L(j),w7(i,L(j)))= T(i,j);
w7(i,L(j))= w7(i,L(j))+1;
end
end
elseif (x(j)==2008)
if(flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E8(i,L(j),w8(i,L(j)))= T(i,j);
w8(i,L(j))= w8(i,L(j))+1;
end
end
elseif (x(j)==2009)
if(flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E9(i,L(j),w9(i,L(j)))= T(i,j);
w9(i,L(j))= w9(i,L(j))+1;
end
end
elseif (x(j)==2010)
if (flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E10(i,L(j),w10(i,L(j)))= T(i,j);
w10(i,L(j))= w10(i,L(j))+1;
end
end
elseif (x(j)==2011)
if(flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E11(i,L(j),w11(i,L(j)))= T(i,j);
w11(i,L(j))= w11(i,L(j))+1;
end
end
elseif (x(j)==2012)
if(flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E12(i,L(j),w12(i,L(j)))= T(i,j);
w12(i,L(j))= w12(i,L(j))+1;
end
end
elseif (x(j)==2013)
if(flag(i,j)<=2)&& ((T(i,j))>0)&&(~isnan(T(i,j)))
for i=1:120
E13(i,L(j),w13(i,L(j)))= T(i,j);
w13(i,L(j))= w13(i,L(j))+1;
end
end
end
end
I printed E4. The results I get is attached in the question.

 Accepted Answer

I don't really know the answer to your problem, mostly because your program is so difficult to follow with all these elseif it's difficult to see where the bug, if any, could be. Possibly, it's because some of your elseif may be wrong, such as:
elseif (d(i)>=16.5 && d(i)==31 && m(i)==3)
if d == 31, it's obviously >= 16.5, so the _d(i)>=16.5 serves no purpose
elseif (d(i)==1 && m(i)==4) || (d(i)<=15.75 && m(i)==4)
the first part (before the | |) is always true when the second is, thus serves no purpose. There may be more elseif like that.
So, I think your first task should be to simplify your program. Most of what you're doing can be achieved with a lot less code. For example:
  • copying matrix: I'm not sure why you copy year, day, month to new arrays with names which are less descriptive but you don't need a loop to do that,
x = year; % or x = year(1:32025) if year has more elements
works just as well.
Your L calculation looks like it's partitioning the year into several periods and finding in which period a particular month/day combination falls in. You're basically finding in which bin of an histogram a particular date falls in. The 2nd output of histc tells you that. You just need to transform your month/day dual variable into a single one. This is easily done with datenum and datevec_, e.g.:
dvdaymonth = [zeros(32025, 1) m' d']; %assuming m and d are row vector. Don't transpose if column
%dvdaymonth is a datevector where each row is year month day. year is always 0
dndaymonth = datenum(dvdaymonth); %transform into a single number
dvthresholds = [
0 0 0
0 1 15.25
0 1 30
0 2 14.75
... and so on
0 12 31]; %again, each row is year, month, day.
dnthresholds = datenum(dvthresholds);
[~, L] = histc(dndaymonth, dnthresholds);
You use loops to create matrices of one, use the ones function:
w4 = ones(120, 4);
...
I'm not sure what you're doing next in the code, it looks like you're building an histogram. maybe explain what you're trying to achieve and we'll tell you how to simplify it.
With simpler code, it'll be a lot easier to find where it's going wrong.

10 Comments

Sorry for the late question. I tried to do this by using data structures. But it was not successful. Anyway then I came back to your method. This is my code.
dvdaymonth = [month day]; %assuming m and d are row vector. Don't transpose if column
%dvdaymonth is a datevector where each row is year month day. year is always 0
dndaymonth = datenum(dvdaymonth); %transform into a single number
dvthresholds = [
1 15.25
1 30.5
2 14.75
3 1.5
3 16.5
3 31.5
4 15.5
4 30.5
5 16.5
5 31.5
6 15.5
6 31.5
7 16.5
7 30.5
8 15.5
8 31.5
9 15.5
9 30.5
10 15.5
10 31.5
11 15.5
11 30.5
12 15.5
12 31.5
];
dnthresholds = datenum(dvthresholds);
[~, L] = histc(dndaymonth, dnthresholds)
I know that the last line checks in which bin the data is in. But I get this error.
Error using histc
Edge vector must be monotonically non-decreasing.
Error in Untitled (line 43)
[~, L] = histc(dndaymonth, dnthresholds)
datenum transform a date vector into a single number. For it to work properly the vector needs to have 3 (year, month, day) or 6 columns (year, month , day, hour, minute, second). It won't work with only two columns.
Hence, in my example I set the year to 0 (the value does not matter as long as it's always the same):
dvdaymonth = [zeros(size(month) month day];
dvthresholds = [
0 1 15.25
0 30 30.5
%... and so on
];
%or you can keep dvthresholds as you have written and do:
dnthresholds = dataenum([zeros(size(dvthresholds, 1), 1) dvthresholds]);
for histc to work, dndaymonth and dnthresholds must be vectors and dnthresholds needs to be indeed monotonically non-decreasing.
I converted it into a double array because it says that ,
Caused by:
Error using datenummx
The datenummx function only accepts double arrays.
a 'double' array is an array whose elements are of class_double_, not array with two columns.
Most likely, your month and day variables are not double but some integer class.
dvdaymonth = double([zeros(size(month) month day]);
should fix it.
[~, L] = histc(dndaymonth, dnthresholds);
What does this code does? I know it gives you an index for each bin. I tried to execute it and it gives an error. Up to now what you said worked so perfectly. Thanks.
Attempt to execute SCRIPT histc as a function.
What does this say
>> which histc
It sounds like you have your m-file called histc and then try to call histc inside it. Don't name your variables, functions, or m-files after built-in function names.
Does this give 0 as an index? I don't want an index to be 0. I checked and I get some value in L as zeroes.
Nevermind I fixed it. Thanks.
Thank you again. Everything you said worked perfectly. Now the problem I have is that as I explained earlier I have data according to the year. Since I have assigned an index to each bin according to the month and day as you said, now I want to seperate data according to the year and take average of each bin. I have data for 10 years. And each year I have 48 bins. Altogether it's 48*10 bins. I need to calculate average of values in each bin. Thanks for the help.
If you want to take the year into account, you just have to add it to the datenum calculation:
dvdate = double([year month day)];
For the thresholds, either you manually define them for all the years (a bit tedious), or you define it as:
dvthresholds = [
1 15.25
1 30.5
2 14.75
... %same as before
];
thresholdyears = repmat(min(year):max(year), size(dvthresholds, 1), 1);
dvthresholds = double([thresholdyears(:) dvthresholds]);
Once you've got your bins distribution L, to calculate the average of T per bin:
Taverage = accumarray(L, double(T), [], @mean);

Sign in to comment.

More Answers (1)

Image Analyst
Image Analyst on 15 Sep 2014
What are the bin centers or edges? Is there a value that you know for a fact should have gone into a bin yet the bin is still zero? For example bin #123 covers values from 5460 to 5600 (or whatever) and you know for a fact that you have a data value of 5500, which should have got counted in bin #123 but bin #123 is zero?

5 Comments

anton fernando
anton fernando on 15 Sep 2014
Edited: anton fernando on 15 Sep 2014
It is like this. This is a huge matrix which gives you the temperature of a certain material. Data were taken for 32025 days and 120 volumes. Column gives you which day the data was taken. And the row the certain volume. For an example T(10,2) gives you the temperature of the material at the volume 10 m^3 in the 2nd day the data was taken. For the day, month and year can be obtained by day(2), month(2) and year(2). It gives you month/day/year. What I have to do is to put the data into 24 bins in each year. So what I did I was assigned a number L(k) for each data according to the bin.(one bin is 15 days approximately.) Then tried to bin it for each year. Every data has a flag. I have to remove when the flag is greater than 2. Later I have to do some statistics.
Image Analyst
Image Analyst on 15 Sep 2014
Edited: Image Analyst on 15 Sep 2014
I didn't understand this "For the day, month and year can be obtained by day(2), month(2) and year(2). It gives you month/day/year. " And I don't know how you could have collected 87 years worth of data (32,025 days) unless it was a simulation. And I didn't see an answer to my question of a value that should be in a certain bin, but was not. Sorry.
anton fernando
anton fernando on 15 Sep 2014
Edited: anton fernando on 15 Sep 2014
I am sorry. The data were taken let's say few times a day. So the first day is 2000/jan/1, second day is 2000/jan/1 etc. For each data there is a date. The date of the data can be obtained separately for the year, month and the day. If you want the date of the 300th data you can find the year by year(300), month(300) and day(300). Then you can find the temperature which was taken at a certain volume(24 m^3) at the 300th day at a certain volume by T(24,300). I hope you get this.
I thought I explained how I binned the data. Let's say the data which was taken in January 1st must go to the bin 1.(I have 24 bins for each year. So the first 15.25 days of the year is assigned to bin 1. Day 16th to 31st must is bin 2 etc.).Because the data that was taken was in the first 15.25 days it should be in bin 1. Just like that. I have to bin it for different 10 years. So the data will be put into bins like in 2004, 1st bin. 2010 12th bin. etc. I don't understand your question. (bin #123 is zero? May be what I meant by bin was different.)
OK, look at your PDF. It shows bins 14 and up are all zeros. The bins are counts , correct? Like a histogram , right? What number in your data did you expect to be logged into bin #14?
anton fernando
anton fernando on 15 Sep 2014
Edited: anton fernando on 15 Sep 2014
The pdf gives you the data of E4(volume,bin number, data) array. I am not supposed to get zeroes in data. Because I have given the condition in my codes if it is zero it should be discarded.
You can see in my codes that I have divided number of days of the year(365)by 24 and have assigned a number for each 15.25 days. So if the date is 2004/feb/3 the bin number is 3. Then the data should go to E4(volume,3,data).Just like that I checked each date of the data and assigned it into huge matrices E4,E5 etc. E4 gives you the data in 2004. it is a 3 dimensional matrix. E4(volume,bin number,data). I am not an expert in Matlab and I appreciate your patience. So when I explain something if it is not clear let me know. I will do my best. Thank you for sparing time for this.

Sign in to comment.

Categories

Asked:

on 14 Sep 2014

Edited:

on 24 Sep 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!