How to use cumsum for data in a table which is not NaN?

Hi. I have a question about how to use cumsum function for data in a table which is not NaN.
'02-Nov-2017 10:50:49' NaN
'02-Nov-2017 10:50:53' NaN
'03-Nov-2017 00:00:00' NaN
'03-Nov-2017 08:00:00' NaN
'04-Nov-2017 00:00:00' NaN
'04-Nov-2017 08:00:00' NaN
'05-Nov-2017 00:00:00' NaN
'05-Nov-2017 08:00:00' NaN
'05-Nov-2017 14:00:41' 0.100000000000000
'05-Nov-2017 14:04:08' 0.100000000000000
'05-Nov-2017 14:06:40' 0.100000000000000
'05-Nov-2017 14:10:00' NaN
'05-Nov-2017 14:12:58' NaN
'05-Nov-2017 14:13:24' NaN
'05-Nov-2017 14:14:00' NaN
'05-Nov-2017 14:15:58' 0.100000000000000
'05-Nov-2017 14:16:24' 0.100000000000000
For example I have this above data. I want the cumsum function only calculate the value which is not NaN. The result should be like this:
'02-Nov-2017 10:50:49' NaN
'02-Nov-2017 10:50:53' NaN
'03-Nov-2017 00:00:00' NaN
'03-Nov-2017 08:00:00' NaN
'04-Nov-2017 00:00:00' NaN
'04-Nov-2017 08:00:00' NaN
'05-Nov-2017 00:00:00' NaN
'05-Nov-2017 08:00:00' NaN
'05-Nov-2017 14:00:41' 0.100000000000000
'05-Nov-2017 14:04:08' 0.200000000000000
'05-Nov-2017 14:06:40' 0.300000000000000
'05-Nov-2017 14:10:00' NaN
'05-Nov-2017 14:12:58' NaN
'05-Nov-2017 14:13:24' NaN
'05-Nov-2017 14:14:00' NaN
'05-Nov-2017 14:15:58' 0.100000000000000
'05-Nov-2017 14:16:24' 0.200000000000000
I try to find the example for cumsum, but I found nothing. Could you please hep me? Thank you very much for your help.

 Accepted Answer

Jan
Jan on 2 Dec 2017
Edited: Jan on 3 Dec 2017
idx = ~isnan(Val);
Val2 = Val;
Val2(idx) = cumsum(Val(idx));
Or:
Val2 = cumsum(Val, 'omitnan');
Val2(isnan(Val)) = NaN;
[EDITED] And with a reset of the sum after each NaN block:
Data = [NaN, NaN, 0.5, 0.1, 0.4, NaN, 0.1, 0.2];
X = Data;
idx = isnan(X);
new = strfind(idx(:).', [true, false]); % (:).' because STRFIND needs a row
Y = cumsum(X, 'omitnan');
X(new) = [0; -diff(Y(new))]; % [EDITED 2] ROUNDING PROBLEMS!!!
R = cumsum(X, 'omitnan');
R(idx) = NaN;
[EDITED 2] I've fixed the code, but it suffers from rounding problems. After a certain number of elements you get e.g. 0.0999999999997975 instead of 0.1 . So let's try a simply loop:
function R = cumsumResetNaN(R)
c = 0;
for k = 1:numel(R)
if isnan(R(k))
c = 0;
else
c = c + R(k);
R(k) = c;
end
end
end
This needs 0.00068 sec for input data with 64700 elements, which is faster than the vectorized method above (which has the severe rounding problem).
[EDITED 3] And for completeness a C-Mex function:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// Jan Simon, 2017, License: CC BY-SA 3.0
// Create cumulative sum of a double vector, which is reset to 0
// at every NaN.
double *x, *xf, c;
if (!mxIsDouble(prhs[0]) || mxIsSparse(prhs[0]) || mxIsComplex(prhs[0]) ||
(mxGetM(prhs[0]) != 1 && mxGetM(prhs[0]) != 1)) {
mexErrMsgIdAndTxt("JSimon:cumsumRestartNaN:BadInput",
"Input must be a real full double vector.");
}
plhs[0] = mxDuplicateArray(prhs[0]);
x = mxGetPr(plhs[0]);
xf = x + mxGetNumberOfElements(plhs[0]);
c = 0;
while (x < xf) {
if (*x == *x) { // NaN==NaN is FALSE by defintion
c += *x;
*x++ = c;
} else {
c = 0;
x++;
}
}
}
This needs 0.26 sec compared to 0.91 sec for the M-version [EDITED 2] (input data: [64700] elements, 1000 iterations, MSCV2012, Win7/64, Matlab 2016b)

6 Comments

Thank you for your reply, Jan Simon. The code above works well at the beginning but when there is a NaN after the value and the value again, the cumsum will continue. What I mean in the question is for example:
'02-Nov-2017 10:50:49' NaN
'02-Nov-2017 10:50:53' NaN
'03-Nov-2017 00:00:00' NaN
'03-Nov-2017 08:00:00' NaN
'04-Nov-2017 00:00:00' NaN
'04-Nov-2017 08:00:00' NaN
'05-Nov-2017 00:00:00' NaN
'05-Nov-2017 08:00:00' NaN
'05-Nov-2017 14:00:41' 0.100000000000000
'05-Nov-2017 14:04:08' 0.100000000000000
'05-Nov-2017 14:06:40' 0.100000000000000
'05-Nov-2017 14:10:00' NaN
'05-Nov-2017 14:12:58' NaN
'05-Nov-2017 14:13:24' NaN
'05-Nov-2017 14:14:00' NaN
'05-Nov-2017 14:15:58' 0.100000000000000
'05-Nov-2017 14:16:24' 0.100000000000000
with the cumsum it should be:
'02-Nov-2017 10:50:49' NaN
'02-Nov-2017 10:50:53' NaN
'03-Nov-2017 00:00:00' NaN
'03-Nov-2017 08:00:00' NaN
'04-Nov-2017 00:00:00' NaN
'04-Nov-2017 08:00:00' NaN
'05-Nov-2017 00:00:00' NaN
'05-Nov-2017 08:00:00' NaN
'05-Nov-2017 14:00:41' 0.100000000000000
'05-Nov-2017 14:04:08' 0.200000000000000
'05-Nov-2017 14:06:40' 0.300000000000000
'05-Nov-2017 14:10:00' NaN
'05-Nov-2017 14:12:58' NaN
'05-Nov-2017 14:13:24' NaN
'05-Nov-2017 14:14:00' NaN
'05-Nov-2017 14:15:58' 0.100000000000000
'05-Nov-2017 14:16:24' 0.200000000000000
However with the code above, the NaN indeed remain in its place but the value after NaN increases. It should reset after NaN, so the value between NaN is considered as one event, which is not related to others (it has their own start value). Do you know how to do cumsum function that way? Thank you in advance
Jan
Jan on 2 Dec 2017
Edited: Jan on 2 Dec 2017
See [EDITED] in my answer to consider the resetting.
There are some implementations of cumsum with a restart in the FileExchange: https://www.mathworks.com/matlabcentral/fileexchange/?utf8=%E2%9C%93&term=cumsum . Does one of them matches your needs?
By the way: It would be useful, if you provide the inputs such, that they can be used by copy&paste. The date part of your data is clutter only for the actual question, isn't it?
Thanks for your reply Jan Simon. I try your EDITED example code above and it works well. But when I try in my code. I got an error, especially in strfind function "Error using strfind Input strings must have one row."
I try to convert the column to a row, and run your EDITED code. It works well at the beginning but when I scroll up to the last column, it does not work as expected. I attach the original data in this comment.
I tried one of the function from the link you gave, nancumsum. But when I tried it I got an error. I am sorry I am really new in matlab. If we download a matlab file in file exchange, we should run it first and then try the function in our matlab code file right? Thank you very much for your help.
Jan
Jan on 3 Dec 2017
Edited: Jan on 3 Dec 2017
You are right: strfind needs a row vector as input. With the input data I have created, it works fine, but your data might be a column vector. Therefore it is better, if you provide the input data. In addition posting the code you use and the complete error messages instead of a rough rephrasing is very helpful for a discussion in the forum. "It does not work as expected" is not as useful as describing what you got and what want. You have all required information on your monitor already, while I have to download you MAT file, move it to my Matlab data folder, load it, extract the needed column and run my code again with the assumed changes you have made. Please share the information you have already, because this makes the answering much easier.
Some FileExchange submissions contain C-Mex functions, which must be compiled at first. They can be very efficient and this matters, if you are working with billions of numbers. If a submission contains M-files only, you can use it directly.
See [EDITED2].
Oh okay sorry, Here is the code I try from your EDITED code:
X = rainfall.CH01;
X = X.';
idx = isnan(X);
new = strfind(idx, [true, false]);
Y = cumsum(X, 'omitnan');
X(new) = -Y(new);
R = cumsum(X, 'omitnan');
R(idx) = NaN;
When I run it, I saw many negative values up to the last column in "R variable as the result" (picture attached). In my original data, there is no negative values, That's why I wonder why.
EDITED2 works perfectly! Thank you very much, Jan Simon. You're a life saver!

Sign in to comment.

More Answers (2)

How about keeping position of NaN, applying cumsum with 'omitnan' option and finally putting NaN for the position, like:
Val = [NaN NaN 0.1 0.1 0.1 NaN]';
idx = isnan(Val);
Val2 = cumsum(Val,'omitnan');
Val2(idx) = NaN;

3 Comments

Hi Akira. Thanks for your answer. It will be a right answer if there is only one group of values between NaN. However what I want in this case is it should also do if there are also some values after NaN. For example :
'02-Nov-2017 10:50:49' NaN
'02-Nov-2017 10:50:53' NaN
'03-Nov-2017 00:00:00' NaN
'03-Nov-2017 08:00:00' NaN
'04-Nov-2017 00:00:00' NaN
'04-Nov-2017 08:00:00' NaN
'05-Nov-2017 00:00:00' NaN
'05-Nov-2017 08:00:00' NaN
'05-Nov-2017 14:00:41' 0.100000000000000
'05-Nov-2017 14:04:08' 0.200000000000000
'05-Nov-2017 14:06:40' 0.300000000000000
'05-Nov-2017 14:10:00' NaN
'05-Nov-2017 14:12:58' NaN
'05-Nov-2017 14:13:24' NaN
'05-Nov-2017 14:14:00' NaN
'05-Nov-2017 14:15:58' 0.100000000000000
'05-Nov-2017 14:16:24' 0.200000000000000
I am sorry my example in the question above is not really complete.
If you have Image Processing Toolbox, you can simply do that by using bwlabel function. Here is my second try!
Val = [NaN NaN 0.1 0.1 0.1 NaN NaN 0.1 0.1 0.1 NaN]';
idx = isnan(Val);
group = bwlabel(~idx);
tmp = splitapply(@(x) {cumsum(x)},Val(~idx), group(~idx));
Val2 = nan(size(Val));
for kk = 1:max(group)
Val2(group == kk) = tmp{kk};
end

Sign in to comment.

How to find the sum of every single number in a table? I searched every board I can find and all they had is the sum of rows or columns.

1 Comment

Please do not attach a new question in the section for answers of another question. Such thread-hijacking produces confusions, because it is not clear, to which question an answer belongs. Create a new question in your own thread and delete this "answer". Thanks.
By the way: "The sum of each single number" is not clear and should be elaborated.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!