When the histcounts function is used, data that was not in the original data appears

47 views (last 30 days)
My data and code are follows
>> load('data.mat')
>> [ecg_peak_counts, ecgEdges,ecgbinidx] = histcounts(ecg_filterpeak_index, 0:windowsN*Fs:limit*windowsN*Fs);
disp(ecg_peak_counts);
disp("-------------------------ecg_filterpeak_index")
disp(ecg_filterpeak_index)
disp("-------------------------ecgbinidx")
disp(ecgbinidx)
disp("------------------edges")
disp(ecgEdges)
disp(ecgbinidx==1)
disp(ecg_filterpeak_index(10))
ecg_filterpeak_index(ecgbinidx==1)
My data ecg_filterpeak_index is incremented.
The first value of ecg_peak_counts (that is, in the bin from 0 to 2000) is 10, but there are really only nine, and indeed ecgbinidx only has the first nine values of 1, but when I run the line ecg_filterpeak_index(ecgbinidx==1), In addition to the output of the first nine data, there is an additional 0.0008, which is not in ecg_filterpeak_index

Accepted Answer

Stephen23
Stephen23 on 12 Nov 2024 at 8:14
Edited: Stephen23 on 12 Nov 2024 at 8:24
"My data ecg_filterpeak_index is incremented."
Nope, your data are not monotonically increasing.
"The first value of ecg_peak_counts (that is, in the bin from 0 to 2000) is 10..."
Because the vector ecg_filterpeak_index contains ten values between 0 and 2000.
"... but there are really only nine..."
Nope, you are wrong. There are ten. Lets check.
format short G
S = load('data.mat') % raw data
S = struct with fields:
Fs: 200 ecg_filterpeak_index: [187 395 589 794 1006 1217 1414 1611 1831 2058 2270 2471 2681 2888 3040 3281 3497 3712 3909 4100 4305 4521 4735 4943 5169 5395 5604 ... ] (1x409 double) limit: 42 windowsN: 10
B = 0:S.windowsN*S.Fs:S.limit*S.windowsN*S.Fs % bin edges
B = 1×43
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000 46000 48000 50000 52000 54000 56000 58000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
nnz(S.ecg_filterpeak_index>=B(1) & S.ecg_filterpeak_index<B(2)) % there are 10
ans =
10
"and indeed ecgbinidx only has the first nine values of 1"
It actually has all ten. This is very easy to check. The consecutive nine at the start might be the only ones that you expect, but your data actually contain ten such values:
[ecg_peak_counts, ecgEdges, ecgbinidx] = histcounts(S.ecg_filterpeak_index, B);
X = find(ecgbinidx==1)
X = 1×10
1 2 3 4 5 6 7 8 9 71
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
"...In addition to the output of the first nine data, there is an additional 0.0008, which is not in ecg_filterpeak_index"
Actually it is. Lets check it:
S.ecg_filterpeak_index(X(end))
ans =
0.77594
As far as I can recall, 0 < 0.77594 < 2000. We can check this without HISTCOUNTS:
S.ecg_filterpeak_index(70:72)
ans = 1×3
1.0e+00 * 14540 0.77594 14940
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
plot(S.ecg_filterpeak_index)
Look at that plot. Do you notice the problem with your data?

More Answers (1)

Shashi Kiran
Shashi Kiran on 12 Nov 2024 at 8:12
After reproducing your issue and investigating further, I found that the value at the 71st position in your data, 0.775938595158718, falls into the first bin (0-2000). This causes ecgbinidx == 1 to include this position, displaying an approximate value of 1.0e+03 * 0.0008.
You can verify this with the following code:
load('data.mat')
[ecg_peak_counts, ecgEdges,ecgbinidx] = histcounts(ecg_filterpeak_index, 0:windowsN*Fs:limit*windowsN*Fs);
disp(find(ecgbinidx==1))
1 2 3 4 5 6 7 8 9 71
This will show the exact indices where ecgbinidx equals 1, helping confirm which values fall into the first bin.
Hope this helps.

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!