Finding area under peaks for drifting signal

EDITED, see bottom, (regarding numerical integration)
Dear all,
How can I find the area under the "larger" peaks, as seen below?
My idea is to:
1) locate where the peaks are (I have tried using the findpeaks() function)
2) add or subtract the minimum value of the particular peak to get a baseline for each peak at y = 0.
3) use some sort of trapezoidal integration (I saw that MATLAB has a built-in function that does this).
When using the findpeaks() function, I'm having a really hard time locating "interesting" peaks. For instance, the first "big" peak has some noise in it and then findpeaks() give me a lot of peaks on it - and then there are other small peaks I'm not interested in.
I hope someone can help!
(If you need it, my data is found in the attached CSV file - I'm only using the first two columns, "Time" and "Channel1V". I have 100 CSV files like the uploaded one, so I really need a smart code ;) )
EDIT:
Q: How to find area under peak using trapz()?
So, using the arguments 'MinPeakDistance' and 'MinPeakProminence' I have been able to figure out which peaks to use. I still need to find the area under the graphs; so far, I've been using the very rough estimate Prominence times Width (where width is the width of the peak taken at the half of the prominence). It should for most purposes be alright, but I'd really appreaciate if anyone has got a good idea of how to numerically integrate this.

11 Comments

- Woops, in case anyone is using my attached files: the CSV-file that the .m file is loading is not the one I put up. It shouldn't matter though.
Computers have a hard time understanding criteria like "interesting". For some silly reason, they just get confused.
That is why people were invented, to convert soft words like "interesting" into mathematics, and really into code to serve our computer overlords. ;-)
Seriously, you need to quantify what is an interesting peak. Otherwise ANY peak is interesting to a computer. Hey, its a peak. See it there? Wow. What a nice peak!
So, you need to quantify an interesting peak. What makes it big? What makes it small? Until then, you cannot write code to find what you cannot describe clearly and robustly in terms of mathematics.
Once you DO know what makes a peak an interesting peak, AND you have a set of rules that robustly define it as such, then finding the area is just a call to trapz.
This was not very constructive. I am somewhat comfortable in mathematics - it is the coding itself that causes me trouble. This is exactly why I put the words big, large, and interesting under quotation marks. I should be able to define "interesting" and "big" from a mathematical persepctive, I just don't know how to handle this in MATLAb.
Sorry, but your question was not very constructive. You cannot write code until you define what operation will be coded.
Not yet have we seen YOU doing the important part, that is defining what "interesting" means to you. Reading your mind is not easy. You say only that you SHOULD be able to do it, but we have not seen you do that. If you cannot define it, since you did not bother to do so, then how are we able to know what "interesting" means? How big is big? What is it relative to? What if there are two peaks that are directly adjacent, but separated by a small valley? Are they just one peak?
I SHOULD be able to solve the problem of finding peace in the Middle East, but am just too busy today. Can you do it for me? It should be easy.
Perhaps you should start here:
https://www.mathworks.com/matlabcentral/answers/6200-tutorial-how-to-ask-a-question-on-answers-and-get-a-fast-answer
My apologies. I thought I was clear and that you were being polemic, but from your questions in the recent post I can see you are willing to help. Sorry. I haven't analyzed these types of things before but I think I can now make it more clear:
Intuitively, everything that looks like a peak is what I want to somehow define as a peak and then find the area under it.
To be precise, I have used the arguments "MinPeakDistance" and "MinPeakProminence". The first argument is to ensure I don't have multiple "peaks" on what I consider to be a peak, e.g. the first big peak needs to be considered as one peak. The second argument has been used so that I only get the peaks of a certain size.
The actual problem arises because I would like to "stretch" or "smoothen" the data so that 1) I do not have negative values (for numerical integration), and 2) so that the "beginning" of the peaks start at zero - and I simply do not know how to that in a smart way, maybe through some sort of filtering?
The second problem is more of a coding nature: How to keep track of when a peak starts and ends. If I could figure that out, I could use the trapz() function for each of my peaks.
Thank you for taking your time and sorry for the misunderstanding.
If you have ONE peak, so perhaps you have simply looped over the list of peaks, then just call traps on the vector from the start to end of that defined peak. You should be able to decide on the index that defines where the peak starts and where it ends.
If you have multiple peaks, then you could create a list of start points for each peak and the end points. Now, just call cumtrapz on the ENTIRE vector. Then as one operation, index into that cumulative vector with the start and end points. Subtract the results of those two vectors, the result being the area for each peak.
If you have no clue as to how to create a vector of start points, or how to work with vectors in general, or how to use indexing, then you need to read the getting started tutorials for MATLAB. (Sorry, but it is not obvious what your problems are in all this. My gut tells me that you are very new to using MATLAB from the questions asked.)
As far as smoothing a sequence of values, you could start with the function smooth. However, ANY kind of smoothing will destroy those high narrow peaks. That is the nature of smoothing algorithms, dampening high frequency components in the signal. So any smooth will cost you exactly the signals that you are trying to measure. That argues you really do NOT want to do any smoothing.
Finally, to start the sequence at the beginning of the peaks, just choose some criteria that identifies that location. Again, that criteria can only come from your mind. Then delete the part of the signal before that point.
I think your problem here is that you are trying to solve a problem too large for your current computing skills (in MATLAB) to solve. You are faced with a programming elephant. When that happens, eat a programming elephant one byte at a time. Break the problem down into small problems, each of which is solvable by you. Solve each one. Then put it all together.
Don't worry about elegance or about efficiency. Screw elegance, in favor of getting the job done. Loops are fine. When you have something that works, if it is too slow, only then do you worry about efficiency.
Here, if findpeaks does what you want, then use it. Work on each peak in turn, using a loop. Step back from the peak until you find a point that fits your criterion for the end of the peak. Do the same moving forward. That gives you two points. Call trapz. WTP? So you need to define some algorithm to look in the vicinity of a peak, forward and back. Again, a simple loop could suffice, depending on whatever criterion you define.
The findpeaks() function has a large number of Name-Value Pair Arguments (link), such as 'MinPeakDistance' that can help you avoid multiple close peaks, and others that can help you define ‘interesting’ in a way that is computationally meaningful.
@John D'Errico, @Star Strider: Thanks for your replies, especially John D'Errico for taking your time.
I am indeed not an expert in computing. The dataset described is only one out of many obtained by my girlfriend. She is doing a research thing on something to do with molecules in ants' brains. Due to a software issue, she has transfered her data to excel and is now doing it all manually - looking at the peaks she knows is of interest, then scrolling down the data and applying trapezoidal integration - very tidyous since she has 100 files with 16000 data points in each.
@John D'Errico:
Thank you for your time again - I have finally been able to locate the peaks of interest - done! As you say, I need to find a criteria to determine the "beginning" of each peak. This is very, very difficult I find - the baseline for the peaks is not constant and some peaks "start" at one vertical/y - value but ends at another.
I only just learned about the islocalmin() function and was hoping I could use this to find the relevant intervals - however, due to "noise", it provided me with too many minima. HOWEVER:
Maybe I could somehow, by keeping track of indices, compare the indices of the islocalmin() and the findpeaks() functions' indices. My idea is to "catch" an index from findpeaks() in between two indices of islocalmin(). This is what is currently causing me trouble.
Hopefully you have an idea, maybe not, maybe this is tiresome - in any case THANK YOU for taking your time :)
One option would be to design a bandpass filter (first using the fft (link) function to determine the ‘useful’ frequencies, and Signal Processing Toolbox functions to design and implement the filter) to filter out low-frequency baseline variations and high-frequency noise), then use the filtered signal to determine the beginning and ending x-coordinates of the peaks. Use that information with either the original or filtered signal to integrate the peaks.
@Star Strider:
Thank you for taking your time. I have no experience in fft, signal processing and filtering, but I will use a few hours tomorrow to look into it and see if I can make anything out of it. Once again, thank you for your time, and if I get smarter reading up on fft, signal processing and the like and find a method, I will let you know with much appreciation!
My pleasure.
I can help with the signal processing. You posted one of your signals, so I will download it, and experiment with it.

Sign in to comment.

 Accepted Answer

I would use the Signal Processing Toolbox medfilt1() function to estimnate the base line drift, that you can then remove. (None of my other approaches, such as frequency-selective filtering or the Savitzky-Golay filter, gave satisfactory results.)
Try this to remove the baseline drift:
[D,S,R] = xlsread('420B1.1.csv');
T = D(:,1);
Ch1 = D(:,2);
Ch2 = D(:,3);
Ts = mean(diff(T));
L = length(T);
MF = medfilt1(Ch1, 750); % Median Filter
Ch1F = Ch1-MF; % Correct Baseline
figure
plot(T,Ch1F)
grid
I leave the peak detection to you. Here is one possibility:
dCh1F = gradient(Ch1F, Ts); % Derivative
[pkp,locp] = findpeaks(dCh1F, 'MinPeakHeight',0.04, 'MinPeakDistance',500); % Positive Derivative Peaks, ‘locp’ Are Indices
[pkn,locn] = findpeaks(-dCh1F, 'MinPeakHeight',0.03, 'MinPeakDistance',500); % Negative Derivative Peaks, ‘locn’ Are Indices
figure
plot(T,Ch1F)
hold on
plot(T([locp; locn]),Ch1F([locp; locn]), '+r')
hold off
grid
You will have to experiment to get the result you want. This may not be robust to all your data, so you may have to adjust it for each file.

2 Comments

Amazing! I really appreciate you (everyone here!) taking the time. Sorting out the baseline looks very promising! I shall do some experiments and play around with it, but determining peaks and start/end intervals for them now looks like a walk in the park!
Thanks!
As always, my pleasure!
(And for the rest of us, as well!)

Sign in to comment.

More Answers (1)

1. instead of using the function maxpeak use
[~,I] = max(f);
to obtain the index of the max peak
2. Nice! Now you know where the max peak is. The next step is to set up your boundaries, so you know where to start and stop your integration. There are several algorithms you can implement. the easiest would be simulated annealing. The algorithm goes as follows.
% Note: E is your data vector
% for the alg I ~= 1
if I == 1 % check condition
I = 2;
end
%
%
% Preform Alg
for t = 0:length(E)-1
% check if next point is
idx = I+t;
Delta_E = E(I+t) - E(I-1+t);
if E > 0, continue, end
%
% Note T: is your cooling function which
% Converges to 0 as t increases. now this
% is basic but T could be as simple as
% T = 1/t or T = 1/(t^2)
% now for tuning part to be complete you
% should say:
% if T == 0, break, end
% however only use it if you develop T
% properly.
%
T = 1/(t^2); % Temperature Function
if exp(Delta_E/T) < Thresh, break, end
end
Now you have found the Right-hand side boundary. For now, you can
3. To be complete, idx_2 = idx;
4. use the function E_New = flip(E);
5. solve for I_New = I - length(E) + 1;
6. then run alg two on E_new and I_New
7. To be complete, idx_1 = idx;
8. get index vector II you wish to integrate over
II = idx_1:idx_2
9. Then solve for the area using II
Area = trapz(E(II));
10. Grab a beer if of legal age and does not effect believes (Optional)
11. Finish beer
12. Explore other integration methods.

2 Comments

I was in a loop between point 10 and 11 when reading your answer ;-) I shall look into your answer soon and see if I can get it to work - thank you for taking your time!
Thank you once again - I looked at it and it really inspired me. Star Strider provided an answer involving a filter which should make the interval-detection much easier, so I have opted to use that.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!