Group data into bins with irregular sizes / Track peaks
Show older comments
Hello,
I need to group data (peaks for many signals, below) into bins with irregular sizes:
[pks,locs,w,p] = findpeaks(data)
I know discretize(), but I don't have equal-sized bins. The goal: Find peaks in series of measurements and track shifts of peak positions (therefore the peak positions +/- tolerance give my bins). This means findpeaks(), peaks = bins +/-, findpeaks() in next dataset, put them into the bins. Then the whole process again with the latest dataset giving the next bins. I am dealing with lots of peaks (~ 100/measurement and 60.000 altogether).
So what I want in the end:
Peak 1
Pos 7.2 Height 5 Width 3
Pos 7.3 Height 5.3 Width 2.8
... ... ...
Peak 2
... ... ...
Coming from the C# world, where this would be solved with lists and loops I appreciate every hint, including keywords for further research (also which data structures to use).
Thanks!
2 Comments
Image Analyst
on 6 Jul 2017
Don't make us imagine what your data looks like -- show us. Please plot it and upload a screenshot, and attach the data file with code to read it in.
Andreas Nagl
on 11 Jul 2017
Edited: Andreas Nagl
on 11 Jul 2017
Accepted Answer
More Answers (3)
Star Strider
on 5 Jul 2017
Edited: Star Strider
on 5 Jul 2017
Since you have a fixed number of peaks in your data (I assume they are always in the same order), I would store the peaks and other data in a cell array (or matrix) in a loop, for example:
for k1 = 1:N
[pks{k1},locs{k1},w{k1},p{k1}] = findpeaks(data{k1});
end
where ‘N’ are the number of data vectors you have.
If you want to use discretize (or possibly histcounts), note that the bins do not have to be equal sizes. You can specify the edges of the bins in a vector in both functions, so the bins can be different widths.
EDIT —
The loop I posted will allow you to get all the information you described in your edit that you want.
For information on how to use cell arrays such as I use in my code example, see the documentation on Cell Arrays (link).
8 Comments
Andreas Nagl
on 5 Jul 2017
Star Strider
on 5 Jul 2017
My pleasure.
I have done something similar, and so used the approach I used in my own application. I am not certain that the table will do what you want, since you would likely have to use separate tables to track each peak, since you have several values of all parameters for each peak.
My approach would be to plot the peak positions as a function of the number of the data set (or some derivative value describing the data set). The plots for each peak (or all peaks on the same plot) will likely help you understand your data so you can decide how best to analyze it.
Andreas Nagl
on 6 Jul 2017
Star Strider
on 7 Jul 2017
Without your data and some knowledge of what data you are analyzing with findpeaks, I cannot determine anything about the peaks.
I assume that the first peak is always the first peak and represents the same information, although its position and amplitude may change, and the same for all the other peaks. Note that findpeaks has a number of name-value pair arguments, such as 'MinPeakHeight' and others, that can help you determine what constitutes a ‘true peak’ (as opposed to noise), and so may help you identify and track them.
I have no idea what you are doing, what your signal components are, or what they represent. Without that information, it is very difficult for me to help you.
Andreas Nagl
on 11 Jul 2017
Star Strider
on 11 Jul 2017
I have no idea what you are measuring. It looks to me that you have a signal (somewhere) corrupted by noise. I would do a Fourier transform of your data (use the fft (link) function) to see if there is underlying periodicity in your data, and if at least some of your noise is band-limited. If there is, you can design a filter to recover it, and depending on the frequency characteristics of the noise, eliminate most of the noise.
It is common to have noise in measured data, usually requiring at least some pre-processing to recover the underlying signal.
Andreas Nagl
on 11 Jul 2017
Star Strider
on 11 Jul 2017
I have no idea what your peaks represent. I have no experience with crystallography.
If the peaks simply shift in amplitude but are always at the same positions with respect to your independent variable, then you can concatenate them in a matrix. That would allow you to track their amplitudes between experiments, and plot them.
If the peak positions are never stable, so that the peaks shift in amplitude and position of your independent variable, I know of no reliable way to track them between experiments. Perhaps summing them over the same ranges of your independent variable in each experiment would then be appropriate. You could do that with the reshape function, if the size of your vectors and the ranges of the independent variable match the requirements reshape imposes.
Perhaps posting to a crystallography forum would provide you with participants with the necessary expertise and experience to give you the information you need.
Omit the details about cristallography or if the numbers are peaks, because for Matlat they are just numbers.
You have 60'000 sets of 100 numbers and want to find the clusters in it: The numbers which are near together. The data sets need not contain all numbers. You want the position and width of the clusters. Correct? If so, how is "width" defined? Standard deviation or maximum range? Can the clusters overlap -- or in your terminology: can the positions of the peaks vary such, that they could be seen as the one or the other final peak? Then the order of numbers might matter.
This might be a job for kmeans. Join all numbers (peak positions) to a vector and determine the 100 clusters. But perhaps there are not exactly 100 clusters. Then this answer is not the solution, but perhaps it helps you to get in the right direction. Or I misunderstand your question.
1 Comment
Star Strider
on 24 Jul 2017
I would agree, however I was never able to determine if: (1) the particular values of the independent variable are invariant and that the ‘peak’ values at those values changed; (2) if the goal was to determine the total ‘energy’ (or whatever the dependent variable represents) in a given range of the independent variable; (3) the number of ‘peaks’ in a given range; (4) the pattern of the peaks in a range; or (5) something else.
Andreas Nagl
on 28 Jul 2017
Edited: Andreas Nagl
on 28 Jul 2017
0 votes
Categories
Find more on Scopes and Data Logging in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
