Script to remove polynomial/quadratic error off CSV data
Show older comments
[tl;dr: read a csv, fit a curve, substract it from the data and write back to the csv]
Hello everyone,
for a research project I have large amounts of data coming off a profilometer. If you don't know, this is a device that measures the surface profile, in my case of a thin film on a piece of glass, and stores it as X/Y-data in .csv form. Inherent to this data is an error caused by the curvature of the glass plate, that needs to get removed. One such measurement will produce about 40000 lines of data.
I have determined that a quadratic compensation is good enough for what I'm looking to measure, so I have an area in front of and behind the film, as well as in the middle, where there is no film, which can be used to fit a quadratic polynome. The data is quite noisy, so you need to take an average over a couple 100 points. What I would like to do is write a script that reads a CSV file, fits a quadratic polynome to these areas that are known to be the glass plate and subtracts this polynome from the data, so I will hopefully end up with data that is compensated for the curvature of the glass plate, which is then added to the CSV file, ideally in a third column, if that is even possible.
Unfortunately, I am quite new to Matlab, although I managed to cobble together a script that could read a CSV file and plot it in the past, I don't know where to even start with this one. Has anyone ever done this or knows how to do it?
Best, IJ
6 Comments
dpb
on 9 May 2021
Start with what you already had/have and work from there...
Lay out the steps in a logical order and then implement those steps. It's not as hard as it may seem.
There are builtin fitting tools in the MATLAB base product (polfit, polyval) that will do the job easily enough; there are more sophisticated tools in Curve Fitting and/or Statistics TB if you have those and want to do more with the fit as far as test statistics, etc., etc., ...
The first step will be to have some way to identify just which pieces of the data are those to fit and then to look carefully at the kind of data you are getting to see what you might want/need to do about smoothing it first or the like.
We can't really say much about that without the data; attaching at least one and preferabley a few of these profiles would certainly lead to a much higher likelihood of somebody really doing something specific.
Ivo Trausch
on 10 May 2021
dpb
on 10 May 2021
Just use an indexing vector...nothing difficult in that, particularly -- but finding an algorithm to fit something smooth to these data is going to be a trick methinks given the characteristics -- I looked at just the first trace--
Npt=500; % the number points in regions to use
ix=[{1:Npt}; {N2+[-Npt/2:Npt/2]}; {N-Npt:N}]; % build reference cell array of regions
Let's see what that gives us...
plot(G60X(:,1),G60X(:,2)) % plot the whole trace first
hold on % get ready to add on top
cellfun(@(ix)plot(G60X(ix,1),G60X(ix,2),'r-'),ix) % add the sections in red
xlim([0 max(G60X(:,1))]) % blow up so can see interest areas
ylim([-200 200])
results in

Will need to look at those areas in red much more closely, but simply fitting the data will not produce anything at all approximating the baseline -- and the center area "hump" is peculiar to my eye...
Ivo Trausch
on 10 May 2021
dpb
on 10 May 2021
Ah...that's a lot less restrictive of a problem statement than I had inferred from prior... :)
Are the spikes "real" in that they're going to be influencing this estimate across the sample or would/should rejecting them be part of the algorithm?
I've not looked at the rest, there are a relatively few meally large spikes of from 2-3X to 5-6X the surrounding area that are extremely large excursion at the beginning/ending although they have some noise/structure at the peak (that may/may not be real?). Would it be desirable/acceptable to remove those and replace with, say, spline interpolant between?
That likely could be done reasonably robustly and then, having done that in your three selected areas, just fit that parabola on the means of those locations. You could investigate the effect of fitting the raw data as well, but I suspect it wouldn't help much and would, in fact, reintroduce more noise than would help.
I've got other tasks right now, but I'll try to look again later this evening...but those would be my thinking of what I'd probably try. findpeaks if you have Signal Processing TB could be very helpful in peak-locating.
Ivo Trausch
on 10 May 2021
Answers (2)
Steven Lord
on 10 May 2021
0 votes
4 Comments
Ivo Trausch
on 10 May 2021
dpb
on 10 May 2021
detrend is linear only; no curvature.
I've never seen the "Remote Trends" thingie before; you can probably experiment with it to see, but I suspect you're still going to have difficulties in how to remove the spikes programmatically. What is clear to the eye isn't necessarily that simple a task to code in general.
Steven Lord
on 10 May 2021
Ivo Trausch
on 18 May 2021
Ivo Trausch
on 18 May 2021
Edited: Ivo Trausch
on 18 May 2021
Categories
Find more on Spline Postprocessing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!