Optimizing a Regression Learner App for an Electrochemical NO2 Sensor: Dealing with Drift and Input Variations
Show older comments
Hello,
I am currently using the Regression Learner App to develop a GPR Exponential model for my Electrochemical NO2 Sensor. This sensor outputs a voltage, and I use reference data alongside temperature and humidity measurements to train my model.
Initially, after creating a model with the App, I find that the GPR Exponential model aligns reasonably well with the sensor data. However, over time, I have noticed a slight drift in the data. I don't believe that this drift is a result of the sensor itself. Instead, it may be influenced by new combinations of sensor output voltage, temperature, humidity, and reference data values, which the model might not have encountered during the training process.
If I rerun the Regression Learner App to update or create a new GPR Exponential model, the sensor output appears to be accurate again. This leads me to believe that the need to retrain the model might be due to changes in the combination of the input parameters.
Considering the potential for a wide array of different parameter combinations, how can I optimize my model to predict more accurately?
Moreover, could the nature of my temperature input impact the prediction? Specifically, would there be a noticeable difference in the accuracy of predictions if I input the absolute temperature compared to inputting the temperature segmented into smaller blocks?
I'm curious to know if anyone else has had similar experiences with their models? Any insights or suggestions to enhance the performance of my GPR Exponential model would be greatly appreciated.
6 Comments
dpb
on 6 Aug 2023
Why a GPR model, first, instead of something built on the physical considerations of the sensor response to its environment based on physical principles?
"... the need to retrain the model might be due to changes in the combination of the input parameters."
A particularly apt quote for the circumstance above would be
"To find out what happens when you change something, it is necessary to change it."
A fundamental principle of experiment design is to vary the parameter levels of interest of the region over which they need to be to cover the range for which the derived correlation is to be used.
Also, have you plotted residuals to discover if, per chance, you have left out potentially important interaction or higher order terms?
Dharmesh Joshi
on 6 Aug 2023
You design and execute an experiment that sets the conditions you want to measure, you don't just set something over in the corner and let it go -- that's "happenstance data" and is rife with trouble. Namely, more than likely you don't cover much range; what you do cover will be serially correlated in time and unless everything else in the environment is controlled, there may be confounding extraneous variables that vary but aren't even being measured but also affect the sensor response.
I've not used one of the electrochemical NO2 sensors so don't have real hands on experience with it, but DAGS and found a couple studies -- one did a "calibration" similar as to what you describe by placing 16 sensors by a roadway and near an official monitoring station and tried to calibrate against its data.
In the end, the best correlation they had both the T and RH input but also a compensation for ozone levels. Not including ozone made a significant difference in the observed R-squared of the correlation but since the devices didn't include an ozone measurement, it wasn't able to be used in practice. Something like that missing variable is quite possibly some of your issue as well. The temperature data above 30C were simply ignored as the sensor elements become highly nonlinear at and above those temperatures. (Part of the issue in the specific setup was that the onboard electronics were not actively cooled so the internal temperatures were quite a bit higher than the ambient air temperature.)
All in all, it was a quite complex issue to get something useful from the sensors and they also referenced some other studies that ended up using time-based drift compensation besides; that study was not able to track down all the confounding variables to be able to compensate for the effects otherwise, it seems.
Dharmesh Joshi
on 8 Sep 2023
A. You can always compute something outside the model range; how accurate it will be is clearly dependent upon how accurate the model is to begin with plus how well it does predict what the response will be outside that range.
Clearly, if a sensor's response were purely linear over the entire range, then it wouldn't matter; a straight line is a straight line. That is never the case in practice; just how nonlinear and how well the fitted model holds is purely up to whatever the particular data/model predict related to what the sensor output actually is for a given input. Polynomials in higher degrees are particular notorious for "blowing up" as a range gets larger; a quadratic term response alone increases by 2X for every 1.4X in input; iow a 40% increase in T would double the predicted sensor output including a quadratic term by that term alone. (38/32)^2 ==> 1.4. Remember the shape of a parabola is always increasing slope magnitude, whether pointing up or down.
B. You clearly can't measure every single combination of all paramters, that's not what experiment design is about. You should, however cover the RANGE of all parameters over the ranges that can exist jointly. Picking that set of points is the subject of experiment design; one method that has been generally found helpful in fitting quadratic response surface models is the central composite design. Again, I recommend to you Box, Hunter and Hunter as an essential background tool to get an idea of the issues and techniques designed to avoid pitfalls.
Dharmesh Joshi
on 21 Sep 2023
Answers (1)
Kaustab Pal
on 19 Aug 2024
0 votes
For the model to work well, it needs to see inputs that are similar to what it saw during training. For example, if the input and output had a linear relationship during training, the model will do well if this relationship stays the same. But if the relationship changes to something like exponential during testing, the model might not perform well, and you'll need to retrain it.
To make your model more accurate, try to gather a large dataset that shows different types of input-output relationships. You can also improve the model by updating it regularly with new data it hasn't seen before.
The way you input temperature data can also affect how well the model works. It's important to use the same method of representing temperature both when training the model and when using it to make predictions. This consistency is key to maintaining accuracy.
I hope this helps clear things up!
Categories
Find more on Support Vector Machine Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!