Optimizing a Regression Learner App for an Electrochemical NO2 Sensor: Dealing with Drift and Input Variations

Question

0 votes

Hello,

I am currently using the Regression Learner App to develop a GPR Exponential model for my Electrochemical NO2 Sensor. This sensor outputs a voltage, and I use reference data alongside temperature and humidity measurements to train my model.

Initially, after creating a model with the App, I find that the GPR Exponential model aligns reasonably well with the sensor data. However, over time, I have noticed a slight drift in the data. I don't believe that this drift is a result of the sensor itself. Instead, it may be influenced by new combinations of sensor output voltage, temperature, humidity, and reference data values, which the model might not have encountered during the training process.

If I rerun the Regression Learner App to update or create a new GPR Exponential model, the sensor output appears to be accurate again. This leads me to believe that the need to retrain the model might be due to changes in the combination of the input parameters.

Considering the potential for a wide array of different parameter combinations, how can I optimize my model to predict more accurately?

Moreover, could the nature of my temperature input impact the prediction? Specifically, would there be a noticeable difference in the accuracy of predictions if I input the absolute temperature compared to inputting the temperature segmented into smaller blocks?

I'm curious to know if anyone else has had similar experiences with their models? Any insights or suggestions to enhance the performance of my GPR Exponential model would be greatly appreciated.

6 Comments
Show 4 older comments Hide 4 older comments

dpb on 6 Aug 2023

Edited: dpb on 7 Aug 2023

You design and execute an experiment that sets the conditions you want to measure, you don't just set something over in the corner and let it go -- that's "happenstance data" and is rife with trouble. Namely, more than likely you don't cover much range; what you do cover will be serially correlated in time and unless everything else in the environment is controlled, there may be confounding extraneous variables that vary but aren't even being measured but also affect the sensor response.

I've not used one of the electrochemical NO2 sensors so don't have real hands on experience with it, but DAGS and found a couple studies -- one did a "calibration" similar as to what you describe by placing 16 sensors by a roadway and near an official monitoring station and tried to calibrate against its data.

In the end, the best correlation they had both the T and RH input but also a compensation for ozone levels. Not including ozone made a significant difference in the observed R-squared of the correlation but since the devices didn't include an ozone measurement, it wasn't able to be used in practice. Something like that missing variable is quite possibly some of your issue as well. The temperature data above 30C were simply ignored as the sensor elements become highly nonlinear at and above those temperatures. (Part of the issue in the specific setup was that the onboard electronics were not actively cooled so the internal temperatures were quite a bit higher than the ambient air temperature.)

All in all, it was a quite complex issue to get something useful from the sensors and they also referenced some other studies that ended up using time-based drift compensation besides; that study was not able to track down all the confounding variables to be able to compensate for the effects otherwise, it seems.

dpb on 8 Sep 2023

Edited: dpb on 9 Sep 2023

A. You can always compute something outside the model range; how accurate it will be is clearly dependent upon how accurate the model is to begin with plus how well it does predict what the response will be outside that range.

Clearly, if a sensor's response were purely linear over the entire range, then it wouldn't matter; a straight line is a straight line. That is never the case in practice; just how nonlinear and how well the fitted model holds is purely up to whatever the particular data/model predict related to what the sensor output actually is for a given input. Polynomials in higher degrees are particular notorious for "blowing up" as a range gets larger; a quadratic term response alone increases by 2X for every 1.4X in input; iow a 40% increase in T would double the predicted sensor output including a quadratic term by that term alone. (38/32)^2 ==> 1.4. Remember the shape of a parabola is always increasing slope magnitude, whether pointing up or down.

B. You clearly can't measure every single combination of all paramters, that's not what experiment design is about. You should, however cover the RANGE of all parameters over the ranges that can exist jointly. Picking that set of points is the subject of experiment design; one method that has been generally found helpful in fitting quadratic response surface models is the central composite design. Again, I recommend to you Box, Hunter and Hunter as an essential background tool to get an idea of the issues and techniques designed to avoid pitfalls.

Dharmesh Joshi on 21 Sep 2023

Thanks for the update.

Yes, I can add some additional computing after the mode if necessary. My concern is that if I were to train my model with data with temperature values below 20 degrees, but then, when my model is used in the real world, the temperature becomes 35 degrees, how would the model behave? Would it not know, or would it somehow learn and predict?

If I retrain the model, is it possible to see what new elements are learned from the new data?

I have a large amount of data which is being inputted into the regression learner app, and it becomes very slow. When I want to see the effects of temperature using the "Partial Dependence Plot", can I simply import my model into my script, keep all variables (apart from temperature) static, and observe the effect of varying the temperature value?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Kaustab Pal on 19 Aug 2024

0 votes

Hi @Dharmesh Joshi,

For the model to work well, it needs to see inputs that are similar to what it saw during training. For example, if the input and output had a linear relationship during training, the model will do well if this relationship stays the same. But if the relationship changes to something like exponential during testing, the model might not perform well, and you'll need to retrain it.

To make your model more accurate, try to gather a large dataset that shows different types of input-output relationships. You can also improve the model by updating it regularly with new data it hasn't seen before.

The way you input temperature data can also affect how well the model works. It's important to use the same method of representing temperature both when training the model and when using it to make predictions. This consistency is key to maintaining accuracy.

I hope this helps clear things up!

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Optimizing a Regression Learner App for an Electrochemical NO2 Sensor: Dealing with Drift and Input Variations

6 Comments
Show 4 older comments Hide 4 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

Optimizing a Regression Learner App for an Electrochemical NO2 Sensor: Dealing with Drift and Input Variations

6 Comments Show 4 older comments Hide 4 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

6 Comments
Show 4 older comments Hide 4 older comments

0 Comments
Show -2 older comments Hide -2 older comments