What does a cross (horizontal line) in the regression plot of a neural network mean with multivariant input and output?

1 view (last 30 days)
Hello everyone,
I have trained a neural network and got the below regression plot.
First of all I have nomalized every sample, by substracting its mean value over the samples and dividing this with its standard deviation. So that all input and output is normalized and in the same range. Is that allowed, or do I introduce any errors into my data.?
I have tried several network structures and always get such a cross in the regression plot, but as I guess, some of the output data seems to be insensitive to the input data. Is that right? If so, how can I check that.
Thanks for your help! Best regards,
Pablo

Accepted Answer

Cris LaPierre
Cris LaPierre on 2 Dec 2020
Edited: Cris LaPierre on 3 Dec 2020
Normalizing is a standard preprocessing step. It is helpful when you have several inputs to your model that are of different scale. It helps prevent any one feature from dominating the model due to its scale. When you have a single input, this is unnecessary. Also, this is for preprocessing. I don't think it makes sense to do this after the fact, and could be affecting your visualization.
A cross would suggest there are two different types of data in your data set-one with a relationship and one without. The horizontal part indicates data points that have no relationship between the Target and the Output.
  1 Comment
Pablo Noever
Pablo Noever on 3 Dec 2020
Edited: Pablo Noever on 3 Dec 2020
Thank you Cris for your reply.
First of all; as I have guessed and you have confirmed the cross in the relationship is due to missing links between input and output of samples. By performing a sensitivity analysis on the normalized in and output data of the samples I have discarded every Input that does not contribute to any output parameter and every output parameter that is not affacted by any input parameter. By that all data points on the horizontal axes disapear. Thanks for that. The result is as follows:
So I get a very goog agreement correlation of target (sample output) and ANN output.
Second; I agree that normalizing the sample inputs is necessary. Put also normalizing the sample output (targets) can be helpfull so that you have all data in the same range, so that you can better estimate if the approximation of all data is good. I have performed the same without normalizing the targets, see below and you can see a good agreement (R=1). But the lowest values are in a very small range compared to the other, thus it is hard to evaluate their deviations, as this is fairly not obvious due to the scale.
Nevertheless this is just some detail. The main problem is solved. Insensitive target values can not be represented by the ANN, due to missing linkage to the input parameters and thus produce a horizontal line in the regression plots. A sensitivity analysis is a good method to discard all insensitive in and output parameters.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!