Error trying to fit data to gaussian function: Inf computed by model function, fitting cannot continue

7 views (last 30 days)
I am trying to find out whether my data fit to a gaussian distribution or not and am using the fit function for that.
When I try to execute this line:
[~,gof]=fit((1:length(median(norm_spikes_flight_control,2,'omitnan')))', ...
median(norm_spikes_flight_control,2,'omitnan'),'gauss1');
I get the following error message:
Error using fit>iFit
Inf computed by model function, fitting cannot continue.
Try using or tightening upper and lower bounds on coefficients.
Error in fit (line 116)
[fitobj, goodness, output, convmsg] = iFit( xdatain, ydatain, fittypeobj, ...
I have attached the 3001x1 double that results from "median(norm_spikes_flight_control,2,'omitnan')" for a set of data that gives no error and one dataset where the code above works just fine.
If anyone is sufficiently knowledgeable what is happening here and how to find my error, I would be extremely grateful.

Answers (1)

dpb
dpb on 10 Aug 2023
Edited: dpb on 10 Aug 2023
d1=readmatrix('data_working.txt');
d2=readmatrix('data_error.txt');
plot([d1 d2]), xlim([1 numel(d1)]), legend('Working','Error')
Neither of those look anything at all like a single-peak Gaussian when plotted seqentially which is what the fit() algorithm tries to match.
You can use Statistics TB fitdist to consider the data as a whole without the given order; histfit does it and shows a figure when done...
histfit(d1)
fitdist(d1,'normal')
ans =
NormalDistribution Normal distribution mu = -0.0737158 [-0.0755425, -0.0718891] sigma = 0.0510356 [0.0497764, 0.0523606]
[mean(d1) std(d1)]
ans = 1×2
-0.0737 0.0510
The result for it is, you see simply the mean and standard deviation with error estimates.
histfit(d2)
What is the purpose behind trying to fit a Gaussian model to those data as shown originally? That model simply doesn't fit the data in that form and the histograms show the overall distributions are multi-mode and also far from being simply a single Gaussian distribution.
ADDENDUM
I forgot to add the look at what the fitted model actually did...
x=[1:numel(d1)].';
[f,gof]=fit(x,d1,'gauss1')
f =
General model Gauss1: f(x) = a1*exp(-((x-b1)/c1)^2) Coefficients (with 95% confidence bounds): a1 = 0.01526 (-0.02586, 0.05637) b1 = 607.3 (559.1, 655.5) c1 = 21.89 (-46.23, 90.02)
gof = struct with fields:
sse: 24.1150 rsquare: -2.0862 dfe: 2998 adjrsquare: -2.0882 rmse: 0.0897
plot(f,x,d1,'fit','residuals')
subplot(2,1,1), ylabel('D1'), xlim([1 numel(d1)])
legend location southwest
subplot(2,1,2), ylabel('Residuals'), xlim([1 numel(d1)])
legend location southwest
As the above shows, the fitted model is totally inadequate; an adjusted R-square of -2 is absurd. If one looks very carefully, one can see the model fit red line overlays the blue data at the peak around x=600; otherwise the model prediction is essentially zero. The plot of the model residuals is almost indistinguishable from the data plot although the magnitude of the peak at x=607 is almost removed; anywhere else farther away from that location parameter value the exponential term rapidly vanishes. The model simply doesn't represent the data at all in that form.
figure, hAx=subplot(2,1,1);
xlim([1 numel(x)])
plot(f)
xlim([1 numel(x)])
legend off
title('Model Full Range')
subplot(2,1,2)
xlim([400 800])
plot(f)
legend off
title('Model About Central Peak')
illustrates the model output itself without the original data; it shows that it is, indeed, a Gaussian, but that other than around the one central location the value is essentially zero because the exponential vanishes.
Again, would need to know what the end purpose here is in order to have any idea of what might be feasible/make sense; simply throwing data at a model without exploratory analysis is exceedingly dangerous.
  4 Comments
dpb
dpb on 7 Sep 2023
Edited: dpb on 7 Sep 2023
That would be one conclusion to be drawn, yes. I haven't dug into the details of the solution algorithm used as to the exact cause of the inf being returned; somewhere it's dividing by zero which most likely comes from underflow of the tails of the normal pdf function being evaluated. But, I'd contend that's certainly not a reliable test; it's simply a heuristic occurrence that is dependent upon some specific condition within a dataset.
For your case, given the sample data, I'd say neither is a normal distribution and that your analysis/observation of your data should be prepared to observe/find the multi-modal cases and identify them as opposed to a case that actually does show a normal-like distribution.
I <answered a previous Q? some time back> on testing for normality, I still haven't implemented an m-file to do the Shapiro-Wilk, however.
Christian Kraus
Christian Kraus on 14 Sep 2023
Thank you again for taking the time with this!
I will give this some thought and discuss this with some people in our group to see how we want to tackle this in the future. Definitely something to keep i
When it comes to the Shapiro-Wilk test, there is a function by Ahmed Ben Saïda in the community file exchange that I could use for testing for normality.

Sign in to comment.

Categories

Find more on Get Started with Curve Fitting Toolbox in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!