Evaluating Goodness of Fit

How to Evaluate Goodness of Fit

After fitting data with one or more models, you should evaluate the goodness of fit. A visual examination of the fitted curve displayed in the Curve Fitter app should be your first step. Beyond that, the toolbox provides these methods to assess goodness of fit for both linear and nonlinear parametric fits:

As is common in statistical literature, the term goodness of fit is used here in several senses: A “good fit” might be a model

that your data could reasonably have come from, given the assumptions of least-squares fitting
in which the model coefficients can be estimated with little uncertainty
that explains a high proportion of the variability in your data, and is able to predict new observations with high certainty

A particular application might dictate still other aspects of model fitting that are important to achieving a good fit, such as a simple model that is easy to interpret. The methods described here can help you determine goodness of fit in all these senses.

These methods group into two types: graphical and numerical. Plotting residuals and prediction bounds are graphical methods that aid visual interpretation, while computing goodness-of-fit statistics and coefficient confidence bounds yield numerical measures that aid statistical reasoning.

Generally speaking, graphical measures are more beneficial than numerical measures because they allow you to view the entire data set at once, and they can easily display a wide range of relationships between the model and the data. The numerical measures are more narrowly focused on a particular aspect of the data and often try to compress that information into a single number. In practice, depending on your data and analysis requirements, you might need to use both types to determine the best fit.

Note that it is possible that none of your fits can be considered suitable for your data, based on these methods. In this case, it might be that you need to select a different model. It is also possible that all the goodness-of-fit measures indicate that a particular fit is suitable. However, if your goal is to extract fitted coefficients that have physical meaning, but your model does not reflect the physics of the data, the resulting coefficients are useless. In this case, understanding what your data represents and how it was measured is just as important as evaluating the goodness of fit.

Goodness-of-Fit Statistics

After using graphical methods to evaluate the goodness of fit, you should examine the goodness-of-fit statistics. Curve Fitting Toolbox™ software supports these goodness-of-fit statistics for parametric models:

The sum of squares due to error (SSE)
R-square
Degrees of freedom for error (DFE)
Adjusted R-square
Root mean squared error (RMSE)

For the current fit, these statistics are displayed in the Results pane in the Curve Fitter app. For all fits in the current curve-fitting session, you can compare the goodness-of-fit statistics in the Table Of Fits pane.

To examine goodness-of-fit statistics at the command line, either:

In the Curve Fitter app, export your fit and goodness of fit to the workspace. On the Curve Fitter tab, in the Export section, click Export and select Export to Workspace.
Specify the gof output argument with the fit function.

Sum of Squares Due to Error

This statistic measures the total deviation of the response values from the fit to the response values. It is also called the summed square of residuals and is usually labeled as SSE.

$S S E = \sum_{i = 1}^{n} w_{i} {(y_{i} - {\hat{y}}_{i})}^{2}$

A value closer to 0 indicates that the model has a smaller random error component, and that the fit will be more useful for prediction.

R-Square

This statistic measures how successful the fit is in explaining the variation of the data. Put another way, R-square is the square of the correlation between the response values and the predicted response values. It is also called the square of the multiple correlation coefficient and the coefficient of multiple determination.

R-square is defined as the ratio of the sum of squares of the regression (SSR) and the total sum of squares (SST). SSR is defined as

$S S R = \sum_{i = 1}^{n} w_{i} {({\hat{y}}_{i} - \bar{y})}^{2}$

SST is also called the sum of squares about the mean, and is defined as

$S S T = \sum_{i = 1}^{n} w_{i} {(y_{i} - \bar{y})}^{2}$

where SST = SSR + SSE. Given these definitions, R-square is expressed as

$R-square = \frac{S S R}{S S T} = 1 - \frac{S S E}{S S T}$

R-square can take on any value between 0 and 1, with a value closer to 1 indicating that a greater proportion of variance is accounted for by the model. For example, an R-square value of 0.8234 means that the fit explains 82.34% of the total variation in the data about the average.

If you increase the number of fitted coefficients in your model, R-square will increase although the fit may not improve in a practical sense. To avoid this situation, you should use the degrees of freedom adjusted R-square statistic described below.

Note that it is possible to get a negative R-square for equations that do not contain a constant term. Because R-square is defined as the proportion of variance explained by the fit, if the fit is actually worse than just fitting a horizontal line then R-square is negative. In this case, R-square cannot be interpreted as the square of a correlation. Such situations indicate that a constant term should be added to the model.

Degrees of Freedom Adjusted R-Square

This statistic uses the R-square statistic defined above, and adjusts it based on the residual degrees of freedom. The residual degrees of freedom is defined as the number of response values n minus the number of fitted coefficients m estimated from the response values.

v = n – m

v indicates the number of independent pieces of information involving the n data points that are required to calculate the sum of squares. Note that if parameters are bounded and one or more of the estimates are at their bounds, then those estimates are regarded as fixed. The degrees of freedom is increased by the number of such parameters.

The adjusted R-square statistic is generally the best indicator of the fit quality when you compare two models that are nested — that is, a series of models each of which adds additional coefficients to the previous model.

$adjusted R-square = 1 - \frac{S S E (n - 1)}{S S T (v)}$

The adjusted R-square statistic can take on any value less than or equal to 1, with a value closer to 1 indicating a better fit. Negative values can occur when the model contains terms that do not help to predict the response.

Root Mean Squared Error

This statistic is also known as the fit standard error and the standard error of the regression. It is an estimate of the standard deviation of the random component in the data, and is defined as

$R M S E = s = \sqrt{M S E}$

where MSE is the mean square error or the residual mean square

$M S E = \frac{S S E}{v}$

Just as with SSE, an MSE value closer to 0 indicates a fit that is more useful for prediction.