'sequentialfs' simply compares the mean criterion values of the candidate subsets after performing the cross-validation. Below is the algorithm described in the blog post, with the third step reworded.
- Start by testing each possible predictor one at a time. Identify the single predictor that generates the most accurate model. This predictor is automatically added to the model.
- Next, one at a time, add each of the remaining predictors to a model that includes the single best variable. Identify the variable that improves the accuracy of the model the most.
- Test the two models for predictive accuracy. If the new model is not more accurate that the original model within a specified tolerance, stop the process. If, however, the new model has better predictive accuracy, go and search for the third best variable.
- Repeat this process until you can't identify a new variable that improves the predictive accuracy of the model.
There is a 'significance test' of sorts being performed here, in the sense that the improvements in the model accuracy are measured against a tolerance. The tolerance can be specified in the 'TolFun' parameter to the 'options' struct which can be passed to 'sequentialfs'. This value defaults to 1e-6 or 0, depending on the direction of the sequential search.