MATLAB Answers

Statistical test in sequentialfs?

1 view (last 30 days)
Alexis Moscoso Rial
Alexis Moscoso Rial on 23 Oct 2017
Hello,
I'm using sequentialfs to find the most suitable subset of features for my binary classification problem. As described in sequentialfs documentation, this algorithm adds a feature if, when using 10-fold cross-validation, mean criterion (averaged over the 10-folds) is the smallest across candidate features and is smaller compared to the mean criterion yielded by the model without that feature. My question is the following: does sequentialfs use some kind of statistical test to compare criterions yielded by the model without the feature and the model with the feature or is it just a comparison of mean criterions (if mean criterion of n features > mean criterion of n+1 features, then add feature).
Thanks!

  2 Comments

Scott Weidenkopf
Scott Weidenkopf on 31 Oct 2017
From the 'sequentialfs' documentation :
After computing the mean criterion values for each candidate feature subset, sequentialfs chooses the candidate feature subset that minimizes the mean criterion value. This process continues until adding more features does not decrease the criterion.
I am not sure I understand your question, what sort of statistical test are you referring to?
Alexis Moscoso Rial
Alexis Moscoso Rial on 2 Nov 2017
My question arises from the fact that although the addition of a feature might decrease the criterion, this decrease may not be statiscally significant. This could be checked with any inference test like the t-test.
This particular issue has been pointed out in the mathworks blog: https://blogs.mathworks.com/loren/2011/11/21/subset-selection-and-regularization/
As you can see, in the 'Introducing Sequential Feature Selection' section, one of the steps is
"Test the two models for statistical significance. If the new model is not significantly more accurate that the original model, stop the process. If, however, the new model is statistically more significant, go and search for the third best variable."
My question is: What type of statistical analysis is it used to compare models?

Sign in to comment.

Answers (1)

Scott Weidenkopf
Scott Weidenkopf on 3 Nov 2017
'sequentialfs' simply compares the mean criterion values of the candidate subsets after performing the cross-validation. Below is the algorithm described in the blog post, with the third step reworded.
  1. Start by testing each possible predictor one at a time. Identify the single predictor that generates the most accurate model. This predictor is automatically added to the model.
  2. Next, one at a time, add each of the remaining predictors to a model that includes the single best variable. Identify the variable that improves the accuracy of the model the most.
  3. Test the two models for predictive accuracy. If the new model is not more accurate that the original model within a specified tolerance, stop the process. If, however, the new model has better predictive accuracy, go and search for the third best variable.
  4. Repeat this process until you can't identify a new variable that improves the predictive accuracy of the model.
There is a 'significance test' of sorts being performed here, in the sense that the improvements in the model accuracy are measured against a tolerance. The tolerance can be specified in the 'TolFun' parameter to the 'options' struct which can be passed to 'sequentialfs'. This value defaults to 1e-6 or 0, depending on the direction of the sequential search.

  2 Comments

Alexis Moscoso Rial
Alexis Moscoso Rial on 4 Nov 2017
Thank you very much for the detailed answer!
Bibhavari Bandyopadhyay
Bibhavari Bandyopadhyay on 22 Jul 2019
I am using sequentialfs function.I get an error in line 363 of sequentialfs.m file.It says incorrect assignment due to incorrect number of rows.Kindly help me in this matter asap.

Sign in to comment.