Lasso/Elastic Net feature selection with kFold crossvalidation

Question

Juliana Corlier on 18 Apr 2018

2
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/395821-lasso-elastic-net-feature-selection-with-kfold-crossvalidation

Commented: Tyson on 23 Jul 2018

I want to understand how Lasso/Elastic Net regression selects the final features when using kFold cross-validation and using the function: [B, stats] = lasso(featData, classData, 'CV', 10) (from the Statistics & ML toolbox).

In my understanding, if the model is trained 10 times on different subsets of the total sample, this may result in different features selected/penalized in every fold. However, the cross-validated model output does not provide any insight on the variability of those features across different folds. Is the best model simply chosen among all folds and applied to the entire training set? Or are features averaged/weighted based on their stability across folds?

There was a related question previously, but nobody ever answered it:

https://www.mathworks.com/matlabcentral/answers/125357-understanding-k-fold-cross-validation

Thanks for your help!

1 Comment
Show -1 older commentsHide -1 older comments

Tyson on 23 Jul 2018

This is an important thread. We are also looking for clarification on this exact question. We do not find any info about the beta values for the k-folds in the FitInfo, only a single set of beta values for each lambda. Exactly how were these betas determined?

Sign in to comment.

Sign in to answer this question.

Answer 1

Bernhard Suhm on 22 Apr 2018

0
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/395821-lasso-elastic-net-feature-selection-with-kfold-crossvalidation#answer_316558

Crossvalidation just applies to assessing model performance. As described in doc , with kfold the average error across the k different partitions will be reported. The model is trained on the complete dataset that you provide to the training function, in this case, "lasso".

3 Comments
Show 1 older commentHide 1 older comment

Juliana Corlier on 23 Apr 2018

Thanks for your comment. However, the linked document clearly says that the original data set is partitioned, using only a subset (-not the complete data set-) to train the model. This is repeated 10 times (in my case), so the model would always be trained on slightly different subsets and in result different selected features. I get the average error part, but in my understanding, the trained models per fold are still likely be different. Here the quote:

"[...] This is done by partitioning a dataset and using a subset to train the algorithm and the remaining data for testing. Because cross-validation does not use all of the data to build a model, it is a commonly used method to prevent overfitting during training.

Each round of cross-validation involves randomly partitioning the original dataset into a training set and a testing set. The training set is then used to train a supervised learning algorithm and the testing set is used to evaluate its performance. This process is repeated several times and the average cross-validation error is used as a performance indicator."

Please advice if I am missing something.

Bernhard Suhm on 30 Apr 2018

You are right, and asked internally for additional clarification. If you use the kfold argument, you don't get a "final" model back with features weighted or averaged, but pointers to all k models, whose coefficients (or selected features) may slightly differ. If they do differ, that would be a sign those features aren't very strong, so you wouldn't want them in your final model. - You can get additional information on the various fitted models in the FitInfo field of the output object, but you have to analyze the variability across different objects yourself. - Alternatively, you can retrain the model without k-fold, which will give you the best features using the complete data set.

Juliana Corlier on 11 May 2018

Thanks for clarifying this! This is very helpful. I have a practical follow up question:

I was looking for these pointers, but I can't seem to find them. In the FitInfo struct I only get coefficients for the 72 different Lambda values (which I also get if I don't run crossvalidation). I would have expected a multidimensional struct/object for different kFolds, but my FitInfo is a 1x1 struct. Any ideas on that? Many thanks!

Sign in to comment.

Lasso/Elastic Net feature selection with kFold crossvalidation

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

Lasso/Elastic Net feature selection with kFold crossvalidation

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

3 Comments
Show 1 older commentHide 1 older comment