How to implement kNN imputation on test set without data leakage?
2 views (last 30 days)
Show older comments
I am using knnimpute to handle missing data for machine learning. My data is subdivided into a training and test set (mTrain and mTest). The usage of knnimpute for the training set is easy. For the test set, however, I need the algorithm to impute missing values by using the nearest neighbor from the training set to prevent data leakage. Now I am wondering how to implement knnimpute on the test set in this way. Does anybody have an idea how to code that?
1 Comment
Zexi Yang
on 17 Aug 2022
Why do you have to impute test set using nearest neighbor from training set? You can just use nearest neighours from test set without having any data leakage. Data leakage is where you impute training set using data from test set.
Answers (0)
See Also
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!