Matlab: Error using classreg.learning.FitTemplate/fit with hyperparameter optimization of SVM
    8 views (last 30 days)
  
       Show older comments
    
I am using Bayesian optimization (bayesopt function) in Matlab for hyperparameter optimization of SVM classifier. The optimization goal is to minimize 10-fold cross validation error. Here is the code that I use:
 KernelFlag = 1;
c = cvpartition(size(XTrain,1),'KFold',10);
sigma = optimizableVariable('sigma',[1e-5,1e5],'Transform','log');
box = optimizableVariable('box',[1e-5,1e5],'Transform','log');
polyOrder = optimizableVariable('polyOrder',[2,4]);
fun = @(z)mysvmfunTest(z,XTrain,yTrain,c,classNames,KernelFlag);
results = bayesopt(fun,[sigma,box,polyOrder],'IsObjectiveDeterministic',true,...
    'PlotFcn',{@plotMinObjective},...
    'AcquisitionFunctionName','expected-improvement-plus');
and mysvmfunTest:
function [objective] = mysvmfunTest(z,X,Y,c,classNames,KernelFlag)
if KernelFlag == 1
  t = templateSVM('Standardize',1,'KernelFunction','RBF',...
      'BoxConstraint',z.box,'KernelScale',z.sigma,'RemoveDuplicates',true);
elseif KernelFlag == 2
  t = templateSVM('Standardize',1,'KernelFunction','polynomial',...
  'BoxConstraint',z.box,'KernelScale',z.sigma,'PolynomialOrder',z.polyOrder,...
  'RemoveDuplicates',true);
else
  t = templateSVM('Standardize',1,'KernelFunction','linear',...
  'BoxConstraint',z.box,'KernelScale',z.sigma,...
  'RemoveDuplicates',true);
end
SVMModel = fitcecoc(X,Y,'Learners',t,'ClassNames',classNames); 
cvModel = crossval(SVMModel,'CVPartition',c);
objective = kfoldLoss(cvModel);
I have used this code before, with different datasets. But, lately when I try to use it on a new dataset, it throws me an error:
Error using classreg.learning.FitTemplate/fit (line 249) You passed a cvpartition object for 27152 observations, but the input data have only 10395 observations. Some observations may have been removed because they have NaN values for all predictors, missing response values or zero weights. When cross-validating an existing object, consider using the RowsUsed property to determine what size partition is required.
I checked all the data, there is no nan, or missing values in my data. I even removed all the samples which have any feature between 0 and .01 (all my features are positive). Still have the same problem and get the same error. I guess the error is due to the existence of samples that are perhaps too close, resulting into removal of many of the observations, but I am not sure that is the case. Any idea where this error might come from or any suggestion how I can solve this issue?
0 Comments
Accepted Answer
  Ilya
      
 on 26 Oct 2018
        You are passing ClassNames to fitcecoc - are your ClassNames a subset of all class names you have in yTrain?
Train one ECOC model using
SVMModel = fitcecoc(XTrain,yTrain,'Learners',t,'ClassNames',classNames);
and look at the size of property X in SVMModel. Does it have as many rows as XTrain does?
More Answers (1)
  Don Mathis
    
 on 26 Oct 2018
        
      Edited: Don Mathis
    
 on 26 Oct 2018
  
      Maybe your use of 'RemoveDuplicates' is causing observations to be removed?
I ran your code on some synthetic data that has no duplicates in XTrain and it works fine:
XTrain = rand(1000,10);
yTrain = categorical(round(XTrain(:,1)*3));
classNames = categories(yTrain);
KernelFlag = 1;
c           = cvpartition(size(XTrain,1),'KFold',10);
sigma       = optimizableVariable('sigma',[1e-5,1e5],'Transform','log');
box         = optimizableVariable('box',[1e-5,1e5],'Transform','log');
polyOrder   = optimizableVariable('polyOrder',[2,4]);
fun         = @(z)mysvmfunTest(z,XTrain,yTrain,c,classNames,KernelFlag);
results     = bayesopt(fun,[sigma,box,polyOrder],'IsObjectiveDeterministic',true,...
    'PlotFcn',{@plotMinObjective},...
    'AcquisitionFunctionName','expected-improvement-plus');
function [objective] = mysvmfunTest(z,X,Y,c,classNames,KernelFlag)
if KernelFlag == 1
    t = templateSVM('Standardize',1,'KernelFunction','RBF',...
        'BoxConstraint',z.box,'KernelScale',z.sigma,'RemoveDuplicates',true);
elseif KernelFlag == 2
    t = templateSVM('Standardize',1,'KernelFunction','polynomial',...
        'BoxConstraint',z.box,'KernelScale',z.sigma,'PolynomialOrder',z.polyOrder,...
        'RemoveDuplicates',true);
else
    t = templateSVM('Standardize',1,'KernelFunction','linear',...
        'BoxConstraint',z.box,'KernelScale',z.sigma,...
        'RemoveDuplicates',true);
end
SVMModel = fitcecoc(X,Y,'Learners',t,'ClassNames',classNames);
cvModel = crossval(SVMModel,'CVPartition',c);
objective = kfoldLoss(cvModel);
end
By the way, it's probably best to declare polyOrder to be an integer:
polyOrder   = optimizableVariable('polyOrder',[2,4],'Type','integer');
See Also
Categories
				Find more on Gaussian Process Regression in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

