Can I have checkpoints in Bayesian optimization for tuning hyperparameters of a neural network?

Question

0 votes

Hi there,

I'm trying to implement Bayesian Optimization on a BiLSTM network.

I'm planning to run this code in a university cluster but, they give us maximum of 2 days (48 hours) to run our job and if it goes beyond it, they automatically kill the job which probably will result in wasted time and resources for me and for other students waiting in que.

I was wondering if it would be possible to implement some kind of a checkpoint for bayesopt() to continue from where the job is left off:

Basically, what I'm asking is if I can save my previous runs (variables bayesopt() observed) and load them in my next run and continue from where it stopped?

I have not seen any documentation related to this (I may have missed it).

My understanding with bayesopt() is that, the more points are observed, the more accurate the answers bayesopt() gives. Is this right? If so, that means I might want to try to run it for more than 2 days maybe. The number of cores I can request are limited (the more I request, the longer I wait in que) and from what I'm estimating, the most complex combination of variables can take between 40 mins to 1 hour to train and give me a result ( obviously, not every combination will take this much time).

Any help is appreciated.

Thank you.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Aditya Patil on 16 Nov 2020

Edited: Aditya Patil on 16 Nov 2020

Open in MATLAB Online

1 vote

Currently, there is no checkpointing argument. However, you can use the 'OutputFcn' argument along with the 'SaveFileName' argument to save to file, and the resume function to restart the process as follows,

x1 = optimizableVariable('x1',[-5,5]);
x2 = optimizableVariable('x2',[-5,5]);
fun = @rosenbrocks;
if exist('BayesoptResults.mat','file')
    load('BayesoptResults.mat');
    results = resume(BayesoptResults,...
    'SaveFileName', 'BayesoptResults.mat', ...
     'OutputFcn',{@saveToFile});
else
    results = bayesopt(fun, [x1, x2],'AcquisitionFunctionName',...
    'expected-improvement-plus', ...
    'SaveFileName', 'BayesoptResults.mat', ...
    'OutputFcn',{@saveToFile});
end
function f = rosenbrocks(x)
    f = 100*(x.x2 - x.x1^2)^2 + (1 - x.x1)^2;
end

Note that this saves to file on every iteration, so you might want to replace saveToFile with a custom function that saves occasionally, for performance reason.

The relevant docs are available here, resume and bayesopt.

10 Comments
Show 8 older comments Hide 8 older comments

Yildirim Kocoglu on 20 Nov 2020

Open in MATLAB Online

I'm just here to say that it really worked. Thank you very much Aditya, it works great!

It even kept the points that were evalueted previously and did not re-evaluate them again and if plotFcn was active, I could see the previously evaluated points as well in the plot along with the new points.

The only additional thing that would be nice to have would be to save the output in commandwindow from verbose (and keep appending the next runs to it) so that I can see the full picture but, I think I can find a solution for it (probably diary file?).

But, I should give a warning that it did not work with the spmd option (for parallel bayesopt) (if you anyone is using it or any of the options similarly documented in the documentation for parallel bayesopt) the 2nd time even if the file is saved.

The error was:

ObjectiveFcn is a parallel.pool.Constant, but its 'Value' property does not exist on the current pool.

The C.Value when I checked inside C = parallel.pool.Constant(valErrorFun), I noticed the C.Value "does not exist" probably because it is a function handle. However, when I first ran it using the automatic options and then run it using the spdm option for parallel (same code), it worked fine.

For now, I can't tell why this happened exactly but, I think there is an easy solution to this which I don't know about yet.

Regardless, the solution that Aditya gave worked fine.

Thank you again Aditya.

Yildirim Kocoglu on 29 Nov 2020

Open in MATLAB Online

Hi Aditya,

I aplogize for not getting back to you sooner. I was working on something else that was taking a lot of time and didn't notice how the time has passed. I finally posted a question on mathworks about it.

Here is the code with the issue:

clear;
clc;
close all;
spmd
    valErrorFun = makeObjFcn();
end
C = parallel.pool.Constant(valErrorFun);
% Classical Neural Network - Hyperparameter tuning using Bayesian Optimization
simplefitInputs = [0 0.0498 0.0996 0.1550 0.2103 0.2657 0.3210 0.3825 ...
    0.4440 0.5123 0.5807 0.6566 0.7409 0.8347 0.9388 1.0674 1.2102 1.3690 ...
    1.5453 1.7041 1.8469 1.9898 2.1326 2.2755 2.4183 2.5612 2.7041 2.8469 ...
    2.9898 3.1326 3.2755 3.4342 3.5929 3.7693 3.9457 4.1220 4.2984 4.4748 ...
    4.6511 4.8275 4.9862 5.1450 5.3037 5.4466 5.5894 5.7323 5.8910 6.0674 ...
    6.2437 6.3866 6.5295 6.6452 6.7389 6.8233 6.8992 6.9675 7.0290 7.0905 ...
    7.1458 7.2012 7.2565 7.3119 7.3617 7.4115 7.4613 7.5167 7.5720 7.6273 ...
    7.6827 7.7442 7.8057 7.8740 7.9499 8.0343 8.1384 8.2813 8.4577 8.6005 ...
    8.7162 8.8100 8.8943 8.9702 9.0461 9.1145 9.1828 9.2511 9.3195 9.3878 ...
    9.4637 9.5396 9.6240 9.7177 9.8334 9.9763];
simplefitTargets = [5.0472 5.3578 5.6632 5.9955 6.3195 6.6343 6.9389 ...
    7.2645 7.5753 7.9020 8.2078 8.5216 8.8366 9.1432 9.4289 9.7007 9.8995 ...
    10.0000 9.9786 9.8589 9.6876 9.4722 9.2283 8.9701 8.7099 8.4579 8.2217 ...
    8.0065 7.8153 7.6494 7.5084 7.3793 7.2770 7.1912 7.1319 7.0972 7.0866 ...
    7.1014 7.1440 7.2169 7.3100 7.4287 7.5699 7.7102 7.8544 7.9901 8.1120 ...
    8.1811 8.1424 8.0056 7.7556 7.4618 7.1617 6.8445 6.5222 6.2041 5.8970 ...
    5.5721 5.2664 4.9500 4.6250 4.2937 3.9920 3.6889 3.3863 3.0529 2.7252 ...
    2.4056 2.0968 1.7695 1.4619 1.1469 0.8345 0.5391 0.2564 0.0263 0 0.1787 ...
    0.4413 0.7207 1.0154 1.3092 1.6244 1.9214 2.2266 2.5356 2.8438 3.1469 ...
    3.4723 3.7799 4.0938 4.3986 4.6956 4.9132];
%%Choose Variables to Optimize
minHiddenLayerSize = 10;
maxHiddenLayerSize = 20;
hiddenLayerSizeRange = [minHiddenLayerSize maxHiddenLayerSize];
optimVars = [
    optimizableVariable('Layer1Size',hiddenLayerSizeRange,'Type','integer')
    optimizableVariable('Layer2Size',hiddenLayerSizeRange,'Type','integer')];
%%Perform Bayesian Optimization
%ObjFcn = makeObjFcn(simplefitInputs, simplefitTargets);
% Print command window results into a text file
% Save to file and resume from where it is left
if exist('BayesoptResults.mat', 'file')
    load ('BayesoptResults.mat');
    BayesObject = resume(BayesoptResults, 'SaveFilename', 'BayesoptResults.mat', 'OutputFcn', {@saveToFile});
else
    BayesObject = bayesopt(C,optimVars,...
    "MaxObj",30,...
    "PlotFcn", [],...
    "MaxTime",8*60*60,...
    "IsObjectiveDeterministic",false,...
    "UseParallel",true, 'SaveFileName', 'BayesoptResults.mat', 'OutputFcn', {@saveToFile});
end
%%Evaluate Final Network
bestIdx = BayesObject.IndexOfMinimumTrace(end);
fileName = BayesObject.UserDataTrace{bestIdx};
load(fileName);
YPredicted = net(simplefitInputs);
testError = perform(net,simplefitTargets,YPredicted);
testError
valError
%%etc.
% ...
%%Objective Function for Optimization
function ObjFcn = makeObjFcn()
%%Input-Output Fitting with a Neural Network and Bayesian Optimization
%%Prepare Data
XTrain = [0 0.0498 0.0996 0.1550 0.2103 0.2657 0.3210 0.3825 ...
    0.4440 0.5123 0.5807 0.6566 0.7409 0.8347 0.9388 1.0674 1.2102 1.3690 ...
    1.5453 1.7041 1.8469 1.9898 2.1326 2.2755 2.4183 2.5612 2.7041 2.8469 ...
    2.9898 3.1326 3.2755 3.4342 3.5929 3.7693 3.9457 4.1220 4.2984 4.4748 ...
    4.6511 4.8275 4.9862 5.1450 5.3037 5.4466 5.5894 5.7323 5.8910 6.0674 ...
    6.2437 6.3866 6.5295 6.6452 6.7389 6.8233 6.8992 6.9675 7.0290 7.0905 ...
    7.1458 7.2012 7.2565 7.3119 7.3617 7.4115 7.4613 7.5167 7.5720 7.6273 ...
    7.6827 7.7442 7.8057 7.8740 7.9499 8.0343 8.1384 8.2813 8.4577 8.6005 ...
    8.7162 8.8100 8.8943 8.9702 9.0461 9.1145 9.1828 9.2511 9.3195 9.3878 ...
    9.4637 9.5396 9.6240 9.7177 9.8334 9.9763];
YTrain = [5.0472 5.3578 5.6632 5.9955 6.3195 6.6343 6.9389 ...
    7.2645 7.5753 7.9020 8.2078 8.5216 8.8366 9.1432 9.4289 9.7007 9.8995 ...
    10.0000 9.9786 9.8589 9.6876 9.4722 9.2283 8.9701 8.7099 8.4579 8.2217 ...
    8.0065 7.8153 7.6494 7.5084 7.3793 7.2770 7.1912 7.1319 7.0972 7.0866 ...
    7.1014 7.1440 7.2169 7.3100 7.4287 7.5699 7.7102 7.8544 7.9901 8.1120 ...
    8.1811 8.1424 8.0056 7.7556 7.4618 7.1617 6.8445 6.5222 6.2041 5.8970 ...
    5.5721 5.2664 4.9500 4.6250 4.2937 3.9920 3.6889 3.3863 3.0529 2.7252 ...
    2.4056 2.0968 1.7695 1.4619 1.1469 0.8345 0.5391 0.2564 0.0263 0 0.1787 ...
    0.4413 0.7207 1.0154 1.3092 1.6244 1.9214 2.2266 2.5356 2.8438 3.1469 ...
    3.4723 3.7799 4.0938 4.3986 4.6956 4.9132];
    ObjFcn = @valErrorFun;
      function [valError,cons,fileName] = valErrorFun(optVars)
          % Solve an Input-Output Fitting problem with a Neural Network
          % Choose a Training Function
          % For a list of all training functions type: help nntrain
          % 'trainlm' is usually fastest.
          % 'trainbr' takes longer but may be better for challenging problems.
          % 'trainscg' uses less memory. Suitable in low memory situations.
          trainFcn = 'trainlm';  % Levenberg-Marquardt backpropagation.
          % Create a Fitting Network
          layer1_size = optVars.Layer1Size;
          layer2_size = optVars.Layer2Size;
          hiddenLayerSizes = [layer1_size layer2_size];
          net = fitnet(hiddenLayerSizes,trainFcn);
          % Setup Division of Data for Training, Validation, Testing
          net.divideParam.trainRatio = 70/100;
          net.divideParam.valRatio = 15/100;
          net.divideParam.testRatio = 15/100;
          % Train the Network
          net.trainParam.showWindow = false;
          net.trainParam.showCommandLine = false;
          [net,~] = train(net,XTrain,YTrain);
          % Test the Network
          YPredicted = net(XTrain);
          valError = perform(net,YTrain,YPredicted);
          fileName = num2str(valError) + ".mat";
          save(fileName,'net','valError')
          cons = [];
      end
end

The error for me is:

Error using bayesoptim.BayesoptParallel/ensureObjFcnIsOnWorkers (line 82)
ObjectiveFcn is a parallel.pool.Constant, but its 'Value' property does not exist on the current pool.
Error in bayesoptim.BayesoptParallel (line 42)
            this = ensureObjFcnIsOnWorkers(this, objFcn, Verbose);
Error in BayesianOptimization/initializeParallel (line 2002)
                this.Parallel = bayesoptim.BayesoptParallel(this.ObjectiveFcn, this.PrivOptions.NumCoupledConstraints, this.PrivOptions.Verbose);
Error in BayesianOptimization/resume (line 444)
                Results = initializeParallel(Results);
Error in Bayesian_optimization_classical_NN (line 52)
    BayesObject = resume(BayesoptResults, 'SaveFilename', 'BayesoptResults.mat', 'OutputFcn', {@saveToFile});

Aditya Patil on 7 Dec 2020

I wasn't able to reproduce the issue with either R2020a or R2020b.

Yildirim Kocoglu on 7 Dec 2020

That's fine.

Your suggested method still works in general (automatic) and that's what I need for the time being.

I highly appreciate you for answering my question because it was a life saver and I'm forever grateful to you for it.

I also really appreciate you trying to reproduce the issue even if it was not successful. There may be other reasons (not code related) for the issue which I can't pinpoint right now but, at least it's good to know it worked fine for you. This knowledge might help me in the future.

Have a great day.

Sign in to comment.

Answer 2

Alan Weiss on 17 Nov 2020

1 vote

There is one other possible solution to your problem. surrogateopt froom Optimization Toolbox™ can checkpoint automatically. It does not perform Bayesian optimization, but surrogate optimization is closely related and might be similar enough for your purposes.

Alan Weiss

MATLAB mathematical toolbox documentation

2 Comments
Show None Hide None

Yildirim Kocoglu on 20 Nov 2020

Thank you so much Mr. Weiss,

I wasn't aware of surrogateopt option and its ability to checkpoint automatically.

I will have to look into it as well and see which one works best for me.

Also, I noticed when I type parallel in bayesian it uses multiple cores for each iteration (I initially thought it would send each run to a different core simultaneously (observe multiple points for the next points optimization) ). I will look into surrogateopt but, do you by any chance know if it has parallel option and whether it works similar to bayesopt?

Thank you.

Alan Weiss on 20 Nov 2020

Reference page: surrogateopt

Algorithm description, including parallel: Surrogate Optimization Algorithm

Options description: Surrogate Optimization Options

Alan Weiss

MATLAB mathematical toolbox documentation

Sign in to comment.

Can I have checkpoints in Bayesian optimization for tuning hyperparameters of a neural network?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

10 Comments
Show 8 older comments Hide 8 older comments

More Answers (1)

2 Comments
Show None Hide None

Categories

Products

Release

Tags

Community Treasure Hunt

Can I have checkpoints in Bayesian optimization for tuning hyperparameters of a neural network?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

10 Comments Show 8 older comments Hide 8 older comments

More Answers (1)

2 Comments Show None Hide None

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

10 Comments
Show 8 older comments Hide 8 older comments

2 Comments
Show None Hide None