MATLAB Answers

Nicholas
0

Is there a limit to the amount of data MBC toolbox can handle?

Asked by Nicholas
on 23 Mar 2016
Latest activity Answered by Ian Noell on 15 Apr 2016
I'm currently creating a model of an engine using the model based calibration toolbox which has 9 inputs. To get a good fit for the data I've included 10000 data points. Currently the model has been "Building response model"... for several hours. Is there an upper limit to the amount of data it can handle, or will it eventually converge? I don't mind letting the model run for days as long as I get a good fit out of it at the end!
Thank you

  3 Comments

Hi Nicholas,
What version of MATLAB and MBC model type are you using? I would expect MBC to be able to handle this size dataset but the fit time will depend on the model type, the fit options you are using and the amount of memory you have.
In particular, there are some options for Gaussian Process Models (available from R2015b) for fitting larger datasets that could be used to improve the fitting time. The default GPM fit algorithm changes when the dataset has more than 2000 points. You can find details about this in the documentation for fitrgp. I can also make suggestions about fitting other model types such as RBFs if needed.
Let me know if you have any further questions,
Ian
Hi Ian,
Thank you for your answer. I'm using mbc toolbox version 4.8.1 on Matlab 2015a; is there any way I can get GPM fits without needing 2015b?
I've tried RBFs and have the same problem; the model does not find a solution, even after 24hr of running. Is there documentation available anywhere which could recommend the best type of model based on the data available?
Many thanks,
Nicholas
Hi Nicholas,
You need R2015b to use GPM. If you use RBF's you can try and use the Advanced button on the Model Setup dialog. There is help for the advance options at:
Some useful options include:
Maximum number of centers: min(nObs/3,1000)
Percentage of data to be used as centers: min(100,(2000/nObs)*100)
For a dataset of 10000 these defaults result in 2000 points selected at random being considered as centers from which 1000 centers will be chosen. You could try reducing the percentage to be equivalent to 1000 points: min(100,(1000/nObs)*100).
Other options that you could explore is to reduce the number of trials , reduce the number of zooms, change the lambda algorithm directly.
Feel free to message me directly if you want more advice on this.
Ian

Sign in to comment.

2 Answers

Answer by Ian Noell on 15 Apr 2016

After discussing this question offline with Nicholas, we identified that the fitting of convex hull boundary models was taking a very long time with large data sets and number of inputs. Fitting a convex hull boundary model occurs by default from R2014b. You can uncheck the Fit boundary model option in the Fit Models dialog or wizard.
In R2015b we changed the default boundary model to be pairwise convex hulls when there are more than 10 inputs or more than 2000 data points. The R2015b release notes provide details.

  0 Comments

Sign in to comment.


Answer by Ian Noell on 31 Mar 2016

Hi Nicholas,
Please see my answer in the comments.
Ian

  1 Comment

Hi Ian,
Thanks, I didn't know how to reply to the comment so I left a new comment below.
Thanks,
Nicholas

Sign in to comment.