fmincon: less optimal results with user supplied analytic gradient?
Show older comments
Hi there, I have a question concerning the fmincon optimization procedure in generel: I am minimizing a very complicated (unfortunately not necessarily convex and continuous) functional using the fmincon, currently along with the interior-point-algorithm.
Until some days ago, I used finite differencing to estimate the gradients. Since I derived an analytic expression for the functional, I went through the tedious work of deriving an analytic expression for the gradient, mainly to accelerate the optimization. In fact, the computation is significantly faster, but the final function value is bigger, i.e. less optimal, which I don't understand. I already compared the finite-differencing and analytic gradients, but this does not seem to be the problem (accuracy ~ 1e-6).
Do you know if there are some parameters, I should particularly adjust to tune the algorithm towards using analytic gradients, or can you imaging, where the problem might be? Thanks a lot for your help. And by the way: thanks a lot to the community in general. Although this is my first active participation, I have been extensively and successfully using the forums for years... :-)
Best Mathias
4 Comments
unfortunately not necessarily convex and continuous
Maybe not convex, but the function and the constraints are required to be continuous (and twice differentiable).
As for the rest, we need to see the difference in output that you're talking about and how significant it is. One certianly doesn't expect an identical solution to the finite differencing case.
the function only becomes badly conditioned (noncontinuous) at the border of the search space.
Not sure what you mean by the "search space". Even when constraints are specified, fmincon searches over all of R^n, which has no borders. Constraints are only satisfied asymptotically.
Anyway, we still need to see more information. How much do the G_opts differ in each case? Why should I consider 0.0147 a large final value for the gradient? Maybe it started at 1e7. Also, you said the finite differencing gave "amost perfect results", yet in what you show above it stopped with a much larger max gradient 2.130e+02.
Similarly, why should I consider the difference between 9.62976 and 10.15389 significant? Maybe the typical/starting value of the function was 1e20.
It would help to actually show your objective function and constraints, and the code you used to initialize and call fmincon.
Answers (1)
My best appraisal, without seeing the details of your objective function, is that you have unbounded (or very large) second derivatives at/near your solution. That's the only thing that explains to me why the algorithm starts to take very tiny Newton-like steps even in regions where the gradients are still pretty large. Note that this happens both with the finite differencing and analytical case. It doesn't appear that either version successfully finished the optimization.
8 Comments
Mathias
on 13 Oct 2013
The result still doesn't look too great to me. The first order optimality measure still seems suspiciously large-looking and it doesn't look like you travelled very far throughout the optimization. It also bothers me that you were getting super-large Hessian values, even with the default bfgs method. We still haven't gotten to the bottom of why they were as large as they were. Did you check that they got smaller when you switched to lbfgs?
The only way to test these things anyway is on simulated data sets where you know apriori what the solution should be.
A few other notes.
- For the interior-point algorihtm, the documentation recommends using 'fin-diff-grads' when computing the analytical gradient, as you are, instead of 'lbfgs'.
- Your TolCon= 2.5e-01 seems like an oddly large choice. No idea why you didn't like the default value. Since you're safely in the interior, I guess it doesn't matter.
Mathias
on 14 Oct 2013
maybe the optimality measure just has some problem with the fact that some variables move towards the edges of the non-continuous regions, as the local minimum is located there.
That sounds different from what you said before. Earlier you said that the solution is safely in the interior of the feasible region, away from the discontinuities. If the solution lies at a discontinuity, that would indeed explain why you are getting super-large Hessians.
Matt J
on 15 Oct 2013
Mathias Commented:
Yeah, sorry, what I meant is that, generally these areas lie at the border of the interior. In certain cases, however, these discontinuities may be located near the current point. I actually assumed, that if the optimizer would interpret this as a local minimum, it would stop varying the associated variable (the functional is separable to some extent). I didn't expect fmincon to set very tiny steps for all variables and hence affect the whole optimization. Is there a chance of e.g. limiting the max. elements in the Hessian to a certain value?
Also, I was considering adjusting the functional such, that the discontinuities are somewhat smoothed to some extent? Do you think that could help?
Mathias
on 15 Oct 2013
Matt J
on 15 Oct 2013
A mathematical description of the problem then, showing what you're trying to minimize and why/where the discontinuities occur.
Categories
Find more on Solver Outputs and Iterative Display in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!