fmincon: less optimal results with user supplied analytic gradient?

Hi there, I have a question concerning the fmincon optimization procedure in generel: I am minimizing a very complicated (unfortunately not necessarily convex and continuous) functional using the fmincon, currently along with the interior-point-algorithm.
Until some days ago, I used finite differencing to estimate the gradients. Since I derived an analytic expression for the functional, I went through the tedious work of deriving an analytic expression for the gradient, mainly to accelerate the optimization. In fact, the computation is significantly faster, but the final function value is bigger, i.e. less optimal, which I don't understand. I already compared the finite-differencing and analytic gradients, but this does not seem to be the problem (accuracy ~ 1e-6).
Do you know if there are some parameters, I should particularly adjust to tune the algorithm towards using analytic gradients, or can you imaging, where the problem might be? Thanks a lot for your help. And by the way: thanks a lot to the community in general. Although this is my first active participation, I have been extensively and successfully using the forums for years... :-)
Best Mathias

4 Comments

unfortunately not necessarily convex and continuous
Maybe not convex, but the function and the constraints are required to be continuous (and twice differentiable).
As for the rest, we need to see the difference in output that you're talking about and how significant it is. One certianly doesn't expect an identical solution to the finite differencing case.
Hey, thanks for the reply,
the bad continuity is unfortunately problem specific, but its not that a big deal. Using finite differencing, I was already able to get almost perfect results, and the function only becomes badly conditioned (noncontinuous) at the border of the search space.
For example the function values I am obtaining are:
Finite Differencing: 9.62976
First-order Norm of
Iter F-count f(x) Feasibility optimality step
0 139 1.190988e+01 0.000e+00 1.334e-02
1 278 1.190547e+01 0.000e+00 1.327e-02 7.001e-02
...
111 15789 9.629760e+00 0.000e+00 2.130e+02 2.617e-08
112 15933 9.629760e+00 0.000e+00 2.130e+02 1.309e-08
113 16079 9.629760e+00 0.000e+00 2.130e+02 1.636e-09
Optimization stopped because the relative changes in all elements of x are
less than options.TolX = 1.000000e-10, and the relative maximum constraint
violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
Optimization Metric Options
max(abs(delta_x./x)) = 7.82e-11 TolX = 1e-10 (selected)
relative max(constraint violation) = 0.00e+00 TolCon = 3e-01 (selected)
and for the analytic gradient:
Analytic Gradients: 10.15389
First-order Norm of
Iter F-count f(x) Feasibility optimality step
0 1 1.190988e+01 0.000e+00 1.331e-02
1 2 1.190547e+01 0.000e+00 1.327e-02 6.679e-02
...
28 98 1.015389e+01 0.000e+00 1.464e-02 4.946e-06
29 108 1.015389e+01 0.000e+00 1.464e-02 1.082e-06
30 119 1.015389e+01 0.000e+00 1.464e-02 1.183e-07
Optimization stopped because the relative changes in all elements of x are
less than options.TolX = 1.000000e-10, and the relative maximum constraint
violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
Optimization Metric Options
max(abs(delta_x./x)) = 5.06e-11 TolX = 1e-10 (selected)
relative max(constraint violation) = 0.00e+00 TolCon = 3e-01 (selected)
I really don't expect an identical output, but at least something comparable, and in particular, I would expect the analytic gradient to yield better results. I also checked the gradient and neighborhood, where the analytic gradient optimization stops, there seems to be no problem such as non-continuous areas nearby. From my point of view, the algorithm simply does not expect the function value to be significantly decreased anymore.
EDIT:
Especially, if you compute the gradient where the optimization stops, you still have entrys, which significantly differ from zero
[~,df] = f(G_opt);
max(abs(df))
ans =
0.0147
the function only becomes badly conditioned (noncontinuous) at the border of the search space.
Not sure what you mean by the "search space". Even when constraints are specified, fmincon searches over all of R^n, which has no borders. Constraints are only satisfied asymptotically.
Anyway, we still need to see more information. How much do the G_opts differ in each case? Why should I consider 0.0147 a large final value for the gradient? Maybe it started at 1e7. Also, you said the finite differencing gave "amost perfect results", yet in what you show above it stopped with a much larger max gradient 2.130e+02.
Similarly, why should I consider the difference between 9.62976 and 10.15389 significant? Maybe the typical/starting value of the function was 1e20.
It would help to actually show your objective function and constraints, and the code you used to initialize and call fmincon.
Okay, yeah sorry, that was stupid not providing any knowledge about the problem. The functional f depending on N variables, G varies within the Bounds
G in [-Gmax,...,+Gmax]^N, Gmax = 25.0
The G_opt differes by something like 13.2403, which is more than a quarter of the full range of possible values given by Gmax. In addition, I added some lines of the iteration output in my first reply, to give some idea about the first iteration function values.
Ohh, and the optimization has only these bound constraints
lb = -Gmax;
ub = +Gmax;
No further constraints.

Sign in to comment.

Answers (1)

Matt J
Matt J on 13 Oct 2013
Edited: Matt J on 13 Oct 2013
My best appraisal, without seeing the details of your objective function, is that you have unbounded (or very large) second derivatives at/near your solution. That's the only thing that explains to me why the algorithm starts to take very tiny Newton-like steps even in regions where the gradients are still pretty large. Note that this happens both with the finite differencing and analytical case. It doesn't appear that either version successfully finished the optimization.

8 Comments

Hey, okay I just checked that by having a look at the Hessian output of fmincon: in fact there are some very big entries in the matrix, probably causing the tiny steps. However, I tested another approach on how to compute the Hessian
options.Hessian = 'lbfgs';
% instead of
% options.Hessian = 'bfgs';
This brought a huge benefit to the optimization:
Analytic Gradients: 9.61685
First-order Norm of
Iter F-count f(x) Feasibility optimality step
0 1 1.190988e+01 0.000e+00 1.331e-02
1 2 1.190547e+01 0.000e+00 1.327e-02 6.679e-02
...
43 139 9.616847e+00 0.000e+00 8.225e-03 9.500e-07
44 148 9.616847e+00 0.000e+00 8.225e-03 4.156e-07
Optimization stopped because the relative changes in all elements of x are
less than options.TolX = 1.000000e-10, and the relative maximum constraint
violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
Optimization Metric Options
max(abs(delta_x./x)) = 8.99e-11 TolX = 1e-10 (selected)
relative max(constraint violation) = 0.00e+00 TolCon = 3e-01 (selected)
To me that looks as if the optimization finally stopped where it is actually supposed to stop. I was also considering deriving an analytic expression for the Hessian as well, but since the results already do look quite good, I guess I won't have to. From your point of view, does the optimality measure indicate, that this is a local optimum?
Thanks a lot for the helpful hint, you made my sunday! :-)
The result still doesn't look too great to me. The first order optimality measure still seems suspiciously large-looking and it doesn't look like you travelled very far throughout the optimization. It also bothers me that you were getting super-large Hessian values, even with the default bfgs method. We still haven't gotten to the bottom of why they were as large as they were. Did you check that they got smaller when you switched to lbfgs?
The only way to test these things anyway is on simulated data sets where you know apriori what the solution should be.
A few other notes.
  1. For the interior-point algorihtm, the documentation recommends using 'fin-diff-grads' when computing the analytical gradient, as you are, instead of 'lbfgs'.
  2. Your TolCon= 2.5e-01 seems like an oddly large choice. No idea why you didn't like the default value. Since you're safely in the interior, I guess it doesn't matter.
Hey,
concerning your comments:
  1. I checked the 'fin-diff-grads' algorithm before but the benefit was not as good, in addition the computation becomes quite slow then
  2. The TolCon was just chosen to be 1% of Gmax, but you're right, I might as well just use the default value instead
However, I got another slight improvement by setting the subproblem algorithm:
options.SubproblemAlgorithm = 'cg';
With that the final output becomes
First-order Norm of
Iter F-count f(x) Feasibility optimality step
0 1 1.190988e+01 0.000e+00 1.331e-02
1 2 1.190547e+01 0.000e+00 1.327e-02 6.679e-02
...
56 127 9.590544e+00 0.000e+00 8.244e-03 9.437e-06
57 135 9.590544e+00 0.000e+00 8.244e-03 5.161e-07
Optimization stopped because the relative changes in all elements of x are
less than options.TolX = 1.000000e-20, and the relative maximum constraint
violation, 0.000000e+00, is less than options.TolCon = 1.000000e-06.
Optimization Metric Options
max(abs(delta_x./x)) = 5.68e-21 TolX = 1e-20 (selected)
relative max(constraint violation) = 0.00e+00 TolCon = 1e-06 (selected)
I am not quite sure about that, maybe the optimality measure just has some problem with the fact that some variables move towards the edges of the non-continuous regions, as the local minimum is located there. If the Hessian is calculated at that point (numerically), the optimality measure may become relatively small or again the big entries in the Hessian make the optimizer stop due to tiny step sizes.
I am not sure, if the process of deriving the Hessian analytically is worth the effort, since the results already looks quite good to me.
maybe the optimality measure just has some problem with the fact that some variables move towards the edges of the non-continuous regions, as the local minimum is located there.
That sounds different from what you said before. Earlier you said that the solution is safely in the interior of the feasible region, away from the discontinuities. If the solution lies at a discontinuity, that would indeed explain why you are getting super-large Hessians.
Mathias Commented:
Yeah, sorry, what I meant is that, generally these areas lie at the border of the interior. In certain cases, however, these discontinuities may be located near the current point. I actually assumed, that if the optimizer would interpret this as a local minimum, it would stop varying the associated variable (the functional is separable to some extent). I didn't expect fmincon to set very tiny steps for all variables and hence affect the whole optimization. Is there a chance of e.g. limiting the max. elements in the Hessian to a certain value?
Also, I was considering adjusting the functional such, that the discontinuities are somewhat smoothed to some extent? Do you think that could help?
I really wouldn't know what to recommend without seeing the objective function.
Yeah, the objective function is unfortunately 1500 lines of code, it is a highly complicated and highly nonlinear thing... I will try to have a more detailed look on that tomorrow or the day after, maybe then I will be able to ask some more concrete questions, right now I am not really sure how to proceed. But thank you very much for the moment, you already brought some helpful insight into the problems, that I am probably having...
A mathematical description of the problem then, showing what you're trying to minimize and why/where the discontinuities occur.

Sign in to comment.

Asked:

on 13 Oct 2013

Commented:

on 15 Oct 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!