fmincon: less optimal results with user supplied analytic gradient?

Question

0 votes

Hi there, I have a question concerning the fmincon optimization procedure in generel: I am minimizing a very complicated (unfortunately not necessarily convex and continuous) functional using the fmincon, currently along with the interior-point-algorithm.

Until some days ago, I used finite differencing to estimate the gradients. Since I derived an analytic expression for the functional, I went through the tedious work of deriving an analytic expression for the gradient, mainly to accelerate the optimization. In fact, the computation is significantly faster, but the final function value is bigger, i.e. less optimal, which I don't understand. I already compared the finite-differencing and analytic gradients, but this does not seem to be the problem (accuracy ~ 1e-6).

Do you know if there are some parameters, I should particularly adjust to tune the algorithm towards using analytic gradients, or can you imaging, where the problem might be? Thanks a lot for your help. And by the way: thanks a lot to the community in general. Although this is my first active participation, I have been extensively and successfully using the forums for years... :-)

Best Mathias

4 Comments
Show 2 older comments Hide 2 older comments

Mathias on 13 Oct 2013

Edited: Mathias on 13 Oct 2013

Open in MATLAB Online

Hey, thanks for the reply,

the bad continuity is unfortunately problem specific, but its not that a big deal. Using finite differencing, I was already able to get almost perfect results, and the function only becomes badly conditioned (noncontinuous) at the border of the search space.

For example the function values I am obtaining are:

 Finite Differencing:  9.62976
                                            First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0     139    1.190988e+01    0.000e+00    1.334e-02
    1     278    1.190547e+01    0.000e+00    1.327e-02    7.001e-02
  ...
  111   15789    9.629760e+00    0.000e+00    2.130e+02    2.617e-08
  112   15933    9.629760e+00    0.000e+00    2.130e+02    1.309e-08
  113   16079    9.629760e+00    0.000e+00    2.130e+02    1.636e-09
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-10, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   7.82e-11                    TolX =   1e-10  (selected)
relative max(constraint violation) =   0.00e+00    TolCon =   3e-01 (selected)

and for the analytic gradient:

 Analytic Gradients:  10.15389
                                            First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0       1    1.190988e+01    0.000e+00    1.331e-02
    1       2    1.190547e+01    0.000e+00    1.327e-02    6.679e-02
   ...
   28      98    1.015389e+01    0.000e+00    1.464e-02    4.946e-06
   29     108    1.015389e+01    0.000e+00    1.464e-02    1.082e-06
   30     119    1.015389e+01    0.000e+00    1.464e-02    1.183e-07
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-10, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   5.06e-11                    TolX =   1e-10  (selected)
 relative max(constraint violation) =   0.00e+00    TolCon =   3e-01 (selected)

I really don't expect an identical output, but at least something comparable, and in particular, I would expect the analytic gradient to yield better results. I also checked the gradient and neighborhood, where the analytic gradient optimization stops, there seems to be no problem such as non-continuous areas nearby. From my point of view, the algorithm simply does not expect the function value to be significantly decreased anymore.

EDIT:

Especially, if you compute the gradient where the optimization stops, you still have entrys, which significantly differ from zero

 [~,df] = f(G_opt);
 max(abs(df))
 ans =
    0.0147

Matt J on 13 Oct 2013

Edited: Matt J on 13 Oct 2013

the function only becomes badly conditioned (noncontinuous) at the border of the search space.

Not sure what you mean by the "search space". Even when constraints are specified, fmincon searches over all of R^n, which has no borders. Constraints are only satisfied asymptotically.

Anyway, we still need to see more information. How much do the G_opts differ in each case? Why should I consider 0.0147 a large final value for the gradient? Maybe it started at 1e7. Also, you said the finite differencing gave "amost perfect results", yet in what you show above it stopped with a much larger max gradient 2.130e+02.

Similarly, why should I consider the difference between 9.62976 and 10.15389 significant? Maybe the typical/starting value of the function was 1e20.

It would help to actually show your objective function and constraints, and the code you used to initialize and call fmincon.

Mathias on 13 Oct 2013

Edited: Mathias on 13 Oct 2013

Open in MATLAB Online

Okay, yeah sorry, that was stupid not providing any knowledge about the problem. The functional f depending on N variables, G varies within the Bounds

G in [-Gmax,...,+Gmax]^N, Gmax = 25.0

The G_opt differes by something like 13.2403, which is more than a quarter of the full range of possible values given by Gmax. In addition, I added some lines of the iteration output in my first reply, to give some idea about the first iteration function values.

Ohh, and the optimization has only these bound constraints

 lb = -Gmax;
 ub = +Gmax;

No further constraints.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 13 Oct 2013

Edited: Matt J on 13 Oct 2013

0 votes

My best appraisal, without seeing the details of your objective function, is that you have unbounded (or very large) second derivatives at/near your solution. That's the only thing that explains to me why the algorithm starts to take very tiny Newton-like steps even in regions where the gradients are still pretty large. Note that this happens both with the finite differencing and analytical case. It doesn't appear that either version successfully finished the optimization.

8 Comments
Show 6 older comments Hide 6 older comments

Mathias on 13 Oct 2013

Open in MATLAB Online

Hey, okay I just checked that by having a look at the Hessian output of fmincon: in fact there are some very big entries in the matrix, probably causing the tiny steps. However, I tested another approach on how to compute the Hessian

 options.Hessian     = 'lbfgs';
 % instead of
 % options.Hessian     = 'bfgs';

This brought a huge benefit to the optimization:

 Analytic Gradients:  9.61685
                                            First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0       1    1.190988e+01    0.000e+00    1.331e-02
    1       2    1.190547e+01    0.000e+00    1.327e-02    6.679e-02
   ...
   43     139    9.616847e+00    0.000e+00    8.225e-03    9.500e-07
   44     148    9.616847e+00    0.000e+00    8.225e-03    4.156e-07
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-10, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 2.500000e-01.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   8.99e-11                    TolX =   1e-10  (selected)
relative max(constraint violation) =   0.00e+00    TolCon =   3e-01 (selected)

To me that looks as if the optimization finally stopped where it is actually supposed to stop. I was also considering deriving an analytic expression for the Hessian as well, but since the results already do look quite good, I guess I won't have to. From your point of view, does the optimality measure indicate, that this is a local optimum?

Thanks a lot for the helpful hint, you made my sunday! :-)

Mathias on 14 Oct 2013

Open in MATLAB Online

Hey,

concerning your comments:

I checked the 'fin-diff-grads' algorithm before but the benefit was not as good, in addition the computation becomes quite slow then
The TolCon was just chosen to be 1% of Gmax, but you're right, I might as well just use the default value instead

However, I got another slight improvement by setting the subproblem algorithm:

options.SubproblemAlgorithm = 'cg';

With that the final output becomes

 First-order      Norm of
 Iter F-count            f(x)  Feasibility   optimality         step
    0       1    1.190988e+01    0.000e+00    1.331e-02
    1       2    1.190547e+01    0.000e+00    1.327e-02    6.679e-02
   ...
   56     127    9.590544e+00    0.000e+00    8.244e-03    9.437e-06
   57     135    9.590544e+00    0.000e+00    8.244e-03    5.161e-07
 Optimization stopped because the relative changes in all elements of x are
 less than options.TolX = 1.000000e-20, and the relative maximum constraint
 violation, 0.000000e+00, is less than options.TolCon = 1.000000e-06.
 Optimization Metric                                         Options
 max(abs(delta_x./x)) =   5.68e-21                    TolX =   1e-20 (selected)
relative max(constraint violation) =   0.00e+00    TolCon =   1e-06 (selected)

I am not quite sure about that, maybe the optimality measure just has some problem with the fact that some variables move towards the edges of the non-continuous regions, as the local minimum is located there. If the Hessian is calculated at that point (numerically), the optimality measure may become relatively small or again the big entries in the Hessian make the optimizer stop due to tiny step sizes.

I am not sure, if the process of deriving the Hessian analytically is worth the effort, since the results already looks quite good to me.

Mathias on 15 Oct 2013

Yeah, the objective function is unfortunately 1500 lines of code, it is a highly complicated and highly nonlinear thing... I will try to have a more detailed look on that tomorrow or the day after, maybe then I will be able to ask some more concrete questions, right now I am not really sure how to proceed. But thank you very much for the moment, you already brought some helpful insight into the problems, that I am probably having...

Matt J on 15 Oct 2013

A mathematical description of the problem then, showing what you're trying to minimize and why/where the discontinuities occur.

Sign in to comment.

fmincon: less optimal results with user supplied analytic gradient?

4 Comments
Show 2 older comments Hide 2 older comments

Answers (1)

8 Comments
Show 6 older comments Hide 6 older comments

Categories

Products

Tags

Community Treasure Hunt

fmincon: less optimal results with user supplied analytic gradient?

4 Comments Show 2 older comments Hide 2 older comments

Answers (1)

8 Comments Show 6 older comments Hide 6 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

8 Comments
Show 6 older comments Hide 6 older comments