# Scaling a vector-valued non-linear function-numerical optimization

7 views (last 30 days)
Commented: Michael on 27 May 2020
I am trying to minimize a non-linear vector-valued objective (cost) function in MATLAB wherein one component of the solution vector is widely separated in scale.
As a test case for my code, I tried to minimize a function whose solution I know apriori.The apriori known solution vector is x_min = [175, 164, 854, 3.7e5, 6000]. As you can see, the 4th solution component (3.7e5) has a wide difference in scale with the rest of the solution vector. As a result, despite changing tolerances, algorithms etc., the best that MATLAB can find for my x(4) solution is 5.2e5.
I understand that using a diagonal (or close to diagonal) scaling matrix for the objective function can help in finding the minimizing solution , i.e. ideally one must scale the solutions such that the Hessian of the objective function becomes the identity matrix at/near the solution.
But the problem is that the Hessian of a vector-valued function is a higher-dimensional Tensor and not a 2D-matrix. Is there a solution, or any other method for handling this problem ?
##### 2 CommentsShowHide 1 older comment
Michael on 26 May 2020
I realize this question is over 4 years old, and you've almost certainly moved on from needing a solution to this, but I find the question interesting, and I somewhat disagree with Torsten's proposed solution.
In the end, it's not the magnitude of the input vector that affects how "close" the algorithm can get to the correct solution. It's the Jacobian of the target function near that solution. If the function is highly insensitive to the 4th input parameter, then 5.2e5 may give a result that is close enough to the minimum that the minimization function (fmincon or whatever you're using) sees no need in proceeding further (i.e. the step tolerance has been satisfied). In fact, if the function is sensitive to the 4th parameter, then you must have met some other stopping criteria for the minimization to quit. Look into the optimization tracing capabilities of the various minimization routines to see if you can get any insight from them.
If you know for a fact that your x_min exactly minimizes the function, then I think your only solution is to reduce the tolerance and/or increase the iteration limit and let the minimization function work harder to get to the final solution.

John D'Errico on 26 May 2020
To a large extent, I want to disagree with Michael in his recent comment, except that he is correct, in that a perusal of the gradient & Hessian, even the Jacobian before you convert this to a scalar problem can help. And since he has ressurrected this question, I might as well respond to a question I never saw in the first place.
My disagreement stems form the idea that just cranking down on the tolerance is a good idea. It may work. But usually not. Just cracking the whip on an optimizer too often just leads to longer run times, with little gain.
As far as the Hessian of a vector valued function being a higher dimensional thing, that is irrelevant. You cannot optimize a vector valued function anyway. You can only convert it by some scheme into a scalar function, and THEN you CAN indeed compute a Hessian. Note that in the case of most vector valued functions, these end up being nonlinear least squares problems that people are trying to solve. It may be a multi-criteria optimization. But if you are optimizing anything, in the end, you converted it to a scalar problem. So still, the Hessian of the final objective is well defined and is a simple matrix.
Once you have a scalar objective, you can look at gradients, at the Hessian. And you can always look at the Jacobian. Is there some magic formula that will tell you how to intelligently rescale the problem? Of course not. If there were, then nonlinear problems would be trivial to solve. An issue is that the starting point may be far away from the solution, and sensitivities of the objective can change greatly when you wander around the parameter space. That suggests the most important thing you can do is to choose intelligent starting values.
As such, that suggests the value of a multi-start solution. If you have no real clue as to the final parameters, then you want to use multiple start points. That will always improve your results. Either it will give you some confidance the solver is arriving at a consistent solution, or it will convince you there is an issue.
Good choices of constraints are also a valuable thing. As much as you can reduce the size of the search space in any multi-dimensional search, this is a huge benefit.
Finally, it is not a bad idea to look at the gradient and Hessian AFTER the result has been obtained. Be careful though. Don't look too carefully at the Hessian returned by a solver like fmincon, as it will only be an approximation to the final Hessian. In fact, I would ignore that output of fmincon completely. But a Hessian is not too difficult to compute at a point. And if you are really worried about the solution, you might decide to use that final Hessian to re-scale the parameters, and then resolve the problem.
But I would never advise someone to just crank down on the convergence tolerances. If I was sick, well chicken soup might help - hey, it can't hurt! But I'd rather give my doctor a call first. And that is how I see cranking down on the tolerances - as the computational equivalent of chicken soup.
Michael on 27 May 2020
I grant you that cranking down on the tolerance should never be the first choice in situations like this. This is why I suggested looking into the optimization tracing capabilities (or at least look at the stopping criteria details) to try to understand what was causing the optimization to give up.
But at the end of the day, if the real-world system you are modeling is insensitive to one of your inputs, and it is still important to you to arrive at the "right" answer, rather than just a "good enough" answer, then in some sense this ceases to be a minimization problem, and requires a more analytical approach to finding a solution. As you rightly point out, no minimization routine (at least that I know of) has any concept of a "right" answer. It is simply going to find an acceptable solution that meets the given criteria.
One question I would ask (assuming the original poster is even still interested): What is the difference in the output of the objective function when the 4th parameter is 5.2e5 vs 3.7e5. If that difference is unimportant, then I'd suggest that the model may be overparameterized.
Anyway, good discussion. I take your point.