To a large extent, I want to disagree with Michael in his recent comment, except that he is correct, in that a perusal of the gradient & Hessian, even the Jacobian before you convert this to a scalar problem can help. And since he has ressurrected this question, I might as well respond to a question I never saw in the first place.
My disagreement stems form the idea that just cranking down on the tolerance is a good idea. It may work. But usually not. Just cracking the whip on an optimizer too often just leads to longer run times, with little gain.
As far as the Hessian of a vector valued function being a higher dimensional thing, that is irrelevant. You cannot optimize a vector valued function anyway. You can only convert it by some scheme into a scalar function, and THEN you CAN indeed compute a Hessian. Note that in the case of most vector valued functions, these end up being nonlinear least squares problems that people are trying to solve. It may be a multi-criteria optimization. But if you are optimizing anything, in the end, you converted it to a scalar problem. So still, the Hessian of the final objective is well defined and is a simple matrix.
Once you have a scalar objective, you can look at gradients, at the Hessian. And you can always look at the Jacobian. Is there some magic formula that will tell you how to intelligently rescale the problem? Of course not. If there were, then nonlinear problems would be trivial to solve. An issue is that the starting point may be far away from the solution, and sensitivities of the objective can change greatly when you wander around the parameter space. That suggests the most important thing you can do is to choose intelligent starting values.
As such, that suggests the value of a multi-start solution. If you have no real clue as to the final parameters, then you want to use multiple start points. That will always improve your results. Either it will give you some confidance the solver is arriving at a consistent solution, or it will convince you there is an issue.
Good choices of constraints are also a valuable thing. As much as you can reduce the size of the search space in any multi-dimensional search, this is a huge benefit.
Finally, it is not a bad idea to look at the gradient and Hessian AFTER the result has been obtained. Be careful though. Don't look too carefully at the Hessian returned by a solver like fmincon, as it will only be an approximation to the final Hessian. In fact, I would ignore that output of fmincon completely. But a Hessian is not too difficult to compute at a point. And if you are really worried about the solution, you might decide to use that final Hessian to re-scale the parameters, and then resolve the problem.
But I would never advise someone to just crank down on the convergence tolerances. If I was sick, well chicken soup might help - hey, it can't hurt! But I'd rather give my doctor a call first. And that is how I see cranking down on the tolerances - as the computational equivalent of chicken soup.