Skip to content

Ensure that derivatives up to the order of the optimization algorithm are finite #184

@devmotion

Description

@devmotion

I recently came across a (numerically) quite challenging optimization problem where the gradient and Hessian would sometimes become non-finite (contain Inf or NaNs) for unfortunate parameter configurations. Unfortunately, it seemed that Optim sometimes used steps where the function value was finite but the gradient or Hessian were non-finite.

Maybe I missed something but when I skimmed through the source code of LineSearches.jl I got the impression that currently LineSearches.jl might return such problematic steps as in general it does not guarantee that the gradient and higher-order derivatives are finite.

For instance, MoreThuente checks whether the value and the gradient of the objective function are finite (but only at the initial step)

while (!isfinite(f) || !isfinite(dg)) && iterfinite < iterfinitemax

whereas Backtracking (at the initial step) checks only the function value:

while !isfinite(ϕx_1) && iterfinite < iterfinitemax

Maybe checking only the initial step could be justified by some assumptions about how smoot/well-behaved the objective function is (even though it seems a bit unsafe), but in the case of e.g. BackTracking gradient etc. are not even checked at the initial step.

Intuitively, I'd assume that one would want that the step returned by a line search algorithm for an optimization algorithm that uses derivatives up to order n yields finite derivatives up to order n at the subsequent iterate. Of course, currently the optimization algorithm could (and probably does?) reject steps that yield iterates with non-finite values/derivatives. However, I think if possible LineSearches.jl shouldn't propose such steps at all.

Maybe this could be achieved by - regardless of the line search algorithm! - checking (at least for the initial step sizes) function values for NonDifferentiable functions, function values + gradients for OnceDifferentiable functions and values, gradients, and Hessians for TwiceDifferentiable functions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions