Vmap multiple initial conditions? #1297

mathDR · 2025-05-08T22:27:45Z

mathDR
May 8, 2025

Is there benefit to using vmap to start multiple optimizations in parallel? That is, would there be computational "savings" with gradient reuse?

rdyro · 2025-05-08T22:32:51Z

rdyro
May 8, 2025
Maintainer

Generally optax would stay on a higher level, leaving an optimization like this—e.g., if there's an opportunity to reuse gradients—to the XLA compiler thanks to jax.jit.

Do you have a specific application in mind? It's possible there are ways to express computation reuse in a way that'll be easier for the compiler to pick up on an optimization like that.

1 reply

mathDR May 9, 2025
Author

Okay the generic case is: we have a non-convex optimization, so I want to run the optimization over a bunch of initial conditions, then take the "best" one at the end. That way we explore a lot of the local minima.

My thought in gradient reuse is basically we will jit the grad of the objective so that can be trivially called under vmap.

This seems like a standard use case right?

Another use case: vmap a linesearch instead of sequentially updating the next function call. just compute the objective over a span of points and compute the corresponding values to see if it satisfies the wolfe conditions.

rdyro · 2025-05-12T18:00:55Z

rdyro
May 12, 2025
Maintainer

Okay the generic case is: we have a non-convex optimization, so I want to run the optimization over a bunch of initial conditions, then take the "best" one at the end. That way we explore a lot of the local minima.

I think you should be able to create a joint optimization objective (likely by summing losses for all cases) and run a global optax optimization); vmap and compiling the whole step or the whole run should work well.

Another use case: vmap a linesearch instead of sequentially updating the next function call. just compute the objective over a span of points and compute the corresponding values to see if it satisfies the wolfe conditions.

For a parallel line search you should be able to write your own optax linesearch, or we can explore adding it to optax.

2 replies

mathDR May 12, 2025
Author

Okay great. This is where I was leaning, so thanks for responding and verifying that it was a good path to travel.

If/when I get the parallel line search running I will open a PR.

rdyro May 12, 2025
Maintainer

That'd be great, thanks!

mathDR · 2025-05-12T20:19:22Z

mathDR
May 12, 2025
Author

Just to complete a trivial example we will use the Branin function:

import jax
import optax
import jax.numpy as jnp
import jax.random as jr
from jaxtyping import ArrayLike, Float

def branin(x: ArrayLike)->Float:
    """
    Calculates the Branin function value for a given input vector x.

    Args:
        x (jnp.ndarray): Input vector [x1, x2].

    Returns:
        float: Branin function value.
    """
    x1 = x[0]
    x2 = x[1]
    term1 = jnp.square(x2 - (5.1 / (4 * jnp.square(jnp.pi))) * jnp.square(x1) + (5 / jnp.pi) * x1 - 6)
    term2 = 10 * (1 - (1 / (8 * jnp.pi))) * jnp.cos(x1)
    return term1 + term2 + 10

then we know the 3 local (global) optimia are at

true_optimia = jnp.array([[-jnp.pi,12.275], [jnp.pi,2.275], [9.42478,2.475]])

and the typical domain is [-5,10] x [0,15], so we could just initialize a random point in this domain and optimize until convergence.

But we could also just:

x_init_grid = jnp.mgrid[-5:10:.5, 0:15:.5].reshape(2,-1).T
jit_branin = jax.jit(jax.value_and_grad(branin))
# A simple update loop.
x = x_init_grid
for _ in range(25000):
  val, grads = jax.vmap(jit_branin)(x)
  updates, opt_state = optimizer.update(grads, opt_state)
  x = optax.apply_updates(x, updates)

then plotting it yields:

(still trying to understand those two values that are not local optimum...)

3 replies

rdyro May 12, 2025
Maintainer

Nice!

mathDR May 13, 2025
Author

The others are fixed points, but not global minima. In particular, the (0,6) value is a saddle and is only there because we initialized at it.

rdyro May 13, 2025
Maintainer

Oh, that's a cool finding!

Vmap multiple initial conditions? #1297

Uh oh!

mathDR May 8, 2025

Replies: 3 comments · 6 replies

Uh oh!

rdyro May 8, 2025 Maintainer

Uh oh!

Uh oh!

mathDR May 9, 2025 Author

Uh oh!

rdyro May 12, 2025 Maintainer

Uh oh!

mathDR May 12, 2025 Author

Uh oh!

rdyro May 12, 2025 Maintainer

Uh oh!

mathDR May 12, 2025 Author

Uh oh!

rdyro May 12, 2025 Maintainer

Uh oh!

mathDR May 13, 2025 Author

Uh oh!

rdyro May 13, 2025 Maintainer

mathDR
May 8, 2025

Replies: 3 comments 6 replies

rdyro
May 8, 2025
Maintainer

mathDR May 9, 2025
Author

rdyro
May 12, 2025
Maintainer

mathDR May 12, 2025
Author

rdyro May 12, 2025
Maintainer

mathDR
May 12, 2025
Author

rdyro May 12, 2025
Maintainer

mathDR May 13, 2025
Author

rdyro May 13, 2025
Maintainer