Efficiency of jit and vmap #21505

AnthonyChouGit · 2024-05-30T09:09:28Z

AnthonyChouGit
May 30, 2024

As demonstrated by the doc, jax.jit and jax.vmap can be composed to produce a batched-compiled version of the target function. What I wanna know is whether there is a difference between jax.jit(jax.vmap(func)) and jax.vmap(jax.jit(func)) behind the scene. It seems to me that the former should generate more efficient code since the batched func is compiled altogether.

A second question is about vmap alone. Does vmap produce a new version of batched function each time it encounters a different batch size? For example, we have jax.vmap(jax.jit(func))(x), will there be extra overhead if the input x is shaped (32, D) for the first time and (64, D) for the second? Will there be any re-mapping or re-compilation taking place at the second time?

froystig · 2024-06-05T16:42:50Z

froystig
Jun 5, 2024
Maintainer

What I wanna know is whether there is a difference between jax.jit(jax.vmap(func)) and jax.vmap(jax.jit(func)) behind the scene. It seems to me that the former should generate more efficient code since the batched func is compiled altogether.

Same outcome. In either case, the batched function is compiled. For example:

>>> import jax; import jax.numpy as jnp
>>> def f(m, v): return m @ v
... 
>>> m, vs = jnp.ones((3, 4)), jnp.ones((7, 4))
>>> jax.make_jaxpr(jax.jit(jax.vmap(f, in_axes=(None, 0))))(m, vs)
{ lambda ; a:f32[3,4] b:f32[7,4]. let
    c:f32[7,3] = pjit[
      name=f
      jaxpr={ lambda ; d:f32[3,4] e:f32[7,4]. let
          f:f32[3,7] = dot_general[
            dimension_numbers=(([1], [1]), ([], []))
            preferred_element_type=float32
          ] d e
          g:f32[7,3] = transpose[permutation=(1, 0)] f
        in (g,) }
    ] a b
  in (c,) }
>>> jax.make_jaxpr(jax.vmap(jax.jit(f), in_axes=(None, 0)))(m, vs)
{ lambda ; a:f32[3,4] b:f32[7,4]. let
    c:f32[3,7] = pjit[
      name=f
      jaxpr={ lambda ; d:f32[3,4] e:f32[7,4]. let
          f:f32[3,7] = dot_general[
            dimension_numbers=(([1], [1]), ([], []))
            preferred_element_type=float32
          ] d e
        in (f,) }
    ] a b
    g:f32[7,3] = transpose[permutation=(1, 0)] c
  in (g,) }

Does vmap produce a new version of batched function each time it encounters a different batch size?

The batching transformation would happen every time, i.e. for different batch sizes but also for before-seen batch sizes. There are no caching guarantees here, so this could change. But the batching transformation is also efficient and inline, and there isn't much overhead to it over standard evaluation: the system is essentially bookkeeping batch dimensions and calling into simple batching rules for evaluation. If you'll be compiling anyway, then jit will cache computations for you.

4 replies

AnthonyChouGit Jun 7, 2024
Author

So if I use jax.vmap(jax.jit(func))(x) and pass x's of different batch sizes, re-compilation will be taking place every time. Is that correct? If so, I should do some padding to avoid that situation.

froystig Jun 10, 2024
Maintainer

The cache associated with jit would still do its part, so you wouldn't always recompile today. Example:

import jax
import jax.numpy as jnp

def f(x):
  return 2. * x

g = jax.vmap(jax.jit(f))

with jax.log_compiles(True):
  g(jnp.zeros(2))
  g(jnp.zeros(3))
  g(jnp.ones(3))

When I run this, logs show "Compiling f" twice only.

AnthonyChouGit Jun 11, 2024
Author

Your second and third calls to f pass x's of the same batch size, so re-compilation shouldn't occur the third time by any means. However, f is re-compiled at the second call because the batch size changes. Considering your example, I still think padding is needed to avoid the constant recompilation due to the batch size change. Please correct me if I take it the wrong way. Thanks for your patience.

froystig Jun 12, 2024
Maintainer

Yes, I was aiming to show that the compilation cache works the same whether under vmap(jit(...)) or jit(vmap(...)). That cache (today) is still keyed on shapes however (and element types), since that is the level of specialization required for compilation. You can pad and mask to reduce compilation further, but that's true regardless of the order of jit and vmap.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficiency of jit and vmap #21505

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Efficiency of jit and vmap #21505

Uh oh!

Uh oh!

AnthonyChouGit May 30, 2024

Replies: 1 comment · 4 replies

Uh oh!

froystig Jun 5, 2024 Maintainer

Uh oh!

AnthonyChouGit Jun 7, 2024 Author

Uh oh!

froystig Jun 10, 2024 Maintainer

Uh oh!

Uh oh!

AnthonyChouGit Jun 11, 2024 Author

Uh oh!

froystig Jun 12, 2024 Maintainer

AnthonyChouGit
May 30, 2024

Replies: 1 comment 4 replies

froystig
Jun 5, 2024
Maintainer

AnthonyChouGit Jun 7, 2024
Author

froystig Jun 10, 2024
Maintainer

AnthonyChouGit Jun 11, 2024
Author

froystig Jun 12, 2024
Maintainer