vmap memory usage - is this expected behaviour? #6194

adam-hartshorne · 2021-03-24T01:46:36Z

adam-hartshorne
Mar 24, 2021

I have tried to batch my training data using a vmap based approach, rather than manually selecting the batch size. However if I do this, in terms of memory usage, it seems to act like the data isn't batched at all and in my example case then causes out of memory exit.
e.g.

def cost_func(params, example):
   # Do stuff

@jax.jit
def train_step(params, optimizer_state, data):
   per_example_values, per_example_grads = jax.vmap(
            jax.value_and_grad(cost_func, argnums=(0)),
            (None, 0))(params, data)
   grads = jnp.sum(per_example_grads, axis=0)
   updates, opt_state = optimizer.update(grads, optimizer_state, params)
   return value, optax.apply_updates(params, updates), opt_state

def main():
    params = {
        'param_a': jnp.ones(10)
    }
    optimizer = optax.adam(0.1)
    opt_state = optimizer.init(params)
    data = jnp.array(np.random.randn(123, 10))
    train_step(params, optimizer, opt_state, data)

My presumption was that vmap would vectorize in such a manner to ensure that only available memory was filled. Is this an incorrect assumption or is there something going wrong with vmap?

Answered by jakevdp

Mar 24, 2021

I think you have the wrong mental model of what vmap is doing. vmap is about logical batching, and does not imply anything about sequential computation of the batches. In the simplest cases, using vmap is identical to using standard numpy-style arguments in functions. Here is a quick example showing this:

from jax import vmap, make_jaxpr
import jax.numpy as jnp

x = jnp.ones((3, 4))

make_jaxpr(vmap(jnp.sum))(x)
# { lambda  ; a.
#   let b = reduce_sum[ axes=(1,) ] a
#   in (b,) }

make_jaxpr(lambda x: jnp.sum(x, axis=-1))(x)
# { lambda  ; a.
#   let b = reduce_sum[ axes=(1,) ] a
#   in (b,) }

Calling vmap on sum for 2D input is identical to calling an unmapped sum with an axis argument: t…

View full answer

jakevdp · 2021-03-24T04:32:10Z

jakevdp
Mar 24, 2021
Maintainer

I think you have the wrong mental model of what vmap is doing. vmap is about logical batching, and does not imply anything about sequential computation of the batches. In the simplest cases, using vmap is identical to using standard numpy-style arguments in functions. Here is a quick example showing this:

from jax import vmap, make_jaxpr
import jax.numpy as jnp

x = jnp.ones((3, 4))

make_jaxpr(vmap(jnp.sum))(x)
# { lambda  ; a.
#   let b = reduce_sum[ axes=(1,) ] a
#   in (b,) }

make_jaxpr(lambda x: jnp.sum(x, axis=-1))(x)
# { lambda  ; a.
#   let b = reduce_sum[ axes=(1,) ] a
#   in (b,) }

Calling vmap on sum for 2D input is identical to calling an unmapped sum with an axis argument: this is a consequence of the fact that the batching rule for lax.reduce_sum is computed in terms of a higher dimensional lax.reduce_sum (it's a bit opaque, but you can see the implementation here).

The closest thing JAX has to what you're after is pmap, which is a transform that implements single-program-multiple-device parallelism. But if you need your code to be run sequentially on a single device, I believe that would have to be done manually.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vmap memory usage - is this expected behaviour? #6194

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

vmap memory usage - is this expected behaviour? #6194

Uh oh!

adam-hartshorne Mar 24, 2021

Replies: 1 comment

Uh oh!

Uh oh!

jakevdp Mar 24, 2021 Maintainer

adam-hartshorne
Mar 24, 2021

jakevdp
Mar 24, 2021
Maintainer