jitted-functions slow performance due to copying of arrays #13225

simon-bachhuber · 2022-11-13T11:45:35Z

simon-bachhuber
Nov 13, 2022

Hello,

do jitted-functions always copy every array even though it might be a unity mapping?
Can somebody explain to me the reason behind the timing shown in the following:

import jax 
import jax.numpy as jnp 


def gen_time_me(dim1, dim2, jit_outer, jit_inner):

    model = (jnp.zeros((dim1, dim1)), jnp.zeros((dim2, dim2)))

    def inner(model):
        # change only parts of the model, e.g. only some states
        return (model[0], model[1] + 1.0)
    
    if jit_inner:
        inner = jax.jit(inner)
        # run once
        inner(model)
    
    def outer(model):
        for _ in range(100):
            model = inner(model)
        return model 

    if jit_outer:
        outer = jax.jit(outer)
        # run once
        outer(model)

    def time_me():
        outer(model)
    
    return time_me 

time_me = gen_time_me(1000, 1, False, False)
%timeit time_me()
# 449 µs ± 11.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
 
time_me = gen_time_me(1000, 1, False, True)
%timeit time_me()
# Why is this so slow? compared to unjitted version?
# Is it because at every inner-step we have to make 
# an actual copy of the 1000x1000 array?
# 178 ms ± 6.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

time_me = gen_time_me(1000, 1, True, True)
%timeit time_me()
# 638 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

time_me = gen_time_me(1, 1000, False, False)
%timeit time_me()
# 152 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

time_me = gen_time_me(1, 1000, False, True)
%timeit time_me()
# 175 ms ± 29.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

time_me = gen_time_me(1, 1000, True, True)
%timeit time_me()
# 2.33 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answered by jakevdp

Nov 13, 2022

To directly answer your question: yes, jit-compiled functions will generally allocate new memory for function outputs, even if the function happens to be an identity. This is due to the fact that in JAX, arrays are immutable and cannot share memory with other arrays, and the jit-of-identity case is not really important enough to implement an exception to the normal XLA computation path.

There are generally two ways around this: first, you could use an outer jit around your function that repeatedly calls the identity, and XLA will optimize-away the repeated identity calls and the copies they'll generate (you see this in your example). Second, on GPU or TPU (not CPU) you could use buffer do…

View full answer

jakevdp · 2022-11-13T14:24:52Z

jakevdp
Nov 13, 2022
Maintainer

To directly answer your question: yes, jit-compiled functions will generally allocate new memory for function outputs, even if the function happens to be an identity. This is due to the fact that in JAX, arrays are immutable and cannot share memory with other arrays, and the jit-of-identity case is not really important enough to implement an exception to the normal XLA computation path.

There are generally two ways around this: first, you could use an outer jit around your function that repeatedly calls the identity, and XLA will optimize-away the repeated identity calls and the copies they'll generate (you see this in your example). Second, on GPU or TPU (not CPU) you could use buffer donation to tell XLA that you want the output of the function to share its buffer with the input, and this should avoid the generation of intermediate copies.

A side note: you might look over Benchmarking in JAX for some tips on running more robust micro-benchmarks in JAX. In particular, you should use block_until_ready to avoid being confounded by asynchronous dispatch.

1 reply

simon-bachhuber Nov 13, 2022
Author

Very interesting. So i guess the following is then just not a great pattern, correct?
Because the dataset will be copied at every epoch even though it is not changing.

# both registered pytrees 
# with call signature e.g. `model, y = model(x)`
model = make_model(...)
dataloader = make_dataloader(dataset, shuffle=..., batch_size=..., drop_last=..., transforms=..., key=...)

@jax.jit 
def step_fn_epoch(model, dataloader):
  for _ in range(dataloader.n_batches_in_epoch):
    dataloader, batch = dataloader()
    model = step_fn_batch(model, batch)
  return model, dataloader

for i_epoch in range(n_epochs):
  model, dataloader = step_fn_epoch(model, dataloader)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

jitted-functions slow performance due to copying of arrays #13225

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

jitted-functions slow performance due to copying of arrays #13225

Uh oh!

simon-bachhuber Nov 13, 2022

Replies: 1 comment · 1 reply

Uh oh!

jakevdp Nov 13, 2022 Maintainer

Uh oh!

simon-bachhuber Nov 13, 2022 Author

simon-bachhuber
Nov 13, 2022

Replies: 1 comment 1 reply

jakevdp
Nov 13, 2022
Maintainer

simon-bachhuber Nov 13, 2022
Author