jitting a vmapped function or vmapping a jitted function #20505

andremfreitas · 2024-03-30T17:32:34Z

andremfreitas
Mar 30, 2024

Which one is better practice ?
I was curious and wrote this silly test:

import jax
import numpy as np
from time import time as timer

def fn(x):
    return x**2

x = np.random.rand(1000)

jitted_vmapped_fn = jax.jit(jax.vmap(fn))

x_md = np.random.rand(10240, 1000)

start1 = timer()
result1 = jitted_vmapped_fn(x_md)
end1 = timer()

print(f"Time for jit vmapped function: {end1-start1}")

jitted_fn = jax.jit(fn)
vmapped_jitted_fn = jax.vmap(jitted_fn)

start2 = timer()
result2 = vmapped_jitted_fn(x_md)
end2 = timer()

print(f"Time for vmap jitted function: {end2-start2}")

The vmap of a jitted function consistently performed better (almost double speedup compared to jitting a vmapped function):

Time for jit vmapped function: 0.055289506912231445
Time for vmap jitted function: 0.03153562545776367

Answered by ASKabalan

Mar 30, 2024

Hello,

Short answer, jit 'almost' always has to be the outer transformation.

Long answer :

There are three small mistakes you did in your test.

Asynchronous dispatch

JAX runs everything asynchronously, so in your code the values are not guaranteed to be 'doubled' until you use them.

This means

start1 = timer()
result1 = jitted_vmapped_fn(x_md)
end1 = timer()
# jitted_vmapped_fn might still be running

The corret thing to do is

start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()

For more info async_dispatch

Don't profile the jit time

jitting more complex code takes more time. Functions are jit compiled the first time you run them and the subsequent executio…

View full answer

ASKabalan · 2024-03-30T19:15:59Z

ASKabalan
Mar 30, 2024

Hello,

Short answer, jit 'almost' always has to be the outer transformation.

Long answer :

There are three small mistakes you did in your test.

Asynchronous dispatch

JAX runs everything asynchronously, so in your code the values are not guaranteed to be 'doubled' until you use them.

This means

start1 = timer()
result1 = jitted_vmapped_fn(x_md)
end1 = timer()
# jitted_vmapped_fn might still be running

The corret thing to do is

start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()

For more info async_dispatch

Don't profile the jit time

jitting more complex code takes more time. Functions are jit compiled the first time you run them and the subsequent execution run much faster

for example

start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()
print(f"Time for jit vmapped function: {end1-start1}")
start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()
print(f"Time for jit vmapped function second time: {end1-start1}")

Give this result

Time for jit vmapped function: 2.795020341873169
Time for jit vmapped function second time: 0.01996016502380371

You see much much faster !!

Don't include cpu copy time

You are using numpy (CPU) arrays so you are including the copy from CPU to GPU time

do this instead

# x_md = np.random.rand(10240, 1000) # CPU array
# x_md = jnp.array(x_md) # GPU array
x_md = jax.random.normal(jax.random.PRNGKey(0), (10240, 1000)) # Array directly created on GPU

start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()
print(f"Time for jit vmapped function: {end1-start1}")
start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()
print(f"Time for jit vmapped function second time: {end1-start1}")

This gives

Time for jit vmapped function: 0.020798683166503906
Time for jit vmapped function second time: 0.0008347034454345703

So much faster !!

full compare

x_md = jax.random.normal(jax.random.PRNGKey(0), (10240, 1000)) # Array directly created on GPU

start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()
print(f"Time for jit vmapped function: {end1-start1}")
start1 = timer()
result1 = jitted_vmapped_fn(x_md).block_until_ready()
end1 = timer()
print(f"Time for jit vmapped function second time: {end1-start1}")

jitted_fn = jax.jit(fn)
vmapped_jitted_fn = jax.vmap(jitted_fn)

start2 = timer()
result2 = vmapped_jitted_fn(x_md).block_until_ready()
end2 = timer()
print(f"Time for vmap jitted function: {end2-start2}")
start2 = timer()
result2 = vmapped_jitted_fn(x_md).block_until_ready()
end2 = timer()
print(f"Time for vmap jitted function second time : {end2-start2}")

Results :

Time for jit vmapped function: 0.022216796875
Time for jit vmapped function second time: 0.00091552734375
Time for vmap jitted function: 0.016040325164794922
Time for vmap jitted function second time : 0.0011944770812988281

jitted vmap is almost 50% faster than vmap jit

2 replies

andremfreitas Mar 30, 2024
Author

Very insightful comments, thank you so much!
Now I realize my time analysis was meaningless ahah

jakevdp Mar 30, 2024
Maintainer

For future reference, a lot of this advice is covered in FAQ: Benchmarking JAX Code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

jitting a vmapped function or vmapping a jitted function #20505

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

jitting a vmapped function or vmapping a jitted function #20505

Uh oh!

andremfreitas Mar 30, 2024

Asynchronous dispatch

Don't profile the jit time

Replies: 1 comment · 2 replies

Uh oh!

ASKabalan Mar 30, 2024

Asynchronous dispatch

Don't profile the jit time

Don't include cpu copy time

Uh oh!

andremfreitas Mar 30, 2024 Author

Uh oh!

Uh oh!

jakevdp Mar 30, 2024 Maintainer

andremfreitas
Mar 30, 2024

Replies: 1 comment 2 replies

ASKabalan
Mar 30, 2024

andremfreitas Mar 30, 2024
Author

jakevdp Mar 30, 2024
Maintainer