Can JIT compilation time depend on tensor shapes? #16621

hr0nix · 2023-07-04T00:05:49Z

hr0nix
Jul 4, 2023

I'm currently working on an implementation of autoregressive decoding for transformers in jax.

When my models are small, everything works fine. However, as models become larger, jit compilation time grows. But what is especially surprising, jit compilation time for a model with the same number of transformer blocks and heads, but with large MLP dimensions is significantly higher than for exactly the same model but with small MLP dims.

I would have imagined that jit complexity should be independent of tensor shapes if the computation graph is the same.

I should also probably mention that the compilation time of a forward or a forward-backward pass is small and doesn't seem to depend on parameter dimensions.

Is this something that can potentially be happening or does it likely indicate a bug in jit implementation?

Answered by hr0nix

Jul 4, 2023

TL;DR: it seems that the problem was caused by the fact that model weights were captured by the decoding function instead of being passed in as an argument, so they were a huge constant from jit's perspective and, I guess, constant folding optimization wasn't particularly happy about them.

How I found this out:

I've compared jaxprs of decoding functions for models with different dimensions and established that jaxprs are the same (tensor dimensions aside), so the problem wasn't caused by different computation graphs.
I've then manually split compilation into different stages using AOT functionality and measured the time of each state. As suspected, it was HLO compilation stage that slowe…

View full answer

jakevdp · 2023-07-04T05:31:32Z

jakevdp
Jul 4, 2023
Maintainer

Here’s a way that you could write a function such that compilation time grows with the shape of the input:

@jit 
def f(x):
  y = 0
  for i in range(x.shape[0]):
    y += do_something(x)
  return y

The reason for this is that JAX’s JIT unrolls Python control flow, so here the effective size of the program grows proportionally to the size of the input array, and compilation time grows with program size.

But if your program consists only of simple array operations without this kind of shape-dependent Python control flow, I wouldn’t generally expect compilation time to change with the shape of the input.

2 replies

hr0nix Jul 4, 2023
Author

But if your program consists only of simple array operations without this kind of shape-dependent Python control flow

I would imagine so, but the code is non-trivial and such a dependency might be introduced implicitly somewhere. I guess I'll compare jaxprs for models with different dimensions and see if there is a difference.

hr0nix Jul 4, 2023
Author

TL;DR: it seems that the problem was caused by the fact that model weights were captured by the decoding function instead of being passed in as an argument, so they were a huge constant from jit's perspective and, I guess, constant folding optimization wasn't particularly happy about them.

How I found this out:

I've compared jaxprs of decoding functions for models with different dimensions and established that jaxprs are the same (tensor dimensions aside), so the problem wasn't caused by different computation graphs.
I've then manually split compilation into different stages using AOT functionality and measured the time of each state. As suspected, it was HLO compilation stage that slowed down significantly for large models.
I've then compared HLO sizes and discovered that they differ by several orders of magnitude.
I've looked inside HLO and saw huge constants and it finally clicked.

Answer selected by hr0nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can JIT compilation time depend on tensor shapes? #16621

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can JIT compilation time depend on tensor shapes? #16621

Uh oh!

Uh oh!

hr0nix Jul 4, 2023

Replies: 1 comment · 2 replies

Uh oh!

jakevdp Jul 4, 2023 Maintainer

Uh oh!

hr0nix Jul 4, 2023 Author

Uh oh!

hr0nix Jul 4, 2023 Author

hr0nix
Jul 4, 2023

Replies: 1 comment 2 replies

jakevdp
Jul 4, 2023
Maintainer

hr0nix Jul 4, 2023
Author

hr0nix Jul 4, 2023
Author