Is there a way to figure out what tensors does XLA allocate memory for #20284

hr0nix · 2024-03-16T21:40:05Z

hr0nix
Mar 16, 2024

I'm currently facing the following problem:

I have a jitted jax function, say f(params, input) -> output, which performs inference for a large model. The model is so large that it's only possible to store one copy of params in GPU memory.

According to my calculations, both parameters and intermediate tensors in f (assuming XLA discards them as soon as they aren't needed anymore) should fit into the memory of a GPU. However, when I actually run f on an input, XLA tries to allocate a huge chunk of memory (suspiciously similar in size to params, as is f was making a copy of the params argument) which it cannot do, because there isn't enough memory.

I'm trying to figure out why would my computation require this chunk of memory. I've looked at the generated code and checked for some obvious suspects:

Parameters don't seem to casted to a different type.
There is no implicit resharding of parameters happening.

So I need to dig deeper. Is there some way to figure out what the XLA allocates this tensor for? I've tried finding something that can answer this question in the generated code (lower(f).compile().as_text()), but from what I can tell there is no information about allocations there.

Any suggestions appreciated.

hr0nix · 2024-03-16T23:40:20Z

hr0nix
Mar 16, 2024
Author

I've made some progress: after looking at the trace of the inference code for a small model using tensorboard memory profiler, I've discovered that XLA decided to transpose all matrices for fully connected layers, which constitute the majority of parameters, thus effectively doubling memory consumption.

It might have to do with the structure of the model: my dense params are of shape (expert, input, hidden), which I am multiplying by batch, expert, seq, input. For some reason XLA transposes dense params to (expert, hidden, input).

What is the right way to control this?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to figure out what tensors does XLA allocate memory for #20284

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there a way to figure out what tensors does XLA allocate memory for #20284

Uh oh!

hr0nix Mar 16, 2024

Replies: 1 comment

Uh oh!

hr0nix Mar 16, 2024 Author

hr0nix
Mar 16, 2024

hr0nix
Mar 16, 2024
Author