How to prevent XLA from combining All-to-All #20205

qGentry · 2024-03-12T19:16:35Z

qGentry
Mar 12, 2024

Hi, in mixture of experts transformer block i have a computation that looks like this:

All-to-All to routed all tokens' embeddings to experts, sharded across entire mesh
Compute experts outputs given tokens embeddings
All-to-All experts outputs to corresponding data replicas.

To be more specific, my embeddings' shape/sharding is the following:
[batch, expert, capacity, embedding] : [data, None, None, None]
I'm performing all-to-all by swapping 0 and 1 axes and resharding across new 0th axis (expert).
[batch, expert, capacity, embedding] -> [expert, batch, capacity, embedding]
[expert, batch, capacity, embedding] : [data, None, None, None]

It is expected that In large-scale setups All-to-All communications become very slow and to overcome this problem i'm trying to pipeline this computation over "capacity" axis to overlap communications and computations. I've implemented this simply by splitting capacity axis by some pipeline factor, running entire computation, and then stacking it again.
Also i've enabled XLA flags for latency hiding scheduler and async all-to-all communications.
But when i'm checking what is going on, I can see that XLA just fused all first all-to-all communications into blocking single one, so no computation is going on until it is finished. Second all-to-all, on the other hand, is well-overlapped.

How can i avoid that? Is there a way to put boundaries to XLA fusion mechanism or somehow disable fusions for specific part of my function?

mattjj · 2024-03-12T19:22:42Z

mattjj
Mar 12, 2024
Maintainer

Thanks for the question!

Is this on GPU or TPU?

1 reply

qGentry Mar 12, 2024
Author

GPU

MoFHeka · 2024-06-20T13:57:32Z

MoFHeka
Jun 20, 2024

Any progress?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to prevent XLA from combining All-to-All #20205

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to prevent XLA from combining All-to-All #20205

Uh oh!

qGentry Mar 12, 2024

Replies: 2 comments · 1 reply

Uh oh!

mattjj Mar 12, 2024 Maintainer

Uh oh!

qGentry Mar 12, 2024 Author

Uh oh!

MoFHeka Jun 20, 2024

qGentry
Mar 12, 2024

Replies: 2 comments 1 reply

mattjj
Mar 12, 2024
Maintainer

qGentry Mar 12, 2024
Author

MoFHeka
Jun 20, 2024