Skip to content

Add async collectives RFC.#2897

Open
mwhittaker wants to merge 1 commit intoopenxla:mainfrom
mwhittaker:async_collectives_rfc
Open

Add async collectives RFC.#2897
mwhittaker wants to merge 1 commit intoopenxla:mainfrom
mwhittaker:async_collectives_rfc

Conversation

@mwhittaker
Copy link
Member

No description provided.

@mwhittaker mwhittaker requested a review from GleasonK February 9, 2026 20:49
@mwhittaker
Copy link
Member Author

@mattjj @hawkinsp

Copy link
Member

@felixwqp felixwqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I assume implicitly this will only work for shard_map-based sharding? like how ragged_all_to_all is used?

@mwhittaker
Copy link
Member Author

Can I assume implicitly this will only work for shard_map-based sharding? like how ragged_all_to_all is used?

I'm not sure. I haven't thought that far ahead. When I support these async collectives in JAX, though, I do plan on only supporting shard_map at first.

@mwhittaker mwhittaker force-pushed the async_collectives_rfc branch from 89da850 to 3926874 Compare March 13, 2026 18:08
@mwhittaker mwhittaker self-assigned this Mar 13, 2026
@fhoushmand
Copy link
Member

fhoushmand commented Mar 17, 2026

Would this RFC extend to async dynamic-slice/dynamic-update-slice?
We want to add those to jax as well. If this RFC covers it, great. If not we can talk about extending the support.

@GleasonK
Copy link
Member

Would this RFC extend to async dynamic-slice/dynamic-update-slice?

This should naturally extend to these ops. I'm OK to bring them in scope under the umbrella of "known ops that we want to have an async decomposition by a backend"

@GleasonK GleasonK requested a review from fhoushmand March 20, 2026 20:10

This RFC introduces an `async_start` op and an `async_done` op that allow you to
run a collective asynchronously. We also introduce a new future type (e.g.,
`future<tensor<2xf32>>`) to represent the output of a start operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some details on how "in the future we are likely to consider adding scheduling dependencies between async ops and other ops to enforce an execution orderings, but in the meantime async ops are used to denote that a backend should use an async decomposition for a given op.

Copy link

@mattjj mattjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants