Multiple devices meshes within the same jitted computation #19883

hr0nix · 2024-02-19T22:26:25Z

hr0nix
Feb 19, 2024

Imagine the following hypothetical scenario. I have 16 devices. I also have some jitted computation, some parts of which I would like to shard as if my mesh was 8x2, but other parts should be sharded as if it was 4x4.

Is it true that currently there is no way to achieve what I need because jit assumes a single fixed device mesh? If so, what can I do to work around this limitation.

I can give some rationale for why I'd need to shard the computations in this fashion if needed, but basically it comes down to minimizing communications in a MoE-like model.

hr0nix · 2024-02-23T17:33:09Z

hr0nix
Feb 23, 2024
Author

Ok, I've realized that I don't have to set up mesh globally, I can specify it with each sharding instead, which would allow me to pass different meshes for different computations.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple devices meshes within the same jitted computation #19883

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multiple devices meshes within the same jitted computation #19883

Uh oh!

hr0nix Feb 19, 2024

Replies: 1 comment

Uh oh!

hr0nix Feb 23, 2024 Author

hr0nix
Feb 19, 2024

hr0nix
Feb 23, 2024
Author