Skip to content
Discussion options

You must be logged in to vote
  1. is the scan operation distributed across the mesh devices?

Yes. Each device in the mesh will run the shard_mapped function (ring_attention_standard in this case). In other words, you pass the "per-device" function to shard_map, and exactly what's in the function will be run on each device (unlike jit, which you pass a "global" function that is then automatically rewritten to run across multiple devices). So the scan is run on each device, although it won't be automatically distributed, it just runs as specified.

  1. if so, is the scan sequential, i.e. does each device wait for the output of the previous device?

It is sequential. The devices can run independently of each other though, …

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@jpilaul
Comment options

@jpilaul
Comment options

@skye
Comment options

skye Apr 22, 2024
Maintainer

Answer selected by jpilaul
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants