Pipeline parallelism hello world #17246
Unanswered
dionhaefner
asked this question in
Q&A
Replies: 2 comments 5 replies
-
I think you have to rely on something like alpa for now: |
Beta Was this translation helpful? Give feedback.
1 reply
-
Thank you for your question, indeed there is no ergonomic recipe on how to do this today, however a reference implementation you might want to use is implemented in Praxis , I don’t know enough about your use case, do mind sharing more details? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm experimenting with pipeline parallelism where subsequent computations are executed on different devices.
I've not been able to jit a simple function that takes parameters living on different devices. I'm assuming this needs some explicit sharding information but I got confused by the tutorials which seem to be written for the more advanced case of sharding individual axes (instead of entire arrays).
Example code:
Error:
Beta Was this translation helpful? Give feedback.
All reactions