-
Couldn't load subscription status.
- Fork 6.4k
[docs] CP #12331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] CP #12331
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think we should merge this PR in #11941.
I will try to gather some benchmarks to include in the docs.
| pipeline.transformer.parallelize(config=ContextParallelConfig(ring_degree=2)) | ||
| pipeline.transformer.set_attention_backend("flash") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to call parallelize() after loading the model, or is it better to pass a parallel config when initializing the model? Or are both approaches same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think it's just enable_prallelism() now.
| [`ContextParallelConfig`] also supports Ulysses Attention through the `ulysses_degree` argument. This determines the number of devices to use for Ulysses Attention. | ||
|
|
||
| ```py | ||
| pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is ParallelConfig used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't include ParallelConfig because it seems like you just pass ContextParallelConfig to it. So I opted to use ContextParallelConfig directly.
Is the ParallelConfig class meant to support other parallelism strategies not yet implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayakpaul So, the intention with ParallelConfig is to support different kinds of parallelism easily. If you pass just ContextParallelConfig, it just create a ParallelConfig using that automatically.
I think current example is sufficient but we can ofcourse revision once there is more parallelisms supported natively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks! Can we supplement a ParallelConfig as well? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't see any parallelize() method in #11941
|
Sorry for the delay! Please LMK if I can help with anything :) The CP PR is currently blocked because I can't make updates to it (the branch is in the diffusers repo and not a personal fork, so I can't push changes). Hopefully someone can address the tests there and we can proceed here too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stevhliu ! LGTM in general, but the examples are outdated a bit. The latest inference snippet removes enable_parallelism and handles that internally.
The final code looks like this: #11941 (comment)
Sorry for the inconvenience! I forgot to update the description of that PR
|
Ah my bad, I missed that! Code snippet should be updated now. Let me know if there are any more changes :) |
| [`ContextParallelConfig`] supports Ulysses Attention through the `ulysses_degree` argument. This determines how many devices to use for Ulysses Attention. | ||
|
|
||
| ```py | ||
| pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2)) | |
| pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=2)) |
Just one last change and this should be good I think. Off to you @sayakpaul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I would also link the distributed_inference doc from parallel.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!


companion docs for context parallelism with Ring/Ulysses attention (see #11941)
cc @a-r-r-o-w