-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[docs] Regional compilation docs #11556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
b87a962
add regional compilation docs.
sayakpaul 581cba4
minor.
sayakpaul ea889c1
Merge branch 'main' into regional-compilation-docs
sayakpaul c7eb7fe
Merge branch 'main' into regional-compilation-docs
sayakpaul 8881dc6
reviwer feedback.
sayakpaul bacd403
Update docs/source/en/optimization/torch2.0.md
sayakpaul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -78,6 +78,24 @@ For more information and different options about `torch.compile`, refer to the [ | |
| > [!TIP] | ||
| > Learn more about other ways PyTorch 2.0 can help optimize your model in the [Accelerate inference of text-to-image diffusion models](../tutorials/fast_diffusion) tutorial. | ||
|
|
||
| ### Regional compilation | ||
|
|
||
| Compiling the whole model usually has a big problem space for optimization. Models are often composed of multiple repeated blocks. [Regional compilation](https://pytorch.org/tutorials/recipes/regional_compilation.html) compiles the repeated block first (a transformer encoder block, for example), so that the Torch compiler would re-use its cached/optimized generated code for the other blocks, reducing (often massively) the cold start compilation time observed on the first inference call. | ||
|
|
||
| Enabling regional compilation might require simple yet intrusive changes to the | ||
| modeling code. However, 🤗 Accelerate provides a utility [`compile_regions()`](https://huggingface.co/docs/accelerate/main/en/usage_guides/compilation#how-to-use-regional-compilation) which automatically _only_ compiles | ||
| the repeated blocks of the provided `nn.Module`. | ||
|
|
||
| ```py | ||
| # Make sure you're on the latest `accelerate`: `pip install -U accelerate`. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Merge after There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. released ! |
||
| from accelerate.utils import compile_regions | ||
|
|
||
| pipe.unet = compile_regions(pipe.unet, mode="reduce-overhead", fullgraph=True) | ||
| ``` | ||
|
|
||
| As you may have noticed `compile_regions()` takes the same arguments as `torch.compile()`, allowing | ||
| flexibility. | ||
|
|
||
| ## Benchmark | ||
|
|
||
| We conducted a comprehensive benchmark with PyTorch 2.0's efficient attention implementation and `torch.compile` across different GPUs and batch sizes for five of our most used pipelines. The code is benchmarked on 🤗 Diffusers v0.17.0.dev0 to optimize `torch.compile` usage (see [here](https://github.com/huggingface/diffusers/pull/3313) for more details). | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we actually compile the rest of the model as well 😅 I found out in my post that some people thought only the encoder/decoder block will be compiled in regional, which is not true.
I changed the docs to be more explicit huggingface/accelerate#3572 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👁️ But https://docs.pytorch.org/tutorials/recipes/regional_compilation.html suggests a completely different recipe no? No full compilation but only regional and I always thought that is what should be done.
What am I missing?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regional compilation is simply: cut into regions and then compile those regions. I didn't compare the two approaches but I believe in the context of the pytorch tutorial they were simply trying to reduce cold start, not trying to keep inference optimized as well (they didn't benchamrk inference).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So
Is my understanding right or is it still fragmented?
Do you think providing an option to NOT compile the rest of the blocks could still make sense?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is how it works !
doesn't make sense for me personally, since you will miss on the tuning of the task-specific head. Do you have any specific cases where we don't want to compile the rest of the model ?