-
Couldn't load subscription status.
- Fork 6.5k
[docs] Regional compilation docs #11556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
b87a962
581cba4
ea889c1
c7eb7fe
8881dc6
bacd403
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -78,6 +78,23 @@ For more information and different options about `torch.compile`, refer to the [ | |
| > [!TIP] | ||
| > Learn more about other ways PyTorch 2.0 can help optimize your model in the [Accelerate inference of text-to-image diffusion models](../tutorials/fast_diffusion) tutorial. | ||
|
|
||
| ### Regional compilation | ||
|
|
||
| Compiling the whole model usually has a big problem space for optimization. Models are often composed of multiple repeated blocks. [Regional compilation](https://pytorch.org/tutorials/recipes/regional_compilation.html) compiles the repeated block first (a transformer encoder block, for example), so that the Torch compiler would re-use its cached/optimized generated code for the other blocks, reducing (often massively) the cold start compilation time observed on the first inference call. | ||
|
|
||
| Enabling regional compilation might require simple yet intrusive changes to the | ||
| modeling code. However, 🤗 Accelerate provides a utility [`compile_regions()`](https://huggingface.co/docs/accelerate/main/en/usage_guides/compilation#how-to-use-regional-compilation) which automatically compiles | ||
| the repeated blocks of the provided `nn.Module` along with other parts of it that are non-repeating. This helps with not only just cold start time but also the inference latency. | ||
|
|
||
| ```py | ||
| # Make sure you're on the latest `accelerate`: `pip install -U accelerate`. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Merge after There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. released ! |
||
| from accelerate.utils import compile_regions | ||
|
|
||
| pipe.unet = compile_regions(pipe.unet, mode="reduce-overhead", fullgraph=True) | ||
| ``` | ||
|
|
||
| As you may have noticed `compile_regions()` takes the same arguments as `torch.compile()`, allowing flexibility. | ||
|
|
||
| ## Benchmark | ||
|
|
||
| We conducted a comprehensive benchmark with PyTorch 2.0's efficient attention implementation and `torch.compile` across different GPUs and batch sizes for five of our most used pipelines. The code is benchmarked on 🤗 Diffusers v0.17.0.dev0 to optimize `torch.compile` usage (see [here](https://github.com/huggingface/diffusers/pull/3313) for more details). | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.