Skip to content

Commit 54cd4a3

Browse files
cjluo-nvkevalmorabia97
authored andcommitted
nvidia-modelopt 0.13 examples release
1 parent 06a1553 commit 54cd4a3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+2401
-471
lines changed

README.md

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
## Latest News
1818

19+
- \[2024/06/03\] Model Optimizer now has an experimental feature to deploy to vLLM as part of our effort to support popular deployment frameworks. Check out the workflow [here](./llm_ptq/README.md#deploy-fp8-quantized-model-using-vllm)
1920
- \[2024/05/08\] [Announcement: Model Optimizer Now Formally Available to Further Accelerate GenAI Inference Performance](https://developer.nvidia.com/blog/accelerate-generative-ai-inference-performance-with-nvidia-tensorrt-model-optimizer-now-publicly-available/)
2021
- \[2024/03/27\] [Model Optimizer supercharges TensorRT-LLM to set MLPerf LLM inference records](https://developer.nvidia.com/blog/nvidia-h200-tensor-core-gpus-and-nvidia-tensorrt-llm-set-mlperf-llm-inference-records/)
2122
- \[2024/03/18\] [GTC Session: Optimize Generative AI Inference with Quantization in TensorRT-LLM and TensorRT](https://www.nvidia.com/en-us/on-demand/session/gtc24-s63213/)
@@ -45,7 +46,7 @@ Model Optimizer is available for free for all developers on [NVIDIA PyPI](https:
4546
### [PIP](https://pypi.org/project/nvidia-modelopt/)
4647

4748
```bash
48-
pip install "nvidia-modelopt[all]~=0.11.0" --extra-index-url https://pypi.nvidia.com
49+
pip install "nvidia-modelopt[all]~=0.13.0" --extra-index-url https://pypi.nvidia.com
4950
```
5051

5152
See the [installation guide](https://nvidia.github.io/TensorRT-Model-Optimizer/getting_started/2_installation.html) for more fine-grained control over the installation.
@@ -67,6 +68,8 @@ docker run --gpus all -it --shm-size 20g --rm docker.io/library/modelopt_example
6768
python -c "import modelopt"
6869
```
6970

71+
Alternatively for PyTorch, you can also use [NVIDIA NGC PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags) with Model Optimizer pre-installed starting from 24.06 PyTorch container. Make sure to update the Model Optimizer version to the latest one if not already.
72+
7073
## Techniques
7174

7275
### Quantization
@@ -79,8 +82,12 @@ Sparsity is a technique to further reduce the memory footprint of deep learning
7982

8083
## Examples
8184

82-
- [PTQ for LLMs](./llm_ptq/README.md) covers how to use Post-training quantization (PTQ) for popular pre-trained [NVIDIA NeMo](https://github.com/NVIDIA/NeMo) and [Hugging Face](https://huggingface.co/docs/hub/en/models-the-hub) models, export to [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) for deployment.
83-
- [PTQ for Diffusers](./diffusers/README.md) walks through how to quantize a diffusion model with FP8 or INT8, export to ONNX, and deploy with [TensorRT](https://github.com/NVIDIA/TensorRT/tree/release/10.0/demo/Diffusion). The Diffusers example in this repo is complementary to the [demoDiffusion example in TensorRT repo](https://github.com/NVIDIA/TensorRT/tree/release/9.3/demo/Diffusion#introduction) and includes FP8 plugins as well as the latest updates on INT8 quantization.
85+
- [PTQ for LLMs](./llm_ptq/README.md) covers how to use Post-training quantization (PTQ) and export to [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) for deployment for popular pre-trained models from frameworks like
86+
- [Hugging Face](https://huggingface.co/docs/hub/en/models-the-hub)
87+
- [NVIDIA NeMo](https://github.com/NVIDIA/NeMo)
88+
- [NVIDIA Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
89+
- [Medusa](https://github.com/FasterDecoding/Medusa)
90+
- [PTQ for Diffusers](./diffusers/quantization/README.md) walks through how to quantize a diffusion model with FP8 or INT8, export to ONNX, and deploy with [TensorRT](https://github.com/NVIDIA/TensorRT/tree/release/10.0/demo/Diffusion). The Diffusers example in this repo is complementary to the [demoDiffusion example in TensorRT repo](https://github.com/NVIDIA/TensorRT/tree/release/10.0/demo/Diffusion#introduction) and includes FP8 plugins as well as the latest updates on INT8 quantization.
8491
- [QAT for LLMs](./llm_qat/README.md) demonstrates the recipe and workflow for Quantization-aware Training (QAT), which can further preserve model accuracy at low precisions (e.g., INT4, or 4-bit in [NVIDIA Blackwell platform](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/)).
8592
- [Sparsity for LLMs](./llm_sparsity/README.md) shows how to perform Post-training Sparsification and Sparsity-aware fine-tuning on a pre-trained Hugging Face model.
8693
- [ONNX PTQ](./onnx_ptq/README.md) shows how to quantize the ONNX models in INT4 or INT8 quantization mode. The examples also include the deployment of quantized ONNX models using TensorRT.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Cache Diffusion
2+
3+
## News
4+
5+
- [Utilizing DeepCache to Accelerate Stable Diffusion-XL Benchmarks in MLPerf Yields Leading Results](https://developer.nvidia.com/blog/nvidia-h200-tensor-core-gpus-and-nvidia-tensorrt-llm-set-mlperf-llm-inference-records/)
6+
7+
## Introduction
8+
9+
| Supported Framework | Supported Models |
10+
|----------|----------|
11+
| **PyTorch** | [**PixArt-α**](https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS), [**Stable Diffusion - XL**](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), [**SVD**](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) |
12+
| **TensorRT** | **WIP** |
13+
14+
Cache Diffusion methods, such as [DeepCache](https://arxiv.org/abs/2312.00858), [Block Caching](https://arxiv.org/abs/2312.03209) and [T-Gate](https://arxiv.org/abs/2404.02747), optimize performance by reusing cached outputs from previous steps instead of recalculating them. This **training-free** caching approach is compatible with a variety of models, like **DiT** and **UNet**, enabling considerable acceleration without compromising quality.
15+
16+
<p align="center">
17+
<img src="./assets/sdxl_cache.png" width="900"/>
18+
</p>
19+
<p align="center">
20+
This diagram shows the default SDXL Cache compute graph in this example.
21+
Significant speedup is achieve through skipping certain blocks at the specific steps.
22+
</p>
23+
24+
## Quick Start
25+
26+
1. Install the required packages:
27+
28+
```bash
29+
pip install -r requirements.txt
30+
```
31+
32+
2. Refer to the provided [example.ipynb](./example.ipynb) for detailed instructions on using cache diffusion.
33+
34+
Using our API, users can create various compute graphs by simply adjusting the parameters. For instance, the default parameter for SDXL is:
35+
36+
```python
37+
SDXL_DEFAULT_CONFIG = [
38+
{
39+
"wildcard_or_filter_func": lambda name: "up_blocks.2" not in name,
40+
"select_cache_step_func": lambda step: (step % 2) != 0,
41+
}
42+
]
43+
44+
cachify.prepare(pipe, num_inference_steps, SDXL_DEFAULT_CONFIG)
45+
```
46+
47+
Two parameters are essential: `wildcard_or_filter_func` and `select_cache_step_func`.
48+
49+
`wildcard_or_filter_func`: This can be a **str** or a **function**. If the module matches the given str or filter_func, then it will perform the cache operation. For example, if your input is a string `*up_blocks*`, it will match all names containing `up_blocks` and will perform the cache operation in the future, as you use `fnmatch` to match the string. If you use a function instead, the module name will be passed into the function you provided, and if the function returns True, then it will perform the cache operation.
50+
51+
`select_cache_step_func`: During inference, code will check at each step to see if you want to perform the cache operation based on the `select_cache_step_func` you provided. If `select_cache_step_func(current_step)` returns True, the module will cached; otherwise, it won't.
52+
53+
Multiple configurations can be set up, but ensure that the `wildcard_or_filter_func` works correctly. If you input more than one pair of parameters with the same `wildcard_or_filter_func`, the later one in the list will overwrite the previous ones.
54+
55+
## Demo
56+
57+
The following demo images are generated using `PyTorch==2.3.0 with 1xAda 6000 GPU backend`. TensorRT support will be available in the next ModelOPT release.
58+
59+
Comparing with naively reducing the generation steps, cache diffusion can achieve the same speedup and also much better image quality, even close to the reference image. If the image quality does not meet your needs or product requirements, you can replace our default configuration with your customized settings.
60+
61+
### Stable Diffusion - XL
62+
63+
<p align="center">
64+
<img src="./assets/SDXL_Cache_Diffusion_Img.png" />
65+
</p>
1.52 MB
Loading
445 KB
Loading
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: MIT
3+
#
4+
# Permission is hereby granted, free of charge, to any person obtaining a
5+
# copy of this software and associated documentation files (the "Software"),
6+
# to deal in the Software without restriction, including without limitation
7+
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
8+
# and/or sell copies of the Software, and to permit persons to whom the
9+
# Software is furnished to do so, subject to the following conditions:
10+
#
11+
# The above copyright notice and this permission notice shall be included in
12+
# all copies or substantial portions of the Software.
13+
#
14+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17+
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19+
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20+
# DEALINGS IN THE SOFTWARE.
21+
22+
import fnmatch
23+
24+
from diffusers.models.attention import FeedForward
25+
from diffusers.models.attention_processor import Attention
26+
from diffusers.models.resnet import ResnetBlock2D, TemporalResnetBlock
27+
from diffusers.pipelines.pixart_alpha.pipeline_pixart_alpha import PixArtAlphaPipeline
28+
from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl import (
29+
StableDiffusionXLPipeline,
30+
)
31+
from diffusers.pipelines.stable_video_diffusion.pipeline_stable_video_diffusion import (
32+
StableVideoDiffusionPipeline,
33+
)
34+
35+
from .module import CachedModule
36+
from .utils import replace_module
37+
38+
SUPPORTED_METHODS = {PixArtAlphaPipeline, StableDiffusionXLPipeline, StableVideoDiffusionPipeline}
39+
40+
41+
def cachify(model, num_inference_steps, config_list):
42+
for name, module in model.named_modules():
43+
for config in config_list:
44+
if _pass(name, config["wildcard_or_filter_func"]) and isinstance(
45+
module, (Attention, ResnetBlock2D, TemporalResnetBlock, FeedForward)
46+
):
47+
replace_module(
48+
model,
49+
name,
50+
CachedModule(module, num_inference_steps, config["select_cache_step_func"]),
51+
)
52+
53+
54+
def disable(pipe):
55+
model = get_model(pipe)
56+
for _, module in model.named_modules():
57+
if isinstance(module, CachedModule):
58+
module.disable_cache()
59+
60+
61+
def enable(pipe):
62+
model = get_model(pipe)
63+
for _, module in model.named_modules():
64+
if isinstance(module, CachedModule):
65+
module.enable_cache()
66+
67+
68+
def _pass(name, wildcard_or_filter_func):
69+
if isinstance(wildcard_or_filter_func, str):
70+
return fnmatch.fnmatch(name, wildcard_or_filter_func)
71+
elif callable(wildcard_or_filter_func):
72+
return wildcard_or_filter_func(name)
73+
else:
74+
raise NotImplementedError(f"Unsupported type {type(wildcard_or_filter_func)}")
75+
76+
77+
def get_model(pipe):
78+
if hasattr(pipe, "unet"):
79+
model = pipe.unet
80+
elif hasattr(pipe, "transformer"):
81+
model = pipe.transformer
82+
else:
83+
raise KeyError
84+
85+
return model
86+
87+
88+
def prepare(pipe, num_inference_steps, config_list):
89+
assert pipe.__class__ in SUPPORTED_METHODS, f"{pipe.__class__} is not supported!"
90+
91+
model = get_model(pipe)
92+
93+
cachify(model, num_inference_steps, config_list)
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: MIT
3+
#
4+
# Permission is hereby granted, free of charge, to any person obtaining a
5+
# copy of this software and associated documentation files (the "Software"),
6+
# to deal in the Software without restriction, including without limitation
7+
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
8+
# and/or sell copies of the Software, and to permit persons to whom the
9+
# Software is furnished to do so, subject to the following conditions:
10+
#
11+
# The above copyright notice and this permission notice shall be included in
12+
# all copies or substantial portions of the Software.
13+
#
14+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17+
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19+
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20+
# DEALINGS IN THE SOFTWARE.
21+
22+
from torch import nn
23+
24+
25+
class CachedModule(nn.Module):
26+
def __init__(self, block, num_inference_steps, select_cache_step_func) -> None:
27+
super().__init__()
28+
self.block = block
29+
self.num_inference_steps = num_inference_steps
30+
self.select_cache_step_func = select_cache_step_func
31+
self.cur_step = 0
32+
self.cached_results = None
33+
self.enabled = True
34+
35+
def __getattr__(self, name):
36+
try:
37+
return super().__getattr__(name)
38+
except AttributeError:
39+
return getattr(self.block, name)
40+
41+
def if_cache(self):
42+
return self.select_cache_step_func(self.cur_step) and self.enabled
43+
44+
def enable_cache(self):
45+
self.enabled = True
46+
47+
def disable_cache(self):
48+
self.enabled = False
49+
self.cur_step = 0
50+
51+
def reset_num_inference_steps(self, new_step):
52+
self.num_inference_steps = new_step
53+
54+
def forward(self, *args, **kwargs):
55+
if not self.if_cache():
56+
self.cached_results = self.block(*args, **kwargs)
57+
if self.enabled:
58+
self.cur_step = (self.cur_step + 1) % self.num_inference_steps
59+
return self.cached_results
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: MIT
3+
#
4+
# Permission is hereby granted, free of charge, to any person obtaining a
5+
# copy of this software and associated documentation files (the "Software"),
6+
# to deal in the Software without restriction, including without limitation
7+
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
8+
# and/or sell copies of the Software, and to permit persons to whom the
9+
# Software is furnished to do so, subject to the following conditions:
10+
#
11+
# The above copyright notice and this permission notice shall be included in
12+
# all copies or substantial portions of the Software.
13+
#
14+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
17+
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19+
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
20+
# DEALINGS IN THE SOFTWARE.
21+
22+
import re
23+
24+
SDXL_DEFAULT_CONFIG = [
25+
{
26+
"wildcard_or_filter_func": lambda name: "up_blocks.2" not in name,
27+
"select_cache_step_func": lambda step: (step % 2) != 0,
28+
}
29+
]
30+
31+
PIXART_DEFAULT_CONFIG = [
32+
{
33+
"wildcard_or_filter_func": lambda name: not re.search(
34+
r"transformer_blocks\.(2[1-7])\.", name
35+
),
36+
"select_cache_step_func": lambda step: (step % 3) != 0,
37+
}
38+
]
39+
40+
SVD_DEFAULT_CONFIG = [
41+
{
42+
"wildcard_or_filter_func": lambda name: "up_blocks.3" not in name,
43+
"select_cache_step_func": lambda step: (step % 2) != 0,
44+
}
45+
]
46+
47+
48+
def replace_module(parent, name_path, new_module):
49+
path_parts = name_path.split(".")
50+
for part in path_parts[:-1]:
51+
parent = getattr(parent, part)
52+
setattr(parent, path_parts[-1], new_module)

0 commit comments

Comments
 (0)