You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/api/pipelines/kandinsky5_image.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License.
11
11
12
12
[Kandinsky 5.0](https://arxiv.org/abs/2511.14993) is a family of diffusion models for Video & Image generation.
13
13
14
-
Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters)
14
+
Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters).
15
15
16
16
The model introduces several key innovations:
17
17
-**Latent diffusion pipeline** with **Flow Matching** for improved training stability
@@ -21,10 +21,14 @@ The model introduces several key innovations:
21
21
22
22
The original codebase can be found at [kandinskylab/Kandinsky-5](https://github.com/kandinskylab/Kandinsky-5).
23
23
24
+
> [!TIP]
25
+
> Check out the [Kandinsky Lab](https://huggingface.co/kandinskylab) organization on the Hub for the official model checkpoints for text-to-video generation, including pretrained, SFT, no-CFG, and distilled variants.
[Z-Image](https://huggingface.co/papers/2511.22699) is a powerful and highly efficient image generation model with 6B parameters. Currently there's only one model with two more to be released:
Z-Image-Turbo is a distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers sub-second inference latency on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
Copy file name to clipboardExpand all lines: docs/source/en/optimization/cache.md
+31Lines changed: 31 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,4 +66,35 @@ config = FasterCacheConfig(
66
66
tensor_format="BFCHW",
67
67
)
68
68
pipeline.transformer.enable_cache(config)
69
+
```
70
+
71
+
## TaylorSeer Cache
72
+
73
+
[TaylorSeer Cache](https://huggingface.co/papers/2403.06923) accelerates diffusion inference by using Taylor series expansions to approximate and cache intermediate activations across denoising steps. The method predicts future outputs based on past computations, reusing them at specified intervals to reduce redundant calculations.
74
+
75
+
This caching mechanism delivers strong results with minimal additional memory overhead. For detailed performance analysis, see [our findings here](https://github.com/huggingface/diffusers/pull/12648#issuecomment-3610615080).
76
+
77
+
To enable TaylorSeer Cache, create a [`TaylorSeerCacheConfig`] and pass it to your pipeline's transformer:
78
+
79
+
-`cache_interval`: Number of steps to reuse cached outputs before performing a full forward pass
80
+
-`disable_cache_before_step`: Initial steps that use full computations to gather data for approximations
81
+
-`max_order`: Approximation accuracy (in theory, higher values improve quality but increase memory usage but we recommend it should be set to `1`)
82
+
83
+
```python
84
+
import torch
85
+
from diffusers import FluxPipeline, TaylorSeerCacheConfig
Copy file name to clipboardExpand all lines: docs/source/en/quantization/modelopt.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. -->
11
11
12
12
# NVIDIA ModelOpt
13
13
14
-
[NVIDIA-ModelOpt](https://github.com/NVIDIA/TensorRT-Model-Optimizer) is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
14
+
[NVIDIA-ModelOpt](https://github.com/NVIDIA/Model-Optimizer) is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
15
15
16
16
Before you begin, make sure you have nvidia_modelopt installed.
17
17
@@ -57,7 +57,7 @@ image.save("output.png")
57
57
>
58
58
> The quantization methods in NVIDIA-ModelOpt are designed to reduce the memory footprint of model weights using various QAT (Quantization-Aware Training) and PTQ (Post-Training Quantization) techniques while maintaining model performance. However, the actual performance gain during inference depends on the deployment framework (e.g., TRT-LLM, TensorRT) and the specific hardware configuration.
59
59
>
60
-
> More details can be found [here](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples).
60
+
> More details can be found [here](https://github.com/NVIDIA/Model-Optimizer/tree/main/examples).
61
61
62
62
## NVIDIAModelOptConfig
63
63
@@ -86,7 +86,7 @@ The quantization methods supported are as follows:
86
86
|**NVFP4**|`nvfp4 weight only`, `nvfp4 block quantization`|`quant_type`, `quant_type + channel_quantize + block_quantize`|`channel_quantize = -1 is only supported for now`|
87
87
88
88
89
-
Refer to the [official modelopt documentation](https://nvidia.github.io/TensorRT-Model-Optimizer/) for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
89
+
Refer to the [official modelopt documentation](https://nvidia.github.io/Model-Optimizer/) for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
0 commit comments