Skip to content

Commit 2b46355

Browse files
committed
update
1 parent 3c70be2 commit 2b46355

File tree

2 files changed

+5
-10
lines changed

2 files changed

+5
-10
lines changed

docs/source/en/quantization/gguf.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,13 +67,13 @@ image.save("flux-gguf.png")
6767

6868
## CUDA kernels
6969

70-
Optimized CUDA kernels can accelerate GGUF model inference by ~10%. It requires a compatible GPU with `torch.cuda.get_device_capability` greater than 7 and the [kernels](https://huggingface.co/docs/kernels/index) library.
70+
Optimized CUDA kernels accelerate GGUF model inference by ~10%. You need a compatible GPU with `torch.cuda.get_device_capability` greater than 7 and the [kernels](https://huggingface.co/docs/kernels/index) library.
7171

7272
```bash
7373
pip install -U kernels
7474
```
7575

76-
Set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to enable optimized kernels. CUDA kernels may introduce minor numerical differences compared to the original GGUF implementation, potentially causing subtle visual variations in generated images.
76+
Set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to enable optimized kernels. CUDA kernels introduce minor numerical differences compared to the original GGUF implementation, which may cause subtle visual variations in generated images.
7777

7878
```python
7979
import os
@@ -95,7 +95,7 @@ Use the Space below to convert a Diffusers checkpoint into a GGUF file.
9595
height="450"
9696
></iframe>
9797
98-
GGUF files stored in the [Diffusers format](../using-diffusers/other-formats) require the model's `config` path. If the model config is inside a subfolder, provide the `subfolder` argument as well.
98+
GGUF files stored in the [Diffusers format](../using-diffusers/other-formats) require the model's `config` path. Provide the `subfolder` argument if the model config is inside a subfolder.
9999

100100
```py
101101
import torch

docs/source/en/quantization/quanto.md

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,9 @@ specific language governing permissions and limitations under the License.
1313

1414
# Quanto
1515

16-
[Quanto](https://github.com/huggingface/optimum-quanto) is a PyTorch quantization backend for [Optimum](https://huggingface.co/docs/optimum/index). It has been designed with versatility and simplicity in mind:
16+
[Quanto](https://github.com/huggingface/optimum-quanto) is a PyTorch quantization backend inside the [Optimum](https://huggingface.co/docs/optimum/index) ecosystem. It's designed to work in eager mode, automatically inserts quantization/dequantization steps, supports a variety of weights and activations, and features accelerated matrix multiplication on CUDA devices.
1717

18-
- All features are available in eager mode (works with non-traceable models)
19-
- Supports quantization aware training
20-
- Quantized models are compatible with `torch.compile`
21-
- Quantized models are Device agnostic (e.g CUDA,XPU,MPS,CPU)
22-
23-
Although the Quanto library does allow quantizing `nn.Conv2d` and `nn.LayerNorm` modules, currently, Diffusers only supports quantizing the weights in the `nn.Linear` layers of a model.
18+
Quanto doesn't quantize `nn.Conv2d` and `nn.LayerNorm` modules because Diffusers can only quantize weights in `nn.Linear` layers.
2419

2520
Make sure Quanto and [Accelerate](https://huggingface.co/docs/optimum/index) are installed.
2621

0 commit comments

Comments
 (0)