update

stevhliu · stevhliu · commit 2b4635509d39 · 2025-09-25T12:17:37.000-07:00
diff --git a/docs/source/en/quantization/gguf.md b/docs/source/en/quantization/gguf.md
@@ -67,13 +67,13 @@ image.save("flux-gguf.png")
 
 ## CUDA kernels
 
-Optimized CUDA kernels can accelerate GGUF model inference by ~10%. It requires a compatible GPU with `torch.cuda.get_device_capability` greater than 7 and the [kernels](https://huggingface.co/docs/kernels/index) library.
+Optimized CUDA kernels accelerate GGUF model inference by ~10%. You need a compatible GPU with `torch.cuda.get_device_capability` greater than 7 and the [kernels](https://huggingface.co/docs/kernels/index) library.
 
 ```bash
 pip install -U kernels
 ```
 
-Set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to enable optimized kernels. CUDA kernels may introduce minor numerical differences compared to the original GGUF implementation, potentially causing subtle visual variations in generated images.
+Set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to enable optimized kernels. CUDA kernels introduce minor numerical differences compared to the original GGUF implementation, which may cause subtle visual variations in generated images.
 
 ```python
 import os
@@ -95,7 +95,7 @@ Use the Space below to convert a Diffusers checkpoint into a GGUF file.
 	height="450"
 ></iframe>
 
-GGUF files stored in the [Diffusers format](../using-diffusers/other-formats) require the model's `config` path. If the model config is inside a subfolder, provide the `subfolder` argument as well.
+GGUF files stored in the [Diffusers format](../using-diffusers/other-formats) require the model's `config` path. Provide the `subfolder` argument if the model config is inside a subfolder.
 
 ```py
 import torch
diff --git a/docs/source/en/quantization/quanto.md b/docs/source/en/quantization/quanto.md
@@ -13,14 +13,9 @@ specific language governing permissions and limitations under the License.
 
 # Quanto
 
-[Quanto](https://github.com/huggingface/optimum-quanto) is a PyTorch quantization backend for [Optimum](https://huggingface.co/docs/optimum/index). It has been designed with versatility and simplicity in mind:
+[Quanto](https://github.com/huggingface/optimum-quanto) is a PyTorch quantization backend inside the [Optimum](https://huggingface.co/docs/optimum/index) ecosystem. It's designed to work in eager mode, automatically inserts quantization/dequantization steps, supports a variety of weights and activations, and features accelerated matrix multiplication on CUDA devices.
 
-- All features are available in eager mode (works with non-traceable models)
-- Supports quantization aware training
-- Quantized models are compatible with `torch.compile`
-- Quantized models are Device agnostic (e.g CUDA,XPU,MPS,CPU)
-
-Although the Quanto library does allow quantizing `nn.Conv2d` and `nn.LayerNorm` modules, currently, Diffusers only supports quantizing the weights in the `nn.Linear` layers of a model.
+Quanto doesn't quantize `nn.Conv2d` and `nn.LayerNorm` modules because Diffusers can only quantize weights in `nn.Linear` layers.
 
 Make sure Quanto and [Accelerate](https://huggingface.co/docs/optimum/index) are installed.